Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: dev@spark.apache.org, u...@spark.apache.org Subject: Broken SQL Visualization? Hi, today I hav

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-16 Thread Ted Yu
Is there going to be another RC ? With KafkaContinuousSourceSuite hanging, it is hard to get the rest of the tests going. Cheers On Sat, Jan 13, 2018 at 7:29 AM, Sean Owen wrote: > The signatures and licenses look OK. Except for the missing k8s package, > the contents look OK. Tests look prett

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ted Yu
Interesting. Should requiredClustering return a Set of Expression's ? This way, we can determine the order of Expression's by looking at what requiredOrdering() returns. On Mon, Mar 26, 2018 at 5:45 PM, Ryan Blue wrote: > Hi Pat, > > Thanks for starting the discussion on this, we’re really inte

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ted Yu
wrote: > Actually clustering is already supported, please take a look at > SupportsReportPartitioning > > Ordering is not proposed yet, might be similar to what Ryan proposed. > > On Mon, Mar 26, 2018 at 6:11 PM, Ted Yu wrote: > >> Interesting. >> >> Sh

Re: DataSourceV2 write input requirements

2018-03-28 Thread Ted Yu
>>> provide >>>>>>>>>> Spark a hash function for the other side of a join. It seems >>>>>>>>>> unlikely to me >>>>>>>>>> that many data sources would have partitioning that happens

Re: DataSourceV2 write input requirements

2018-03-30 Thread Ted Yu
+1 Original message From: Ryan Blue Date: 3/30/18 2:28 PM (GMT-08:00) To: Patrick Woody Cc: Russell Spitzer , Wenchen Fan , Ted Yu , Spark Dev List Subject: Re: DataSourceV2 write input requirements You're right. A global sort would change the clustering if it had

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Ted Yu
Congratulations, Zhenhua  Original message From: 雨中漫步 <601450...@qq.com> Date: 4/1/18 11:30 PM (GMT-08:00) To: Yuanjian Li , Wenchen Fan Cc: dev Subject: 回复: Welcome Zhenhua Wang as a Spark committer Congratulations Zhenhua Wang -- 原始邮件 --发件

Re: Spark Kafka adapter questions

2018-08-17 Thread Ted Yu
If you have picked up all the changes for SPARK-18057, the Kafka “broker” supporting v1.0+ should be compatible with Spark's Kafka adapter. Can you post more details about the “failed to send SSL close message” errors ? (The default Kafka version is 2.0.0 in Spark Kafka adapter after SPARK-18057

Re: Spark Kafka adapter questions

2018-08-20 Thread Ted Yu
spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:203) > > 18/08/20 22:29:33 INFO AbstractCoordinator: Marking the coordinator > :9093 (id: 2147483647 rack: null) dead for group > spark-kafka-source-1aa50598-99d1-4c53-a73c-fa6637a219b2--1338794993-dri

Re: SPIP: Executor Plugin (SPARK-24918)

2018-08-31 Thread Ted Yu
+1 Original message From: Reynold Xin Date: 8/30/18 11:11 PM (GMT-08:00) To: Felix Cheung Cc: dev Subject: Re: SPIP: Executor Plugin (SPARK-24918) I actually had a similar use case a while ago, but not entirely the same. In my use case, Spark is already up, but I want to m

Re: Upgrade SBT to the latest

2018-08-31 Thread Ted Yu
+1 Original message From: Sean Owen Date: 8/31/18 6:40 AM (GMT-08:00) To: Darcy Shen Cc: dev@spark.apache.org Subject: Re: Upgrade SBT to the latest Certainly worthwhile. I think this should target Spark 3, which should come after 2.4, which is itself already just about rea

Re: from_csv

2018-09-19 Thread Ted Yu
+1 Original message From: Dongjin Lee Date: 9/19/18 7:20 AM (GMT-08:00) To: dev Subject: Re: from_csv Another +1. I already experienced this case several times. On Mon, Sep 17, 2018 at 11:03 AM Hyukjin Kwon wrote: +1 for this idea since text parsing in CSV/JSON is quite co

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Ted Yu
+1 Original message From: Denny Lee Date: 9/30/18 10:30 PM (GMT-08:00) To: Stavros Kontopoulos Cc: Sean Owen , Wenchen Fan , dev Subject: Re: [VOTE] SPARK 2.4.0 (RC2) +1 (non-binding) On Sat, Sep 29, 2018 at 10:24 AM Stavros Kontopoulos wrote: +1 Stavros On Sat, Sep 2

Re: welcome a new batch of committers

2018-10-03 Thread Ted Yu
Congratulations to all ! Original message From: Jungtaek Lim Date: 10/3/18 2:41 AM (GMT-08:00) To: Marco Gaido Cc: dev Subject: Re: welcome a new batch of committers Congrats all! You all deserved it. On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido wrote: Congrats you all! Il g

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-06 Thread Ted Yu
Running the following command: build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr -Dhadoop.version=2.7.0 package The build stopped with this test failure: ^[[31m- SPARK-9757 Persist Parquet relation with decimal column *** FAILED ***^[[0m On Wed, Jul 6, 2016 at 6:25 AM, Sea

Re: Spark 2.0.0 performance; potential large Spark core regression

2016-07-08 Thread Ted Yu
bq. we turned it off when fixing a bug Adam: Can you refer to the bug JIRA ? Thanks On Fri, Jul 8, 2016 at 9:22 AM, Adam Roberts wrote: > Thanks Michael, we can give your options a try and aim for a 2.0.0 tuned > vs 2.0.0 default vs 1.6.2 default comparison, for future reference the > defaults

Re: Spark performance regression test suite

2016-07-08 Thread Ted Yu
Found a few issues: [SPARK-6810] Performance benchmarks for SparkR [SPARK-2833] performance tests for linear regression [SPARK-15447] Performance test for ALS in Spark 2.0 Haven't found one for Spark core. On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman wrote: > Hello, > > I've seen a few messa

Re: Build speed

2016-07-22 Thread Ted Yu
I assume you have enabled Zinc. Cheers On Fri, Jul 22, 2016 at 7:54 AM, Mikael Ståldal wrote: > Is there any way to speed up an incremental build of Spark? > > For me it takes 8 minutes to build the project with just a few code > changes. > > -- > [image: MagineTV] > > *Mikael Ståldal* > Senior

Re: SQL Based Authorization for SparkSQL

2016-08-02 Thread Ted Yu
There was SPARK-12008 which was closed. Not sure if there is active JIRA in this regard. On Tue, Aug 2, 2016 at 6:40 PM, 马晓宇 wrote: > Hi guys, > > I wonder if anyone working on SQL based authorization already or not. > > This is something we needed badly right now and we tried to embedded a > H

Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Ted Yu
Congratulations, Felix. On Mon, Aug 8, 2016 at 11:15 AM, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Felix Cheung as a committer. Felix has been > a major contributor to SparkR and we're excited to have him join > officially. Congrats and welcome, Felix! > > Matei >

Re: Spark 1.x/2.x qualifiers in downstream artifact names

2016-08-24 Thread Ted Yu
'Spark 1.x and Scala 2.10 & 2.11' was repeated. I guess your second line should read: org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] for Spark 2.x and Scala 2.10 & 2.11 On Wed, Aug 24, 2016 at 9:41 AM, Michael Heuer wrote: > Hello, > > We're a project downstream of Spark and need to

Replacement for SparkSqlSerializer.deserialize[

2016-09-06 Thread Ted Yu
Hi, In hbase-spark module of hbase, we previously had this code: def hbaseFieldToScalaType( f: Field, src: Array[Byte], offset: Int, length: Int): Any = { ... case BinaryType => val newArray = new Array[Byte](length) System.arraycopy(src, offse

Re: Issues in compiling spark 2.0.0 code using scala-maven-plugin

2016-09-30 Thread Ted Yu
Was there any error prior to 'LifecycleExecutionException' ? On Fri, Sep 30, 2016 at 2:43 PM, satyajit vegesna < satyajit.apas...@gmail.com> wrote: > >> i am trying to compile code using maven ,which was working with spark >> 1.6.2, but when i try for spark 2.0.0 then i get below error, >> >> org

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Ted Yu
I think only committers should resolve JIRAs which were not created by himself / herself. > On Oct 8, 2016, at 6:53 AM, Hyukjin Kwon wrote: > > I am uncertain too. It'd be great if these are documented too. > > FWIW, in my case, I privately asked and told Sean first that I am going to > look

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Ted Yu
Makes sense. I trust Hyukjin, Holden and Cody's judgement in respective areas. I just wish to see more participation from the committers. Thanks > On Oct 8, 2016, at 8:27 AM, Sean Owen wrote: > > Hyukjin - To unsubscribe

Re: Difference between netty and netty-all

2016-12-05 Thread Ted Yu
This should be in netty-all : $ jar tvf /home/x/.m2/repository/io/netty/netty-all/4.0.29.Final/netty-all-4.0.29.Final.jar | grep ThreadLocalRandom 967 Tue Jun 23 11:10:30 UTC 2015 io/netty/util/internal/ThreadLocalRandom$1.class 1079 Tue Jun 23 11:10:30 UTC 2015 io/netty/util/internal/ThreadL

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
I haven't used Gobblin. You can consider asking Gobblin mailing list of the first option. The second option would work. On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri wrote: > Hello Guys, > > I would like to understand different approach for Distributed Incremental > load from HBase, Is there

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
processing is delivered to hbase. Cheers On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri wrote: > Ok, Sure will ask. > > But what would be generic best practice solution for Incremental load from > HBASE. > > On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu wrote: > >> I haven

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6) You can build the hbase-spark module yourself. Cheers On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri wrote: > Hello Spark Community Folks, > > Currently I am using HBase 1.2.4 and

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
The references are vendor specific. Suggest contacting vendor's mailing list for your PR. My initial interpretation of HBase repository is that of Apache. Cheers On Wed, Jan 25, 2017 at 7:38 AM, Chetan Khatri wrote: > @Ted Yu, Correct but HBase-Spark module available at HBase re

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Does the storage handler provide bulk load capability ? Cheers > On Jan 25, 2017, at 3:39 AM, Amrit Jangid wrote: > > Hi chetan, > > If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. > > Try this if you

Re: Should we consider a Spark 2.1.1 release?

2017-03-20 Thread Ted Yu
Timur: Mind starting a new thread ? I have the same question as you have. > On Mar 20, 2017, at 11:34 AM, Timur Shenkao wrote: > > Hello guys, > > Spark benefits from stable versions not frequent ones. > A lot of people still have 1.6.x in production. Those who wants the freshest > (like me)

Re: the compile of spark stoped without any hints, would you like help me please?

2017-06-25 Thread Ted Yu
Does adding -X to mvn command give you more information ? Cheers On Sun, Jun 25, 2017 at 5:29 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > > Today I use new PC to compile SPARK. > At the beginning, it worked well. > But it stop at some point. > the content in consle is : > ==

Re: how to mention others in JIRA comment please?

2017-06-26 Thread Ted Yu
You can find the JIRA handle of the person you want to mention by going to a JIRA where that person has commented. e.g. you want to find the handle for Joseph. You can go to: https://issues.apache.org/jira/browse/SPARK-6635 and click on his name in comment: https://issues.apache.org/jira/secure/V

Re: Spark Hbase Connector

2017-06-29 Thread Ted Yu
Please take a look at HBASE-16179 (work in progress). On Thu, Jun 29, 2017 at 4:30 PM, Raj, Deepu wrote: > Hi Team, > > > > Is there stable Spark HBase connector for Spark 2.0 ? > > > > Thanks, > > Deepu Raj > > >

Re: Performance Benchmark Hbase vs Cassandra

2017-06-29 Thread Ted Yu
For Cassandra, I found: https://www.instaclustr.com/multi-data-center-sparkcassandra-benchmark-round-2/ My coworker (on vacation at the moment) was doing benchmark with hbase. When he comes back, the result can be published. Note: it is hard to find comparison results with same setup (hardware,

Spark 2.1.x client with 2.2.0 cluster

2017-08-10 Thread Ted Yu
Hi, Has anyone used Spark 2.1.x client with Spark 2.2.0 cluster ? If so, is there any compatibility issue observed ? Thanks

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Ted Yu
Congratulations, Jerry ! On Mon, Aug 28, 2017 at 6:28 PM, Matei Zaharia wrote: > Hi everyone, > > The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai > has been contributing to many areas of the project for a long time, so it’s > great to see him join. Join me in thanking an

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Ted Yu
I built based on this commit today and the build was successful. What command did you use ? Cheers On Tue, Nov 4, 2014 at 2:08 PM, Alessandro Baretta wrote: > Fellow Sparkers, > > I am new here and still trying to learn to crawl. Please, bear with me. > > I just pulled f90ad5d from https://git

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Ted Yu
G]^ > >> > >> [WARNING] > >> > /home/alex/git/spark/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeans.scala:22: > >> imported `StreamingKMeans' is permanently hidden by definition of obje

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirror&subj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken Cheers On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas wrote: > As part of my work for SPARK-3821 >

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Ted Yu
o also appears to be broken now: > http://apache.mesi.com.ar/hadoop/common/ > > Nick > > On Wed, Nov 5, 2014 at 10:43 PM, Ted Yu wrote: >> Have you seen this thread ? >> >> http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirror&subj=Re+All+mirrored

Re: Has anyone else observed this build break?

2014-11-15 Thread Ted Yu
Sorry for the late reply. I tested my patch on Mac with the following JDK: java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) Let me see if the problem can be solved upstream in HBase hbase-annotations modu

Re: Has anyone else observed this build break?

2014-11-15 Thread Ted Yu
uest to exclude it from various hbase modules: https://github.com/apache/spark/pull/3286 Cheers https://github.com/apache/spark/pull/3286 On Sat, Nov 15, 2014 at 6:56 AM, Ted Yu wrote: > Sorry for the late reply. > > I tested my patch on Mac with the following JDK: > > java ve

Re: How spark and hive integrate in long term?

2014-11-21 Thread Ted Yu
bq. spark-0.12 also has some nice feature added Minor correction: you meant Spark 1.2.0 I guess Cheers On Fri, Nov 21, 2014 at 3:45 PM, Zhan Zhang wrote: > Thanks Dean, for the information. > > Hive-on-spark is nice. Spark sql has the advantage to take the full > advantage of spark and allows

Re: Required file not found in building

2014-12-01 Thread Ted Yu
I tried the same command on MacBook and didn't experience the same error. Which OS are you using ? Cheers On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch wrote: > It seems there were some additional settings required to build spark now . > This should be a snap for most of you ot there about wh

Re: Required file not found in building

2014-12-01 Thread Ted Yu
-help for information about locating necessary files > > 2014-12-01 19:02 GMT-08:00 Stephen Boesch : > > Mac as well. Just found the problem: I had created an alias to zinc a >> couple of months back. Apparently that is not happy with the build anymore. >> No problem now that th

Re: Required file not found in building

2014-12-01 Thread Ted Yu
zip for 0.3.5.3 was downloaded and exploded. Then I ran > sbt dist/create . zinc is being launched from > dist/target/zinc-0.3.5.3/bin/zinc > > 2014-12-01 20:12 GMT-08:00 Ted Yu : > > I use zinc 0.2.0 and started zinc with the same command shown below. >> >> I don&#

Re: Unit tests in < 5 minutes

2014-12-04 Thread Ted Yu
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ? Test categorization in HBase is done through maven-surefire-plugin Cheers On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas wrote: > fwiw, when we did this work in HBase, we categorized the tests. Then some > tests can share a s

Re: Unit tests in < 5 minutes

2014-12-06 Thread Ted Yu
bq. I may move on to trying Maven. Maven is my favorite :-) On Sat, Dec 6, 2014 at 10:54 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Ted, > > I posted some updates >

Re: Nabble mailing list mirror errors: "This post has NOT been accepted by the mailing list yet"

2014-12-19 Thread Ted Yu
Andy: I saw two emails from you from yesterday. See this thread: http://search-hadoop.com/m/JW1q5opRsY1 Cheers On Fri, Dec 19, 2014 at 12:51 PM, Andy Konwinski wrote: > Yesterday, I changed the domain name in the mailing list archive settings > to remove ".incubator" so maybe it'll work now. >

Re: Assembly jar file name does not match profile selection

2014-12-26 Thread Ted Yu
Can you try this command ? sbt/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive assembly On Fri, Dec 26, 2014 at 6:15 PM, Alessandro Baretta wrote: > I am building spark with sbt off of branch 1.2. I'm using the following > command: > > sbt/sbt -Pyarn -Phadoop-2.3 assembly > > (http://spar

Re: Why the major.minor version of the new hive-exec is 51.0?

2014-12-30 Thread Ted Yu
I extracted org/apache/hadoop/hive/common/CompressionUtils.class from the jar and used hexdump to view the class file. Bytes 6 and 7 are 00 and 33, respectively. According to http://en.wikipedia.org/wiki/Java_class_file, the jar was produced using Java 7. FYI On Tue, Dec 30, 2014 at 8:09 PM, Shi

Re: python converter in HBaseConverter.scala(spark/examples)

2015-01-05 Thread Ted Yu
In my opinion this would be useful - there was another thread where returning only the value of first column in the result was mentioned. Please create a SPARK JIRA and a pull request. Cheers On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio wrote: > Hi, > > In HBaseConverter.scala > < > https://githu

Re: python converter in HBaseConverter.scala(spark/examples)

2015-01-05 Thread Ted Yu
ld think that various custom converters would be part > of external projects that can be listed with http://spark-packages.org/ I > see your project is already listed there. > > — > Sent from Mailbox <https://www.dropbox.com/mailbox> > > > On Mon, Jan 5, 2015 at 5:37 PM,

Re: Results of tests

2015-01-08 Thread Ted Yu
Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/ On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix wrote: > Hi, > I'm checking that Spark works fine on a new environment (PPC64 hardware). > I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when > running on Ubun

Re: Results of tests

2015-01-08 Thread Ted Yu
d/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ > ? (I'm not authorized to look at the "configuration" part) > > Thx ! > > Tony > > -- > *De :* Ted Yu [yuzhih...@gmail.com] > *Envoyé :* jeudi 8 janvier 2015 16:11 > *À :*

Re: Results of tests

2015-01-09 Thread Ted Yu
I have 3485 tests only > (like on Ubuntu/x86_64 with IBM JVM), with 6 or 285 failures... > > > > So, I need to learn more about how your Jenkins environment extracts > details about the results. > > Moreover, which JVM is used ? > > > > Do you plan to use IBM J

Re: Results of tests

2015-01-09 Thread Ted Yu
ala tests, but we might be able to > integrate the PySpark tests here, too (I think it's just a matter of > getting the Python test runner to generate the correct test result XML > output). > > On Fri, Jan 9, 2015 at 10:47 AM, Ted Yu wrote: > >> For a build which

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-18 Thread Ted Yu
Please tale a look at SPARK-4048 and SPARK-5108 Cheers On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik wrote: > Hi, > > I took a source code of Spark 1.2.0 and tried to build it together with > hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift ) > I used Hadoop 2.6.0. > > The buil

Re: Standardized Spark dev environment

2015-01-20 Thread Ted Yu
How many profiles (hadoop / hive /scala) would this development environment support ? Cheers On Tue, Jan 20, 2015 at 4:13 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > What do y'all think of creating a standardized Spark development > environment, perhaps encoded as a Vagrantfile,

Re: Intellij IDEA 14 env setup; NoClassDefFoundError when run examples

2015-01-31 Thread Ted Yu
Have you read / followed this ? https://cwiki.apache.org/confluence/display/SPARK /Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA Cheers On Sat, Jan 31, 2015 at 8:01 PM, Yafeng Guo wrote: > Hi, > > I'm setting up a dev environment with Intellij IDEA 14. I selected prof

Re: Welcoming three new committers

2015-02-03 Thread Ted Yu
Congratulations, Cheng, Joseph and Sean. On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas wrote: > Congratulations guys! > > On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia > wrote: > > > Hi all, > > > > The PMC recently voted to add three new committers: Cheng Lian, Joseph > > Bradley and Sean

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Ted Yu
I downloaded 1.2.1 tar ball for hadoop 2.4 I got: ls lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-rdbms-3.2.9.jar spark-assembly-1.2.1-hadoop2.4.0.jar datanucleus-core-3.2.10.jarspark-1.2.1-yarn-shuffle.jar spark-examples-1.2.1-hadoop2.4.0.jar FYI On Wed, Feb 11, 2015 at 2:27 PM, Nichola

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Ted Yu
; datanucleus-api-jdo-3.2.6.jar >> datanucleus-core-3.2.10.jar >> datanucleus-rdbms-3.2.9.jar >> spark-1.2.1-yarn-shuffle.jar >> spark-assembly-1.2.1-hadoop2.4.0.jar >> spark-examples-1.2.1-hadoop2.4.0.jar >> >> So that looks correct… Hmm. >> &

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Ted Yu
totally missed it... :( > > On Wed Feb 11 2015 at 2:46:35 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> lol yeah, I changed the path for the email... turned out to be the issue >> itself. >> >> >> On Wed Feb 11 2015 at 2:43:09 PM

Re: trouble with sbt building network-* projects?

2015-02-27 Thread Ted Yu
bq. to be able to run my tests in sbt, though, it makes the development iterations much faster. Was the preference for sbt due to long maven build time ? Have you started Zinc on your machine ? Cheers On Fri, Feb 27, 2015 at 11:10 AM, Imran Rashid wrote: > Has anyone else noticed very strange

Re: trouble with sbt building network-* projects?

2015-02-27 Thread Ted Yu
t; want to entirely skip compiling sql, graphx, mllib etc. -- I have to switch > branches often enough that i end up triggering a full rebuild of those > projects even when I haven't touched them. > > > > > > On Fri, Feb 27, 2015 at 1:14 PM, Ted Yu wrote: > >> bq. to

Re: GitHub Syncing Down

2015-03-11 Thread Ted Yu
Looks like github is functioning again (I no longer encounter this problem when pushing to hbase repo). Do you want to give it a try ? Cheers On Tue, Mar 10, 2015 at 6:54 PM, Michael Armbrust wrote: > FYI: https://issues.apache.org/jira/browse/INFRA-9259 >

Re: Wrong version on the Spark documentation page

2015-03-15 Thread Ted Yu
When I enter http://spark.apache.org/docs/latest/ into Chrome address bar, I saw 1.3.0 Cheers On Sun, Mar 15, 2015 at 11:12 AM, Patrick Wendell wrote: > Cheng - what if you hold shift+refresh? For me the /latest link > correctly points to 1.3.0 > > On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian

Re: Exception using the new createDirectStream util method

2015-03-19 Thread Ted Yu
Looking at KafkaCluster#getLeaderOffsets(): respMap.get(tp).foreach { por: PartitionOffsetsResponse => if (por.error == ErrorMapping.NoError) { ... } else { errs.append(ErrorMapping.exceptionFor(por.error)) } There should be some error ot

Re: Error: 'SparkContext' object has no attribute 'getActiveStageIds'

2015-03-20 Thread Ted Yu
Please take a look at core/src/main/scala/org/apache/spark/SparkStatusTracker.scala, around line 58: def getActiveStageIds(): Array[Int] = { Cheers On Fri, Mar 20, 2015 at 3:59 PM, xing wrote: > getStageInfo in self._jtracker.getStageInfo below seems not > implemented/included in the current

Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame

2015-03-24 Thread Ted Yu
Please take a look at: ./sql/core/src/main/scala/org/apache/spark/sql/DataFrameHolder.scala ./sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala Cheers On Tue, Mar 24, 2015 at 8:46 PM, Zhiwei Chan wrote: > Hi all, > > I just upgraded spark from 1.2.1 to 1.3.0, and changed the "imp

Re: Jira Issues

2015-03-25 Thread Ted Yu
Issues are tracked on Apache JIRA: https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel Cheers On Wed, Mar 25, 2015 at 1:51 PM, Igor Costa wrote: > Hi there Guys. > > I want to be more collaborative to Spark, but I have two questions. >

Re: should we add a start-masters.sh script in sbin?

2015-03-31 Thread Ted Yu
Sounds good to me. On Tue, Mar 31, 2015 at 6:12 PM, sequoiadb wrote: > Hey, > > start-slaves.sh script is able to read from slaves file and start slaves > node in multiple boxes. > However in standalone mode if I want to use multiple masters, I’ll have to > start masters in each individual box,

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Ted Yu
bq. writing the output (to Amazon S3) failed What's the value of "fs.s3.maxRetries" ? Increasing the value should help. Cheers On Wed, Apr 1, 2015 at 8:34 AM, Romi Kuntsman wrote: > What about communication errors and not corrupted files? > Both when reading input and when writing output. > We

Re: org.spark-project.jetty and guava repo locations

2015-04-02 Thread Ted Yu
Take a look at the maven-shade-plugin in pom.xml. Here is the snippet for org.spark-project.jetty : org.eclipse.jetty org.spark-project.jetty org.eclipse.jetty.** On Thu, Apr 2, 2015 at 3:59 AM, Ni

Re: One corrupt gzip in a directory of 100s

2015-04-02 Thread Ted Yu
t; at java.lang.Thread.run(Thread.java:745) > > Romi Kuntsman, Big Data Engineer > http://www.totango.com > >> On Wed, Apr 1, 2015 at 6:46 PM, Ted Yu wrote: >> bq. writing the output (to Amazon S3) failed >> >> What's the value of "fs.s3

Re: wait time between start master and start slaves

2015-04-11 Thread Ted Yu
>From SparkUI.scala : def getUIPort(conf: SparkConf): Int = { conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT) } Better retrieve effective UI port before probing. Cheers On Sat, Apr 11, 2015 at 2:38 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > So basically, to tell if t

Re: how long does it takes for full build ?

2015-04-16 Thread Ted Yu
You can get some idea by looking at the builds here: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ Cheers On Thu, Apr 16, 2015 at 11:56 AM, Sree V wrote: > Hi Team, > How long does it takes for a full build 'mvn clean pa

Re: how long does it takes for full build ?

2015-04-16 Thread Ted Yu
urs. > ... > ExternalSorterSuite: > - empty data stream > - few elements per partition > - empty partitions with spilling > - empty partitions with spilling, bypass merge-sort > > Any pointers ? > > Thanking you. > > With Regards > Sree > > > > On Thur

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Ted Yu
The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I had an issue trying to use Spark SQL from Java (8 o

Re: [sql] Dataframe how to check null values

2015-04-20 Thread Ted Yu
I found: https://issues.apache.org/jira/browse/SPARK-6573 > On Apr 20, 2015, at 4:29 AM, Peter Rudenko wrote: > > Sounds very good. Is there a jira for this? Would be cool to have in 1.4, > because currently cannot use dataframe.describe function with NaN values, > need to filter manually al

Re: Should we let everyone set Assignee?

2015-04-24 Thread Ted Yu
bq. get newly created JIRAs posted onto a list (dev?) +1 On Fri, Apr 24, 2015 at 3:02 AM, Steve Loughran wrote: > > I actually think the assignee JIRA issue is a minor detail; what really > matters is do things get in and how. > > So far, in the bits I've worked on, I've not encountered any pro

Re: [discuss] DataFrame function namespacing

2015-04-30 Thread Ted Yu
IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin wrote: > We definitely still have the name collision problem in SQL. > > On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal < > punya.bis...@gmail.com > > wrote: > > > Do we still have to keep the names of the

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Ted Yu
+1 on ending support for Java 6. BTW from https://www.java.com/en/download/faq/java_7.xml : After April 2015, Oracle will no longer post updates of Java SE 7 to its public download sites. On Thu, Apr 30, 2015 at 1:34 PM, Punyashloka Biswal wrote: > I'm in favor of ending support for Java 6. We

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Ted Yu
But it is hard to know how long customers stay with their most recent download. Cheers On Thu, Apr 30, 2015 at 2:26 PM, Sree V wrote: > If there is any possibility of getting the download counts,then we can use > it as EOS criteria as well.Say, if download counts are lower than 30% (or > anothe

Re: Mima test failure in the master branch?

2015-04-30 Thread Ted Yu
Looks like this has been taken care of: commit beeafcfd6ee1e460c4d564cd1515d8781989b422 Author: Patrick Wendell Date: Thu Apr 30 20:33:36 2015 -0700 Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support" On Thu, Apr 30, 2015 at 7:58 PM, zhazhan wrote: > [info] spark-sql: found 1 poten

Re: Speeding up Spark build during development

2015-05-01 Thread Ted Yu
Pramod: Please remember to run Zinc so that the build is faster. Cheers On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander wrote: > Hi Pramod, > > For cluster-like tests you might want to use the same code as in mllib's > LocalClusterSparkContext. You can rebuild only the package that you change

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Ted Yu
+1 On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan wrote: > We could build on minimum jdk we support for testing pr's - which will > automatically cause build failures in case code uses newer api ? > > Regards, > Mridul > > On Fri, May 1, 2015 at 2:46 PM, Reynold Xin wrote: > > It's really

Re: jackson.databind exception in RDDOperationScope.jsonMapper.writeValueAsString(this)

2015-05-06 Thread Ted Yu
Looks like mismatch of jackson version. Spark uses: 2.4.4 FYI On Wed, May 6, 2015 at 8:00 AM, A.M.Chan wrote: > Hey, guys. I meet this exception while testing SQL/Columns. > I didn't change the pom or the core project. > In the morning, it's fine to test my PR. > I don't know what happed. >

Re: unable to extract tgz files downloaded from spark

2015-05-06 Thread Ted Yu
>From which site did you download the tar ball ? Which package type did you choose (pre-built for which distro) ? Thanks On Wed, May 6, 2015 at 7:16 PM, Praveen Kumar Muthuswamy < muthusamy...@gmail.com> wrote: > Hi > I have been trying to install latest spark verison and downloaded the .tgz >

Re: Recent Spark test failures

2015-05-08 Thread Ted Yu
Andrew: Do you think the -M and -A options described here can be used in test runs ? http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or wrote: > Dear all, > > I'm sure you have all noticed that the Spark tests have been fairly > unstable recently.

Re: Build fail...

2015-05-08 Thread Ted Yu
Looks like you're right: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/427/console [error] /home/jenkins/workspace/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/core/src/main/scala/org/apache/spark/MapOut

Re: Recent Spark test failures

2015-05-11 Thread Ted Yu
ally the worst if tests > fail sometimes but not others, because we can't reproduce them > deterministically. Using -M and -A actually tolerates flaky tests to a > certain extent, and I would prefer to instead increase the determinism in > these tests. > > -Andrew > >

Re: [PySpark DataFrame] When a Row is not a Row

2015-05-11 Thread Ted Yu
In Row#equals(): while (i < len) { if (apply(i) != that.apply(i)) { '!=' should be !apply(i).equals(that.apply(i)) ? Cheers On Mon, May 11, 2015 at 1:49 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > This is really strange. > > >>> # Spark 1.3.1 > >>> print type(resu

Re: How to link code pull request with JIRA ID?

2015-05-13 Thread Ted Yu
Subproject tag should follow SPARK JIRA number. e.g. [SPARK-5277][SQL] ... Cheers On Wed, May 13, 2015 at 11:50 AM, Stephen Boesch wrote: > following up from Nicholas, it is > > [SPARK-12345] Your PR description > > where 12345 is the jira number. > > > One thing I tend to forget is when/where

Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
uilds lately: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu wrote: > Makes sense. > > Having high determinism in these tests would make Jenkins build stable.

Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
, 2015 at 9:23 AM, Ted Yu wrote: > Jenkins build against hadoop 2.4 has been unstable recently: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ > > I haven't found the test which hung / failed in r

Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
gt; = >> Running Spark unit tests >> = >> [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3 >> -Dhadoop.version=2.3.0 -Pkinesis-asl test >> >> Is anyone testing individual pull reque

  1   2   3   4   >