1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-14 Thread Priya Ch
Hi All, We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set up hdfs which has 2 TB capacity and the block size is 256 mb When we try to process 1 gb file on spark, we see the following exception

Re: 1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-14 Thread Akhil Das
It shows nullPointerException, your data could be corrupted? Try putting a try catch inside the operation that you are doing, Are you running the worker process on the master node also? If not, then only 1 node will be doing the processing. If yes, then try setting the level of parallelism and

Re: Skipping Bad Records in Spark

2014-11-14 Thread Ganelin, Ilya
Hi Quizhuang - you have two options: 1) Within the map step define a validation function that will be executed on every record. 2) Use the filter function to create a filtered dataset prior to processing. On 11/14/14, 10:28 AM, Qiuzhuang Lian qiuzhuang.l...@gmail.com wrote: Hi, MapReduce has

Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x, would it make sense for us to update the poms?

Re: Spark Hadoop 2.5.1

2014-11-14 Thread Sean Owen
I don't think it's necessary. You're looking at the hadoop-2.4 profile, which works with anything = 2.4. AFAIK there is no further specialization needed beyond that. The profile sets hadoop.version to 2.4.0 by default, but this can be overridden. On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet

Re: Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like you've mentioned. What prompted me to write this email was that I did not see any documentation that told me Hadoop 2.5.1 was officially supported by Spark (i.e. community has been using it, any bugs are being fixed,

Re: Spark Hadoop 2.5.1

2014-11-14 Thread sandy . ryza
You're the second person to request this today. Planning to include this in my PR for Spark-4338. -Sandy On Nov 14, 2014, at 8:48 AM, Corey Nolet cjno...@gmail.com wrote: In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like you've mentioned. What prompted me to write

Re: Spark Hadoop 2.5.1

2014-11-14 Thread Sean Owen
Yeah I think someone even just suggested that today in a separate thread? couldn't hurt to just add an example. On Fri, Nov 14, 2014 at 4:48 PM, Corey Nolet cjno...@gmail.com wrote: In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like you've mentioned. What prompted me to

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-14 Thread Andrew Or
Hi all, since the vote ends on a Sunday, please let me know if you would like to extend the deadline to allow more time for testing. 2014-11-13 12:10 GMT-08:00 Sean Owen so...@cloudera.com: Ah right. This is because I'm running Java 8. This was fixed in SPARK-3329 (

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-14 Thread Matei Zaharia
+1 Tested on Mac OS X, and verified that sort-based shuffle bug is fixed. Matei On Nov 14, 2014, at 10:45 AM, Andrew Or and...@databricks.com wrote: Hi all, since the vote ends on a Sunday, please let me know if you would like to extend the deadline to allow more time for testing.

Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote:

Re: Has anyone else observed this build break?

2014-11-14 Thread Hari Shreedharan
Seems like a comment on that page mentions a fix, which would add yet another profile though — specifically telling mvn that if it is an apple jdk, use the classes.jar as the tools.jar as well, since Apple-packaged JDK 6 bundled them together. Link:

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
I think in this case we can probably just drop that dependency, so there is a simpler fix. But mostly I'm curious whether anyone else has observed this. On Fri, Nov 14, 2014 at 12:24 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Seems like a comment on that page mentions a fix, which

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-14 Thread Zach Fry
+0 I expect to start testing on Monday but won't have enough results to change my vote from +0 until Monday night or Tuesday morning. Thanks, Zach -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-1-RC1-tp9311p9370.html

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-14 Thread Cheng Lian
+1 Tested HiveThriftServer2 against Hive 0.12.0 on Mac OS X. Known issues are fixed. Hive version inspection works as expected. On 11/15/14 8:25 AM, Zach Fry wrote: +0 I expect to start testing on Monday but won't have enough results to change my vote from +0 until Monday night or Tuesday