please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi, I am using spark 1.4 when an issue occurs to me. I am trying to use the aggregate function: JavaRddString rdd = some rdd; HashMapLong, TypeA zeroValue = new HashMap(); // add initial key-value pair for zeroValue rdd.aggregate(zeroValue, new

??????please help with ClassNotFoundException

2015-08-13 Thread Sea
Are you using 1.4.0? If yes, use 1.4.1 -- -- ??: ??;qhz...@apache.org; : 2015??8??13??(??) 6:04 ??: devdev@spark.apache.org; : please help with ClassNotFoundException Hi,I am using spark 1.4 when an issue occurs

What does NativeMethodAccessorImpl.java do?

2015-08-13 Thread freedafeng
I am running a spark job with only two operations: mapPartition and then collect(). The output data size of mapPartition is very small. One integer per partition. I saw there is a stage 2 for this job that runs this java program. I am not a java programmer. Could anyone please let me know what

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread Reynold Xin
That was intentional - what's your use case that require configs not starting with spark? On Thu, Aug 13, 2015 at 8:16 AM, rfarrjr rfar...@gmail.com wrote: Ran into an issue setting a property on the SparkConf that wasn't made available on the worker. After some digging[1] I noticed that

Re?? please help with ClassNotFoundException

2015-08-13 Thread Sea
Yes, I guess so. I see this bug before. -- -- ??: ??;z.qian...@gmail.com; : 2015??8??13??(??) 9:30 ??: Sea261810...@qq.com; dev@spark.apache.orgdev@spark.apache.org; : Re: please help with ClassNotFoundException Hi

possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
Ran into an issue setting a property on the SparkConf that wasn't made available on the worker. After some digging[1] I noticed that only properties that start with spark. are sent by the schedular. I'm not sure if this was intended behavior or not. Using Spark Streaming 1.4.1 running on Java

Graphx - how to add vertices to a HashSet of vertices ?

2015-08-13 Thread Ranjana Rajendran
Hi, sampledVertices is a HashSet of vertices var sampledVertices: HashSet[VertexId] = HashSet() In each iteration, I am making a list of neighborVertexIds val neighborVertexIds = burnEdges.map((e:Edge[Int]) = e.dstId) I want to add this neighborVertexIds to the sampledVertices

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Imran Rashid
oh I see, you are defining your own RDD Partition types, and you had a bug where partition.index did not line up with the partitions slot in rdd.getPartitions. Is that correct? On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com wrote: I figured that out, And these are my

subscribe

2015-08-13 Thread Naga Vij
subscribe

Fwd: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Naga Vij
Hello, Any idea on why this is happening? Thanks Naga -- Forwarded message -- From: Naga Vij nvbuc...@gmail.com Date: Wed, Aug 12, 2015 at 5:47 PM Subject: - Spark 1.4.1 - run-example SparkPi - Failure ... To: u...@spark.apache.org Hi, I am evaluating Spark 1.4.1 Any idea on

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
Hi Naga, This happened here sometimes when the memory of the spark cluster wasn't enough, and Java GC enters into an infinite loop trying to free some memory. To fix this I just added more memory to the Workers of my cluster, or you can increase the number of partitions of your RDD, using the

Re: Switch from Sort based to Hash based shuffle

2015-08-13 Thread Akhil Das
Have a look at spark.shuffle.manager, You can switch between sort and hash with this configuration. spark.shuffle.managersortImplementation to use for shuffling data. There are two implementations available:sort and hash. Sort-based shuffle is more memory-efficient and is the default option

Re: please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi sea Is it the same issue as https://issues.apache.org/jira/browse/SPARK-8368 Sea 261810...@qq.com于2015年8月13日周四 下午6:52写道: Are you using 1.4.0? If yes, use 1.4.1 -- 原始邮件 -- *发件人:* 周千昊;qhz...@apache.org; *发送时间:* 2015年8月13日(星期四) 晚上6:04 *收件人:*

Fwd: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Naga Vij
Has anyone run into this? -- Forwarded message -- From: Naga Vij nvbuc...@gmail.com Date: Wed, Aug 12, 2015 at 5:47 PM Subject: - Spark 1.4.1 - run-example SparkPi - Failure ... To: u...@spark.apache.org Hi, I am evaluating Spark 1.4.1 Any idea on why run-example SparkPi

Re: subscribe

2015-08-13 Thread Ted Yu
See first section on https://spark.apache.org/community On Thu, Aug 13, 2015 at 9:44 AM, Naga Vij nvbuc...@gmail.com wrote: subscribe

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
That works. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/possible-bug-user-SparkConf-properties-not-copied-to-worker-process-tp13665p13689.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Developer API plugins for Hive Hadoop ?

2015-08-13 Thread Thomas Dudziak
Unfortunately it doesn't because our version of Hive has different syntax elements and thus I need to patch them in (and a few other minor things). It would be great if there would be a developer api on a somewhat higher level. On Thu, Aug 13, 2015 at 2:19 PM, Reynold Xin r...@databricks.com

Re: Developer API plugins for Hive Hadoop ?

2015-08-13 Thread Reynold Xin
I believe for Hive, there is already a client interface that can be used to build clients for different Hive metastores. That should also work for your heavily forked one. For Hadoop, it is definitely a bigger project to refactor. A good way to start evaluating this is to list what needs to be

Developer API plugins for Hive Hadoop ?

2015-08-13 Thread Thomas Dudziak
Hi, I have asked this before but didn't receive any comments, but with the impending release of 1.5 I wanted to bring this up again. Right now, Spark is very tightly coupled with OSS Hive Hadoop which causes me a lot of work every time there is a new version because I don't run OSS Hive/Hadoop

Re: What does NativeMethodAccessorImpl.java do?

2015-08-13 Thread freedafeng
Thanks Marcelo! The reason I was asking that question is that I was expecting my spark job to be a map only job. In other words, it should finish after the mapPartitions run for all partitions. This is because the job is only mapPartitions() plus count() where mapPartitions only yield one

Re: 答复: 答复: 答复: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-13 Thread Ted Malaska
Cool seems like the design are very close. Here is my latest blog on my work with HBase and Spark. Let me know if you have any questions. There should be two more blogs next month talking about bulk load through spark 14150 which is committed, and SparkSQL 14181 which should be done next week.

Fwd: [ANNOUNCE] Spark 1.5.0-preview package

2015-08-13 Thread Reynold Xin
Retry sending this again ... -- Forwarded message -- From: Reynold Xin r...@databricks.com Date: Thu, Aug 13, 2015 at 12:15 AM Subject: [ANNOUNCE] Spark 1.5.0-preview package To: dev@spark.apache.org dev@spark.apache.org In order to facilitate community testing of the 1.5.0

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread Reynold Xin
Is this through Java properties? For java properties, you can pass them using spark.executor.extraJavaOptions. On Thu, Aug 13, 2015 at 2:11 PM, rfarrjr rfar...@gmail.com wrote: Thanks for the response. In this particular case we passed a url that would be leveraged when configuring some

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
Hi Naga, If you are trying to use classes from this jar, you will need to call the addJar method from the sparkcontext, which will put this jar in the all workers context. Even when you execute it in standalone. 2015-08-13 16:02 GMT-03:00 Naga Vij nvbuc...@gmail.com: Hi Dirceu, Thanks for

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
Thanks for the response. In this particular case we passed a url that would be leveraged when configuring some serialization support for Kryo. We are using a schema registry and leveraging it to efficiently serialize avro objects without the need to register specific records or schemas up

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
Thanks Josh for the initiative. I think reducing the redundancy in QA bot posts would make discussion on GitHub UI more focused. Cheers On Thu, Aug 13, 2015 at 7:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 On Wed,

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
I tried accessing just now. It took several seconds before the page showed up. FYI On Thu, Aug 13, 2015 at 7:56 PM, Cheng, Hao hao.ch...@intel.com wrote: I found the https://spark-prs.appspot.com/ is super slow while open it in a new window recently, not sure just myself or everybody

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao
I found the https://spark-prs.appspot.com/ is super slow while open it in a new window recently, not sure just myself or everybody experience the same, is there anyways to speed up? From: Josh Rosen [mailto:rosenvi...@gmail.com] Sent: Friday, August 14, 2015 10:21 AM To: dev Subject: Re:

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao
OK, thanks, probably just myself… From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, August 14, 2015 11:04 AM To: Cheng, Hao Cc: Josh Rosen; dev Subject: Re: Automatically deleting pull request comments left by AmplabJenkins I tried accessing just now. It took several seconds before the

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Josh Rosen
Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen rosenvi...@gmail.com wrote: *TL;DR*: would anyone object if I wrote a script to auto-delete pull request comments from AmplabJenkins? Currently there are two bots which post

Re?? please help with ClassNotFoundException

2015-08-13 Thread Sea
I have no idea... We use scala. You upgrade to 1.4 so quickly..., are you using spark in production? Spark 1.3 is better than spark1.4. -- -- ??: ??;z.qian...@gmail.com; : 2015??8??14??(??) 11:14 ??: Sea261810...@qq.com;

Re: please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi Sea I have updated spark to 1.4.1, however the problem still exists, any idea? Sea 261810...@qq.com于2015年8月14日周五 上午12:36写道: Yes, I guess so. I see this bug before. -- 原始邮件 -- *发件人:* 周千昊;z.qian...@gmail.com; *发送时间:* 2015年8月13日(星期四) 晚上9:30 *收件人:*

Re: Switch from Sort based to Hash based shuffle

2015-08-13 Thread Ranjana Rajendran
Hi Cheez, You can set the parameter spark.shuffle.manager when you submit the Spark job. --conf spark.shuffle.manager=hash Thank you, Ranjana On Thu, Aug 13, 2015 at 2:26 AM, cheez 11besemja...@seecs.edu.pk wrote: I understand that the current master branch of Spark uses Sort based shuffle.

Fwd: [ANNOUNCE] Spark 1.5.0-preview package

2015-08-13 Thread Reynold Xin
(I tried to send this last night but somehow ASF mailing list rejected my mail) In order to facilitate community testing of the 1.5.0 release, I've built a preview package. This is not a release candidate, so there is no voting involved. However, it'd be great if community members can start