Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
cc'ing dev list Ok, looks like when the KCL version was updated in https://github.com/apache/spark/pull/8957, the AWS SDK version was not, probably leading to dependency conflict, though as Burak mentions its hard to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally and on my

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
Is that PR against master branch? S3 read comes from Hadoop / jet3t afaik — Sent from Mailbox On Fri, Dec 11, 2015 at 5:38 PM, Brian London wrote: > That's good news I've got a PR in to up the SDK version to 1.10.40 and the > KCL to 1.6.1 which I'm running tests

Re: JIRA: Wrong dates from imported JIRAs

2015-12-11 Thread Lars Francke
That's a good point. I assume there's always a small risk but it's at least the documented way from Atlassian to change the creation date so I'd hope it should be okay. I'd build the minimal CSV file. I agree that probably not a lot of people are going to search across projects but on the other

Re: Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
I noticed that it is configurable in job level spark.task.cpus. Anyway to support on task level? Thanks. Zhan Zhang On Dec 11, 2015, at 10:46 AM, Zhan Zhang wrote: > Hi Folks, > > Is it possible to assign multiple core per task and how? Suppose we have some >

Re: JIRA: Wrong dates from imported JIRAs

2015-12-11 Thread Reynold Xin
Thanks for looking at this. Is it worth fixing? Is there a risk (although small) that the re-import would break other things? Most of those are done and I don't know how often people search JIRAs by date across projects. On Fri, Dec 11, 2015 at 3:40 PM, Lars Francke

A very Minor typo in the Spark paper

2015-12-11 Thread Fengdong Yu
Hi, I found a very minor typo in: http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf Page 4: We complement the data mining example in Section 2.2.1 with two iterative applications: logistic regression and PageRank. I read back to section 2.2.1, there is no these two examples.

Re: coalesce at DataFrame missing argument for shuffle.

2015-12-11 Thread Reynold Xin
I am not sure if we need it. The RDD API has way too many methods and parameters. As you said, it is simply "repartition". On Fri, Dec 11, 2015 at 2:56 PM, Hyukjin Kwon wrote: > Hi all, > > I accidentally met coalesce() function and found this taking arguments > different

Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
Hi Folks, Is it possible to assign multiple core per task and how? Suppose we have some scenario, in which some tasks are really heavy processing each record and require multi-threading, and we want to avoid similar tasks assigned to the same executors/hosts. If it is not supported, does it

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail: