Use cases for kafka direct stream messageHandler

2016-03-04 Thread Cody Koeninger
Wanted to survey what people are using the direct stream messageHandler for, besides just extracting key / value / offset. Would your use case still work if that argument was removed, and the stream just contained ConsumerRecord objects

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Sean Owen
No. Those are all in Java examples, and while we should show stopping the context, it has no big impact. It's worth touching up. I'm concerned about the ones with a potential correctness implication. They are easy to fix and already identified; why wouldn't we fix them? we take PRs to fix typos

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
Is there JIRA for fixing the resource leaks w.r.t. unclosed SparkContext ? I wonder if such defects are really high priority. Cheers On Fri, Mar 4, 2016 at 7:06 AM, Sean Owen wrote: > Hi Ted, I've already marked them. You should be able to see the ones > marked "Fix

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Sean Owen
Hi Ted, I've already marked them. You should be able to see the ones marked "Fix Required" if you click through to the defects. Most are just bad form and probably have no impact. The few that looked reasonably important were: - using platform char encoding, not UTF-8 - Incorrect notify/wait -

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
Last time I checked there wasn't high impact defects. Mind pointing out the defects you think should be fixed ? Thanks On Fri, Mar 4, 2016 at 4:35 AM, Sean Owen wrote: > Yeah, it's not going to help with Scala, but it can at least find > stuff in the Java code. I'm not

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Sean Owen
Yeah, it's not going to help with Scala, but it can at least find stuff in the Java code. I'm not suggesting anyone run it regularly, but one run to catch some bugs is useful. I've already triaged ~70 issues there just in the Java code, of which a handful are important. On Fri, Mar 4, 2016 at

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
Since majority of code is written in Scala which is not analyzed by Coverity, the efficacy of the tool seems limited. > On Mar 4, 2016, at 2:34 AM, Sean Owen wrote: > > https://scan.coverity.com/projects/apache-spark-2f9d080d-401d-47bc-9dd1-7956c411fbb4?tab=overview > >

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
Have added this to SO, can you guys share any thoughts ? http://stackoverflow.com/questions/35795518/spark-1-6-spills-to-disk-even-when-there-is-enough-memory

Set up a Coverity scan for Spark

2016-03-04 Thread Sean Owen
https://scan.coverity.com/projects/apache-spark-2f9d080d-401d-47bc-9dd1-7956c411fbb4?tab=overview This has to be run manually, and is Java-only, but the inspection results are pretty good. Anyone should be able to browse them, and let me know if anyone would like more access. Most are

Fwd: spark master ui to proxy app and worker ui

2016-03-04 Thread Gurvinder Singh
Forwarding to development mailing list, as it might be more relevant here to ask for it. I am wondering if I miss something in the documentation that it might be possible already. If yes then please point me to the documentation as how to achieve it. If no, then would it make sense to implement it