Re: Contribute to Spark - Need a mentor.

2014-06-18 Thread Reynold Xin
Hi Michael, Unfortunately the Apache mailing list filters out attachments. That said, you can usually just start by looking at the JIRA for Spark and find issues tagged with the starter tag and work on them. You can submit pull requests to the github repo or email the dev list for feedbacks on

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Surendranauth Hiraman
Patrick, My team is using shuffle consolidation but not speculation. We are also using persist(DISK_ONLY) for caching. Here are some config changes that are in our work-in-progress. We've been trying for 2 weeks to get our production flow (maybe around 50-70 stages, a few forks and joins with

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Mridul Muralidharan
On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman suren.hira...@velos.io wrote: Patrick, My team is using shuffle consolidation but not speculation. We are also using persist(DISK_ONLY) for caching. Use of shuffle consolidation is probably what is causing the issue. Would be good idea

question about Hive compatiblilty tests

2014-06-18 Thread Will Benton
Hi all, Does a Failed to generate golden answer for query message from HiveComparisonTests indicate that it isn't possible to run the query in question under Hive from Spark's test suite rather than anything about Spark's implementation of HiveQL? The stack trace I'm getting implicates Hive

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Patrick Wendell
Just wondering, do you get this particular exception if you are not consolidating shuffle data? On Wed, Jun 18, 2014 at 12:15 PM, Mridul Muralidharan mri...@gmail.com wrote: On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman suren.hira...@velos.io wrote: Patrick, My team is using shuffle

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Surendranauth Hiraman
Good question. At this point, I'd have to re-run it to know for sure. We've been trying various different things, so I'd have to reset the flow config back to that state. I can say that by removing persist(DISK_ONLY), the flows are running more stably, probably due to removing disk contention. We

Re: Run ScalaTest inside Intellij IDEA

2014-06-18 Thread Doris Xin
Here's the JIRA on this known issue: https://issues.apache.org/jira/browse/SPARK-1835 tl;dr: manually delete mesos-0.18.1.jar from lib_managed/jars after running sbt/sbt gen-idea. You should be able to run units inside Intellij after doing so. Doris On Tue, Jun 17, 2014 at 6:10 PM, Henry

Re: question about Hive compatiblilty tests

2014-06-18 Thread Michael Armbrust
I assume you are adding tests? because that is the only time you should see that message. That error could mean a couple of things: 1) The query is invalid and hive threw an exception 2) Your Hive setup is bad. Regarding #2, you need to have the source for Hive 0.12.0 available and built as

Re: question about Hive compatiblilty tests

2014-06-18 Thread Will Benton
I assume you are adding tests? because that is the only time you should see that message. Yes, I had added the HAVING test to the whitelist. That error could mean a couple of things: 1) The query is invalid and hive threw an exception 2) Your Hive setup is bad. Regarding #2, you need