creating hive packages for spark

2015-04-27 Thread Manku Timma
Hello Spark developers, I want to understand the procedure to create the org.spark-project.hive jars. Is this documented somewhere? I am having issues with -Phive-provided with my private hive13 jars and want to check if using spark's procedure helps.

Re: creating hive packages for spark

2015-04-27 Thread yash datta
Hi, you can build spark-project hive from here : https://github.com/pwendell/hive/tree/0.13.1-shaded-protobuf Hope this helps. On Mon, Apr 27, 2015 at 3:23 PM, Manku Timma manku.tim...@gmail.com wrote: Hello Spark developers, I want to understand the procedure to create the

Re: Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
Thanks Marcelo and Patrick - I don't know how I missed that ticket in my Jira search earlier. Is anybody working on the sub-issues yet, or is there a design doc I should look at before taking a stab? Regards, Punya On Mon, Apr 27, 2015 at 3:56 PM Patrick Wendell pwend...@gmail.com wrote: Hey

Re: Plans for upgrading Hive dependency?

2015-04-27 Thread Marcelo Vanzin
That's a lot more complicated than you might think. We've done some basic work to get HiveContext to compile against Hive 1.1.0. Here's the code: https://github.com/cloudera/spark/commit/00e2c7e35d4ac236bcfbcd3d2805b483060255ec We didn't sent that upstream because that only solves half of the

Re: Is there any particular reason why there's no Java counterpart in Streaming Guide's Design Patterns for using foreachRDD section?

2015-04-27 Thread Sean Owen
My guess is since it says for example (in Scala) that this started as Scala-only and then Python was tacked on as a one-off, and Java never got added. I think you'd be welcome to add it. It's not an obscure example and one people might want to see in Java. On Mon, Apr 27, 2015 at 4:34 AM, Emre

Exception in using updateStateByKey

2015-04-27 Thread Sea
Hi, all: I use function updateStateByKey in Spark Streaming, I need to store the states for one minite, I set spark.cleaner.ttl to 120, the duration is 2 seconds, but it throws Exception Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist:

java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-27 Thread wyphao.2007
Hi everyone, I am using val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet) to read data from kafka(1k/second), and store the data in windows,the code snippets as follow:val windowedStreamChannel =

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Nicholas Chammas
I like the idea of having design docs be kept up to date and tracked in git. If the Apache repo isn't a good fit, perhaps we can have a separate repo just for design docs? Maybe something like github.com/spark-docs/spark-docs/ ? If there's other stuff we want to track but haven't, perhaps we can

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
Nick, I like your idea of keeping it in a separate git repository. It seems to combine the advantages of the present Google Docs approach with the crisper history, discoverability, and text format simplicity of GitHub wikis. Punya On Mon, Apr 27, 2015 at 1:30 PM Nicholas Chammas

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Sandy Ryza
My only issue with Google Docs is that they're mutable, so it's difficult to follow a design's history through its revisions and link up JIRA comments with the relevant version. -Sandy On Mon, Apr 27, 2015 at 7:54 AM, Steve Loughran ste...@hortonworks.com wrote: One thing to consider is that

Re: github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread shane knapp
sure, i'll kill all of the current spark prb build... On Mon, Apr 27, 2015 at 11:34 AM, Reynold Xin r...@databricks.com wrote: Shane - can we purge all the outstanding builds so we are not running stuff against stale PRs? On Mon, Apr 27, 2015 at 11:30 AM, Nicholas Chammas

Re: github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread shane knapp
never mind, looks like you guys are already on it. :) On Mon, Apr 27, 2015 at 11:35 AM, shane knapp skn...@berkeley.edu wrote: sure, i'll kill all of the current spark prb build... On Mon, Apr 27, 2015 at 11:34 AM, Reynold Xin r...@databricks.com wrote: Shane - can we purge all the

Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
Dear Spark devs, Is there a plan for staying up-to-date with current (and future) versions of Hive? Spark currently supports version 0.13 (June 2014), but the latest version of Hive is 1.1.0 (March 2015). I don't see any Jira tickets about updating beyond 0.13, so I was wondering if this was

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
Github's wiki is just another Git repo. If we use a separate repo, it's probably easiest to use the wiki git repo rather than the primary git repo. Punya On Mon, Apr 27, 2015 at 1:50 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Oh, a GitHub wiki (which is separate from having docs in

Re: github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread Reynold Xin
Shane - can we purge all the outstanding builds so we are not running stuff against stale PRs? On Mon, Apr 27, 2015 at 11:30 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: And unfortunately, many Jenkins executor slots are being taken by stale Spark PRs... On Mon, Apr 27, 2015 at

github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread shane knapp
somehow, the power outage on friday caused the pull request builder to lose it's config entirely... i'm not sure why, but after i added the oauth token back, we're now catching up on the weekend's pull request builds. have i mentioned how much i hate this plugin? ;) sorry for the

Re: github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread shane knapp
anyways, the build queue is SLAMMED... we're going to need at least a day to catch up w/this. i'll be keeping an eye on system loads and whatnot all day today. whee! On Mon, Apr 27, 2015 at 11:18 AM, shane knapp skn...@berkeley.edu wrote: somehow, the power outage on friday caused the pull

Re: github pull request builder FAIL, now WIN(-ish)

2015-04-27 Thread Nicholas Chammas
And unfortunately, many Jenkins executor slots are being taken by stale Spark PRs... On Mon, Apr 27, 2015 at 2:25 PM shane knapp skn...@berkeley.edu wrote: anyways, the build queue is SLAMMED... we're going to need at least a day to catch up w/this. i'll be keeping an eye on system loads and