Re: HIVE SparkSQL

2015-03-18 Thread 宫勐
Hi: I need to count some Game Player Events in the game. Such as : How Many Players stay in the game scene 1--Save the Princess from a Dragon Moneys they have paid in the last 5 min How many players pay money for go through this scene much more esi

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
Thanks, Shao On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai wrote: > Yeah, as I said your job processing time is much larger than the sliding > window, and streaming job is executed one by one in sequence, so the next > job will wait until the first job is finished, so the total latency will be

Re: GraphX: Get edges for a vertex

2015-03-18 Thread Jeffrey Jedele
Hi Mas, I never actually worked with GraphX, but one idea: As far as I know, you can directly access the vertex and edge RDDs of your Graph object. Why not simply run a .filter() on the edge RDD to get all edges that originate from or end at your vertex? Regards, Jeff 2015-03-18 10:52 GMT+01:00

Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
Hi everybody, When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both 1.2.0 and 1.2.1) I encounter a weird error never occurred before about which I'd kindly ask for any possible help. In particular, all my Spark SQL queries fail with the following exception: java.lang.RuntimeExcepti

Re: Spark + Kafka

2015-03-18 Thread James King
Thanks Jeff, I'm planning to use it in standalone mode, OK will use hadoop 2.4 package. Chao! On Wed, Mar 18, 2015 at 10:56 AM, Jeffrey Jedele wrote: > What you call "sub-category" are packages pre-built to run on certain > Hadoop environments. It really depends on where you want to run Spark.

Re: Spark + Kafka

2015-03-18 Thread Jeffrey Jedele
What you call "sub-category" are packages pre-built to run on certain Hadoop environments. It really depends on where you want to run Spark. As far as I know, this is mainly about the included HDFS binding - so if you just want to play around with Spark, any of the packages should be fine. I wouldn

Re: GraphX: Get edges for a vertex

2015-03-18 Thread mas
Hi, Just to continue with the question. I need to find the edges of one particular vertex. However, (collectNeighbors/collectNeighborIds) provides the neighbors/neighborids for all the vertices of the graph. Any help in this regard will be highly appreciated. Thanks, -- View this message in co

Re: Spark + Kafka

2015-03-18 Thread Jeffrey Jedele
Probably 1.3.0 - it has some improvements in the included Kafka receiver for streaming. https://spark.apache.org/releases/spark-release-1-3-0.html Regards, Jeff 2015-03-18 10:38 GMT+01:00 James King : > Hi All, > > Which build of Spark is best when using Kafka? > > Regards > jk >

Spark + Kafka

2015-03-18 Thread James King
Hi All, Which build of Spark is best when using Kafka? Regards jk

Re: Idempotent count

2015-03-18 Thread Arush Kharbanda
Hi Binh, It stores the state as well the unprocessed data. It is a subset of the records that you aggregated so far. This provides a good reference for checkpointing. http://spark.apache.org/docs/1.2.1/streaming-programming-guide.html#checkpointing On Wed, Mar 18, 2015 at 12:52 PM, Binh Nguye

RE: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Shao, Saisai
Yeah, as I said your job processing time is much larger than the sliding window, and streaming job is executed one by one in sequence, so the next job will wait until the first job is finished, so the total latency will be accumulated. I think you need to identify the bottleneck of your job at

Re: Idempotent count

2015-03-18 Thread Binh Nguyen Van
Hi Arush, Thank you for answering! When you say checkpoints hold metadata and Data, what is the Data? Is it the Data that is pulled from input source or is it the state? If it is state then is it the same number of records that I aggregated since beginning or only a subset of it? How can I limit t

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
Hi, Saisai Here is the duration of one of the jobs, 22 seconds in total, it is longer than the sliding window. Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write 342foreach at SimpleApp.scala:58 2015/03/18 15:0

<    1   2