subject:"\"RE\\\: DataFrame#show cost 2 Spark Jobs \\\?\""

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-25 Thread Cheng, Hao

Ok, I see, thanks for the correction, but this should be optimized. From: Shixiong Zhu [mailto:zsxw...@gmail.com] Sent: Tuesday, August 25, 2015 2:08 PM To: Cheng, Hao Cc: Jeff Zhang; user@spark.apache.org Subject: Re: DataFrame#show cost 2 Spark Jobs ? That's two jobs. `SparkPlan.execut

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Shixiong Zhu

gt; 2 jobs, not 2 tasks. > > > > *From:* Shixiong Zhu [mailto:zsxw...@gmail.com] > *Sent:* Tuesday, August 25, 2015 1:29 PM > *To:* Cheng, Hao > *Cc:* Jeff Zhang; user@spark.apache.org > > *Subject:* Re: DataFrame#show cost 2 Spark Jobs ? > > > > Hao, > >

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Cheng, Hao

ay, August 25, 2015 8:11 AM To: Cheng, Hao Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: DataFrame#show cost 2 Spark Jobs ? Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overa

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Shixiong Zhu

8:11 AM > *To:* Cheng, Hao > *Cc:* user@spark.apache.org > *Subject:* Re: DataFrame#show cost 2 Spark Jobs ? > > > > Hi Cheng, > > > > I know that sqlContext.read will trigger one spark job to infer the > schema. What I mean is DataFrame#show cost 2 spark

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Cheng, Hao

loading the data for JSON, it’s probably causes longer time for ramp up with large number of files/partitions. From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Tuesday, August 25, 2015 8:11 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: DataFrame#show cost 2 Spark Jobs ? Hi Cheng, I

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Shixiong Zhu

Because defaultMinPartitions is 2 (See https://github.com/apache/spark/blob/642c43c81c835139e3f35dfd6a215d668a474203/core/src/main/scala/org/apache/spark/SparkContext.scala#L2057 ), your input "people.json" will be split to 2 partitions. At first, `take` will start a job for the first partition. H

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Jeff Zhang

Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3 jobs. Here's the command I use: >> val df = sqlContext.read.json("file:///Users/hadoop/github/spark/examples/src/main/resources/people

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Cheng, Hao

The first job is to infer the json schema, and the second one is what you mean of the query. You can provide the schema while loading the json file, like below: sqlContext.read.schema(xxx).json(“…”)? Hao From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Monday, August 24, 2015 6:20 PM To: user@sp

RE: DataFrame#show cost 2 Spark Jobs ?

Re: DataFrame#show cost 2 Spark Jobs ?

RE: DataFrame#show cost 2 Spark Jobs ?

Re: DataFrame#show cost 2 Spark Jobs ?

RE: DataFrame#show cost 2 Spark Jobs ?

Re: DataFrame#show cost 2 Spark Jobs ?

Re: DataFrame#show cost 2 Spark Jobs ?

RE: DataFrame#show cost 2 Spark Jobs ?

8 matches

Site Navigation

Mail list logo

Footer information