Ok, I see, thanks for the correction, but this should be optimized.
From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 2:08 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
That's two jobs. `SparkPlan.execut
gt; 2 jobs, not 2 tasks.
>
>
>
> *From:* Shixiong Zhu [mailto:zsxw...@gmail.com]
> *Sent:* Tuesday, August 25, 2015 1:29 PM
> *To:* Cheng, Hao
> *Cc:* Jeff Zhang; user@spark.apache.org
>
> *Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
>
>
>
> Hao,
>
>
ay, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overa
8:11 AM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
>
>
>
> Hi Cheng,
>
>
>
> I know that sqlContext.read will trigger one spark job to infer the
> schema. What I mean is DataFrame#show cost 2 spark
loading the data for JSON, it’s probably causes longer time for ramp up with
large number of files/partitions.
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I
Because defaultMinPartitions is 2 (See
https://github.com/apache/spark/blob/642c43c81c835139e3f35dfd6a215d668a474203/core/src/main/scala/org/apache/spark/SparkContext.scala#L2057
), your input "people.json" will be split to 2 partitions.
At first, `take` will start a job for the first partition. H
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3
jobs.
Here's the command I use:
>> val df =
sqlContext.read.json("file:///Users/hadoop/github/spark/examples/src/main/resources/people
The first job is to infer the json schema, and the second one is what you mean
of the query.
You can provide the schema while loading the json file, like below:
sqlContext.read.schema(xxx).json(“…”)?
Hao
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Monday, August 24, 2015 6:20 PM
To: user@sp