...@databricks.com]
> *Sent:* Monday, August 24, 2015 2:13 PM
> *To:* Philip Weaver
> *Cc:* Jerrick Hoang ; Raghavendra Pandey <
> raghavendra.pan...@gmail.com>; User ; Cheng, Hao <
> hao.ch...@intel.com>
>
> *Subject:* Re: Spark Sql behaves strangely with tab
Michael Armbrust [mailto:mich...@databricks.com]
>> *Sent:* Monday, August 24, 2015 2:13 PM
>> *To:* Philip Weaver
>> *Cc:* Jerrick Hoang ; Raghavendra Pandey <
>> raghavendra.pan...@gmail.com>; User ; Cheng, Hao <
>> hao.ch...@intel.com>
>>
>&g
chael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Monday, August 24, 2015 2:13 PM
> *To:* Philip Weaver
> *Cc:* Jerrick Hoang ; Raghavendra Pandey <
> raghavendra.pan...@gmail.com>; User ; Cheng, Hao <
> hao.ch...@intel.com>
>
> *Subject:* Re: Spark Sql behaves strangely w
if you can paste the physical plan for the simple
query.
From: Jerrick Hoang
[mailto:jerrickho...@gmail.com<mailto:jerrickho...@gmail.com>]
Sent: Thursday, August 20, 2015 1:46 PM
To: Cheng, Hao
Cc: Philip Weaver; user
Subject: Re: Spark Sql behaves strangely with tables with a lot of par
gt; philip.wea...@gmail.com> wrote:
>>>>>>
>>>>>>> I hadn't heard of spark.sql.sources.partitionDiscovery.enabled
>>>>>>> before, and I couldn't find much information about it online. What does
>>>>>>> it
;>>> And also, it’s will be great if you can paste the physical plan for
>>>>>>> the simple query.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com]
>>>>&g
m]
>>>>>> *Sent:* Thursday, August 20, 2015 1:46 PM
>>>>>> *To:* Cheng, Hao
>>>>>> *Cc:* Philip Weaver; user
>>>>>> *Subject:* Re: Spark Sql behaves strangely with tables with a lot of
>>>>>> partitions
>>>>>>
>>&
gt;> *Subject:* Re: Spark Sql behaves strangely with tables with a lot of
>>>>> partitions
>>>>>
>>>>>
>>>>>
>>>>> I cloned from TOT after 1.5.0 cut off. I noticed there were a couple
>>>>> of CLs trying to speed up spark sql
m wondering if the driver is busy
>>>> with scanning the HDFS / S3.
>>>>
>>>> Like jstack
>>>>
>>>>
>>>>
>>>> And also, it’s will be great if you can paste the physical plan for the
>>>> simple query
>
>>>
>>>
>>> And also, it’s will be great if you can paste the physical plan for the
>>> simple query.
>>>
>>>
>>>
>>> *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com]
>>> *Sent:* Thursday, August 20, 2015 1:46 PM
&
ysical plan for the
>> simple query.
>>
>>
>>
>> *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com]
>> *Sent:* Thursday, August 20, 2015 1:46 PM
>> *To:* Cheng, Hao
>> *Cc:* Philip Weaver; user
>> *Subject:* Re: Spark Sql behaves strangely with
.com]
> *Sent:* Thursday, August 20, 2015 1:46 PM
> *To:* Cheng, Hao
> *Cc:* Philip Weaver; user
> *Subject:* Re: Spark Sql behaves strangely with tables with a lot of
> partitions
>
>
>
> I cloned from TOT after 1.5.0 cut off. I noticed there were a couple of
> CLs tr
: Cheng, Hao
Cc: Philip Weaver; user
Subject: Re: Spark Sql behaves strangely with tables with a lot of partitions
I cloned from TOT after 1.5.0 cut off. I noticed there were a couple of CLs
trying to speed up spark sql with tables with a huge number of partitions, I've
made sure that thos
> *Cc:* user
> *Subject:* Re: Spark Sql behaves strangely with tables with a lot of
> partitions
>
>
>
> I guess the question is why does spark have to do partition discovery with
> all partitions when the query only needs to look at one partition? Is there
> a conf f
Yes, you can try set the spark.sql.sources.partitionDiscovery.enabled to false.
BTW, which version are you using?
Hao
From: Jerrick Hoang [mailto:jerrickho...@gmail.com]
Sent: Thursday, August 20, 2015 12:16 PM
To: Philip Weaver
Cc: user
Subject: Re: Spark Sql behaves strangely with tables with
I guess the question is why does spark have to do partition discovery with
all partitions when the query only needs to look at one partition? Is there
a conf flag to turn this off?
On Wed, Aug 19, 2015 at 9:02 PM, Philip Weaver
wrote:
> I've had the same problem. It turns out that Spark (specifi
I've had the same problem. It turns out that Spark (specifically parquet)
is very slow at partition discovery. It got better in 1.5 (not yet
released), but was still unacceptably slow. Sadly, we ended up reading
parquet files manually in Python (via C++) and had to abandon Spark SQL
because of this
Hi all,
I did a simple experiment with Spark SQL. I created a partitioned parquet
table with only one partition (date=20140701). A simple `select count(*)
from table where date=20140701` would run very fast (0.1 seconds). However,
as I added more partitions the query takes longer and longer. When
18 matches
Mail list logo