Re: DataFrame creation delay?

Michael Armbrust Fri, 04 Sep 2015 12:55:51 -0700

What format is this table.  For parquet and other optimized formats we
cache a bunch of file metadata on first access to make interactive queries
faster.


On Thu, Sep 3, 2015 at 8:17 PM, Isabelle Phan <nlip...@gmail.com> wrote:

> Hello,
>
> I am using SparkSQL to query some Hive tables. Most of the time, when I
> create a DataFrame using sqlContext.sql("select * from table") command,
> DataFrame creation is less than 0.5 second.
> But I have this one table with which it takes almost 12 seconds!
>
> scala>  val start = scala.compat.Platform.currentTime; val logs =
> sqlContext.sql("select * from temp.log"); val execution =
> scala.compat.Platform.currentTime - start
> 15/09/04 12:07:02 INFO ParseDriver: Parsing command: select * from temp.log
> 15/09/04 12:07:02 INFO ParseDriver: Parse Completed
> start: Long = 1441336022731
> logs: org.apache.spark.sql.DataFrame = [user_id: string, option: int,
> log_time: string, tag: string, dt: string, test_id: int]
> execution: Long = *11567*
>
> This table has 3.6 B rows, and 2 partitions (on dt and test_id columns).
> I have created DataFrames on even larger tables and do not see such delay.
> So my questions are:
> - What can impact DataFrame creation time?
> - Is it related to the table partitions?
>
>
> Thanks much your help!
>
> Isabelle
>

Re: DataFrame creation delay?

Reply via email to