What format is this table. For parquet and other optimized formats we cache a bunch of file metadata on first access to make interactive queries faster.
On Thu, Sep 3, 2015 at 8:17 PM, Isabelle Phan <nlip...@gmail.com> wrote: > Hello, > > I am using SparkSQL to query some Hive tables. Most of the time, when I > create a DataFrame using sqlContext.sql("select * from table") command, > DataFrame creation is less than 0.5 second. > But I have this one table with which it takes almost 12 seconds! > > scala> val start = scala.compat.Platform.currentTime; val logs = > sqlContext.sql("select * from temp.log"); val execution = > scala.compat.Platform.currentTime - start > 15/09/04 12:07:02 INFO ParseDriver: Parsing command: select * from temp.log > 15/09/04 12:07:02 INFO ParseDriver: Parse Completed > start: Long = 1441336022731 > logs: org.apache.spark.sql.DataFrame = [user_id: string, option: int, > log_time: string, tag: string, dt: string, test_id: int] > execution: Long = *11567* > > This table has 3.6 B rows, and 2 partitions (on dt and test_id columns). > I have created DataFrames on even larger tables and do not see such delay. > So my questions are: > - What can impact DataFrame creation time? > - Is it related to the table partitions? > > > Thanks much your help! > > Isabelle >