Hi,

How can I control the number of parquet files getting created under a
partition? I have my sqlContext queries to create a table and insert the
records as follows. It seems to create around 250 parquet files under each
partition though I was expecting that to create around 2 or 3 files. Due to
the large number of files, it takes a lot of time to scan the records. Any
suggestions as to how to control the number of parquet files under each
partition would be of great help.

     sqlContext.sql("  CREATE EXTERNAL TABLE IF NOT EXISTS testUserDts
(userId STRING, savedDate STRING) PARTITIONED BY (partitioner STRING) 
stored as PARQUET LOCATION '/user/testId/testUserDts' ")

      sqlContext.sql(
        """from testUserDtsTemp ps   insert overwrite table testUserDts 
partition(partitioner)  select ps.userId, ps.savedDate ,  ps.partitioner
""".stripMargin)



Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-control-the-number-of-parquet-files-getting-created-under-a-partition-tp26374.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to