Re: Data frame created from hive table and its partition

2015-08-20 Thread VIJAYAKUMAR JAWAHARLAL
Thanks Michael. My bad regarding hive table primary keys. I have one big 140GB hdfs file and external hive table defined on it. Table is not partitioned. When I read external hive table using sqlContext.sql, how does spark decides number of partitions which should be created for that data frame?

Re: Data frame created from hive table and its partition

2015-08-20 Thread Michael Armbrust
There is no such thing as primary keys in the Hive metastore, but Spark SQL does support partitioned hive tables: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables DataFrameWriter also has a partitionBy method. On Thu, Aug 20, 2015 at 7:29 AM,

Data frame created from hive table and its partition

2015-08-20 Thread VIJAYAKUMAR JAWAHARLAL
Hi I have a question regarding data frame partition. I read a hive table from spark and following spark api converts it as DF. test_df = sqlContext.sql(“select * from hivetable1”) How does spark decide partition of test_df? Is there a way to partition test_df based on some column while readin