Re: Does JavaSchemaRDD inherit the Hive partitioning of data?
Any suggestions guys?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Does JavaSchemaRDD inherit the Hive partitioning of data?
DISTRIBUTE BY only promises that data will be collocated, but does not create a partition for each value. You are probably looking for Dynamic Partitions https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions, which was recently merged into HiveContext. On Tue, Oct 28, 2014 at 11:49 AM, nitinkak001 nitinkak...@gmail.com wrote: Any suggestions guys?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Does JavaSchemaRDD inherit the Hive partitioning of data?
So, this means that I can create table and insert data in it with Dynbamic partitioning and those partitions would be inherited by RDDs. Is it in Spark 1.1.0? If not, is there a way to partition the data in a file based on some attributes of the rows in the data data(without hardcoding the number of partitions). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Does JavaSchemaRDD inherit the Hive partitioning of data?
This feature is not in 1.1 and is not going to promise one file per unique value of the data. The only way to do that would be to write your own partitioner http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where . On Tue, Oct 28, 2014 at 1:57 PM, nitinkak001 nitinkak...@gmail.com wrote: So, this means that I can create table and insert data in it with Dynbamic partitioning and those partitions would be inherited by RDDs. Is it in Spark 1.1.0? If not, is there a way to partition the data in a file based on some attributes of the rows in the data data(without hardcoding the number of partitions). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org