Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread nitinkak001
Any suggestions guys??



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread Michael Armbrust
DISTRIBUTE BY only promises that data will be collocated, but does not
create a partition for each value.  You are probably looking for Dynamic
Partitions
https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions, which
was recently merged into HiveContext.

On Tue, Oct 28, 2014 at 11:49 AM, nitinkak001 nitinkak...@gmail.com wrote:

 Any suggestions guys??



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread nitinkak001
So, this means that I can create table and insert data in it with Dynbamic
partitioning and those partitions would be inherited by RDDs. Is it in Spark
1.1.0?

If not, is there a way to partition the data in a file based on some
attributes of the rows in the data data(without hardcoding the number of
partitions).





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread Michael Armbrust
This feature is not in 1.1 and is not going to promise one file per unique
value of the data.  The only way to do that would be to write your own
partitioner
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
.

On Tue, Oct 28, 2014 at 1:57 PM, nitinkak001 nitinkak...@gmail.com wrote:

 So, this means that I can create table and insert data in it with Dynbamic
 partitioning and those partitions would be inherited by RDDs. Is it in
 Spark
 1.1.0?

 If not, is there a way to partition the data in a file based on some
 attributes of the rows in the data data(without hardcoding the number of
 partitions).





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org