Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread nitinkak001
Any suggestions guys??



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread Michael Armbrust
DISTRIBUTE BY only promises that data will be collocated, but does not
create a partition for each value.  You are probably looking for Dynamic
Partitions
https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions, which
was recently merged into HiveContext.

On Tue, Oct 28, 2014 at 11:49 AM, nitinkak001 nitinkak...@gmail.com wrote:

 Any suggestions guys??



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17539.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread nitinkak001
So, this means that I can create table and insert data in it with Dynbamic
partitioning and those partitions would be inherited by RDDs. Is it in Spark
1.1.0?

If not, is there a way to partition the data in a file based on some
attributes of the rows in the data data(without hardcoding the number of
partitions).





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-28 Thread Michael Armbrust
This feature is not in 1.1 and is not going to promise one file per unique
value of the data.  The only way to do that would be to write your own
partitioner
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
.

On Tue, Oct 28, 2014 at 1:57 PM, nitinkak001 nitinkak...@gmail.com wrote:

 So, this means that I can create table and insert data in it with Dynbamic
 partitioning and those partitions would be inherited by RDDs. Is it in
 Spark
 1.1.0?

 If not, is there a way to partition the data in a file based on some
 attributes of the rows in the data data(without hardcoding the number of
 partitions).





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410p17558.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Does JavaSchemaRDD inherit the Hive partitioning of data?

2014-10-27 Thread nitinkak001
Would the rdd resulting from the below query be partitioned on GEO_REGION,
GEO_COUNTRY? I ran some tests(using mapPartitions on the resulting RDD) and
seems that there are always 50 partitions generated while there should be
around 1000.

/SELECT * FROM spark_poc.table_nameDISTRIBUTE BY GEO_REGION, GEO_COUNTRY
SORT BY IP_ADDRESS, COOKIE_ID/

If not, how can I partition the data based on an attribute/combination of
attributes in data.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org