RE: Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-19 Thread Fustes, Diego
@phoenix.apache.org Subject: Re: Phoenix-Spark: Number of partitions in PhoenixRDD Hi Diego, The phoenix-spark RDD partition count is equal to the number of splits that the query planner returns. Adjusting the HBase region splits, table salting [1], as well as the guidepost width [2] should help

Re: Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-18 Thread Josh Mahonin
Hi Diego, The phoenix-spark RDD partition count is equal to the number of splits that the query planner returns. Adjusting the HBase region splits, table salting [1], as well as the guidepost width [2] should help with the parallelization here. Using 'EXPLAIN' for the generated query in sqlline

Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-18 Thread Fustes, Diego
Hi all, I'm working with the Phoenix spark plugin to process a HUGE table. The table is salted in 100 buckets and is split in 400 regions. When I read it with phoenixTableAsRDD, I get a RDD with 150 parititions. These partitions are too big, such that I am getting OutOfMemory problems.