RE: How are Dataframes partitioned by default when using spark?

2016-09-29 Thread Long, Xindian
Hi, Josh: Thanks for the reply. I still have some questions/comments The phoenix-spark integration inherits the underlying splits provided by Phoenix, which is a function of the HBase regions, salting and other aspects determined by the Phoenix Query Planner. XD: Is there any documentation on

How are Dataframes partitioned by default when using spark?

2016-09-19 Thread Long, Xindian
How are Dataframes/Datasets/RDD partitioned by default when using spark? assuming the Dataframe/Datasets/RDD is the result of a query like that: select col1, col2, col3 from table3 where col3 > xxx I noticed that for HBase, a partitioner partitions the rowkeys based on region splits, can