Re: Get region for row key

2016-07-12 Thread James Taylor
Take a look at our MR integration, CSV bulk loader, and other tools. The typical way to get the row key is to run an UPSERT VALUES statement with bind parameters for the columns in the primary key. Then bind them to values and use the PhoenixRuntime.getUncommittedDataIterator(Connection conn) to

Re: Get region for row key

2016-07-12 Thread Simon Wang
Thanks James. I will file a JIRA. Actually I spent some time a few days ago trying to make Phoenix-Hive compatible with 0.13-1, but it did seem easy. So before 0.13-1 is supported, we may want to use the workflow I proposed as a temporary workaround for 4.7. It would be great to know if these

Re: Get region for row key

2016-07-12 Thread James Taylor
If our Hive support would solve your use case, perhaps you could look into supporting Hive 0.13-1. I'm not sure of the level of effort, as others contributed this integration. How about filing a JIRA to discuss? We'll have an RC up for 4.8 in the next day or so. Thanks, James On Tue, Jul 12,

Re: Get region for row key

2016-07-12 Thread Simon Wang
Hi James, Sorry if I wasn’t clear enough. One example use case is: 1. load a Hive data frame, 2. repartition (using default hash function), 3. forEachPartition batch query the rows against Phoenix. This process is a bit slow. We figured that it might have something to do with spark executor

Re: Get region for row key

2016-07-12 Thread James Taylor
Hi Simon, I still don't understand the use case completely. Also, did you know Phoenix has Hive integration now (as of 4.8)? Would it be possible for you to try using that? My initial impression is that you're dipping down to to low of a level here, using may non public APIs which may change in