Take a look at our MR integration, CSV bulk loader, and other tools. The
typical way to get the row key is to run an UPSERT VALUES statement with
bind parameters for the columns in the primary key. Then bind them to
values and use the PhoenixRuntime.getUncommittedDataIterator(Connection
conn) to
Thanks James. I will file a JIRA. Actually I spent some time a few days ago
trying to make Phoenix-Hive compatible with 0.13-1, but it did seem easy. So
before 0.13-1 is supported, we may want to use the workflow I proposed as a
temporary workaround for 4.7. It would be great to know if these
If our Hive support would solve your use case, perhaps you could look into
supporting Hive 0.13-1. I'm not sure of the level of effort, as others
contributed this integration. How about filing a JIRA to discuss?
We'll have an RC up for 4.8 in the next day or so.
Thanks,
James
On Tue, Jul 12,
Hi James,
Sorry if I wasn’t clear enough. One example use case is:
1. load a Hive data frame,
2. repartition (using default hash function),
3. forEachPartition batch query the rows against Phoenix.
This process is a bit slow. We figured that it might have something to do with
spark executor
Hi Simon,
I still don't understand the use case completely. Also, did you know
Phoenix has Hive integration now (as of 4.8)? Would it be possible for you
to try using that? My initial impression is that you're dipping down to to
low of a level here, using may non public APIs which may change in