Re: Get region for row key

2016-07-11 Thread Simon Wang
As I read more Phoenix code, I feel that I should do: 1. Use `PhoenixRuntime.getTable` to get a `PTable` 2. Use `table.getPKColumns` to get a list of `PColumn`s 3. For each column, use `column.getDataType`; then `dataType.toBytes(value, column.getSortOrder)` 4. Finally, create a new

Re: Read Full Phoenix Table

2016-07-11 Thread Mohanraj Ragupathiraj
Thank you for your reply. I tried passing the PKs through IN clause. But the number of PKs to match between files and Phoenix table some times can be 70 million and i felt it will be much slower if i use IN clause. May i know how much PKs you passed through IN clause ? On Tue, Jul 12, 2016 at

Re: Read Full Phoenix Table

2016-07-11 Thread Simon Wang
I actually recently did something similar. If you are joining on primary keys, you can do batch query with the IN clause. > On Jul 11, 2016, at 9:05 PM, Mohanraj Ragupathiraj > wrote: > > Hi, > > I have a Scenario in which i have to load a phoenix table as a whole

Read Full Phoenix Table

2016-07-11 Thread Mohanraj Ragupathiraj
Hi, I have a Scenario in which i have to load a phoenix table as a *whole *and join it with multiple files in Spark. But it takes around 30 minutes just to read 600 million records from the Phoenix table. I feel it is inappropriate to load full table data, as HBase works best for Random lookups.

Re: Index tables at scale

2016-07-11 Thread Simon Wang
Thanks Mujtaba. This is good to know. It is possible manipulate the key bit to avoid the hot-spotting, so we are probably trying unsalted table out. Still, it would be nice if combined indexes in a single table is possible. > On Jul 11, 2016, at 2:41 PM, Mujtaba Chohan

Re: Index tables at scale

2016-07-11 Thread Mujtaba Chohan
FYI if you keys are not written in order i.e. you are not concerned about write hot-spotting/write throughput then try writing your data to an un-salted table. Read performance for un-salted table can be comparable or better to salted one with stats

Re: Index tables at scale

2016-07-11 Thread Simon Wang
This indexes will be salted indeed. (so is the data table). If all indexes reside in the same table, there will be only 512 regions in total (256 for data table, 256 for the combined index table). Indeed the combined index table will be 12x large as a single index table. But it doesn’t cover

Re: Index tables at scale

2016-07-11 Thread James Taylor
Will the index be salted (and that's why it's 256 regions per table)? If not, how many regions would there be if all indexes are in the same table (assuming the table is 12x bigger than one index table)? On Monday, July 11, 2016, Simon Wang wrote: > Thanks, Mujtaba. What

Re: Index tables at scale

2016-07-11 Thread Simon Wang
Thanks, Mujtaba. What you wrote is exactly what I meant. While not all our tables needs these many regions and indexes, the num of regions/region server can grow quickly. -Simon > On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan wrote: > > 12 index tables * 256 region per

Re: Index tables at scale

2016-07-11 Thread Mujtaba Chohan
12 index tables * 256 region per table = ~3K regions for index tables assuming we are talking of covered index which implies 200+ regions/region server on a 15 node cluster. On Mon, Jul 11, 2016 at 1:58 PM, James Taylor wrote: > Hi Simon, > > I might be missing

Re: Index tables at scale

2016-07-11 Thread James Taylor
Hi Simon, I might be missing something, but with 12 separate index tables or 1 index table, the amount of data will be the same. Won't there be the same number of regions either way? Thanks, James On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang wrote: > Hi James, > >