Re: CsvBulkLoadTool question

Kiru Pakkirisamy Fri, 24 Apr 2015 09:51:24 -0700

Gabriel,Thanks for the tip, I will retry with the SALT_BUCKETS option.
Regards,
- kiru      From: Gabriel Reid <gabriel.r...@gmail.com>
 To: user@phoenix.apache.org; Kiru Pakkirisamy <kirupakkiris...@yahoo.com> 
 Sent: Thursday, April 23, 2015 11:57 PM
 Subject: Re: CsvBulkLoadTool question
   
Hi Kiru,
The CSV bulk loader won't automatically make multiple regions for you, it 
simply loads data into the existing regions of the table. In your case, it 
means that all data has been loaded into a single region (as you're seeing), 
which means that any kind of operations that scan over a large number of rows 
(such as a "select count") will be very slow.
I would recommend pre-splitting your table before running the bulk load tool. 
If you're creating the table directly in Phoenix, you can supply the 
SALT_BUCKETS table option [1] when creating the table.
- Gabriel
1. http://phoenix.apache.org/language/index.html#options




On Fri, Apr 24, 2015 at 2:15 AM Kiru Pakkirisamy <kirupakkiris...@yahoo.com> 
wrote:

Hi,We are trying to load large number of rows (100/200M) into a table and 
benchmark it against Hive.We pretty much used the CsvBulkLoadTool as 
documented. But now after completion, Hbase is still in 'minor compaction' for 
quite a number of hours.(Also, we see only one region in the table.)A select 
count on this table does not seem to complete. Any ideas on how to proceed ? 
Regards,
- kiru

Re: CsvBulkLoadTool question

Reply via email to