CSV bulk loading question

Bulvik, Noam Mon, 09 Mar 2015 23:05:21 -0700

Hi,

We are using the CSV bulk  loading (MR) to load our data. we have a table with 
50 columns and We did some testing to understand the factors on the performance 
of loading.
We compared two cases
A -  each column in the data will be a column in hbase table
B - take all non-key column and put them in one column in the hbase table


We saw that the second option we 7 times faster than the first one and consumed 
les CPU resources.

Does this make sense? Can we do something to tune the system so option A will 
run faster? (we prefer it this way because it enables us to query and filter 
over all data columns)


Regards,

Noam Bulvik


________________________________

PRIVILEGED AND CONFIDENTIAL
PLEASE NOTE: The information contained in this message is privileged and 
confidential, and is intended only for the use of the individual to whom it is 
addressed and others who have been specifically authorized to receive it. If 
you are not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, or if any 
problems occur with transmission, please contact sender. Thank you.

CSV bulk loading question

Reply via email to