how to tune phoenix CsvBulkLoadTool job

Vamsi Krishna Sat, 19 Mar 2016 11:02:54 -0700

Hi,

I'm using CsvBulkLoadTool to load a csv data file into Phoenix/HBase table.


HDP Version : 2.3.2 (Phoenix Version : 4.4.0, HBase Version: 1.1.2)
CSV file size: 97.6 GB
No. of records: 1,439,000,238
Cluster: 13 node
Phoenix table salt-buckets: 13
Phoenix table compression: snappy
HBase table size after loading: 26.6 GB

The job completed in *1hrs, 39mins, 43sec*.
Average Map Time         5mins, 25sec
Average Shuffle Time *47mins, 46sec*
Average Merge Time 12mins, 22sec
Average Reduce Time *32mins, 9sec*

I'm looking for an opportunity to tune this job.
Could someone please help me with some pointers on how to tune this job?
Please let me know if you need to know any cluster configuration parameters
that I'm using.

*This is only a performance test. My PRODUCTION data file is 7x bigger.*

Thanks,
Vamsi Attluri

-- 
Vamsi Attluri

how to tune phoenix CsvBulkLoadTool job

Reply via email to