Hi, I'm using CsvBulkLoadTool to load a csv data file into Phoenix/HBase table.
HDP Version : 2.3.2 (Phoenix Version : 4.4.0, HBase Version: 1.1.2) CSV file size: 97.6 GB No. of records: 1,439,000,238 Cluster: 13 node Phoenix table salt-buckets: 13 Phoenix table compression: snappy HBase table size after loading: 26.6 GB The job completed in *1hrs, 39mins, 43sec*. Average Map Time 5mins, 25sec Average Shuffle Time *47mins, 46sec* Average Merge Time 12mins, 22sec Average Reduce Time *32mins, 9sec* I'm looking for an opportunity to tune this job. Could someone please help me with some pointers on how to tune this job? Please let me know if you need to know any cluster configuration parameters that I'm using. *This is only a performance test. My PRODUCTION data file is 7x bigger.* Thanks, Vamsi Attluri -- Vamsi Attluri