For#2: You can use Row_Counter mapreduce job of HBase to count rows of large table. You dont need to write any code. Here is the sample command to invoke: hbase org.apache.hadoop.hbase.mapreduce.RowCounter <TABLE_NAME>
~Anil On Mon, Jun 22, 2015 at 12:08 PM, Ciureanu Constantin < [email protected]> wrote: > Hive can connect to HBase and insert directly into any direction. > Don't know if it also works via Phoenix... > > Counting is too slow on a single threaded job /command line - you should > write a map-reduce job, with some filter to load just the key this being > really fast. > > A Map-reduce job is also the solution to load data from hive to HBase > (read from HDFS not Hive, prepare output to Phoenix format and bulk load > the results). > Pe 22 iun. 2015 9:34 p.m., "Riesland, Zack" <[email protected]> a > scris: > >> I had a very large Hive table that I needed in HBase. >> >> >> >> After asking around, I came to the conclusion that my best bet was to: >> >> >> >> 1 – export the hive table to a CSV ‘file’/folder on the HDFS >> >> 2 – Use the org.apache.phoenix.mapreduce.CsvBulkLoadTool to import the >> data. >> >> >> >> I found that if I tried to pass the entire folder (~ 1/2 TB of data) to >> the CsvBulkLoadTool, my job would eventually fail. >> >> >> >> Empirically, it seems that on our particular cluster, 20-30GB of data is >> the most that the CSVBulkLoadTool can handle at one time without so many >> map jobs timing out that the entire operation fails. >> >> >> >> So I passed one sub-file at a time and eventually got all the data into >> HBase. >> >> >> >> I tried doing a select count(*) on the table to see whether all of the >> rows were transferred, but this eventually fails. >> >> >> >> Today, I believe I found a set of data that is in Hive but NOT in HBase. >> >> >> >> So, I have 2 questions: >> >> >> >> 1) Are there any known errors with the CsvBulkLoadTool such that it might >> skip some data without getting my attention with some kind of error? >> >> >> >> 2) Is there a straightforward way to count the rows in my Phoenix table >> so that I can compare the Hive table with the HBase table? >> >> >> >> Thanks in advance! >> > -- Thanks & Regards, Anil Gupta
