Re: How To Count Rows In Large Phoenix Table?

anil gupta Mon, 22 Jun 2015 21:11:34 -0700

For#2: You can use Row_Counter mapreduce job of HBase to count rows of
large table. You dont need to write any code.
Here is the sample command to invoke:
hbase org.apache.hadoop.hbase.mapreduce.RowCounter <TABLE_NAME>


~Anil


On Mon, Jun 22, 2015 at 12:08 PM, Ciureanu Constantin <
[email protected]> wrote:

> Hive can connect to HBase and insert directly into any direction.
> Don't know if it also works via Phoenix...
>
> Counting is too slow on a single threaded job /command line - you should
> write a map-reduce job, with some filter to load just the key this being
> really fast.
>
> A Map-reduce job is also the solution to load data from hive to HBase
> (read from HDFS not Hive, prepare output to Phoenix format and bulk load
> the results).
> Pe 22 iun. 2015 9:34 p.m., "Riesland, Zack" <[email protected]> a
> scris:
>
>>  I had a very large Hive table that I needed in HBase.
>>
>>
>>
>> After asking around, I came to the conclusion that my best bet was to:
>>
>>
>>
>> 1 – export the hive table to a CSV ‘file’/folder on the HDFS
>>
>> 2 – Use the org.apache.phoenix.mapreduce.CsvBulkLoadTool to import the
>> data.
>>
>>
>>
>> I found that if I tried to pass the entire folder (~ 1/2 TB of data) to
>> the CsvBulkLoadTool, my job would eventually fail.
>>
>>
>>
>> Empirically, it seems that on our particular cluster, 20-30GB of data is
>> the most that the CSVBulkLoadTool can handle at one time without so many
>> map jobs timing out that the entire operation fails.
>>
>>
>>
>> So I passed one sub-file at a time and eventually got all the data into
>> HBase.
>>
>>
>>
>> I tried doing a select count(*)  on the table to see whether all of the
>> rows were transferred, but this eventually fails.
>>
>>
>>
>> Today, I believe I found a set of data that is in Hive but NOT in HBase.
>>
>>
>>
>> So, I have 2 questions:
>>
>>
>>
>> 1) Are there any known errors with the CsvBulkLoadTool such that it might
>> skip some data without getting my attention with some kind of error?
>>
>>
>>
>> 2) Is there a straightforward way to count the rows in my Phoenix table
>> so that I can compare the Hive table with the HBase table?
>>
>>
>>
>> Thanks in advance!
>>
>


-- 
Thanks & Regards,
Anil Gupta

Re: How To Count Rows In Large Phoenix Table?

Reply via email to