Shaheen <sbahauddin@...> writes:
>
>
> Stack <stack@...> writes:
> >
> > You saw my previous set of questions about your issue? ('Wed, Mar 2,
> > 2011 at 10:39 AM')
> > St>Ack
> >
Here is more information on what we are doing and answers to your questions
Our "BulkLoader" calls createTable with a set of startkeys.
The startKeys are keys sampled from the data that goes into the table.
>>If you scan '.META.', do regions show for your just-added files?
>>hbase> scan ".META."
The .META. does show the table I created and the region's startkeys are the
startkeys I passed to createTable.
Here is a row from .META.
SB_TEST,U|C|C C||H||9|V R|S P R E B,
1299194954706.7ac2a0c323cd9fe965b974aac1a149c3.
column=info:regioninfo, timestamp=1299194954840, value=REGION => {NAME =>
SB_TEST,,1299194954706.4b63f58a7aee013c884717562dff2c3f.', STARTKEY => '',
ENDKEY => 'U|C|C C||H||9|V R|S P R E B', ENCODED =>
4b63f58a7aee013c884717562dff2c3f, TABLE => {{NAME => 'SB_TEST', FAMILIES =>
[{NAME => 'entity_key', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>>Take one of these regions and try 'getting' its start key:
>>hbase> get 'TABLENAME', 'STARTKEY'
>>Does that work?
I picked one of the startKeys and did a get, and it did returned 0 rows. a count
also returned 0.
>>Did you mess w/ timestamps when you were inserting?
We do pass in a timestamp when we call put.add. The timestamp is the
currentTimeMillis
>>You used the totalorderpartitioner or something else?
We use a TotalOrderPartitioner with HFileOutputFormat.configureIncrementalLoad.
>>Try with a small subset of the data first?
Yes we tried on a small subset of the data. 379 rows.
-----
To check if the problem was with the startkeys used to call createTable, I
created the table and then called the BulkLoader. Below is the output of scan
.META. with a pre-created table (table created in HBase shell).
>scan '.META.'
SB_TEST,,1299251737810.1cea0a172f273279744470411947698a.
column=info:regioninfo, timestamp=1299251737900, value=REGION => {NAME =>
'SB_TEST,,1299251737810.1cea0a172f273279744470411947698a.', STARTKEY => '',
ENDKEY => '', ENCODED => 1cea0a172f273279744470411947698a, TABLE => {{NAME =>
'SB_TEST', FAMILIES => [{NAME => 'entity_key', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION=> 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
>count 'SB_TEST'
0 row(s) in 0.0530 seconds
----
If I directly load the data to table here is what I get in .META.
SB_TEST,,1299253653711.91771393876cb5701e7ddabe663cf3c1.
column=info:regioninfo, timestamp=1299253653763, value=REGION => {NAME =>
'SB_TEST,,1299253653711.91771393876cb5701e7ddabe663cf3c1.', STARTKEY => '',
ENDKEY => '', ENCODED => 91771393876cb5701e7ddabe663cf3c1, TABLE => {{NAME =>
'SB_TEST', FAMILIES => [{NAME => 'entity_key', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION=> 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
> count 'SB_TEST'
379 row(s) in 0.0600 seconds