stack-3 wrote:
> 
> On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <[email protected]> wrote:
>> I have spent a lot of time in order to track down the bug and found out
>> that
>> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot
>> read
>> the content back using the same HBase version jar, however I am able to
>> read
>> it without any problems with HBase 0.20.* versions. It is easily
>> reproducible with this unit test.
>>
> 
> Stan:
> 
> You are writing kvs with 0.90 and they are readable with 0.20 but not
> w/ the jar that wrote them?
> 
> Where is the unit test you refer to?  Attachments usually don't make
> it across so you might have to pastebin it.
> 
> St.Ack
> 
> 

Exactly, I create the kvs with any of the > v0.90 jar and am not able to
read it back. By digging deeper, I have found a work-around that solves the
problem:

KeyValue kv2 = new KeyValue(kvOrig.getBuffer());

which means that the buffer is read properly by all jars, but somehow in the
new versions it is parsed wrong. I have compared the values of the leght and
offset values that are read in by class KV in the particular hbase versions:

I took a simple sequence file stored in HDFS containing Long and kvs. I have
then output the lengths and offsets of row, key, value, family and qualifier
respectively (+ plus some other kv related info - the whole procedure can be
found here http://pastebin.com/kxC5GrtM ):

version 0.20.6:
1-url/content:content/1264692453000/Put/vlen=2-0-39
r:10-3
k:8-29
v:37-2
f:14-7
q:21-7
39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
r:10-3
k:8-29
v:37-3
f:14-4
q:18-10
40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
3-url/meta:length/1264692453000/Put/vlen=8-0-41
r:10-3
k:8-25
v:33-8
f:14-4
q:18-6



version 0.90.3:

1-url/content:content/1264692453000/Put/vlen=2-0-39
r:10-3
k:8-29
v:37-2
f:14-7
q:21-7
39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
r:10-3
k:8-29
v:37-3
f:14-4
q:18-10
40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
3-url/meta:length\x00\x00\x01&/8469967462476021760/Minimum/vlen=8-0-41
r:10-3
k:8-29
v:37-8
f:14-4
q:18-10


you can see the discrepancy in the third kv read in, namely in the length of
the key as is parsed by v0.20.6 (25) and the v.90 (29). This garbles the
read in stream. However I have not found why is this happening.

Stan
-- 
View this message in context: 
http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32399356.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to