Re: Use loadtable.rb with compressed data?

Stack Fri, 28 Jan 2011 11:30:23 -0800

So, seems like in 0.20.6, we're not doing compression right.
St.Ack


On Fri, Jan 28, 2011 at 11:23 AM, Nanheng Wu <[email protected]> wrote:
> Ah, sorry I should've read the usage. I ran it just now and the meta
> data dump threw the same error "Not in GZIP format"
>
> On Fri, Jan 28, 2011 at 10:51 AM, Stack <[email protected]> wrote:
>> hfile metadata, the -m option?
>> St.Ack
>>
>> On Fri, Jan 28, 2011 at 10:41 AM, Nanheng Wu <[email protected]> wrote:
>>> Sorry, by dumping the metadata did you mean running the same HFile
>>> tool on ".region" file in each region?
>>>
>>> On Fri, Jan 28, 2011 at 10:25 AM, Stack <[email protected]> wrote:
>>>> If you dump the metadata, does it claim GZIP compressor?  If so, yeah,
>>>> seems to be mismatch between what data is and what metadata is.
>>>> St.Ack
>>>>
>>>> On Fri, Jan 28, 2011 at 9:58 AM, Nanheng Wu <[email protected]> wrote:
>>>>> Awesome. I ran it on one of the hfiles and got this:
>>>>> 11/01/28 09:57:15 INFO compress.CodecPool: Got brand-new decompressor
>>>>> java.io.IOException: Not in GZIP format
>>>>>        at 
>>>>> java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
>>>>>        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>>>>>        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>>>>>        at 
>>>>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>>>>>        at 
>>>>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>>>>>        at 
>>>>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>>>>>        at 
>>>>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>>>>>        at 
>>>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.createDecompressionStream(Compression.java:168)
>>>>>        at 
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1013)
>>>>>        at 
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:966)
>>>>>        at 
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1291)
>>>>>        at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:1740)
>>>>>
>>>>> So the problem could be that HFile writer is not writing properly
>>>>> gzipped outputs?
>>>>>
>>>>>
>>>>> On Fri, Jan 28, 2011 at 9:41 AM, Stack <[email protected]> wrote:
>>>>>> The section in 0.90 book on hfile tool should apply to 0.20.6:
>>>>>> http://hbase.apache.org/ch08s02.html#hfile_tool  It might help you w/
>>>>>> your explorations.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>> On Fri, Jan 28, 2011 at 9:38 AM, Nanheng Wu <[email protected]> wrote:
>>>>>>> Hi Stack,
>>>>>>>
>>>>>>>  Get doesn't work either. It was a fresh table created by
>>>>>>> loadtable.rb. Finally, the uncompressed version had the same number of
>>>>>>> regions (8 total). I totally understand you guys shouldn't be patching
>>>>>>> the older version, upgrading for me is an option but will be pretty
>>>>>>> painful. I wonder if I can figure something out by comparing the two
>>>>>>> version's Hfile. Thanks again!
>>>>>>>
>>>>>>> On Fri, Jan 28, 2011 at 9:14 AM, Stack <[email protected]> wrote:
>>>>>>>> On Thu, Jan 27, 2011 at 9:35 PM, Nanheng Wu <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> In the compressed case, there are 8 regions and the region start/end
>>>>>>>>> keys do line up. Which actually is confusing to me, how can hbase read
>>>>>>>>> the files if they are compressed? does each hfile have some metadata
>>>>>>>>> in it that has compression info?
>>>>>>>>
>>>>>>>> You got it.
>>>>>>>>
>>>>>>>>> Anyway, the regions are the same
>>>>>>>>> (numbers and boundaries are same) in both compressed and uncompressed
>>>>>>>>> version. So what else should I look into to fix this? Thanks again!
>>>>>>>>
>>>>>>>> You can't scan. Can you Get from the table at all?  Try getting start
>>>>>>>> key from a few of the regions you see in .META.
>>>>>>>>
>>>>>>>> Did this table preexist or was this a fresh creation?
>>>>>>>>
>>>>>>>> When you created this table uncompressed, how many regions was it?
>>>>>>>>
>>>>>>>> How about just running uncompressed while you are on 0.20.6?  We'd
>>>>>>>> rather be fixing bugs in the new stuff, not the version that we are
>>>>>>>> leaving behind?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> St.Ack
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Use loadtable.rb with compressed data?

Reply via email to