Hi,
i posted this question already on the cloudera list, but i didn't get a
solution yet. So i want to ask here again.
I am currently running Hadoop and HBase in pseudo-distributed mode
using CDH3-u2. In this update, snappy was included for HBase 0.90.4-
cdh3u2.
I wanted to try it out and compare size and speed to lzo (which works
fine). But when i try to create a table, or alter an existing table, i
get an error.
Here is what i do on HBase shell (same effect when i use java API).
create 'testsnappy', {NAME => 'f1', COMPRESSION => 'SNAPPY'}
ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0
of 1 regions are online; retries exhausted.
On the HBase Master Website in can see the new table in the "Regions
in Transition" section. The state is altering between "opening" and
"closed".
The log is full of those:
ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Failed open of region=testsnappy,,
1321955383777.7e5e71006335788551c2d6e90d5f9dee.
java.io.IOException: Compression algorithm 'snappy' previously failed
test.
at
org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:
78)
at
org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:
2670)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:
2659)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:
2647)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:
312)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:
99)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:
158)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
It indicates a failed snappy test, but when i run the test from a
shell:
hbase org.apache.hadoop.hbase.util.CompressionTest afolder snappy
11/11/22 10:57:17 WARN snappy.LoadSnappy: Snappy native library is
available
11/11/22 10:57:17 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
11/11/22 10:57:17 INFO snappy.LoadSnappy: Snappy native library loaded
11/11/22 10:57:17 INFO compress.CodecPool: Got brand-new compressor
SUCCESS
So it seems to be successful.
What am i missing? Is this just related to pseudo-dstributed mode?
I am going to run MapReduce Jobs on HBase tables probably from next
week on a real cluster (also using CDH3-u2) and i would like to avoid
these problems then :)
When i set snappy as compression for Mapper outputs, i don't get any
errors and the jobs run fine.
conf.set("mapred.compress.map.output", "true");
conf.set("mapred.map.output.compression.codec",
"org.apache.hadoop.io.compress.SnappyCodec");
Thanks for help,
Christopher