Hi all,
I try to load data from HBase into pig with HBaseStorage. Something is
going wrong because no data from HBase (test table) shows up in Pig;
only errors.
I configured the Hadoop and HBase in Pseudo-Distributed Operation mode.
What follows are the commands that I did and the output it produced.
//try with pig in remote mode!
pig -x mapreduce
B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B;
output:
009-11-19 13:56:02,810 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2009-11-19 13:56:02,810 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2009-11-19 13:56:04,708 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2009-11-19 13:56:04,729 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2009-11-19 13:56:04,739 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
2009-11-19 13:56:05,024 [Thread-5] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename:
file:/Users/jorislops/Desktop/pig-0.5.0/test
2009-11-19 13:56:05,231 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000
<http://127.0.0.1:60000>. Already tried 0 time(s).
2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000
<http://127.0.0.1:60000>. Already tried 1 time(s).
//port 60000 is used by a java program
pig -x local
B = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B;
output:
2009-11-19 13:53:18,425 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Successfully stored result in: "file:/tmp/temp-1663248768/tmp-1939618752"
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
written : 0
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
written : 0
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
complete!
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
//there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's empty)
I tried different paths to the HBase table 'hbase://test', 'test',
hbase://localhost:60000/test
How I stated the system (Hadoop + HBase) is started and I verified
that's working as I expected.
bin/hadoop namenode -format
bin/start-all.sh
//both Namenode and Jobtrackter are running verified by
http://localhost:50070 and http://localhost:500040
bin/start-hbase.sh
//both mater and regionserver are running check by localhost:60010
localhost:20 localhost:30
//also zookeeper Quorum is started at port localhost:2181
//fill a test table in hbase
hbase-0.20.1/bin/hbase shell
create 'test', 'data'
put 'test', 'row1', 'data', 'value1'
scan 'test'
//localhost:60010 show that the test table is in HBase.
Hope that someone knows the solution.
Thanks,
Joris