Hi all,

I try to load data from HBase into pig with HBaseStorage. Something is going wrong because no data from HBase (test table) shows up in Pig; only errors.

I configured the Hadoop and HBase in Pseudo-Distributed Operation mode.

What follows are the commands that I did and the output it produced.


//try with pig in remote mode!

pig -x mapreduce

B = load 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);

dump B;

output:

009-11-19 13:56:02,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1

2009-11-19 13:56:02,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1

2009-11-19 13:56:04,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job

2009-11-19 13:56:04,729 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized

2009-11-19 13:56:04,739 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

2009-11-19 13:56:05,024 [Thread-5] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename: file:/Users/jorislops/Desktop/pig-0.5.0/test

2009-11-19 13:56:05,231 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: localhost/127.0.0.1:60000 <http://127.0.0.1:60000>. Already tried 0 time(s).

2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: localhost/127.0.0.1:60000 <http://127.0.0.1:60000>. Already tried 1 time(s).



//port 60000 is used by a java program



pig -x local

B = load 'test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);

dump B;

output:

2009-11-19 13:53:18,425 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: "file:/tmp/temp-1663248768/tmp-1939618752"

2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 0

2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0

2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!

2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!

//there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's empty)



I tried different paths to the HBase table 'hbase://test', 'test', hbase://localhost:60000/test



How I stated the system (Hadoop + HBase) is started and I verified that's working as I expected.



bin/hadoop namenode -format

bin/start-all.sh

//both Namenode and Jobtrackter are running verified by http://localhost:50070 and http://localhost:500040



bin/start-hbase.sh

//both mater and regionserver are running check by localhost:60010 localhost:20 localhost:30

//also zookeeper Quorum is started at port localhost:2181



//fill a test table in hbase

hbase-0.20.1/bin/hbase shell

create 'test', 'data'

put 'test', 'row1', 'data', 'value1'

scan 'test'

//localhost:60010 show that the test table is in HBase.



Hope that someone knows the solution.

Thanks,

Joris










Reply via email to