I've been playing with pig this week and I'm running into an issue that
seems like it should be trivial. I'm basically reading data from hbase and
and performing a count of sessions associated with a cookie.
I'm running on Pig 0.8
My script looks like the following
raw = LOAD 'hbase://sport_user'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'session:*', '-loadKey true')
AS (id:bytearray, session_map:map[]);
-- Convert maps to bags
B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;
--dump B;
-- Count the number of session
C = FOREACH B GENERATE id,
COUNT(session_bag) as sess_count;
describe C ;
dump C ;
This works fine. when I dump "C" I see the cg cookie and num of sessions.
For Example
(ANON_Cg+5EUka4wFOAAAAtRg,2)
(ANON_Cg+5EUknSmmLAAAA5CU,1)
(ANON_Cg+5EUlHWwwNAAAALQQ,1)
(ANON_Cg+5EUlSDOIJAAAAygw,1)
(ANON_Cg+5EUlgDESHAAAAWQ0,1)
(ANON_Cg+5EUli1UHBAAAA/xg,4)
(ANON_Cg+5EUmSc3sPAAAAsg4,2)
(ANON_Cg+5EUmo6i8PAAAAwxo,2)
(ANON_Cg+5EUn2X6HOAAAAWSM,1)
(ANON_Cg+5EUn5PmRCAQAA1xA,4)
(ANON_Cg+5EUnUT9+NAAAA0RE,3)
(ANON_Cg+5EUnjSD0BAAAACx0,1)
(ANON_Cg+5EUoJF82PAAAAkgI,1)
(ANON_Cg+5EUoWJW9GAAAAcx4,1)
(ANON_Cg+5EUorklmHAAAAxRk,1)
(ANON_Cg+5EUp1bXGFAAAAPwA,1)
(ANON_Cg+5EUp55I5OAAAAmR4,2)
(ANON_Cg+5EUp9XkHFAAAAYQ8,2)
(ANON_Cg+5EUpK/koEAAAAcRs,3)
(ANON_Cg+5EUpd/aDJAAAABBw,3)
If I then do a desc sort on the alias "C" I get an error when I dump it
D = ORDER C BY sess_count DESC ;
dump D ;
2011-03-10 16:10:59,325 [Thread-57] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112)
... 6 more
any thoughts ?
Thanks
Keric