For some reason pig fails to find the samples files created in the sampling MR job of the order-by. You seem to be running in local mode, is this error seen in map-reduce mode as well? -Thejas
On 3/11/11 8:35 AM, "Keric Donnelly" <[email protected]> wrote: I've been playing with pig this week and I'm running into an issue that seems like it should be trivial. I'm basically reading data from hbase and and performing a count of sessions associated with a cookie. I'm running on Pig 0.8 My script looks like the following raw = LOAD 'hbase://sport_user' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'session:*', '-loadKey true') AS (id:bytearray, session_map:map[]); -- Convert maps to bags B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag; --dump B; -- Count the number of session C = FOREACH B GENERATE id, COUNT(session_bag) as sess_count; describe C ; dump C ; This works fine. when I dump "C" I see the cg cookie and num of sessions. For Example (ANON_Cg+5EUka4wFOAAAAtRg,2) (ANON_Cg+5EUknSmmLAAAA5CU,1) (ANON_Cg+5EUlHWwwNAAAALQQ,1) (ANON_Cg+5EUlSDOIJAAAAygw,1) (ANON_Cg+5EUlgDESHAAAAWQ0,1) (ANON_Cg+5EUli1UHBAAAA/xg,4) (ANON_Cg+5EUmSc3sPAAAAsg4,2) (ANON_Cg+5EUmo6i8PAAAAwxo,2) (ANON_Cg+5EUn2X6HOAAAAWSM,1) (ANON_Cg+5EUn5PmRCAQAA1xA,4) (ANON_Cg+5EUnUT9+NAAAA0RE,3) (ANON_Cg+5EUnjSD0BAAAACx0,1) (ANON_Cg+5EUoJF82PAAAAkgI,1) (ANON_Cg+5EUoWJW9GAAAAcx4,1) (ANON_Cg+5EUorklmHAAAAxRk,1) (ANON_Cg+5EUp1bXGFAAAAPwA,1) (ANON_Cg+5EUp55I5OAAAAmR4,2) (ANON_Cg+5EUp9XkHFAAAAYQ8,2) (ANON_Cg+5EUpK/koEAAAAcRs,3) (ANON_Cg+5EUpd/aDJAAAABBw,3) If I then do a desc sort on the alias "C" I get an error when I dump it D = ORDER C BY sess_count DESC ; dump D ; 2011-03-10 16:10:59,325 [Thread-57] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0004 java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) ... 6 more any thoughts ? Thanks Keric
