Chris, the console output mentions file "/opt/shared_storage/log_ analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any kind of stack trace? Were you running the script in local mode or on a cluster? If the latter, there should be at least map task log output someplace that may also have some clues.
Does path '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' contain SequenceFile<Text, Text> data? If not, you'll have to configure SequenceFileLoader further to properly deserialize the key-value pairs. Andy On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <[email protected]> wrote: > Andy, > > Here's what I'm seeing when I run the following script. There's no > information beyond what is here in the log file. > > Chris > > REGISTER > '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; > %declare SEQFILE_LOADER > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter'; > %declare NULL_CONVERTER > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > > rmf /data/SearchLogJSON; > > -- Load raw log data > raw_logs = LOAD > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING > $SEQFILE_LOADER (); > > -- Store the JSON > STORE raw_logs INTO '/data/SearchLogJSON/'; > > ------------------- > > -sh-3.2$ pig dump_log_json.pig > 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error > messages to: > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log > 2012-05-17 23:57:41,586 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting to hadoop file system at: XXX > 2012-05-17 23:57:41,932 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting to map-reduce job tracker at: XXX > 2012-05-17 23:57:42,204 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the > script: UNKNOWN > 2012-05-17 23:57:42,204 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > pig.usenewlogicalplan is set to true. New logical plan will be used. > 2012-05-17 23:57:42,301 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) - > scope-1 Operator Key: scope-1) > 2012-05-17 23:57:42,317 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > File concatenation threshold: 100 optimistic? false > 2012-05-17 23:57:42,349 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2012-05-17 23:57:42,349 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > 2012-05-17 23:57:42,529 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added > to the job > 2012-05-17 23:57:42,545 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2012-05-17 23:57:44,706 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > 2012-05-17 23:57:44,734 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2012-05-17 23:57:45,053 [Thread-4] INFO > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths > to process : 1 > 2012-05-17 23:57:45,057 [Thread-4] INFO > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input paths (combined) to process : 1 > 2012-05-17 23:57:45,236 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2012-05-17 23:57:45,849 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - HadoopJobId: job_201205170527_0003 > 2012-05-17 23:57:45,849 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - More information at: XXX > 2012-05-17 23:58:25,816 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - job job_201205170527_0003 has failed! Stop running all dependent jobs > 2012-05-17 23:58:25,821 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2012-05-17 23:58:25,824 [main] ERROR > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! > 2012-05-17 23:58:25,825 [main] INFO org.apache.pig.tools.pigstats.PigStats > - Script Statistics: > > HadoopVersion PigVersion UserId StartedAt FinishedAt Features > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17 > 23:58:25 UNKNOWN > > Failed! > > Failed Jobs: > JobId Alias Feature Message Outputs > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA > /data/SearchLogJSON, > > Input(s): > Failed to read data from > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq" > > Output(s): > Failed to produce result in "/data/SearchLogJSON" > > Counters: > Total records written : 0 > Total bytes written : 0 > Spillable Memory Manager spill count : 0 > Total bags proactively spilled: 0 > Total records proactively spilled: 0 > > Job DAG: > job_201205170527_0003 > > > 2012-05-17 23:58:25,825 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed! > 2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser > - ERROR 2244: Job failed, hadoop does not return any error message > Details at logfile: > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log > > > > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer < > [email protected]> wrote: > > > Chris, could you send us any of your error logs? What kind of failures > are > > you running into? > > > > Andy > > > > > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <[email protected]> wrote: > > > > > Hi All, > > > > > > I'm attempting to load sequence files for the first using Elephant > Bird's > > > sequence file loader and having absolutely no luck. > > > > > > I did a hadoop fs -text one on of the sequence files and noticed all > the > > > keys are (null). Not sure if that is throwing off things here. > > > > > > Here are various approaches I've tried that all have failed. > > > > > > REGISTER > > > > > > '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; > > > %declare SEQFILE_LOADER > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > > > %declare TEXT_CONVERTER > > 'com.twitter.elephantbird.pig.util.TextConverter'; > > > %declare NULL_CONVERTER > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > > > > > > raw_logs = LOAD > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > > USING > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: > > > bytearray, value: chararray); > > > --raw_logs = LOAD > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > > USING > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key: > > > chararray, value: chararray); > > > --raw_logs = LOAD > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > > USING > > > $SEQFILE_LOADER (); > > > > > > STORE raw_logs INTO '/data/SearchLogJSON/'; > > > > > > Any thoughts on what might be the problem? Anything else I should try? > > I'm > > > totally out of ideas. > > > > > > Appreciate any pointers! > > > > > > Chris > > > > > >
