Andy, Here's what I'm seeing when I run the following script. There's no information beyond what is here in the log file.
Chris REGISTER '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; %declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter'; %declare NULL_CONVERTER 'com.twitter.elephantbird.pig.util.NullWritableConverter' rmf /data/SearchLogJSON; -- Load raw log data raw_logs = LOAD '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING $SEQFILE_LOADER (); -- Store the JSON STORE raw_logs INTO '/data/SearchLogJSON/'; ------------------- -sh-3.2$ pig dump_log_json.pig 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log 2012-05-17 23:57:41,586 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: XXX 2012-05-17 23:57:41,932 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: XXX 2012-05-17 23:57:42,204 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2012-05-17 23:57:42,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2012-05-17 23:57:42,301 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) - scope-1 Operator Key: scope-1) 2012-05-17 23:57:42,317 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2012-05-17 23:57:42,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2012-05-17 23:57:42,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2012-05-17 23:57:42,529 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2012-05-17 23:57:42,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2012-05-17 23:57:44,706 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2012-05-17 23:57:44,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-05-17 23:57:45,053 [Thread-4] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2012-05-17 23:57:45,057 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2012-05-17 23:57:45,236 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2012-05-17 23:57:45,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201205170527_0003 2012-05-17 23:57:45,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: XXX 2012-05-17 23:58:25,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201205170527_0003 has failed! Stop running all dependent jobs 2012-05-17 23:58:25,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2012-05-17 23:58:25,824 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2012-05-17 23:58:25,825 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17 23:58:25 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA /data/SearchLogJSON, Input(s): Failed to read data from "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq" Output(s): Failed to produce result in "/data/SearchLogJSON" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201205170527_0003 2012-05-17 23:58:25,825 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message Details at logfile: /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer < [email protected]> wrote: > Chris, could you send us any of your error logs? What kind of failures are > you running into? > > Andy > > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <[email protected]> wrote: > > > Hi All, > > > > I'm attempting to load sequence files for the first using Elephant Bird's > > sequence file loader and having absolutely no luck. > > > > I did a hadoop fs -text one on of the sequence files and noticed all the > > keys are (null). Not sure if that is throwing off things here. > > > > Here are various approaches I've tried that all have failed. > > > > REGISTER > > > '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; > > %declare SEQFILE_LOADER > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > > %declare TEXT_CONVERTER > 'com.twitter.elephantbird.pig.util.TextConverter'; > > %declare NULL_CONVERTER > > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > > > > raw_logs = LOAD > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > USING > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: > > bytearray, value: chararray); > > --raw_logs = LOAD > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > USING > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key: > > chararray, value: chararray); > > --raw_logs = LOAD > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > USING > > $SEQFILE_LOADER (); > > > > STORE raw_logs INTO '/data/SearchLogJSON/'; > > > > Any thoughts on what might be the problem? Anything else I should try? > I'm > > totally out of ideas. > > > > Appreciate any pointers! > > > > Chris > > >
