Re: Problem loading sequence files with Elephant Bird

Chris Diehl Thu, 17 May 2012 17:08:18 -0700

Andy,

Here's what I'm seeing when I run the following script. There's no
information beyond what is here in the log file.


Chris

REGISTER
'/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'

rmf /data/SearchLogJSON;

-- Load raw log data
raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ();

-- Store the JSON
STORE raw_logs INTO '/data/SearchLogJSON/';

-------------------

-sh-3.2$ pig dump_log_json.pig
2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
messages to:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
2012-05-17 23:57:41,586 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: XXX
2012-05-17 23:57:41,932 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: XXX
2012-05-17 23:57:42,204 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2012-05-17 23:57:42,204 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2012-05-17 23:57:42,301 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
scope-1 Operator Key: scope-1)
2012-05-17 23:57:42,317 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-05-17 23:57:42,349 [main] INFO
 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-17 23:57:42,349 [main] INFO
 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-17 23:57:42,529 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-05-17 23:57:42,545 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-17 23:57:44,706 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-17 23:57:44,734 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-17 23:57:45,053 [Thread-4] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-05-17 23:57:45,057 [Thread-4] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2012-05-17 23:57:45,236 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-17 23:57:45,849 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205170527_0003
2012-05-17 23:57:45,849 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: XXX
2012-05-17 23:58:25,816 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201205170527_0003 has failed! Stop running all dependent jobs
2012-05-17 23:58:25,821 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-05-17 23:58:25,824 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-05-17 23:58:25,825 [main] INFO  org.apache.pig.tools.pigstats.PigStats
- Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
23:58:25 UNKNOWN

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
/data/SearchLogJSON,

Input(s):
Failed to read data from
"/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"

Output(s):
Failed to produce result in "/data/SearchLogJSON"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201205170527_0003


2012-05-17 23:58:25,825 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser
- ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log



On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
[email protected]> wrote:

> Chris, could you send us any of your error logs? What kind of failures are
> you running into?
>
> Andy
>
>
> On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <[email protected]> wrote:
>
> > Hi All,
> >
> > I'm attempting to load sequence files for the first using Elephant Bird's
> > sequence file loader and having absolutely no luck.
> >
> > I did a hadoop fs -text one on of the sequence files and noticed all the
> > keys are (null). Not sure if that is throwing off things here.
> >
> > Here are various approaches I've tried that all have failed.
> >
> > REGISTER
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > bytearray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > chararray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ();
> >
> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >
> > Any thoughts on what might be the problem? Anything else I should try?
> I'm
> > totally out of ideas.
> >
> > Appreciate any pointers!
> >
> > Chris
> >
>

Re: Problem loading sequence files with Elephant Bird

Reply via email to