With a little bit of luck, we managed to find an answer.

Turns out we needed to remove the cast from key and run the script in Pig
0.10. I was running the script with Pig 0.8.1 up until today.

raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
$NULL_CONVERTER','-c $TEXT_CONVERTER')
    AS (key, value: chararray);

Chris

On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <[email protected]> wrote:

> Hi Andy,
>
> Here's what is in the log file.
>
> Pig Stack Trace
> ---------------
> ERROR 2244: Job failed, hadoop does not return any error message
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
> failed, hadoop does not return any error message
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:500)
>  at org.apache.pig.Main.main(Main.java:107)
>
> ================================================================================
>
> I am running it on the cluster. I could not find any additional
> information on the job tracker.
>
> The keys in the sequence files are all null. The values are all JSON
> strings. Given that information, I tried configuring the SequenceFileLoader
> this way to no avail.
>
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
> chararray);
>
> Is there another way I should be configuring it?
>
> Chris
>
> On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
> [email protected]> wrote:
>
>> Chris, the console output mentions file "/opt/shared_storage/log_
>> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
>> kind of stack trace? Were you running the script in local mode or on a
>> cluster? If the latter, there should be at least map task log output
>> someplace that may also have some clues.
>>
>> Does path
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> contain SequenceFile<Text, Text> data? If not, you'll have to configure
>> SequenceFileLoader further to properly deserialize the key-value pairs.
>>
>> Andy
>>
>>
>> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <[email protected]> wrote:
>>
>> > Andy,
>> >
>> > Here's what I'm seeing when I run the following script. There's no
>> > information beyond what is here in the log file.
>> >
>> > Chris
>> >
>> > REGISTER
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > %declare SEQFILE_LOADER
>> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > %declare TEXT_CONVERTER
>> 'com.twitter.elephantbird.pig.util.TextConverter';
>> > %declare NULL_CONVERTER
>> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> >
>> > rmf /data/SearchLogJSON;
>> >
>> > -- Load raw log data
>> > raw_logs = LOAD
>> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> USING
>> > $SEQFILE_LOADER ();
>> >
>> > -- Store the JSON
>> > STORE raw_logs INTO '/data/SearchLogJSON/';
>> >
>> > -------------------
>> >
>> > -sh-3.2$ pig dump_log_json.pig
>> > 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
>> > messages to:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> > 2012-05-17 23:57:41,586 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to hadoop file system at: XXX
>> > 2012-05-17 23:57:41,932 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to map-reduce job tracker at: XXX
>> > 2012-05-17 23:57:42,204 [main] INFO
>> >  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> > script: UNKNOWN
>> > 2012-05-17 23:57:42,204 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > pig.usenewlogicalplan is set to true. New logical plan will be used.
>> > 2012-05-17 23:57:42,301 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
>> > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
>> > scope-1 Operator Key: scope-1)
>> > 2012-05-17 23:57:42,317 [main] INFO
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>> > File concatenation threshold: 100 optimistic? false
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size before optimization: 1
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size after optimization: 1
>> > 2012-05-17 23:57:42,529 [main] INFO
>> >  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>> added
>> > to the job
>> > 2012-05-17 23:57:42,545 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>> 0.3
>> > 2012-05-17 23:57:44,706 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Setting up single store job
>> > 2012-05-17 23:57:44,734 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 1 map-reduce job(s) waiting for submission.
>> > 2012-05-17 23:57:45,053 [Thread-4] INFO
>> >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
>> paths
>> > to process : 1
>> > 2012-05-17 23:57:45,057 [Thread-4] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> > input paths (combined) to process : 1
>> > 2012-05-17 23:57:45,236 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 0% complete
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - HadoopJobId: job_201205170527_0003
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - More information at: XXX
>> > 2012-05-17 23:58:25,816 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - job job_201205170527_0003 has failed! Stop running all dependent jobs
>> > 2012-05-17 23:58:25,821 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 100% complete
>> > 2012-05-17 23:58:25,824 [main] ERROR
>> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> > 2012-05-17 23:58:25,825 [main] INFO
>>  org.apache.pig.tools.pigstats.PigStats
>> > - Script Statistics:
>> >
>> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
>> > 23:58:25 UNKNOWN
>> >
>> > Failed!
>> >
>> > Failed Jobs:
>> > JobId Alias Feature Message Outputs
>> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
>> > /data/SearchLogJSON,
>> >
>> > Input(s):
>> > Failed to read data from
>> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
>> >
>> > Output(s):
>> > Failed to produce result in "/data/SearchLogJSON"
>> >
>> > Counters:
>> > Total records written : 0
>> > Total bytes written : 0
>> > Spillable Memory Manager spill count : 0
>> > Total bags proactively spilled: 0
>> > Total records proactively spilled: 0
>> >
>> > Job DAG:
>> > job_201205170527_0003
>> >
>> >
>> > 2012-05-17 23:58:25,825 [main] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - Failed!
>> > 2012-05-17 23:58:25,831 [main] ERROR
>> org.apache.pig.tools.grunt.GruntParser
>> > - ERROR 2244: Job failed, hadoop does not return any error message
>> > Details at logfile:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> >
>> >
>> >
>> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
>> > [email protected]> wrote:
>> >
>> > > Chris, could you send us any of your error logs? What kind of failures
>> > are
>> > > you running into?
>> > >
>> > > Andy
>> > >
>> > >
>> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <[email protected]>
>> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm attempting to load sequence files for the first using Elephant
>> > Bird's
>> > > > sequence file loader and having absolutely no luck.
>> > > >
>> > > > I did a hadoop fs -text one on of the sequence files and noticed all
>> > the
>> > > > keys are (null). Not sure if that is throwing off things here.
>> > > >
>> > > > Here are various approaches I've tried that all have failed.
>> > > >
>> > > > REGISTER
>> > > >
>> > >
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > > > %declare SEQFILE_LOADER
>> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > > > %declare TEXT_CONVERTER
>> > > 'com.twitter.elephantbird.pig.util.TextConverter';
>> > > > %declare NULL_CONVERTER
>> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> > > >
>> > > > raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > bytearray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > chararray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ();
>> > > >
>> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
>> > > >
>> > > > Any thoughts on what might be the problem? Anything else I should
>> try?
>> > > I'm
>> > > > totally out of ideas.
>> > > >
>> > > > Appreciate any pointers!
>> > > >
>> > > > Chris
>> > > >
>> > >
>> >
>>
>
>

Reply via email to