RE: More on issue with local vs mapreduce mode

Serega Sheypak Wed, 06 Nov 2013 18:56:03 -0800

Show the code if your func. How did you define input and output schema?
07.11.2013 2:09 пользователь "Sameer Tilak" <ssti...@live.com> написал:


> Dear Serega,
> I am now using log4j for debugging my UDF. Here is what I found out. For
> some reason in mapreduce mode my exec function does not get called.   The
> log message in the constructor gets printed onto the console. However, the
> log message in the exec funciton does not get printed to the console. In
> local mode all the log messages can be seen on the console.
>
>  public Tuple exec(Tuple input) throws IOException {
>         log.info("Into the exec function");
>         // rest of the code
> }
>
>
> public CustomFilter ()
>             {
>
>                 log.info("Hello World!");
>         }
>
>
> > From: serega.shey...@gmail.com
> > Date: Wed, 6 Nov 2013 21:19:10 +0400
> > Subject: Re: More on issue with local vs mapreduce mode
> > To: user@pig.apache.org
> >
> > You get 4 empty tuples.
> > Maybe your UDF parser.customFilter(key,'AAAAA') works differently? Maybe
> > you use the old version?
> > You can add print statement to UDF and see what does it accept and what
> > does produce.
> >
> >
> > 2013/11/6 Sameer Tilak <ssti...@live.com>
> >
> > > Dear Serega,
> > >
> > > When I run the script in local mode, I get correct o/p stored in
> > > AU/part-m-000 file. However, when I run it in the mapreduce mode (with
> i/p
> > > and o/p from HDFS), the file /scratch/AU/part-m-000 is of size 4 and
> there
> > > is nothing in it.
> > >
> > > I am not sure whether AU relation somehow does not get realized
> correctly
> > > or the problem happens during the storing stage.
> > >
> > > Here are some of  the console messages:
> > >
> > > HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt
> > >  Features
> > > 1.0.3    0.11.1    p529444    2013-11-06 07:14:15    2013-11-06
> 07:14:40
> > >  UNKNOWN
> > >
> > > Success!
> > >
> > > Job Stats (time in seconds):
> > > JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime
> > >  MedianMapTime    MaxReduceTime    MinReduceTime    AvgReduceTime
> > >  MedianReducetime    Alias    Feature    Outputs
> > > job_201311011343_0042    1    0    6    6    6    6    0    0    0    0
> > >  A,AU    MULTI_QUERY,MAP_ONLY    /scratch/A,/scratch/AU,
> > >
> > > Input(s):
> > > Successfully read 4 records (311082 bytes) from: "/scratch/file.seq"
> > >
> > > Output(s):
> > > Successfully stored 4 records (231 bytes) in: "/scratch/A"
> > > Successfully stored 4 records (3 bytes) in: "/scratch/AU"
> > >
> > > Counters:
> > > Total records written : 8
> > > Total bytes written : 234
> > > Spillable Memory Manager spill count : 0
> > > Total bags proactively spilled: 0
> > > Total records proactively spilled: 0
> > >
> > > Job DAG:
> > > job_201311011343_0042
> > >
> > >
> > > 2013-11-06 07:14:40,352 [main] INFO
> > >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Success!
> > >
> > > Here is the o/p of following commands:
> > >
> > > hadoop --config $HADOOP_CONF_DIR fs -ls /scratch/AU/part-m-00000
> > >
> > > -rw-r--r--   1 username groupname          4 2013-11-06 07:14
> > > /scratch/AU/part-m-00000
> > >
> > > I am not sure whether DUMP will give me correct result, but when I
> > > replaced store by dump AU in the mapredue mode then I get
> > > AU as:
> > > ()
> > > ()
> > > ()
> > > ()
> > >
> > > > From: serega.shey...@gmail.com
> > > > Date: Wed, 6 Nov 2013 11:19:03 +0400
> > > > Subject: Re: More on issue with local vs mapreduce mode
> > > > To: user@pig.apache.org
> > > >
> > > > "The same script does not work in the mapreduce mode. "
> > > > What does it mean?
> > > >
> > > >
> > > > 2013/11/6 Sameer Tilak <ssti...@live.com>
> > > >
> > > > > Hello,
> > > > >
> > > > > My script in the local mode works perfectly. The same script does
> not
> > > work
> > > > > in the mapreduce mode. For the local mode, the o/p is saved in the
> > > current
> > > > > directory, where as for the mapreduce mode I use /scrach directory
> on
> > > HDFS.
> > > > >
> > > > > Local mode:
> > > > >
> > > > > A = LOAD 'file.seq' USING SequenceFileLoader AS (key: chararray,
> value:
> > > > > chararray);
> > > > > DESCRIBE A;
> > > > > STORE A into 'A';
> > > > >
> > > > > AU = FOREACH A GENERATE FLATTEN(parser.customFilter(key,'AAAAA'));
> > > > > STORE AU into 'AU';
> > > > >
> > > > >
> > > > > Mapreduce mode:
> > > > >
> > > > > A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key:
> > > chararray,
> > > > > value: chararray);
> > > > > DESCRIBE A;
> > > > > STORE A into '/scratch/A';
> > > > >
> > > > > AU = FOREACH A GENERATE FLATTEN(parser.customFilter(key,'AAAAA'));
> > > > > STORE AU into '/scratch/AU';
> > > > >
> > > > > Can someone please point me to tools that I can use to debug the
> > > script in
> > > > > mapreduce mode? Also, any thoughts on why this might be happening
> > > would be
> > > > > great!
> > > > >
> > > > >
> > >
> > >
> > >
> > >
>

RE: More on issue with local vs mapreduce mode

Reply via email to