"The same script does not work in the mapreduce mode. "
What does it mean?
2013/11/6 Sameer Tilak
> Hello,
>
> My script in the local mode works perfectly. The same script does not work
> in the mapreduce mode. For the local mode, the o/p is saved in the current
> directory, where as for the ma
Hello,
My script in the local mode works perfectly. The same script does not work in
the mapreduce mode. For the local mode, the o/p is saved in the current
directory, where as for the mapreduce mode I use /scrach directory on HDFS.
Local mode:
A = LOAD 'file.seq' USING SequenceFileLoader AS
I have some python code I'd like to deploy with a pig script. The .py
code takes input from sys.stdin and outputs to sys.stdout. It also
needs some parameter files to run properly.
The book "Programming Pig" tells me:
"The workaround for this is to create a TAR file and ship that, and
then have a
Hi guys,
For some reason I cannot setup any version higher than Pig 0.10 with
Hadoop 1.2.1 and Cassandra 1.2.10. For example, using Pig 0.12 when I
try a very simple dump I get this error from JobTracker log:
2013-11-05 17:44:12,000 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from at
I see... do you have to do a full cross product or are you able to do a
join?
On Tue, Nov 5, 2013 at 11:07 AM, burakkk wrote:
> There are some small different lookup files so that I need to process each
> single lookup files. From your example it can be that way:
>
> a = LOAD 'small1'; --for ex
There are some small different lookup files so that I need to process each
single lookup files. From your example it can be that way:
a = LOAD 'small1'; --for example taking source_id=1 --> then find
source_name
d = LOAD 'small2'; --for example taking campaign_id=2 --> then find
campaign_name
e =
Hi Pig experts,
Sorry to post so many questions, I have one more question on doing some
analytics on bag of tuples.
My input has the following format:
{(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */
{(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
{(id8,x,y,z), (id4, a, b,
Yes, the input files are on HDFS.
> Date: Tue, 5 Nov 2013 09:37:08 -0800
> Subject: Re: Local vs mapreduce mode
> From: pradeep...@gmail.com
> To: user@pig.apache.org
>
> Really dumb question but... when running in MapReduce mode, is your input
> file on HDFS?
>
>
> On Tue, Nov 5, 2013 at 9:17
CROSS is grossly expensive to compute so I’m not surprised that the
performance is good enough. Are you repeating your LOAD and FILTER op’s for
every one of your small files? At the end of the day, what is it that
you’re trying to accomplish? Find the 1 row you’re after and attach to all
rows in yo
Really dumb question but... when running in MapReduce mode, is your input
file on HDFS?
On Tue, Nov 5, 2013 at 9:17 AM, Sameer Tilak wrote:
>
> Dear Pig experts,
>
> I have the following Pig script that works perfectly in local mode.
> However, in the mapreduce mode I get AU as :
>
> $HADOOP_CO
Dear Pig experts,
I have the following Pig script that works perfectly in local mode. However, in
the mapreduce mode I get AU as :
$HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-0
Warning: $HADOOP_HOME is deprecated.
{}
{}
{}
{}
Both the local mode and the mapreduce mode relation A is set c
Hi Pradeep,
Yes, I implemented the outputSchema method and it fixed that issue.
We are also planning to evaluate to store intermediate and final results in
Cassandra.
> Date: Mon, 4 Nov 2013 17:08:56 -0800
> Subject: Re: Java UDF and incompatible schema
> From: pradeep...@gmail.com
> To: user@
Hi,
I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache
inside Pig?
My problem is that: I have lots of small files in hdfs. Let's say 10 files.
Each files contain more than one rows but I need only one row. But there
isn't any relationship between each other. So I filter them
13 matches
Mail list logo