Re: More on issue with local vs mapreduce mode

2013-11-05 Thread Serega Sheypak
"The same script does not work in the mapreduce mode. " What does it mean? 2013/11/6 Sameer Tilak > Hello, > > My script in the local mode works perfectly. The same script does not work > in the mapreduce mode. For the local mode, the o/p is saved in the current > directory, where as for the ma

More on issue with local vs mapreduce mode

2013-11-05 Thread Sameer Tilak
Hello, My script in the local mode works perfectly. The same script does not work in the mapreduce mode. For the local mode, the o/p is saved in the current directory, where as for the mapreduce mode I use /scrach directory on HDFS. Local mode: A = LOAD 'file.seq' USING SequenceFileLoader AS

Need example of python code with dependency files

2013-11-05 Thread Ryan Compton
I have some python code I'd like to deploy with a pig script. The .py code takes input from sys.stdin and outputs to sys.stdout. It also needs some parameter files to run properly. The book "Programming Pig" tells me: "The workaround for this is to create a TAR file and ship that, and then have a

Pig > 0.10 always throws invalid stream header

2013-11-05 Thread Claudio Romo Otto
Hi guys, For some reason I cannot setup any version higher than Pig 0.10 with Hadoop 1.2.1 and Cassandra 1.2.10. For example, using Pig 0.12 when I try a very simple dump I get this error from JobTracker log: 2013-11-05 17:44:12,000 INFO org.apache.hadoop.mapred.TaskInProgress: Error from at

Re: Pig Distributed Cache

2013-11-05 Thread Pradeep Gollakota
I see... do you have to do a full cross product or are you able to do a join? On Tue, Nov 5, 2013 at 11:07 AM, burakkk wrote: > There are some small different lookup files so that I need to process each > single lookup files. From your example it can be that way: > > a = LOAD 'small1'; --for ex

Re: Pig Distributed Cache

2013-11-05 Thread burakkk
There are some small different lookup files so that I need to process each single lookup files. From your example it can be that way: a = LOAD 'small1'; --for example taking source_id=1 --> then find source_name d = LOAD 'small2'; --for example taking campaign_id=2 --> then find campaign_name e =

Bag of tuples

2013-11-05 Thread Sameer Tilak
Hi Pig experts, Sorry to post so many questions, I have one more question on doing some analytics on bag of tuples. My input has the following format: {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */ {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */ {(id8,x,y,z), (id4, a, b,

RE: Local vs mapreduce mode

2013-11-05 Thread Sameer Tilak
Yes, the input files are on HDFS. > Date: Tue, 5 Nov 2013 09:37:08 -0800 > Subject: Re: Local vs mapreduce mode > From: pradeep...@gmail.com > To: user@pig.apache.org > > Really dumb question but... when running in MapReduce mode, is your input > file on HDFS? > > > On Tue, Nov 5, 2013 at 9:17

Re: Pig Distributed Cache

2013-11-05 Thread Pradeep Gollakota
CROSS is grossly expensive to compute so I’m not surprised that the performance is good enough. Are you repeating your LOAD and FILTER op’s for every one of your small files? At the end of the day, what is it that you’re trying to accomplish? Find the 1 row you’re after and attach to all rows in yo

Re: Local vs mapreduce mode

2013-11-05 Thread Pradeep Gollakota
Really dumb question but... when running in MapReduce mode, is your input file on HDFS? On Tue, Nov 5, 2013 at 9:17 AM, Sameer Tilak wrote: > > Dear Pig experts, > > I have the following Pig script that works perfectly in local mode. > However, in the mapreduce mode I get AU as : > > $HADOOP_CO

Local vs mapreduce mode

2013-11-05 Thread Sameer Tilak
Dear Pig experts, I have the following Pig script that works perfectly in local mode. However, in the mapreduce mode I get AU as : $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-0 Warning: $HADOOP_HOME is deprecated. {} {} {} {} Both the local mode and the mapreduce mode relation A is set c

RE: Java UDF and incompatible schema

2013-11-05 Thread Sameer Tilak
Hi Pradeep, Yes, I implemented the outputSchema method and it fixed that issue. We are also planning to evaluate to store intermediate and final results in Cassandra. > Date: Mon, 4 Nov 2013 17:08:56 -0800 > Subject: Re: Java UDF and incompatible schema > From: pradeep...@gmail.com > To: user@

Pig Distributed Cache

2013-11-05 Thread burakkk
Hi, I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache inside Pig? My problem is that: I have lots of small files in hdfs. Let's say 10 files. Each files contain more than one rows but I need only one row. But there isn't any relationship between each other. So I filter them