Re: HDFS issue

2014-03-25 Thread Keren Ouaknine
Issue solved. I configured eclipse with additional env variables and it solved the error :) Thanks. On Mon, Mar 24, 2014 at 2:12 PM, Keren Ouaknine ker...@gmail.com wrote: Hello, I encounter an HDFS error running Pig from eclipse. The error doesn't occur when I run Pig from the command

pig-0.12.0+PIG-3285: Encounter NoClassDefFoundError: org.cloudera.htrace.Trace during reading hbase table in pig grunt

2014-03-25 Thread lulynn_2008
Hi All,I am reading hbase table as following: A = LOAD 'APE1_RATED_EVENT' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('', '-loadKey true') AS (id:bytearray); B = GROUP A BY id; X = FOREACH B GENERATE COUNT_STAR(A); DUMP X The job failed, and I found following error in hadoop task

Re: Could not estimate number of reducers

2014-03-25 Thread Vincent Barat
I hithttps://issues.apache.org/jira/browse/PIG-3512 Le 24/03/2014 14:40, Vincent Barat a écrit : Hi, Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation of the number of reducers no longer work. My script: A = load 'data'; B = group A by $0; store B into 'out'; My data:

Recordings from Pig user meetup at Linkedin, Mar 14

2014-03-25 Thread Jarek Jarcec Cecho
Sadly I was not able to attend the last bay area user meetup at Linkedin that was held on March 14. I'm very interested to see some of the presentations, so I'm wondering if there are plans to publish the recordings? Jarcec signature.asc Description: Digital signature

Any way to join two aliases without using CROSS

2014-03-25 Thread Christopher Surage
I am trying to perform the following action, but the only solution I have been able to come up with is using a CROSS, but I don't want to use that statement as it is a very expensive process. (1,2,3,4,5) (10,11) (1,2,4,5,7) (10,11) (1,5,7,8,9) (10,11) I want to make

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Pradeep Gollakota
I don't understand what you're trying to do from your example. If you perform a cross on the data you have, the output will be the following: (1,2,3,4,5,10,11) (1,2,3,4,5,10,11) (1,2,3,4,5,10,11) (1,2,4,5,7,10,11) (1,2,4,5,7,10,11) (1,2,4,5,7,10,11) (1,5,7,8,9,10,11) (1,5,7,8,9,10,11)

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Christopher Surage
The output I would like to see is (1,2,3,4,5,10,11) (1,2,4,5,7,10,12) (1,5,7,8,9,10,13) On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I don't understand what you're trying to do from your example. If you perform a cross on the data you have, the output will

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread John Meagher
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank Rank each data set then join on the rank. On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage csur...@gmail.com wrote: The output I would like to see is (1,2,3,4,5,10,11) (1,2,4,5,7,10,12) (1,5,7,8,9,10,13) On Tue, Mar 25, 2014

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Christopher Surage
yes On Tue, Mar 25, 2014 at 4:07 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Oh, sorry. This new example is something different from what I understood before. I thought you were only trying to append one relation (with one tuple) to another (which has more than one tuple). So essentially

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Christopher Surage
@ pradeep, I know what the cross product will do, but I have many lines in many files. So the cross will take far too long to complete. On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I don't understand what you're trying to do from your example. If you perform

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Andrew Musselman
John's answer about RANK sounds like it should solve your problem On Mar 25, 2014, at 1:13 PM, Christopher Surage csur...@gmail.com wrote: @ pradeep, I know what the cross product will do, but I have many lines in many files. So the cross will take far too long to complete. On Tue, Mar

RE: Any way to join two aliases without using CROSS

2014-03-25 Thread william.dowling
Here is how to use rank and join for this problem: sh cat xxx 1,2,3,4,5 1,2,4,5,7 1,5,7,8,9 sh cat yyy 10,11 10,12 10,13 a= load 'xxx' using PigStorage(','); b= load 'yyy' using PigStorage(','); a2 = rank a; b2 = rank b; c = join a1 by $0, b2 by $0; c2 = order c by $6; c3 = foreach c2

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Pradeep Gollakota
CROSS is by definition a very very expensive operation. Regardless, CROSS is the wrong operator for what you're trying to do. As was suggested by others, you want to RANK the relations then do a JOIN by the rank. On Tue, Mar 25, 2014 at 1:27 PM, william.dowl...@thomsonreuters.com wrote: Here

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Christopher Surage
I don't think my version of PIG supports the rank function, I keep getting Internal Error. I would update it, but I am not in control of the cluster. On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: John's answer about RANK sounds like it should solve your

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Andrew Musselman
In that situation you could write a script that tacks on the equivalent value that rank does, and stream the ordered relations through it. I'm assuming you have a sense of order on both these relations. After that join like you would after rank. I'm not at a computer so can't type up an

??????Re: Any way to join two aliases without using CROSS

2014-03-25 Thread James
Hello, There is a similar UDF in DataFu named Enumerate. http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html I wish it may help. James

Re: 回复:Re: Any way to join two aliases without using CROSS

2014-03-25 Thread Pradeep Gollakota
Unfortunately, the Enumerate UDF from DataFu would not work in this case. The UDF works on Bags and in this case, we want to enumerate a relation. Implementing RANK is a very tricky thing to do correctly. I'm not even sure if it's doable just by using Pig operators, UDFs or macros. Best option is