Issue solved. I configured eclipse with additional env variables and it
solved the error :)
Thanks.
On Mon, Mar 24, 2014 at 2:12 PM, Keren Ouaknine ker...@gmail.com wrote:
Hello,
I encounter an HDFS error running Pig from eclipse. The error doesn't
occur when I run Pig from the command
Hi All,I am reading hbase table as following: A = LOAD 'APE1_RATED_EVENT' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('', '-loadKey true') AS
(id:bytearray);
B = GROUP A BY id;
X = FOREACH B GENERATE COUNT_STAR(A);
DUMP X
The job failed, and I found following error in hadoop task
I hithttps://issues.apache.org/jira/browse/PIG-3512
Le 24/03/2014 14:40, Vincent Barat a écrit :
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data:
Sadly I was not able to attend the last bay area user meetup at Linkedin that
was held on March 14. I'm very interested to see some of the presentations, so
I'm wondering if there are plans to publish the recordings?
Jarcec
signature.asc
Description: Digital signature
I am trying to perform the following action, but the only solution I have
been able to come up with is using a CROSS, but I don't want to use that
statement as it is a very expensive process.
(1,2,3,4,5) (10,11)
(1,2,4,5,7) (10,11)
(1,5,7,8,9) (10,11)
I want to make
I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will be the
following:
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank
Rank each data set then join on the rank.
On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage csur...@gmail.com wrote:
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014
yes
On Tue, Mar 25, 2014 at 4:07 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
Oh, sorry. This new example is something different from what I understood
before. I thought you were only trying to append one relation (with one
tuple) to another (which has more than one tuple).
So essentially
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I don't understand what you're trying to do from your example.
If you perform
John's answer about RANK sounds like it should solve your problem
On Mar 25, 2014, at 1:13 PM, Christopher Surage csur...@gmail.com wrote:
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.
On Tue, Mar
Here is how to use rank and join for this problem:
sh cat xxx
1,2,3,4,5
1,2,4,5,7
1,5,7,8,9
sh cat yyy
10,11
10,12
10,13
a= load 'xxx' using PigStorage(',');
b= load 'yyy' using PigStorage(',');
a2 = rank a;
b2 = rank b;
c = join a1 by $0, b2 by $0;
c2 = order c by $6;
c3 = foreach c2
CROSS is by definition a very very expensive operation. Regardless, CROSS
is the wrong operator for what you're trying to do.
As was suggested by others, you want to RANK the relations then do a JOIN
by the rank.
On Tue, Mar 25, 2014 at 1:27 PM, william.dowl...@thomsonreuters.com wrote:
Here
I don't think my version of PIG supports the rank function, I keep getting
Internal Error. I would update it, but I am not in control of the cluster.
On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman
andrew.mussel...@gmail.com wrote:
John's answer about RANK sounds like it should solve your
In that situation you could write a script that tacks on the equivalent value
that rank does, and stream the ordered relations through it.
I'm assuming you have a sense of order on both these relations.
After that join like you would after rank.
I'm not at a computer so can't type up an
Hello,
There is a similar UDF in DataFu named Enumerate.
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html
I wish it may help.
James
Unfortunately, the Enumerate UDF from DataFu would not work in this case.
The UDF works on Bags and in this case, we want to enumerate a relation.
Implementing RANK is a very tricky thing to do correctly. I'm not even sure
if it's doable just by using Pig operators, UDFs or macros. Best option is
17 matches
Mail list logo