Yes, that is indeed better.
On Fri, Jun 29, 2012 at 06:39:58PM -0700, Jonathan Coveney wrote:
> Ideally, you should use the TOP function. It will be more efficient, as it
> is algebraic.
>
> 2012/6/29 Kris Coward
>
> >
> > LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
http://pig.apache.org/docs/r0.10.0/func.html#topx
On Jun 29, 2012, at 5:19 PM, Benjamin Juhn wrote:
> Hi there,
>
> I'm trying to write a group by statement, only returning the top 100 records
> from each group. Does pig support this?
>
> Thanks,
> Ben
Ideally, you should use the TOP function. It will be more efficient, as it
is algebraic.
2012/6/29 Kris Coward
>
> LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
> These should be able to do what you want.
>
> e.g.
>
> B = GROUP A BY key
> C = FOREACH B {
>X = ORDER
LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
These should be able to do what you want.
e.g.
B = GROUP A BY key
C = FOREACH B {
X = ORDER A BY orderingParam;
Y = LIMIT X 100;
GENERATE group, Y;}
-Kris
On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juh
You would want to do a FOREACH after the GROUP BY where you limit the contents
of each group. Usually you would also want to order the bag before you limit
it, so that you are taking the top 100 of something, rather than just a random
selection of 100. Here's an example that creates a list of th
Hey Ben,
You can do a nested ORDER => LIMIT inside a FOREACH
http://pig.apache.org/docs/r0.10.0/basic.html#foreach
Newer versions of Pig also have a TOP function that will replace the ORDER =>
LIMIT.
-Sal
On Jun 29, 2012, at 4:19 PM, Benjamin Juhn wrote:
Hi there,
I'm trying to write a group
Hi there,
I'm trying to write a group by statement, only returning the top 100 records
from each group. Does pig support this?
Thanks,
Ben
Thank you Bill, that should do the work :-)
Le 29/06/12 18:16, Bill Graham a
écrit :
There's support for this in the trunk FYI:
https://issues.apache.org/jira/browse/PIG-2600
On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat wrote:
Hi !
I've
The answer here is to use a more powerful storage format, which is a good
habit to get into with complicated structures.
2012/6/28 Yang
> if I have a field that contains ",", and I add it to a tuple,
>
> then then tuple becomes part of a row in a relation.
>
> I STORE the relation using PigSto
Does anyone have any insight into the problem here? Am I doing something wrong?
From: Matthew Hayes [mha...@linkedin.com]
Sent: Monday, June 11, 2012 10:15 AM
To: user@pig.apache.org
Subject: RE: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0
perfect, thanks
On Fri, Jun 29, 2012 at 10:11 AM, Jie Li wrote:
> Pig does have a "-c" to check the syntax:
>
> pig -x local -c -f x.pig
>
> Jie
>
> On Fri, Jun 29, 2012 at 5:02 AM, Ruslan Al-Fakikh
> wrote:
> > Hey Yang,
> >
> > For debugging you may want the local mode, try
> > pig -x local
>
I would run a perf test, but compared to the many other costs, I think it
will be minimal (unless it's a really massive bag). Pig should probably
allow for more graceful initialization in cases like this, but in my
experience I haven't noticed any serious degradation from this sort of
thing.
2012/
On a different topic, I'm interested in why you refuse to use a project in the
incubator. Incubation is the Apache process by why a community is built around
the code. It says nothing about the maturity of the code.
Alan.
On Jun 28, 2012, at 10:59 AM, Ruslan Al-Fakikh wrote:
> Hi Markus,
>
Pig does have a "-c" to check the syntax:
pig -x local -c -f x.pig
Jie
On Fri, Jun 29, 2012 at 5:02 AM, Ruslan Al-Fakikh
wrote:
> Hey Yang,
>
> For debugging you may want the local mode, try
> pig -x local
>
> Also there are some useful commands like, DESCRIBE, ILLUSTRATE
>
> Ruslan
>
> On Fri,
Does putting the parameters in a file using -param_file help?
Alan.
On Jun 27, 2012, at 9:02 AM, Markus Resch wrote:
> Hey everyone,
>
> we're still using CDH3u3 pig (0.8.1).
> As out pig scripts are growing we like to split them to modules and call
> them via run. the parameter substitution
There's support for this in the trunk FYI:
https://issues.apache.org/jira/browse/PIG-2600
On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat wrote:
> Hi !
>
> I've a bag of map...
>
> ([k3#v13,k1#v11,k2#v12])
> ([k1#v12,k2#v22])
> ([k4#v31])
>
> ... and would like to extract all key names:
>
> (k3)
>
Hi !
I've a bag of map...
([k3#v13,k1#v11,k2#v12])
([k1#v12,k2#v22])
([k4#v31])
... and would like to extract all key names:
(k3)
(k1)
(k2)
(k1)
(k2)
(k4)
I cannot figure out how to do this (except by writting an UDF).
Any idea ?
Thanks a lot
Hey Yang,
For debugging you may want the local mode, try
pig -x local
Also there are some useful commands like, DESCRIBE, ILLUSTRATE
Ruslan
On Fri, Jun 29, 2012 at 7:38 AM, Jonathan Coveney wrote:
> Do you have an example?
>
> 2012/6/28 Yang
>
>> thanks
>>
>>
>> it was simply "blahblah field
Very useful links Thejas! Especially the Talend open studio for Big Data.
Are there any benchmarks available that you might have tried, so that users
find it easy to evaluate Talend with other tools.
I edited the subject and removed old thread, so that the older thread is
not hijacked :-D
Thanks,
Hello All,
The link for road map for Pig ->
https://cwiki.apache.org/PIG/pig-journal.html
Is there any updated pig-journal based on 0.10.0 version of Pig? Any
roadmaps on what all features are planned to come?
Thanks, Subir
> -Original Message-
> From: Dexin Wang [mailto:wangde...@gmail.com]
> Sent: Wednesday, June 27, 2012 11:00 PM
> To: user@pig.apache.org
> Subject: Re: Passing a BAG to Pig UDF constructor?
>
> That's a good idea (to pass the bag to UDF and initialize it on first
> UDF invocation). Thank
21 matches
Mail list logo