Re: Group by Fetching top 100 from each group

2012-06-29 Thread Kris Coward
Yes, that is indeed better. On Fri, Jun 29, 2012 at 06:39:58PM -0700, Jonathan Coveney wrote: > Ideally, you should use the TOP function. It will be more efficient, as it > is algebraic. > > 2012/6/29 Kris Coward > > > > > LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.

Re: Group by Fetching top 100 from each group

2012-06-29 Thread Corbin Hoenes
http://pig.apache.org/docs/r0.10.0/func.html#topx On Jun 29, 2012, at 5:19 PM, Benjamin Juhn wrote: > Hi there, > > I'm trying to write a group by statement, only returning the top 100 records > from each group. Does pig support this? > > Thanks, > Ben

Re: Group by Fetching top 100 from each group

2012-06-29 Thread Jonathan Coveney
Ideally, you should use the TOP function. It will be more efficient, as it is algebraic. 2012/6/29 Kris Coward > > LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement. > These should be able to do what you want. > > e.g. > > B = GROUP A BY key > C = FOREACH B { >X = ORDER

Re: Group by Fetching top 100 from each group

2012-06-29 Thread Kris Coward
LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement. These should be able to do what you want. e.g. B = GROUP A BY key C = FOREACH B { X = ORDER A BY orderingParam; Y = LIMIT X 100; GENERATE group, Y;} -Kris On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juh

RE: Group by Fetching top 100 from each group

2012-06-29 Thread Austin Stickney
You would want to do a FOREACH after the GROUP BY where you limit the contents of each group. Usually you would also want to order the bag before you limit it, so that you are taking the top 100 of something, rather than just a random selection of 100. Here's an example that creates a list of th

Re: Group by Fetching top 100 from each group

2012-06-29 Thread Sal Uryasev
Hey Ben, You can do a nested ORDER => LIMIT inside a FOREACH http://pig.apache.org/docs/r0.10.0/basic.html#foreach Newer versions of Pig also have a TOP function that will replace the ORDER => LIMIT. -Sal On Jun 29, 2012, at 4:19 PM, Benjamin Juhn wrote: Hi there, I'm trying to write a group

Group by Fetching top 100 from each group

2012-06-29 Thread Benjamin Juhn
Hi there, I'm trying to write a group by statement, only returning the top 100 records from each group. Does pig support this? Thanks, Ben

Re: How to extract the keys of a map ?

2012-06-29 Thread Vincent Barat
Thank you Bill, that should do the work :-) Le 29/06/12 18:16, Bill Graham a écrit : There's support for this in the trunk FYI: https://issues.apache.org/jira/browse/PIG-2600 On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat wrote: Hi ! I've

Re: PigStorage problem for nested structures?

2012-06-29 Thread Jonathan Coveney
The answer here is to use a more powerful storage format, which is a good habit to get into with complicated structures. 2012/6/28 Yang > if I have a field that contains ",", and I add it to a tuple, > > then then tuple becomes part of a row in a relation. > > I STORE the relation using PigSto

RE: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0

2012-06-29 Thread Matthew Hayes
Does anyone have any insight into the problem here? Am I doing something wrong? From: Matthew Hayes [mha...@linkedin.com] Sent: Monday, June 11, 2012 10:15 AM To: user@pig.apache.org Subject: RE: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0

Re: suggestion

2012-06-29 Thread Yang
perfect, thanks On Fri, Jun 29, 2012 at 10:11 AM, Jie Li wrote: > Pig does have a "-c" to check the syntax: > > pig -x local -c -f x.pig > > Jie > > On Fri, Jun 29, 2012 at 5:02 AM, Ruslan Al-Fakikh > wrote: > > Hey Yang, > > > > For debugging you may want the local mode, try > > pig -x local >

Re: Passing a BAG to Pig UDF constructor?

2012-06-29 Thread Jonathan Coveney
I would run a perf test, but compared to the many other costs, I think it will be minimal (unless it's a really massive bag). Pig should probably allow for more graceful initialization in cases like this, but in my experience I haven't noticed any serious degradation from this sort of thing. 2012/

Re: Best Practice: store depending on data content

2012-06-29 Thread Alan Gates
On a different topic, I'm interested in why you refuse to use a project in the incubator. Incubation is the Apache process by why a community is built around the code. It says nothing about the maturity of the code. Alan. On Jun 28, 2012, at 10:59 AM, Ruslan Al-Fakikh wrote: > Hi Markus, >

Re: suggestion

2012-06-29 Thread Jie Li
Pig does have a "-c" to check the syntax: pig -x local -c -f x.pig Jie On Fri, Jun 29, 2012 at 5:02 AM, Ruslan Al-Fakikh wrote: > Hey Yang, > > For debugging you may want the local mode, try > pig -x local > > Also there are some useful commands like, DESCRIBE, ILLUSTRATE > > Ruslan > > On Fri,

Re: modulize pig scripts via 'run'; pass param containing special chars

2012-06-29 Thread Alan Gates
Does putting the parameters in a file using -param_file help? Alan. On Jun 27, 2012, at 9:02 AM, Markus Resch wrote: > Hey everyone, > > we're still using CDH3u3 pig (0.8.1). > As out pig scripts are growing we like to split them to modules and call > them via run. the parameter substitution

Re: How to extract the keys of a map ?

2012-06-29 Thread Bill Graham
There's support for this in the trunk FYI: https://issues.apache.org/jira/browse/PIG-2600 On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat wrote: > Hi ! > > I've a bag of map... > > ([k3#v13,k1#v11,k2#v12]) > ([k1#v12,k2#v22]) > ([k4#v31]) > > ... and would like to extract all key names: > > (k3) >

How to extract the keys of a map ?

2012-06-29 Thread Vincent Barat
Hi ! I've a bag of map... ([k3#v13,k1#v11,k2#v12]) ([k1#v12,k2#v22]) ([k4#v31]) ... and would like to extract all key names: (k3) (k1) (k2) (k1) (k2) (k4) I cannot figure out how to do this (except by writting an UDF). Any idea ? Thanks a lot

Re: suggestion

2012-06-29 Thread Ruslan Al-Fakikh
Hey Yang, For debugging you may want the local mode, try pig -x local Also there are some useful commands like, DESCRIBE, ILLUSTRATE Ruslan On Fri, Jun 29, 2012 at 7:38 AM, Jonathan Coveney wrote: > Do you have an example? > > 2012/6/28 Yang > >> thanks >> >> >> it was simply "blahblah field

how to start working with hadoop in single Node and cluster environment

2012-06-29 Thread Subir S
Very useful links Thejas! Especially the Talend open studio for Big Data. Are there any benchmarks available that you might have tried, so that users find it easy to evaluate Talend with other tools. I edited the subject and removed old thread, so that the older thread is not hijacked :-D Thanks,

PigJournal update

2012-06-29 Thread Subir S
Hello All, The link for road map for Pig -> https://cwiki.apache.org/PIG/pig-journal.html Is there any updated pig-journal based on 0.10.0 version of Pig? Any roadmaps on what all features are planned to come? Thanks, Subir

RE: Passing a BAG to Pig UDF constructor?

2012-06-29 Thread Mridul Muralidharan
> -Original Message- > From: Dexin Wang [mailto:wangde...@gmail.com] > Sent: Wednesday, June 27, 2012 11:00 PM > To: user@pig.apache.org > Subject: Re: Passing a BAG to Pig UDF constructor? > > That's a good idea (to pass the bag to UDF and initialize it on first > UDF invocation). Thank