[jira] [Commented] (PIG-2353) RANK function like in SQL

2013-02-28 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589632#comment-13589632
 ] 

David Ciemiewicz commented on PIG-2353:
---

Did anyone look at the solution I proposed in JIRA PIG 821 for partitioned rank 
computation on billions of rows?  There may be better solutions but I do know 
that that one works without need for serialization of all rows, only on the 
histogram.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2013-01-07 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546101#comment-13546101
 ] 

Rohini Palaniswamy commented on PIG-2353:
-

Shouldn't we make it some part of documentation or RELEASE_NOTES.txt, instead 
of just Release note in JIRA?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2013-01-07 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546114#comment-13546114
 ] 

Olga Natkovich commented on PIG-2353:
-

I believe we agreed that the document changes are included and reviewed as part 
of the patch. Since this was not done this way, we need to get a separate patch 
for docs,

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2013-01-07 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546296#comment-13546296
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Hi, sorry I guess I misunderstood.
I thought that PIG-2947 was sufficient as documentation and that we just wanted 
to clarify the release notes.

Should I open a separate Jira to include the release notes of the Jira inside 
RELEASE_NOTES.txt ?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-12-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535441#comment-13535441
 ] 

Jonathan Coveney commented on PIG-2353:
---

Did this make rank a reserved keyword? We may need to document this as a 
non-backwards compatible change if it is, as many scripts use rank as a 
column name. Example:

{code}
A = load 'thing';
B = FOREACH (GROUP A all) GENERATE MIN(A.rank);
{code}

Of all the errors you'd expect, I wasn't expecting this one:


2012-12-18 23:18:36,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1200: line 2, column 38  mismatched input '(' expecting SEMI_COLON
Details at logfile: /var/log/pig/pig_1355872714665.log

I culled this example from a larger script, and it looks like removing rank as 
a column name fixed it. Is this a known issue? I think we can refine the parser 
to work with rank in that position, but I thought it would be worth asking.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-12-18 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535455#comment-13535455
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Hi Jonathan,
Yes, RANK is now an operator and thus a reserved keyword.
We can add it to the release notes.

The parser is definitely a bit rough and could use some reworking, especially 
in the error messages, so I am all in for it. Not sure if it is a known issue. 
Can you use LOAD or FOREACH as column names?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-12-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535506#comment-13535506
 ] 

Jonathan Coveney commented on PIG-2353:
---

You cannot, so this is not without precedent. We should document it, and 
ideally introduce better error messages around it (separate JIRA, and for other 
keywords it is equally as bad).

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-10-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481537#comment-13481537
 ] 

Olga Natkovich commented on PIG-2353:
-

Can you please add usage example to release notes section, thanks!

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-10-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481703#comment-13481703
 ] 

Allan Avendaño commented on PIG-2353:
-

Hi Olga!

Does PIG-2947 apply as release notes? 

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-10-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481817#comment-13481817
 ] 

Olga Natkovich commented on PIG-2353:
-

Yes, I think that's fine - I did not realize it was covered in a separate JIRA, 
thanks!

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-09-04 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447590#comment-13447590
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

There is a regression in the latest patch.
It does not work properly in a multi-machine environment.
It seems that the values of the counters are not properly serialized in the 
JobConf.
We need to add a test and fix the bug before committing the patch.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-06-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399500#comment-13399500
 ] 

Allan Avendaño commented on PIG-2353:
-

Current implementation is now available for your review at 
https://reviews.apache.org/r/5523/diff/#index_header

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG-2353-2, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281934#comment-13281934
 ] 

Daniel Dai commented on PIG-2353:
-

You mean the global rank is implemented by group all + UDF? Do we have a plan 
for a distributed implementation in this project?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281943#comment-13281943
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

No, sorry, there is a typo in my previous comment.
What I meant is that partitioned rank is only group by + UDF.
The main aim of this project is a distributed implementation of the global 
RANK, which needs to be implemented from scratch.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281954#comment-13281954
 ] 

Daniel Dai commented on PIG-2353:
-

So partitioned and non-partitioned RANK are using different implementation, 
right?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281959#comment-13281959
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Yes, partitioned rank can be simply group by + UDF.
Global rank should follow the implementation blueprint that I outlined in this 
Jira, or something similar to make it fully scalable.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280877#comment-13280877
 ] 

Allan Avendaño commented on PIG-2353:
-

Hi to everybody,

I am working on this functionality for GSOC 2012, with Gianmarco as my mentor. 
I had been working on syntax, and now is recognized this syntax, recommended by 
Gianmarco:

RANK relation ( BY column (ASC|DES)? )?

I was also looking for some other functionality that can be incorporated, and 
on SQL Server, Oracle and Postgresql [1][2][3], it is also possible to specify 
a partition (ranking over a specific group) at the same rank operation. 
Gianmarco already pointed me out that it could imply some performance flaws. 


Looking forward for yours feedback/suggestion.

References:

[1] http://msdn.microsoft.com/en-us/library/ms176102.aspx
[2] http://www.techonthenet.com/oracle/functions/rank.php
[3] http://www.postgresql.org/docs/9.1/static/tutorial-window.html

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281166#comment-13281166
 ] 

Daniel Dai commented on PIG-2353:
-

We can use secondary sort to implement partitioned rank. However, I think 
partitioned rank and non-partitioned rank may have to adopt a totally different 
implementation. We can focus on non-partitioned rank first.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-03-25 Thread Apurv Verma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237987#comment-13237987
 ] 

Apurv Verma commented on PIG-2353:
--

Hello,
I am an undergraduate student from India and I would be interested in working 
on this as a GSoC project. I have a beginner level knowledge of writing 
map-reduce tasks so would need help with it. I have understood the algorithm 
which Gianmarco has outlined in the comments.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2012
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-03-14 Thread Gianmarco De Francisci Morales (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229114#comment-13229114
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Thanks Daniel, I am excited this Jira is going to be a candidate for GSoC, I 
was going to propose it myself!

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2012
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-01-27 Thread David Ciemiewicz (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194966#comment-13194966
 ] 

David Ciemiewicz commented on PIG-2353:
---

There is a much more efficient way to compute RANK, DENSE_RANK, CUMULATIVE_SUM 
and more if you have billions of rows of data, especially if the data follows a 
power law/zipf distribution (like queries do).  It involves using Map-Reduce to 
compute a histogram of the frequencies/counts and then serializing and sorting 
the histogram which is something like 20,000 rows for 1B queries.

https://issues.apache.org/jira/browse/PIG-821

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-21 Thread Gianmarco De Francisci Morales (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173987#comment-13173987
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Actually I was thinking that RANK would only do the counting and appending.
This way you could get a sort + rank with
{code}
B = RANK ( ORDER A BY column ASC);
{code}

But you could also get your dataset from file and rank it directly, without any 
specific order
{code}
A = LOAD 'path/to/file';
B = RANK A;
C = ORDER B BY column
{code}

This, for example, gives you the permutation that was used to sort the dataset, 
which might be useful.
Also, RANK would allow to create a data column that reflects the ordering that 
you have in your data.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-21 Thread Jonathan Coveney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174413#comment-13174413
 ] 

Jonathan Coveney commented on PIG-2353:
---

Weird, the above got garbled and I can't edit it, but I think the idea is clear.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-20 Thread Gianmarco De Francisci Morales (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173078#comment-13173078
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Hi Jonathan,
thanks for giving it a try!

I think the approach is fine for an initial implementation.
To scale it out, we need a deeper integration with Pig (i.e. it need to be an 
operator and not a UDF), but this is the subject for another Jira.

Just one more comment.
I am not sure about testing in piggybank.
Should we use e2e testing instead of JUnit?


 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-20 Thread Jonathan Coveney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173394#comment-13173394
 ] 

Jonathan Coveney commented on PIG-2353:
---

Is there anything in the piggybank using the e2e? I'm not sure what the word is 
about piggybank and when to use e2e. Someone else will have to weigh in on that.

As far as that other JIRA...you should make it and link it, though I'm curious 
what benefit/optimization you forsee RANK having if it has access to Pig's 
internals.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-20 Thread Gianmarco De Francisci Morales (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173405#comment-13173405
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

My idea would be to have a distributed implementation of RANK in the following 
manner:

Run a Map-only job with n mapper, each mapper just computes the number of 
records in each input split and accumulates it in an internal variable (or 
alternatively it uses dynamic counters).
At the end, we have a map(partition_id = number_of_records).
This map is small enough to be put in the distributed cache.
Compute the cumulative sum of each number of records.
Then launch a second Map-only job with exactly n mappers, each will read it's 
input split and the cumulative number of records preceding it, initialize the 
counter with this value and finally RANK the records as they come in.

This would be a distributed implementation of RANK that could scale very well.
I haven't figured out how to integrate it into Pig yet.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-20 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173431#comment-13173431
 ] 

Ashutosh Chauhan commented on PIG-2353:
---

I was also thinking of this problem of implementing statistical measures (like 
top-K, median, quantiles) etc. efficiently in a distributed manner which is 
amenable to MR framework. Rank is a basis of it. I came up with similiar 
outline as yours, your have laid it out well. I think this is pretty useful to 
be in Pig and these are kind of features which higher level language like Pig 
should make available to its users. Sophisticated users will expect this and 
this will derive adoption.   
+1 for distributed implementation of RANK in Pig.

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2353) RANK function like in SQL

2011-12-20 Thread Jonathan Coveney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173438#comment-13173438
 ] 

Jonathan Coveney commented on PIG-2353:
---

So Gianmarco, are you thinking this sort of syntax:

{code}
A = some relation
B = RANK A BY column name ASC | DESC;
{code}

IE it'd just follow the order syntax, but add the rank to the end?

And I assume your n map job would run after already sorting the, right? So 
first rank would run the order by, and then it would run the two jobs that 
would actually append the rank?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
 Attachments: PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira