[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875242#action_12875242
 ] 

Aniket Mokashi commented on PIG-972:


Approach-
1. To implement above mentioned functionality, parser keeps track of all the 
nested aliases it comes across and adds them to LOForEach.
2. After we parse the query (foreach), we dump the schema for all 
nested-aliases stored in the list.
3. When describe foreach-alias; is queried, along with schema for 
foreach-alias, we dump the schema for all nested-aliases stored in the list.

Data Structures -
Adding mDescribedAliasList to LOForEach to list all the described nested 
aliases.
LOForeach has mForEachPlans which creates plan for all projections. This keeps 
track of schemas for all nested aliases inside leaves of its plans.

Issues-
Verification of aliases in nested describe- As, we do not create a map for 
nested aliases, it is not possible to validate upfront the name of the alias 
used in nested describe.
Multiple dumps- Above approach might lead to multiple dumping of schemas.
These issues can be solved with adding more state into LOForeach. 


 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-03 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875292#action_12875292
 ] 

Pradeep Kamath commented on PIG-1433:
-

Hudson seems to be unresponsive - I ran unit tests locally and they completed 
successfully. The test-patch ant target also came back successfully except 
for a html page change in the release audit warnings which can be ignored.

Patch is ready for review.

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875319#action_12875319
 ] 

Ashutosh Chauhan commented on PIG-1433:
---

+1 for the commit. couple of notes for future:
* Since this is related to Hadoop property. We should consider this removing 
from Pig codebase when MAPREDUCE-1447 and MAPREDUCE-947 are fixed.
* We have lot of constant strings in our codebase. For the sake of clean code, 
we shall put all of those public static final string in one top level interface 
called Constants. This should be part of seperate clean-up code jira.

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-03 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875321#action_12875321
 ] 

Gianmarco De Francisci Morales commented on PIG-1433:
-

Just for the sake of clean code, constant interface is an anti-pattern.

http://en.wikipedia.org/wiki/Constant_interface

Public final instance controlled (no instance) classes are better for this 
purpose.

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875326#action_12875326
 ] 

Ashutosh Chauhan commented on PIG-1433:
---

My point was to have all constant strings in one place instead of each class 
having some of them It could be either interface or class. If interface is 
considered anti-pattern, doing it in class is fine too.

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875341#action_12875341
 ] 

Julien Le Dem commented on PIG-928:
---

I like Register better as well.

With java UDFs, you REGISTER a jar.
Then you can use the classes in the jar using their fully qualified class name.
Optionally you can use DEFINE to alias the functions or pass extra 
initialization parameters.

with scripting as implemented by Arnab, you REGISTER a script file (adding the 
script language information as it is not only java anymore) and you can use all 
the functions in it (just like you do in java).
Then I would say you should be able to alias them using DEFINE and define a 
closure by passing extra parameters, DEFINE log2 logn(2, $0); (maybe I am 
asking to much here ;) )

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1434) Allow casting relations to scalars

2010-06-03 Thread Olga Natkovich (JIRA)
Allow casting relations to scalars
--

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.8.0


This jira is to implement a simplified version of the functionality described 
in https://issues.apache.org/jira/browse/PIG-801.

The proposal is to allow casting relations to scalar types in foreach.

Example:

A = load 'data' as (x, y, z);
B = group A all;
C = foreach B generate COUNT(A);
.
X = 
Y = foreach X generate $1/(long) C;

Couple of additional comments:

(1) You can only cast relations including a single value or an error will be 
reported
(2) Name resolution is needed since relation X might have field named C in 
which case that field takes precedence.
(3) Y will look for C closest to it.

Implementation thoughts:

The idea is to store C into a file and then convert it into scalar via a UDF. I 
believe we already have a UDF that Ben Reed contributed for this purpose. Most 
of the work would be to update the logical plan to
(1) Store C
(2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1435) make sure dependent jobs fail when a jon in multiquery fails

2010-06-03 Thread Olga Natkovich (JIRA)
make sure dependent jobs fail when a jon in multiquery fails


 Key: PIG-1435
 URL: https://issues.apache.org/jira/browse/PIG-1435
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Fix For: 0.8.0


Currently if one of the MQ jobs fails, Pig tries to run all remainin jobs. As 
the result, if data was partially generated by the failed job, you might get 
incorrect results from dependent jobs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875390#action_12875390
 ] 

Aniket Mokashi commented on PIG-972:


Approach mentioned above seems to work.

Here are some proposals on semantics of nested describe-
a = load '1.txt' as (a0:int, a1:int);
b = group a by $0;

Proposal 1- Explicit describe.
c = foreach b { d = order a by $0; describe d; e = ...; generate d.$0 ...;}
(1a:Instantaneous responce - describes d after parsing above statement)
describe c;
Prints schema for c and d (but not e)
Adv - Can select which one of nestedAlias to describe.
Disadv - Extra typing.

Proposal 2:- Implicit describe (no describe nested statements)
c = foreach b { d = order a by $0; e = ...; generate d.$0 ...;}
describe c;
Describes c, d and e;
Adv- less typing
Disadv- extra prints
(2a -  describe c prints for c, d and e. Also describe c-d to describe nested 
d)
(2b -  describe c prints for c only. describe c- d to describe nested d).

Alan/Olga, Let me know your comments on this,

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeProp1.patch

Attaching patch for prop1.

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875410#action_12875410
 ] 

Daniel Dai commented on PIG-972:


If we have a statement:
{code}
a = load '1.txt' as (a0:int, a1:int);
b = group a by a0;
c = foreach b { d = order a by $0; generate d, *; }
{code}

Here is proposal 2.b from Aniket:
{code}
grunt describe c:
c::d: {a0: int,a1: int}
c: {d: {a0: int,a1: int},group: int,a: {a0: int,a1: int}}
grunt describe c::d;
c::d: {a0: int,a1: int}
{code}

I vote for this approach. Opinion?

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1436) Print number of records outputted at each step of a Pig script

2010-06-03 Thread Russell Jurney (JIRA)
Print number of records outputted at each step of a Pig script
--

 Key: PIG-1436
 URL: https://issues.apache.org/jira/browse/PIG-1436
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.7.0
Reporter: Russell Jurney
Priority: Minor
 Fix For: 0.8.0


I often run a script multiple times, or have to go and look through Hadoop task 
logs, to figure out where I broke a long script in such a way that I get 0 
records out of it.  I think this is a common problem.

If someone can point me in the right direction, I can make a pass at this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct

2010-06-03 Thread Ashutosh Chauhan (JIRA)
[Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
-

 Key: PIG-1437
 URL: https://issues.apache.org/jira/browse/PIG-1437
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct

2010-06-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1437:
--

Release Note:   (was: Its possible to rewrite queries like this
{code}
A = load 'data' as (name,age);
B = group A by (name,age);
C = foreach B generate group.name, group.age;
dump C;
{code}
or
{code} 
(name,age);
B = group A by (name
A = load 'data' as,age);
C = foreach B generate flatten(group);
dump C;
{code}

to
{code}
A = load 'data' as (name,age);
B = distinct A;
dump B;
{code}

This could only be done if no columns within the bags are referenced 
subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed 
more effeciently then group-by this will be a huge win. )
 Description: 
Its possible to rewrite queries like this
{code}
A = load 'data' as (name,age);
B = group A by (name,age);
C = foreach B generate group.name, group.age;
dump C;
{code}
or
{code} 
(name,age);
B = group A by (name
A = load 'data' as,age);
C = foreach B generate flatten(group);
dump C;
{code}

to
{code}
A = load 'data' as (name,age);
B = distinct A;
dump B;
{code}

This could only be done if no columns within the bags are referenced 
subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed 
more effeciently then group-by this will be a huge win. 

 [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
 -

 Key: PIG-1437
 URL: https://issues.apache.org/jira/browse/PIG-1437
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Priority: Minor

 Its possible to rewrite queries like this
 {code}
 A = load 'data' as (name,age);
 B = group A by (name,age);
 C = foreach B generate group.name, group.age;
 dump C;
 {code}
 or
 {code} 
 (name,age);
 B = group A by (name
 A = load 'data' as,age);
 C = foreach B generate flatten(group);
 dump C;
 {code}
 to
 {code}
 A = load 'data' as (name,age);
 B = distinct A;
 dump B;
 {code}
 This could only be done if no columns within the bags are referenced 
 subsequently in the script. Since in Pig-Hadoop world DISTINCT will be 
 executed more effeciently then group-by this will be a huge win. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.