[jira] Commented: (PIG-1552) Nested describe failed when the alias is not referred in the first foreach inner plan

2010-08-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900905#action_12900905
 ] 

Aniket Mokashi commented on PIG-1552:
-

+1

 Nested describe failed when the alias is not referred in the first foreach 
 inner plan
 -

 Key: PIG-1552
 URL: https://issues.apache.org/jira/browse/PIG-1552
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1552-1.patch


 The following script fail:
 {code}
 A = load 'studentab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B {
 D = distinct A.age;
 generate group, COUNT(D);
 }
 describe C::D;
 {code}
 If we remove group from generate statement, then it works

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-08-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900911#action_12900911
 ] 

Aniket Mokashi commented on PIG-506:


Current patch doesnt consider case for parallel keyword -- we can fix this by 
adding -D mapred.reduce.tasks=n to params of RunJar. (ToDo)

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 NativeMapReduceFinale2.patch, TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-506) Does pig need a NATIVE keyword?

2010-08-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi reassigned PIG-506:
--

Assignee: Thejas M Nair  (was: Aniket Mokashi)

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Thejas M Nair
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Attachment: NativeMapReduceFinale3.patch

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-19 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Attachment: NativeMapReduceFinale1.patch

Attaching the final patch-
Includes - MR changes, optimizer related changes, test cases for basic mr.
ToDo- Test cases for optimizer

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-19 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Status: Patch Available  (was: Open)

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-19 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Attachment: TestWordCount.jar

We also need to add this jar in lib to get tests working.
ToDo- CreateTestJarAtRuntime

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-19 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Attachment: NativeMapReduceFinale2.patch

Submitting the updated patch

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 NativeMapReduceFinale2.patch, TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-08-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899965#action_12899965
 ] 

Aniket Mokashi commented on PIG-506:


Wiki page explaining details of specification and implementation has been 
uploaded at - http://wiki.apache.org/pig/NativeMapReduce

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-08-16 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899042#action_12899042
 ] 

Aniket Mokashi commented on PIG-1434:
-

Comments on the finalized syntax--

With the above changes Pig now supports -
{code}
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}
1. Casts are *optional* and the datatype of scalar depends on the schema of C 
(ie depending on the schema of C, we add the casts implicitly. So, typically, 
count is a long and max is a double). In case of undeclared(null) schema for C, 
default type of scalar is *chararray*.

2. Projections are mandatory. For example
{code}
Y = foreach X generate C; // is an *error*
{code}
We need to use-
{code}
Y = foreach X generate C.$0; 
{code}

3. Check if C is a scalar or not is not performed until runtime, thus it will 
fail at the time of execution of UDF with ExecException(Scalar has more than 
one row in the output).

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?

2010-08-10 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-506:
---

Attachment: NativeImplInitial.patch

Attached patch has initial implementation for this feature--
Dump, store, explain work fine. PigStats are generated properly.

ToDos- 
Check for multiquery optimization related tests
Add test cases

Usage-
A = load 'dict.txt';
B = mapreduce 'hadoop-0.20.2-examples.jar' Store A into 'input' Load 'output' 
`wordcount input output`;

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImplFinaleRebase.patch

Attaching rebased version of the patch...

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Patch Available  (was: Open)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Open  (was: Patch Available)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: (was: ScalarImplFinale1.patch)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImplFinale1.patch

Removed the unused variable(findbug warning)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImplFinale1.patch

Missing field in scalar file are handled by returning null. Empty scalar 
file/empty scalar directory tested.
ScalarPhyFinder is moved as local variable. Removed redundant comments and apis 
inside visitors.
Added a new testcase for multiquery.
Fixed findbugs, javac and javadoc warnings (needs findbugs exclusion since we 
throw an error when second line is found (not_null) in UDF).


 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Patch Available  (was: Open)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Open  (was: Patch Available)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-29 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImplFinale.patch

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-29 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Open  (was: Patch Available)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-29 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Patch Available  (was: Open)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1517) Pig needs to support keywords in the package name

2010-07-27 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892863#action_12892863
 ] 

Aniket Mokashi commented on PIG-1517:
-

This bug is an extension of https://issues.apache.org/jira/browse/PIG-656, does 
not need extra test cases.
Other tests pass manually.

 Pig needs to support keywords in the package name
 -

 Key: PIG-1517
 URL: https://issues.apache.org/jira/browse/PIG-1517
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: pigusergroup656.patch


 Pig needs to support keywords in the package name. Pig supports most of the 
 keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. 
 There are a few missing tokens like eq,gt,lt,gte,lte,neq that 
 need to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1517) Pig needs to support keywords in the package name

2010-07-26 Thread Aniket Mokashi (JIRA)
Pig needs to support keywords in the package name
-

 Key: PIG-1517
 URL: https://issues.apache.org/jira/browse/PIG-1517
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0


Pig needs to support keywords in the package name. Pig supports most of the 
keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. 
There are a few missing tokens like eq,gt,lt,gte,lte,neq that need 
to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception

2010-07-26 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892353#action_12892353
 ] 

Aniket Mokashi commented on PIG-656:


eq,gt,lt,gte,lte,neq were missed as part of this fix. Opened jira 
at https://issues.apache.org/jira/browse/PIG-1517 to track further changes.

 Use of eval or any other keyword in the package hierarchy of a UDF causes 
 parse exception
 -

 Key: PIG-656
 URL: https://issues.apache.org/jira/browse/PIG-656
 Project: Pig
  Issue Type: Bug
  Components: documentation, grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, 
 TOKENIZE.jar


 Consider a Pig script which does something similar to a word count. It uses 
 the built-in TOKENIZE function, but packages it inside a class hierarchy such 
 as mypackage.eval
 {code}
 register TOKENIZE.jar
 my_src  = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t')  AS 
 (mlist: chararray);
 modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
 describe modules;
 grouped = GROUP modules BY $0;
 describe grouped;
 counts  = FOREACH grouped GENERATE COUNT(modules), group;
 ordered = ORDER counts BY $0;
 dump ordered;
 {code}
 The parser complains:
 ===
 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray}
 ===
 I looked at the following source code at 
 (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems 
 that : EVAL is a keyword in Pig. Here are some clarifications:
 1) Is there documentation on what the EVAL keyword actually is?
 2) Is EVAL keyword actually implemented?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1517) Pig needs to support keywords in the package name

2010-07-26 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1517:


Attachment: pigusergroup656.patch

 Pig needs to support keywords in the package name
 -

 Key: PIG-1517
 URL: https://issues.apache.org/jira/browse/PIG-1517
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: pigusergroup656.patch


 Pig needs to support keywords in the package name. Pig supports most of the 
 keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. 
 There are a few missing tokens like eq,gt,lt,gte,lte,neq that 
 need to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception

2010-07-23 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-656:
---

Attachment: pigusergroup656.patch

 Use of eval or any other keyword in the package hierarchy of a UDF causes 
 parse exception
 -

 Key: PIG-656
 URL: https://issues.apache.org/jira/browse/PIG-656
 Project: Pig
  Issue Type: Bug
  Components: documentation, grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, 
 TOKENIZE.jar


 Consider a Pig script which does something similar to a word count. It uses 
 the built-in TOKENIZE function, but packages it inside a class hierarchy such 
 as mypackage.eval
 {code}
 register TOKENIZE.jar
 my_src  = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t')  AS 
 (mlist: chararray);
 modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
 describe modules;
 grouped = GROUP modules BY $0;
 describe grouped;
 counts  = FOREACH grouped GENERATE COUNT(modules), group;
 ordered = ORDER counts BY $0;
 dump ordered;
 {code}
 The parser complains:
 ===
 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray}
 ===
 I looked at the following source code at 
 (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems 
 that : EVAL is a keyword in Pig. Here are some clarifications:
 1) Is there documentation on what the EVAL keyword actually is?
 2) Is EVAL keyword actually implemented?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891357#action_12891357
 ] 

Aniket Mokashi commented on PIG-928:


bq. I am still not convinced about the changes required in POUserFunc. That 
logic should really be a part of pythonToPig(pyObject). If python UDF is 
returning byte[], it should be turned into DataByteArray before it gets back 
into Pig's pipeline. And if we do that conversion in pythonToPig() (which is a 
right place to do it) we will need no changes in POUserFunc.
I agree that it is better to move computation on JythonFunction side 
(JythonUtils) for type checking and should provide more type safety to avoid 
user defined types complexity. But I would still go for changes in POUserFunc 
for result.result for the case defined in above example (removing byte[] 
scenario).
bq. Instead of instanceof, doing class equality test will be a wee-bit faster. 
Like instead of (pyObject instanceof PyDictionary) do pyobject.getClass() == 
PyDictionary.class. Obviously, it will work when you know exact target class 
and not for the derived ones.
Jython code has derived classes for each of the basic Jython types, though they 
aren't used for most of the types as of now, they may start returning these 
derived objects (PyTupleDerived) in their future implementation, in which case 
we might break our code. Also, PyLongDerived are already used inside the code. 
__tojava__ function just returns the proxy java object until we ask for a 
specific type of object. I think its better to use instanceof instead of class 
equality here.
bq. For register command, we need to test not only for functionality but for 
regressions as well. Look at TestGrunt.java in test package to get an idea how 
to write test for it.
Code path for .jar registration is identical to old code, except that it doesnt 
use any engine or namespace.
bq. Also what will happen if user returned a nil python object (null equivalent 
of Java) from UDF. It looks to me that will result in NPE. Can you add a test 
for that and similar test case from pigToPython()
A java null object will be turned into PyNone object but __tojava__ function 
will always returns the special object Py.NoConversion  if this PyObject can 
not be converted to the desired Java class.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFLatest2.patch

Added test for map-udf, null-inputoutput and grunt
Made required changes as per suggestions.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Open  (was: Patch Available)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: (was: ScalarImpl1.patch)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImpl1.patch

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImpl1.patch

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: (was: ScalarImpl1.patch)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-19 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImpl1.patch

LOScalar keeps track of the scalars in the logical plan along with the 
reference to the scalar alias.
During compilation, we add LOStores to respective scalars, we also merge plans 
as needed.
POScalar is later replaced by POUserFunc and appropriate dependency is added 
between the MROpers.
Tested with store, dump, explain.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-16 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFLatest.patch

Added new test cases to test tuple and bag scenarios- moved to a new test file.
Fixed the exception handling.
Added detailed comments.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-15 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888979#action_12888979
 ] 

Aniket Mokashi commented on PIG-928:


Commenting on behavior of EvalFuncObject, we consider following UDF-
{code}
public class UDF1 extends EvalFuncObject {
class Student{
int age;
String name;
Student(int a, String nm) {
age = a;
name = nm;
}
}
@Override
public Object exec(Tuple input) throws IOException {
return new Student(12, (String)input.get(0));
}
@Override
public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema(null, DataType.BYTEARRAY));
}
}
{code}
Although, this one define its output schema as ByteArray we fail this one as we 
do not know how to deserialize Student. Clearly, this is due to the bug in 
POUserFunc which fails to convert to ByteArray. Hence, res.result != null 
should be changed to result.result !=null.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-14 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888232#action_12888232
 ] 

Aniket Mokashi commented on PIG-928:


Thanks for your comments. I will make the required changes.

bq. Do you want to allow: register myJavaUDFs.jar using 'java' as 
'javaNameSpace' ? Use-case could be that if we are allowing namespaces for 
non-java, why not allow for Java udfs as well. But then define is exactly for 
this purpose. So, it may make sense to throw exception for such a case.
myJavaUDFs.jar can itself have package structure that can define its own 
namespace, for example- maths.jar has function math.sin etc, I will throw 
parseexception for such a case
bq. ScriptEngine.getInstance() should be a singleton, no?
getInstance is a factory method that returns an instance of scriptEngine based 
on its type. We create a newInstance of the scriptEngine so that if 
registerCode is called simultaneously, we can create a different interpreter 
for both the invocations to register these scripts to pig.
bq. In JythonScriptEngine.getFunction() I think you should check if 
interpreter.get(functionName) != null and then return it and call 
Interpreter.init(path) only if its null.
This behavior is consistent with interpreter.get method that returns null if 
some resource is not found inside the script. Callers of this function handle 
runtimeexceptions. Also, we will fail much earlier if we try to access 
functions that are not already present/registered so it should be safe.
Also, interpreter is never null because its a static member of the 
JythonScriptEngine, instantiated statically.
bq. I didn't get why the changes are required in POUserFunc. Can you explain 
and also add it as comments in the code.
POUserFunc has possible bug to check res.result != null when it is always null 
at this point. If the returntype expected is bytearray, we cast return object 
to byte[] with toString().getBytes() (which was never hit due to the bug 
mentioned above), but when return type is byte[] we need special handling (this 
is not case for other evalfuncs as they generally return pigtypes).
bq. Instead of adding query through pigServer.registerCode() api, add it 
through pigServer.registerQuery(register myscript.py using jython). This will 
make sure we are testing changes in QueryParser.jjt as well.
register is Grunt command parsed by gruntparser hence doesnt go through 
queryparser. We directly call registerCode from GruntParser. Also, parsing 
logic is trivial.



 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-13 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888073#action_12888073
 ] 

Aniket Mokashi commented on PIG-1434:
-

I mean the cases where users type in some alias which are currently not 
possible to include inside a foreach statement and accidentally getting them 
treated as scalar by pig and then failing scripts at runtime (and may not fail 
in one-liner sample cases).
For example-
Y = foreach Z generate C.$0; where C is not a scalar. Currently, this would 
throw an error upfront, for erroneous usage (logical (do not know restrictions 
on foreach statement) or typing mistake) of C, But, after we add support of 
scalars, pig may conclude C to be used as a scalar and generate the plans 
accordingly.
By introducing square bracketed syntax we can make sure that user intended to 
use C as a scalar and it wasn't introduced by mistake.A cast would also work 
for this, but as we have introduced scalar projections (C.count, C.max etc), we 
already have cases wherein user may mean to cast fields(count, max) rather than 
scalars themselves.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: (was: RegisterPythonUDF2.patch)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Open  (was: Patch Available)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale4.patch

Fixed @@@ related stuff...
Parsing of schema from decorators is postponed until the constructor.
Fixed some test related changes. 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886914#action_12886914
 ] 

Aniket Mokashi commented on PIG-1434:
-

Adding this support makes pig code complicated/hacky, because we conclude any 
not parsed alias (AliasFieldOrSpec) as scalar and try to resolve it as scalar 
at runtime.

To simplify, square bracketed syntax is a better idea, for example- 
{code}
Y = foreach Z generate X::$1/(long) [C].count, X::$2-(long) [C].max;
{code}
Otherwise, such queries (if typed by mistakes) can result into non-intuitive 
errors for users.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale5.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-08 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886530#action_12886530
 ] 

Aniket Mokashi commented on PIG-928:


I have uploaded a wiki page to mention the usage and syntax-- 
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-08 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Open  (was: Patch Available)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-07 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886163#action_12886163
 ] 

Aniket Mokashi commented on PIG-928:


I got what you mean, if user needs a generic square function he can write:
{code}
#!/usr/bin/python
@outputSchemaFunction(\squareSchema\)
def square(number):
return (number * number)
def squareSchema(input):
return input
{code}
I will make changes so that I can use similar approach as pig-greek. Since 
outputschema needs to know both input and name of outputSchemaFunction current 
code would need further changes.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale3.patch

Thanks Dmitriy and Julien for your help.
Attached is the patch with test cases. Test manually passed.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884841#action_12884841
 ] 

Aniket Mokashi commented on PIG-928:


The fix needed some changes in queryparser to support namespace, I found this 
in test cases I added. 
Current EvalFuncSpec logic is convoluted, I replaced it with a cleaner one.
I have attached the updated patch with changes mentioned above.

I am not sure what needs to be done for jython.jar, my guess was to check-in 
that in /lib. Thoughts?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale.patch

Changes needed for script UDF.
TODO- jython.jar related changes

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884378#action_12884378
 ] 

Aniket Mokashi commented on PIG-928:


Extension of this jira to track progress for inline script udfs with define 
clause has been added at https://issues.apache.org/jira/browse/PIG-1471

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884492#action_12884492
 ] 

Aniket Mokashi commented on PIG-506:


for the better re-usability of parser code with less distortion to syntax, we 
can use -
{code}
B = NATIVE ('mymr.jar' [, 'other.jar' ...]) STORE A INTO 'storeLocation' USING 
storeFunc LOAD 'loadLocation' USING loadFunc [params, ... ];
{code}
params is needed as some map reduce jobs take parameters.
Also, we assume that mymr.jar has a main method responsible for setting up 
required jobconf.

Alternatively,
mymr.jar can have getJobConf() hook for pig (documented) so that pig can take 
the JobConf from the mymj job, add some more stuff if needed and run this job.

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor

 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884498#action_12884498
 ] 

Aniket Mokashi commented on PIG-1434:
-

I agree to Thejas that we should have a way for user to specify that he means 
to use C as scalar. This will avoid errors in pig code. 
Thus, we have, 
{code}
Y = foreach X generate $1/(int) [C].count, $2- [C],max, ([C].$1+2);
{code}
Do we fail if we find C to have more than one row, or do we just ignore it? 
Should we try to detect, that C has one row, in frontend?

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884503#action_12884503
 ] 

Aniket Mokashi commented on PIG-1434:
-

bq Should we try to detect, that C has one row, in frontend?
We can try to detect the pattern that makes something as scalar (by marking B 
(group by all, limit 1) as scalar and then C as scalar etc) and fail upfront 
otherwise...

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-06-29 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Open  (was: Patch Available)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1471) inline UDFs in scripting languages

2010-06-28 Thread Aniket Mokashi (JIRA)
inline UDFs in scripting languages
--

 Key: PIG-1471
 URL: https://issues.apache.org/jira/browse/PIG-1471
 Project: Pig
  Issue Type: New Feature
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.8.0


It should be possible to write UDFs in scripting languages such as python, 
ruby, etc. This frees users from needing to compile Java, generate a jar, etc. 
It also opens Pig to programmers who prefer scripting languages over Java. It 
should be possible to write these scripts inline as part of pig scripts. This 
feature is an extension of https://issues.apache.org/jira/browse/PIG-928


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1471) inline UDFs in scripting languages

2010-06-28 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883327#action_12883327
 ] 

Aniket Mokashi commented on PIG-1471:
-

The proposed syntax is
{code}
define hellopig using org.apache.pig.scripting.jython.JythonScriptEngine as 
'@outputSchema(x:{t:(word:chararray)})\ndef helloworld():\n\treturn ('Hello, 
World')';
{code}

 inline UDFs in scripting languages
 --

 Key: PIG-1471
 URL: https://issues.apache.org/jira/browse/PIG-1471
 Project: Pig
  Issue Type: New Feature
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc. This frees users from needing to compile Java, generate a jar, 
 etc. It also opens Pig to programmers who prefer scripting languages over 
 Java. It should be possible to write these scripts inline as part of pig 
 scripts. This feature is an extension of 
 https://issues.apache.org/jira/browse/PIG-928

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882711#action_12882711
 ] 

Aniket Mokashi commented on PIG-1434:
-

The proposal for scalars is as follows -
{code}
A = load '1.txt' as (a1, a2);
B = group A all;
C = foreach B generate COUNT(A);
Y = foreach A generate C;
store Y into 'Ystore';
{code}
Based on the schema of C, we detect that Y means to use C as a scalar and 
internally track it as scalar. Thus, operations like C * C are also allowed. 
The limitation is that C should have long convertible value (when stored into 
the file). Also (int) C would be allowed and will succeed if the cast operation 
succeeds.

As mentioned by Daniel earlier, there are two challenges in introducing 
scalars--
1. Addition of implicit store- We cannot do it too early (parsing), as we get 
redundant (implicit) store operation for rest of the commands in the script. If 
we do it too late, merge algorithm doesn't find the store and discards the 
branch that compiles and executes the store.
To solve this, whenever we process a store plan after the parsing stage, we 
detect the existence of scalars into the plan and add required branches that 
has those scalars into the current plan. We also attach LOStores for the 
scalars and merge the required plan.
2. Tracking of implicit dependency- Existence of scalar C needs to be converted 
into a implicit ReadScalar operation, but other than this it also needs to add 
dependency on the map-reduce job that generates this scalar value. We track 
this dependency by adding LOScalar, POScalar operators that carry the reference 
to the scalar they depend upon. When we compile the map reduce plan, we replace 
POScalar with POUserFunc to load the scalar value and mark the dependency 
between two map reduce jobs.

I am attaching the patch with above mentioned changes.

Few known issues-
To track the dependencies of scalars, we need access to map of operators from 
one type of plan to other, but this map is generated by visitors. The same 
visitors are responsible for converting LOScalar -POScalar - POUserFunc. So, 
if a visitor visits LOScalar before LO associated with scalar ( C in example) 
we do not find PO associated with C. 

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: scalarImpl.patch

Initial implemenation

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Patch Available  (was: Open)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882725#action_12882725
 ] 

Aniket Mokashi commented on PIG-1434:
-

Submitting to hudson to check for test failures

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-06-22 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881263#action_12881263
 ] 

Aniket Mokashi commented on PIG-506:


Revised syntax -- as from the proposal document (+ few changes)
{code}
B = native ('mymr.jar' [, 'other.jar' ...]) A store into 'storeLocation' using 
storeFunc load 'loadLocation' using loadFunc;
{code}
mymr.jar contains the MR code the user wants to run.
storeLocation is location the user's code expects to find the data.
storeFunc is the storage function Pig will use to store the data (from A)
loadLocation is where user's code will write the result data
loadFunc is the load function Pig will use to reload the data (into B)
other,jar contains jars to be shipped like InputFormat, OutputFormat for custom 
handling of mapreduce jobs.

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor

 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-21 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Status: Open  (was: Patch Available)

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch, StandardUDFtoPigFinale.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-21 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Attachment: StandardUDFtoPigFinale.patch

Reviewed all comments for all the files. Made required changes.
Test failures do not seem related (Test passes locally).
[junit] Running org.apache.pig.test.TestBuiltin
[junit] Tests run: 37, Failures: 0, Errors: 0, Time elapsed: 20.295 sec
Submitting for Hudson build again.

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch, StandardUDFtoPigFinale.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Attachment: StandardUDFtoPig4.patch

fixed findbugs error
javac errors were due to having COR and COV implement serializable, removed 
those as pig doesnt need it
test failures doesn't seem to be related to these code changes.


 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Status: Patch Available  (was: Open)

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Status: Open  (was: Patch Available)

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-17 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Status: Patch Available  (was: Open)

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-17 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Attachment: StandardUDFtoPig3.patch

Added test cases for all the supported functions in TestBuiltin.java
Test Cases added--
Math functions are tested using reflection with java.lang.Math class.
String functions are tested with a sample string.
Stats and misc functions are tested with sample input.

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-16 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879621#action_12879621
 ] 

Aniket Mokashi commented on PIG-928:


I have attached the patch for proposed changes.

Few points to note-
1. As jar is treated in a different way (searched in system resources, 
classloader used etc) than other files, we differentiate a jar with its 
extension.
2. namespace is kept as default =  as per above comment, this is implemented 
as part of registerFunctions interface of ScriptEngine, so that different 
engines can have different behavior as necessary.
3. keyword python is supported along with custom scriptengine name.


 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, 
 scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-06-16 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDF3.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, 
 scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-14 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeFinale1.patch

Findbug warning fixed

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, 
 NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-14 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Status: Open  (was: Patch Available)

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, 
 NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-14 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Status: Patch Available  (was: Open)

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, 
 NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-11 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877916#action_12877916
 ] 

Aniket Mokashi commented on PIG-1405:
-

As per the comments above, the existing classes will be left in Piggybank for 
some time for backward compatibility.

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-11 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877918#action_12877918
 ] 

Aniket Mokashi commented on PIG-1405:
-

Do we need to add a function variance? or we need to move COV and COR? thoughts?

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-10 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877619#action_12877619
 ] 

Aniket Mokashi commented on PIG-1405:
-

Currently, we have COR(elation) and COV(ariance) functions in piggybank as part 
of stats package. But, there is no existing implementation of VAR(iance).

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-10 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877667#action_12877667
 ] 

Aniket Mokashi commented on PIG-928:


I support above comment.
Also, in favor of not breaking old code. I think, we should avoid introducing 
new keywords.

In the above proposal, by adding python as a lang-keyword I meant to hide 
extensibility of ScriptEngine interface by natively supporting python. If we 
have to allow users add support for other languages. we need to allow using 
org.apache.pig.scripting.jython.JythonScriptEngine. But this will need us to 
document the scriptengine interface.

Following seems to be more suitable choice. Comments?
{code}
-- register all UDFs inside test.py using custom (or builtin) ScriptEngine
register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine 
ship ('1.py', '2.py');
-- namespace? test.helloworld?
b = foreach a generate helloworld(a.$0), complex(a.$1);

-- register helloworld UDF as hello using JythonScriptEngine
define hello using org.apache.pig.scripting.jython.JythonScriptEngine from 
'test.py'#helloworld ship ('1.py', '2.py');
b = foreach a generate helloworld(a.$0); 
{code}

Also, register scalascript.jar would not be necessary if 
getStandardScriptJarPath() returns the path of the jar.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-10 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1405:


Attachment: StandardUDFtoPig.patch

Initial patch..
ToDo- Check for documentation errors

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDF2.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, scripting.tgz, 
 scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterScriptUDFDefineParse.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeFinale.patch

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877293#action_12877293
 ] 

Aniket Mokashi commented on PIG-972:


Submitted patch with above changes. Also added test cases to test different 
scenarios.

{code}
grunt describe c:
c::d: {a0: int,a1: int}
{code}
It does not print any nested aliases. For printing nested aliases, we have 
describe c::d;


 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Status: Patch Available  (was: Open)

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-08 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876764#action_12876764
 ] 

Aniket Mokashi commented on PIG-972:


describe c and describe c::d seems more intuitive. 
Further changes-
1. Currently, we are not using a deterministic way to search for nested alias 
in internal plans. 
With changes, we will dump the schema of latest statement for d.
For example, if we have,
{code}
c = foreach b { d = order a by $0; d = filter d by d.$0  0; generate d.$1;}
describe c::d;
{code}
This will dump the schema for last statement associated with d (filter). This 
will be achieved by traversing the plan from leaves to root while searching for 
nested alias d.
2. nested alias list is redundant and will be removed.



 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeProp2Initial.patch

Attaching initial patch for prop2

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875242#action_12875242
 ] 

Aniket Mokashi commented on PIG-972:


Approach-
1. To implement above mentioned functionality, parser keeps track of all the 
nested aliases it comes across and adds them to LOForEach.
2. After we parse the query (foreach), we dump the schema for all 
nested-aliases stored in the list.
3. When describe foreach-alias; is queried, along with schema for 
foreach-alias, we dump the schema for all nested-aliases stored in the list.

Data Structures -
Adding mDescribedAliasList to LOForEach to list all the described nested 
aliases.
LOForeach has mForEachPlans which creates plan for all projections. This keeps 
track of schemas for all nested aliases inside leaves of its plans.

Issues-
Verification of aliases in nested describe- As, we do not create a map for 
nested aliases, it is not possible to validate upfront the name of the alias 
used in nested describe.
Multiple dumps- Above approach might lead to multiple dumping of schemas.
These issues can be solved with adding more state into LOForeach. 


 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875390#action_12875390
 ] 

Aniket Mokashi commented on PIG-972:


Approach mentioned above seems to work.

Here are some proposals on semantics of nested describe-
a = load '1.txt' as (a0:int, a1:int);
b = group a by $0;

Proposal 1- Explicit describe.
c = foreach b { d = order a by $0; describe d; e = ...; generate d.$0 ...;}
(1a:Instantaneous responce - describes d after parsing above statement)
describe c;
Prints schema for c and d (but not e)
Adv - Can select which one of nestedAlias to describe.
Disadv - Extra typing.

Proposal 2:- Implicit describe (no describe nested statements)
c = foreach b { d = order a by $0; e = ...; generate d.$0 ...;}
describe c;
Describes c, d and e;
Adv- less typing
Disadv- extra prints
(2a -  describe c prints for c, d and e. Also describe c-d to describe nested 
d)
(2b -  describe c prints for c only. describe c- d to describe nested d).

Alan/Olga, Let me know your comments on this,

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeProp1.patch

Attaching patch for prop1.

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-282) Custom Partitioner

2010-06-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-282:
---

Attachment: CustomPartitionerFinale.patch

Added code review comments and some minor changes with test cases.

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, 
 CustomPartitionerTest.patch


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-282) Custom Partitioner

2010-06-01 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-282:
---

Attachment: CustomPartitionerTest.patch

Adding test cases and some small fixes.

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: CustomPartitioner.patch, CustomPartitionerTest.patch


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874359#action_12874359
 ] 

Aniket Mokashi commented on PIG-972:


Describe functionality-
1. We describe schema for an alias when we parse the describe statement in the 
shell.
For example - b = foreach ... {... describe a ...}; will describe schema of a 
after the statement is processed (parsed).
2. We do NOTdescribe any schema as part of the dump command.
3. If user needs to describe a nested schema, he can do so by describing parent 
alias. 
For example, describe b; will print the schema of b as well as a.

Any comments?

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-282) Custom Partitioner

2010-05-27 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872363#action_12872363
 ] 

Aniket Mokashi commented on PIG-282:


1. It is suitable to have PARTITION BY mapreduce.Partitioner than UDF. This 
will be followed by PARALLEL n.
2. Applicable to-
GROUP
COGROUP
CROSS
DISTINCT
JOIN (except 'skewed' which uses SkewedPartitioner)
3. ORDER partition by - not supported.
4. No check for validation of custom partitioners parameters 
(PigNullableWritable, Writable).

Approach- 
1. Added support for ClassType parsing and validation. Parsing for partition 
by is added to above mentioned clauses separately.
2. Custom Partitioner is stored as a String in LO, PO and MR plan. 
LogicalOperator holds the partitioner in LO plan. We add partitioner to 
POGlobalRearrangement as it decides the map-reduce boundary. We read and set 
the partitioner when we visit the POGlobalRearrangement.

Attaching a patch with initial changes...

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-282) Custom Partitioner

2010-05-27 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-282:
---

   Status: Patch Available  (was: Open)
 Release Note: Initial changes
Affects Version/s: 0.7.0

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-282) Custom Partitioner

2010-05-27 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-282:
---

Status: Open  (was: Patch Available)

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-282) Custom Partitioner

2010-05-27 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-282:
---

Attachment: CustomPartitioner.patch

Initial Changes

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: CustomPartitioner.patch


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >