[jira] Commented: (PIG-1552) Nested describe failed when the alias is not referred in the first foreach inner plan
[ https://issues.apache.org/jira/browse/PIG-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900905#action_12900905 ] Aniket Mokashi commented on PIG-1552: - +1 Nested describe failed when the alias is not referred in the first foreach inner plan - Key: PIG-1552 URL: https://issues.apache.org/jira/browse/PIG-1552 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1552-1.patch The following script fail: {code} A = load 'studentab10k' as (name, age, gpa); B = group A by name; C = foreach B { D = distinct A.age; generate group, COUNT(D); } describe C::D; {code} If we remove group from generate statement, then it works -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900911#action_12900911 ] Aniket Mokashi commented on PIG-506: Current patch doesnt consider case for parallel keyword -- we can fix this by adding -D mapred.reduce.tasks=n to params of RunJar. (ToDo) Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, NativeMapReduceFinale2.patch, TestWordCount.jar Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi reassigned PIG-506: -- Assignee: Thejas M Nair (was: Aniket Mokashi) Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Thejas M Nair Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, TestWordCount.jar Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Attachment: NativeMapReduceFinale3.patch Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, TestWordCount.jar Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Attachment: NativeMapReduceFinale1.patch Attaching the final patch- Includes - MR changes, optimizer related changes, test cases for basic mr. ToDo- Test cases for optimizer Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Status: Patch Available (was: Open) Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Attachment: TestWordCount.jar We also need to add this jar in lib to get tests working. ToDo- CreateTestJarAtRuntime Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, TestWordCount.jar Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Attachment: NativeMapReduceFinale2.patch Submitting the updated patch Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, NativeMapReduceFinale2.patch, TestWordCount.jar Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899965#action_12899965 ] Aniket Mokashi commented on PIG-506: Wiki page explaining details of specification and implementation has been uploaded at - http://wiki.apache.org/pig/NativeMapReduce Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899042#action_12899042 ] Aniket Mokashi commented on PIG-1434: - Comments on the finalized syntax-- With the above changes Pig now supports - {code} Y = foreach X generate $1/(long) C.count, $2-(long) C.max; {code} 1. Casts are *optional* and the datatype of scalar depends on the schema of C (ie depending on the schema of C, we add the casts implicitly. So, typically, count is a long and max is a double). In case of undeclared(null) schema for C, default type of scalar is *chararray*. 2. Projections are mandatory. For example {code} Y = foreach X generate C; // is an *error* {code} We need to use- {code} Y = foreach X generate C.$0; {code} 3. Check if C is a scalar or not is not performed until runtime, thus it will fail at the time of execution of UDF with ExecException(Scalar has more than one row in the output). Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-506: --- Attachment: NativeImplInitial.patch Attached patch has initial implementation for this feature-- Dump, store, explain work fine. PigStats are generated properly. ToDos- Check for multiquery optimization related tests Add test cases Usage- A = load 'dict.txt'; B = mapreduce 'hadoop-0.20.2-examples.jar' Store A into 'input' Load 'output' `wordcount input output`; Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: NativeImplInitial.patch Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImplFinaleRebase.patch Attaching rebased version of the patch... Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Patch Available (was: Open) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Open (was: Patch Available) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: (was: ScalarImplFinale1.patch) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImplFinale1.patch Removed the unused variable(findbug warning) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImplFinale1.patch Missing field in scalar file are handled by returning null. Empty scalar file/empty scalar directory tested. ScalarPhyFinder is moved as local variable. Removed redundant comments and apis inside visitors. Added a new testcase for multiquery. Fixed findbugs, javac and javadoc warnings (needs findbugs exclusion since we throw an error when second line is found (not_null) in UDF). Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Patch Available (was: Open) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Open (was: Patch Available) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImplFinale.patch Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Open (was: Patch Available) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Patch Available (was: Open) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1517) Pig needs to support keywords in the package name
[ https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892863#action_12892863 ] Aniket Mokashi commented on PIG-1517: - This bug is an extension of https://issues.apache.org/jira/browse/PIG-656, does not need extra test cases. Other tests pass manually. Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: pigusergroup656.patch Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1517) Pig needs to support keywords in the package name
Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception
[ https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892353#action_12892353 ] Aniket Mokashi commented on PIG-656: eq,gt,lt,gte,lte,neq were missed as part of this fix. Opened jira at https://issues.apache.org/jira/browse/PIG-1517 to track further changes. Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception - Key: PIG-656 URL: https://issues.apache.org/jira/browse/PIG-656 Project: Pig Issue Type: Bug Components: documentation, grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Milind Bhandarkar Fix For: 0.3.0 Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, TOKENIZE.jar Consider a Pig script which does something similar to a word count. It uses the built-in TOKENIZE function, but packages it inside a class hierarchy such as mypackage.eval {code} register TOKENIZE.jar my_src = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t') AS (mlist: chararray); modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); describe modules; grouped = GROUP modules BY $0; describe grouped; counts = FOREACH grouped GENERATE COUNT(modules), group; ordered = ORDER counts BY $0; dump ordered; {code} The parser complains: === 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray} === I looked at the following source code at (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems that : EVAL is a keyword in Pig. Here are some clarifications: 1) Is there documentation on what the EVAL keyword actually is? 2) Is EVAL keyword actually implemented? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1517) Pig needs to support keywords in the package name
[ https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1517: Attachment: pigusergroup656.patch Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: pigusergroup656.patch Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception
[ https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-656: --- Attachment: pigusergroup656.patch Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception - Key: PIG-656 URL: https://issues.apache.org/jira/browse/PIG-656 Project: Pig Issue Type: Bug Components: documentation, grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Milind Bhandarkar Fix For: 0.3.0 Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, TOKENIZE.jar Consider a Pig script which does something similar to a word count. It uses the built-in TOKENIZE function, but packages it inside a class hierarchy such as mypackage.eval {code} register TOKENIZE.jar my_src = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t') AS (mlist: chararray); modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); describe modules; grouped = GROUP modules BY $0; describe grouped; counts = FOREACH grouped GENERATE COUNT(modules), group; ordered = ORDER counts BY $0; dump ordered; {code} The parser complains: === 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray} === I looked at the following source code at (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems that : EVAL is a keyword in Pig. Here are some clarifications: 1) Is there documentation on what the EVAL keyword actually is? 2) Is EVAL keyword actually implemented? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891357#action_12891357 ] Aniket Mokashi commented on PIG-928: bq. I am still not convinced about the changes required in POUserFunc. That logic should really be a part of pythonToPig(pyObject). If python UDF is returning byte[], it should be turned into DataByteArray before it gets back into Pig's pipeline. And if we do that conversion in pythonToPig() (which is a right place to do it) we will need no changes in POUserFunc. I agree that it is better to move computation on JythonFunction side (JythonUtils) for type checking and should provide more type safety to avoid user defined types complexity. But I would still go for changes in POUserFunc for result.result for the case defined in above example (removing byte[] scenario). bq. Instead of instanceof, doing class equality test will be a wee-bit faster. Like instead of (pyObject instanceof PyDictionary) do pyobject.getClass() == PyDictionary.class. Obviously, it will work when you know exact target class and not for the derived ones. Jython code has derived classes for each of the basic Jython types, though they aren't used for most of the types as of now, they may start returning these derived objects (PyTupleDerived) in their future implementation, in which case we might break our code. Also, PyLongDerived are already used inside the code. __tojava__ function just returns the proxy java object until we ask for a specific type of object. I think its better to use instanceof instead of class equality here. bq. For register command, we need to test not only for functionality but for regressions as well. Look at TestGrunt.java in test package to get an idea how to write test for it. Code path for .jar registration is identical to old code, except that it doesnt use any engine or namespace. bq. Also what will happen if user returned a nil python object (null equivalent of Java) from UDF. It looks to me that will result in NPE. Can you add a test for that and similar test case from pigToPython() A java null object will be turned into PyNone object but __tojava__ function will always returns the special object Py.NoConversion if this PyObject can not be converted to the desired Java class. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFLatest2.patch Added test for map-udf, null-inputoutput and grunt Made required changes as per suggestions. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Open (was: Patch Available) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Patch Available (was: Open) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: (was: ScalarImpl1.patch) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImpl1.patch Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImpl1.patch Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: (was: ScalarImpl1.patch) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImpl1.patch LOScalar keeps track of the scalars in the logical plan along with the reference to the scalar alias. During compilation, we add LOStores to respective scalars, we also merge plans as needed. POScalar is later replaced by POUserFunc and appropriate dependency is added between the MROpers. Tested with store, dump, explain. Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFLatest.patch Added new test cases to test tuple and bag scenarios- moved to a new test file. Fixed the exception handling. Added detailed comments. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888979#action_12888979 ] Aniket Mokashi commented on PIG-928: Commenting on behavior of EvalFuncObject, we consider following UDF- {code} public class UDF1 extends EvalFuncObject { class Student{ int age; String name; Student(int a, String nm) { age = a; name = nm; } } @Override public Object exec(Tuple input) throws IOException { return new Student(12, (String)input.get(0)); } @Override public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema(null, DataType.BYTEARRAY)); } } {code} Although, this one define its output schema as ByteArray we fail this one as we do not know how to deserialize Student. Clearly, this is due to the bug in POUserFunc which fails to convert to ByteArray. Hence, res.result != null should be changed to result.result !=null. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888232#action_12888232 ] Aniket Mokashi commented on PIG-928: Thanks for your comments. I will make the required changes. bq. Do you want to allow: register myJavaUDFs.jar using 'java' as 'javaNameSpace' ? Use-case could be that if we are allowing namespaces for non-java, why not allow for Java udfs as well. But then define is exactly for this purpose. So, it may make sense to throw exception for such a case. myJavaUDFs.jar can itself have package structure that can define its own namespace, for example- maths.jar has function math.sin etc, I will throw parseexception for such a case bq. ScriptEngine.getInstance() should be a singleton, no? getInstance is a factory method that returns an instance of scriptEngine based on its type. We create a newInstance of the scriptEngine so that if registerCode is called simultaneously, we can create a different interpreter for both the invocations to register these scripts to pig. bq. In JythonScriptEngine.getFunction() I think you should check if interpreter.get(functionName) != null and then return it and call Interpreter.init(path) only if its null. This behavior is consistent with interpreter.get method that returns null if some resource is not found inside the script. Callers of this function handle runtimeexceptions. Also, we will fail much earlier if we try to access functions that are not already present/registered so it should be safe. Also, interpreter is never null because its a static member of the JythonScriptEngine, instantiated statically. bq. I didn't get why the changes are required in POUserFunc. Can you explain and also add it as comments in the code. POUserFunc has possible bug to check res.result != null when it is always null at this point. If the returntype expected is bytearray, we cast return object to byte[] with toString().getBytes() (which was never hit due to the bug mentioned above), but when return type is byte[] we need special handling (this is not case for other evalfuncs as they generally return pigtypes). bq. Instead of adding query through pigServer.registerCode() api, add it through pigServer.registerQuery(register myscript.py using jython). This will make sure we are testing changes in QueryParser.jjt as well. register is Grunt command parsed by gruntparser hence doesnt go through queryparser. We directly call registerCode from GruntParser. Also, parsing logic is trivial. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888073#action_12888073 ] Aniket Mokashi commented on PIG-1434: - I mean the cases where users type in some alias which are currently not possible to include inside a foreach statement and accidentally getting them treated as scalar by pig and then failing scripts at runtime (and may not fail in one-liner sample cases). For example- Y = foreach Z generate C.$0; where C is not a scalar. Currently, this would throw an error upfront, for erroneous usage (logical (do not know restrictions on foreach statement) or typing mistake) of C, But, after we add support of scalars, pig may conclude C to be used as a scalar and generate the plans accordingly. By introducing square bracketed syntax we can make sure that user intended to use C as a scalar and it wasn't introduced by mistake.A cast would also work for this, but as we have introduced scalar projections (C.count, C.max etc), we already have cases wherein user may mean to cast fields(count, max) rather than scalars themselves. Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Patch Available (was: Open) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: (was: RegisterPythonUDF2.patch) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Open (was: Patch Available) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFFinale4.patch Fixed @@@ related stuff... Parsing of schema from decorators is postponed until the constructor. Fixed some test related changes. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Patch Available (was: Open) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886914#action_12886914 ] Aniket Mokashi commented on PIG-1434: - Adding this support makes pig code complicated/hacky, because we conclude any not parsed alias (AliasFieldOrSpec) as scalar and try to resolve it as scalar at runtime. To simplify, square bracketed syntax is a better idea, for example- {code} Y = foreach Z generate X::$1/(long) [C].count, X::$2-(long) [C].max; {code} Otherwise, such queries (if typed by mistakes) can result into non-intuitive errors for users. Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFFinale5.patch UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886530#action_12886530 ] Aniket Mokashi commented on PIG-928: I have uploaded a wiki page to mention the usage and syntax-- http://wiki.apache.org/pig/UDFsUsingScriptingLanguages. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Open (was: Patch Available) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886163#action_12886163 ] Aniket Mokashi commented on PIG-928: I got what you mean, if user needs a generic square function he can write: {code} #!/usr/bin/python @outputSchemaFunction(\squareSchema\) def square(number): return (number * number) def squareSchema(input): return input {code} I will make changes so that I can use similar approach as pig-greek. Since outputschema needs to know both input and name of outputSchemaFunction current code would need further changes. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFFinale3.patch Thanks Dmitriy and Julien for your help. Attached is the patch with test cases. Test manually passed. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Patch Available (was: Open) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884841#action_12884841 ] Aniket Mokashi commented on PIG-928: The fix needed some changes in queryparser to support namespace, I found this in test cases I added. Current EvalFuncSpec logic is convoluted, I replaced it with a cleaner one. I have attached the updated patch with changes mentioned above. I am not sure what needs to be done for jython.jar, my guess was to check-in that in /lib. Thoughts? UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFFinale.patch Changes needed for script UDF. TODO- jython.jar related changes UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884378#action_12884378 ] Aniket Mokashi commented on PIG-928: Extension of this jira to track progress for inline script udfs with define clause has been added at https://issues.apache.org/jira/browse/PIG-1471 UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884492#action_12884492 ] Aniket Mokashi commented on PIG-506: for the better re-usability of parser code with less distortion to syntax, we can use - {code} B = NATIVE ('mymr.jar' [, 'other.jar' ...]) STORE A INTO 'storeLocation' USING storeFunc LOAD 'loadLocation' USING loadFunc [params, ... ]; {code} params is needed as some map reduce jobs take parameters. Also, we assume that mymr.jar has a main method responsible for setting up required jobconf. Alternatively, mymr.jar can have getJobConf() hook for pig (documented) so that pig can take the JobConf from the mymj job, add some more stuff if needed and run this job. Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Aniket Mokashi Priority: Minor Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884498#action_12884498 ] Aniket Mokashi commented on PIG-1434: - I agree to Thejas that we should have a way for user to specify that he means to use C as scalar. This will avoid errors in pig code. Thus, we have, {code} Y = foreach X generate $1/(int) [C].count, $2- [C],max, ([C].$1+2); {code} Do we fail if we find C to have more than one row, or do we just ignore it? Should we try to detect, that C has one row, in frontend? Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884503#action_12884503 ] Aniket Mokashi commented on PIG-1434: - bq Should we try to detect, that C has one row, in frontend? We can try to detect the pattern that makes something as scalar (by marking B (group by all, limit 1) as scalar and then C as scalar etc) and fail upfront otherwise... Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Open (was: Patch Available) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1471) inline UDFs in scripting languages
inline UDFs in scripting languages -- Key: PIG-1471 URL: https://issues.apache.org/jira/browse/PIG-1471 Project: Pig Issue Type: New Feature Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.8.0 It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. It should be possible to write these scripts inline as part of pig scripts. This feature is an extension of https://issues.apache.org/jira/browse/PIG-928 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1471) inline UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883327#action_12883327 ] Aniket Mokashi commented on PIG-1471: - The proposed syntax is {code} define hellopig using org.apache.pig.scripting.jython.JythonScriptEngine as '@outputSchema(x:{t:(word:chararray)})\ndef helloworld():\n\treturn ('Hello, World')'; {code} inline UDFs in scripting languages -- Key: PIG-1471 URL: https://issues.apache.org/jira/browse/PIG-1471 Project: Pig Issue Type: New Feature Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.8.0 It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. It should be possible to write these scripts inline as part of pig scripts. This feature is an extension of https://issues.apache.org/jira/browse/PIG-928 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882711#action_12882711 ] Aniket Mokashi commented on PIG-1434: - The proposal for scalars is as follows - {code} A = load '1.txt' as (a1, a2); B = group A all; C = foreach B generate COUNT(A); Y = foreach A generate C; store Y into 'Ystore'; {code} Based on the schema of C, we detect that Y means to use C as a scalar and internally track it as scalar. Thus, operations like C * C are also allowed. The limitation is that C should have long convertible value (when stored into the file). Also (int) C would be allowed and will succeed if the cast operation succeeds. As mentioned by Daniel earlier, there are two challenges in introducing scalars-- 1. Addition of implicit store- We cannot do it too early (parsing), as we get redundant (implicit) store operation for rest of the commands in the script. If we do it too late, merge algorithm doesn't find the store and discards the branch that compiles and executes the store. To solve this, whenever we process a store plan after the parsing stage, we detect the existence of scalars into the plan and add required branches that has those scalars into the current plan. We also attach LOStores for the scalars and merge the required plan. 2. Tracking of implicit dependency- Existence of scalar C needs to be converted into a implicit ReadScalar operation, but other than this it also needs to add dependency on the map-reduce job that generates this scalar value. We track this dependency by adding LOScalar, POScalar operators that carry the reference to the scalar they depend upon. When we compile the map reduce plan, we replace POScalar with POUserFunc to load the scalar value and mark the dependency between two map reduce jobs. I am attaching the patch with above mentioned changes. Few known issues- To track the dependencies of scalars, we need access to map of operators from one type of plan to other, but this map is generated by visitors. The same visitors are responsible for converting LOScalar -POScalar - POUserFunc. So, if a visitor visits LOScalar before LO associated with scalar ( C in example) we do not find PO associated with C. Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: scalarImpl.patch Initial implemenation Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Status: Patch Available (was: Open) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882725#action_12882725 ] Aniket Mokashi commented on PIG-1434: - Submitting to hudson to check for test failures Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?
[ https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881263#action_12881263 ] Aniket Mokashi commented on PIG-506: Revised syntax -- as from the proposal document (+ few changes) {code} B = native ('mymr.jar' [, 'other.jar' ...]) A store into 'storeLocation' using storeFunc load 'loadLocation' using loadFunc; {code} mymr.jar contains the MR code the user wants to run. storeLocation is location the user's code expects to find the data. storeFunc is the storage function Pig will use to store the data (from A) loadLocation is where user's code will write the result data loadFunc is the load function Pig will use to reload the data (into B) other,jar contains jars to be shipped like InputFormat, OutputFormat for custom handling of mapreduce jobs. Does pig need a NATIVE keyword? --- Key: PIG-506 URL: https://issues.apache.org/jira/browse/PIG-506 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this: {code} A = load 'myfile'; X = load 'myotherfile'; B = group A by $0; C = foreach B generate group, myudf(B); D = native (jar=mymr.jar, infile=frompig outfile=topig); E = join D by $0, X by $0; ... {code} This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk. Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Open (was: Patch Available) Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, StandardUDFtoPig4.patch, StandardUDFtoPigFinale.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Attachment: StandardUDFtoPigFinale.patch Reviewed all comments for all the files. Made required changes. Test failures do not seem related (Test passes locally). [junit] Running org.apache.pig.test.TestBuiltin [junit] Tests run: 37, Failures: 0, Errors: 0, Time elapsed: 20.295 sec Submitting for Hudson build again. Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, StandardUDFtoPig4.patch, StandardUDFtoPigFinale.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Attachment: StandardUDFtoPig4.patch fixed findbugs error javac errors were due to having COR and COV implement serializable, removed those as pig doesnt need it test failures doesn't seem to be related to these code changes. Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, StandardUDFtoPig4.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Patch Available (was: Open) Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, StandardUDFtoPig4.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Open (was: Patch Available) Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, StandardUDFtoPig4.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Patch Available (was: Open) Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Attachment: StandardUDFtoPig3.patch Added test cases for all the supported functions in TestBuiltin.java Test Cases added-- Math functions are tested using reflection with java.lang.Math class. String functions are tested with a sample string. Stats and misc functions are tested with sample input. Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879621#action_12879621 ] Aniket Mokashi commented on PIG-928: I have attached the patch for proposed changes. Few points to note- 1. As jar is treated in a different way (searched in system resources, classloader used etc) than other files, we differentiate a jar with its extension. 2. namespace is kept as default = as per above comment, this is implemented as part of registerFunctions interface of ScriptEngine, so that different engines can have different behavior as necessary. 3. keyword python is supported along with custom scriptengine name. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDF3.patch UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeFinale1.patch Findbug warning fixed Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Status: Open (was: Patch Available) Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Status: Patch Available (was: Open) Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeFinale1.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877916#action_12877916 ] Aniket Mokashi commented on PIG-1405: - As per the comments above, the existing classes will be left in Piggybank for some time for backward compatibility. Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877918#action_12877918 ] Aniket Mokashi commented on PIG-1405: - Do we need to add a function variance? or we need to move COV and COR? thoughts? Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877619#action_12877619 ] Aniket Mokashi commented on PIG-1405: - Currently, we have COR(elation) and COV(ariance) functions in piggybank as part of stats package. But, there is no existing implementation of VAR(iance). Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877667#action_12877667 ] Aniket Mokashi commented on PIG-928: I support above comment. Also, in favor of not breaking old code. I think, we should avoid introducing new keywords. In the above proposal, by adding python as a lang-keyword I meant to hide extensibility of ScriptEngine interface by natively supporting python. If we have to allow users add support for other languages. we need to allow using org.apache.pig.scripting.jython.JythonScriptEngine. But this will need us to document the scriptengine interface. Following seems to be more suitable choice. Comments? {code} -- register all UDFs inside test.py using custom (or builtin) ScriptEngine register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine ship ('1.py', '2.py'); -- namespace? test.helloworld? b = foreach a generate helloworld(a.$0), complex(a.$1); -- register helloworld UDF as hello using JythonScriptEngine define hello using org.apache.pig.scripting.jython.JythonScriptEngine from 'test.py'#helloworld ship ('1.py', '2.py'); b = foreach a generate helloworld(a.$0); {code} Also, register scalascript.jar would not be necessary if getStandardScriptJarPath() returns the path of the jar. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Attachment: StandardUDFtoPig.patch Initial patch.. ToDo- Check for documentation errors Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: StandardUDFtoPig.patch There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDF2.patch UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterScriptUDFDefineParse.patch UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeFinale.patch Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877293#action_12877293 ] Aniket Mokashi commented on PIG-972: Submitted patch with above changes. Also added test cases to test different scenarios. {code} grunt describe c: c::d: {a0: int,a1: int} {code} It does not print any nested aliases. For printing nested aliases, we have describe c::d; Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Status: Patch Available (was: Open) Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876764#action_12876764 ] Aniket Mokashi commented on PIG-972: describe c and describe c::d seems more intuitive. Further changes- 1. Currently, we are not using a deterministic way to search for nested alias in internal plans. With changes, we will dump the schema of latest statement for d. For example, if we have, {code} c = foreach b { d = order a by $0; d = filter d by d.$0 0; generate d.$1;} describe c::d; {code} This will dump the schema for last statement associated with d (filter). This will be achieved by traversing the plan from leaves to root while searching for nested alias d. 2. nested alias list is redundant and will be removed. Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeProp2Initial.patch Attaching initial patch for prop2 Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875242#action_12875242 ] Aniket Mokashi commented on PIG-972: Approach- 1. To implement above mentioned functionality, parser keeps track of all the nested aliases it comes across and adds them to LOForEach. 2. After we parse the query (foreach), we dump the schema for all nested-aliases stored in the list. 3. When describe foreach-alias; is queried, along with schema for foreach-alias, we dump the schema for all nested-aliases stored in the list. Data Structures - Adding mDescribedAliasList to LOForEach to list all the described nested aliases. LOForeach has mForEachPlans which creates plan for all projections. This keeps track of schemas for all nested aliases inside leaves of its plans. Issues- Verification of aliases in nested describe- As, we do not create a map for nested aliases, it is not possible to validate upfront the name of the alias used in nested describe. Multiple dumps- Above approach might lead to multiple dumping of schemas. These issues can be solved with adding more state into LOForeach. Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875390#action_12875390 ] Aniket Mokashi commented on PIG-972: Approach mentioned above seems to work. Here are some proposals on semantics of nested describe- a = load '1.txt' as (a0:int, a1:int); b = group a by $0; Proposal 1- Explicit describe. c = foreach b { d = order a by $0; describe d; e = ...; generate d.$0 ...;} (1a:Instantaneous responce - describes d after parsing above statement) describe c; Prints schema for c and d (but not e) Adv - Can select which one of nestedAlias to describe. Disadv - Extra typing. Proposal 2:- Implicit describe (no describe nested statements) c = foreach b { d = order a by $0; e = ...; generate d.$0 ...;} describe c; Describes c, d and e; Adv- less typing Disadv- extra prints (2a - describe c prints for c, d and e. Also describe c-d to describe nested d) (2b - describe c prints for c only. describe c- d to describe nested d). Alan/Olga, Let me know your comments on this, Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeProp1.patch Attaching patch for prop1. Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-282: --- Attachment: CustomPartitionerFinale.patch Added code review comments and some minor changes with test cases. Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, CustomPartitionerTest.patch By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-282: --- Attachment: CustomPartitionerTest.patch Adding test cases and some small fixes. Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: CustomPartitioner.patch, CustomPartitionerTest.patch By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874359#action_12874359 ] Aniket Mokashi commented on PIG-972: Describe functionality- 1. We describe schema for an alias when we parse the describe statement in the shell. For example - b = foreach ... {... describe a ...}; will describe schema of a after the statement is processed (parsed). 2. We do NOTdescribe any schema as part of the dump command. 3. If user needs to describe a nested schema, he can do so by describing parent alias. For example, describe b; will print the schema of b as well as a. Any comments? Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872363#action_12872363 ] Aniket Mokashi commented on PIG-282: 1. It is suitable to have PARTITION BY mapreduce.Partitioner than UDF. This will be followed by PARALLEL n. 2. Applicable to- GROUP COGROUP CROSS DISTINCT JOIN (except 'skewed' which uses SkewedPartitioner) 3. ORDER partition by - not supported. 4. No check for validation of custom partitioners parameters (PigNullableWritable, Writable). Approach- 1. Added support for ClassType parsing and validation. Parsing for partition by is added to above mentioned clauses separately. 2. Custom Partitioner is stored as a String in LO, PO and MR plan. LogicalOperator holds the partitioner in LO plan. We add partitioner to POGlobalRearrangement as it decides the map-reduce boundary. We read and set the partitioner when we visit the POGlobalRearrangement. Attaching a patch with initial changes... Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-282: --- Status: Patch Available (was: Open) Release Note: Initial changes Affects Version/s: 0.7.0 Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-282: --- Status: Open (was: Patch Available) Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-282: --- Attachment: CustomPartitioner.patch Initial Changes Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: CustomPartitioner.patch By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.