[jira] Assigned: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain reassigned PIG-1512: --- Assignee: Swati Jain PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Attachment: printJoin.patch PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Attachment: printJoin.patch Fix tab character PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch, printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Attachment: printJoin.patch Attach the right file, final upload. PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-1512 started by Swati Jain. PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Attachment: (was: printJoin.patch) PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Attachment: (was: printJoin.patch) PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Patch Info: [Patch Available] PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swati Jain updated PIG-1512: Status: Patch Available (was: In Progress) PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891244#action_12891244 ] niraj rai commented on PIG-1500: I ran test with guava-r06.jar and all test passed. If everyone is fine, we can move to r06 guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1505) support jars and scripts in dfs
[ https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891258#action_12891258 ] Alan Gates commented on PIG-1505: - I ran core and contrib tests manually and they both pass. Richard will be reviewing the patch. support jars and scripts in dfs --- Key: PIG-1505 URL: https://issues.apache.org/jira/browse/PIG-1505 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Attachments: pig-jars-and-scripts-from-dfs-3.patch, pig-jars-and-scripts-from-dfs-trunk-1.patch, pig-jars-and-scripts-from-dfs-trunk-2.patch, pig-jars-and-scripts-from-dfs-trunk.patch Pig can't operate on files stored in Amazon S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1511) Pig removes packages from its own jar when building the JAR to ship to Hadoop
[ https://issues.apache.org/jira/browse/PIG-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891261#action_12891261 ] Alan Gates commented on PIG-1511: - We don't want to do this by default. In a couple of instances keeping the size of this jar down is more important. One, when the number of tasks being used is very large, since that jar is being copied once to each task, and two when the job itself is quite small and the setup costs become a concern. Pig removes packages from its own jar when building the JAR to ship to Hadoop - Key: PIG-1511 URL: https://issues.apache.org/jira/browse/PIG-1511 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Eric Tschetter Attachments: pig-1511.diff Pig generates a new jar file to ship over to Hadoop. Pig has a couple of packages whitelisted that it includes from its own jar. Pig throws away everything else. I package all of my dependencies into a single jar file. Pig is included in this jar file. I do it this way because my code needs to run reliably and reproducibly in production. Pig throws away all of my dependencies. I don't know what the performance gain is of shaving ~5MB off of a jar that is pushed to a job tracker once and then used to run over 100s of GB of data. The overhead is minimal on my cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1513) Skewed join doesn't handle empty input directory
Skewed join doesn't handle empty input directory Key: PIG-1513 URL: https://issues.apache.org/jira/browse/PIG-1513 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; dump C {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1513) Skewed join doesn't handle empty input directory
[ https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891281#action_12891281 ] Olga Natkovich commented on PIG-1513: - Are we sure that the problem only occurs with skewed join? I would like to make this JIRA more generic and to make sure that pig returns empty results given empty input and short circuits the processing as early as possible Skewed join doesn't handle empty input directory Key: PIG-1513 URL: https://issues.apache.org/jira/browse/PIG-1513 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; dump C {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1505) support jars and scripts in dfs
[ https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891347#action_12891347 ] Richard Ding commented on PIG-1505: --- Thank you for the update. A few more comments: * According to Pig Latin manual, user can also register additional files (to use with user's Pig script) via the command line using the -Dpig.additional.jars option (in addition to the REGISTER statement inside a Pig script). I suggest you call FileLocalizer.fetchFile from the shared method PigServer.registerJar so both cases will be covered. * Can you change the method signature to {code} public static FetchFileRet fetchFile(Properties properties, String filePath) throws IOException {code} The reason is that we have deprecated all other public methods on FileLocalizer which has DataStorage as a parameter (so we can deprecate DataStorage in the future). I think this is safe since the condition in the method {code} ((fileUri.getScheme() == null) (dfs == null)) {code} is not used in the patch. * You need to add a unit test in the patch (by first copying a Pig script to the mini-cluster). * Finally, since this is a new feature, can you add a release note (On jira, there is a Release Note field) so that it will be incorporated in the next Pig release notes. support jars and scripts in dfs --- Key: PIG-1505 URL: https://issues.apache.org/jira/browse/PIG-1505 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Attachments: pig-jars-and-scripts-from-dfs-3.patch, pig-jars-and-scripts-from-dfs-trunk-1.patch, pig-jars-and-scripts-from-dfs-trunk-2.patch, pig-jars-and-scripts-from-dfs-trunk.patch Pig can't operate on files stored in Amazon S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891357#action_12891357 ] Aniket Mokashi commented on PIG-928: bq. I am still not convinced about the changes required in POUserFunc. That logic should really be a part of pythonToPig(pyObject). If python UDF is returning byte[], it should be turned into DataByteArray before it gets back into Pig's pipeline. And if we do that conversion in pythonToPig() (which is a right place to do it) we will need no changes in POUserFunc. I agree that it is better to move computation on JythonFunction side (JythonUtils) for type checking and should provide more type safety to avoid user defined types complexity. But I would still go for changes in POUserFunc for result.result for the case defined in above example (removing byte[] scenario). bq. Instead of instanceof, doing class equality test will be a wee-bit faster. Like instead of (pyObject instanceof PyDictionary) do pyobject.getClass() == PyDictionary.class. Obviously, it will work when you know exact target class and not for the derived ones. Jython code has derived classes for each of the basic Jython types, though they aren't used for most of the types as of now, they may start returning these derived objects (PyTupleDerived) in their future implementation, in which case we might break our code. Also, PyLongDerived are already used inside the code. __tojava__ function just returns the proxy java object until we ask for a specific type of object. I think its better to use instanceof instead of class equality here. bq. For register command, we need to test not only for functionality but for regressions as well. Look at TestGrunt.java in test package to get an idea how to write test for it. Code path for .jar registration is identical to old code, except that it doesnt use any engine or namespace. bq. Also what will happen if user returned a nil python object (null equivalent of Java) from UDF. It looks to me that will result in NPE. Can you add a test for that and similar test case from pigToPython() A java null object will be turned into PyNone object but __tojava__ function will always returns the special object Py.NoConversion if this PyObject can not be converted to the desired Java class. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1513) Pig doesn't handle empty input directory
[ https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1513: -- Summary: Pig doesn't handle empty input directory (was: Skewed join doesn't handle empty input directory) Description: The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; store C into 'output'; {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . For merge join the script The merge join script {code} A = load 'input'; B = load 'emptydir'; C = join A by $0, B by $0 using 'merge'; store C into 'output'; {code} the sample job again has 0 maps and the script fails with ERROR 2176: Error processing right input during merge join. But if we change the join order: {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'merge'; store C into 'output'; {code} The second job (merge) now has 0 maps and 0 reduces. And it generates an empty 'output' directory. Order by on empty directory works fine and generates empty part files. was: The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; dump C {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . Pig doesn't handle empty input directory Key: PIG-1513 URL: https://issues.apache.org/jira/browse/PIG-1513 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; store C into 'output'; {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . For merge join the script The merge join script {code} A = load 'input'; B = load 'emptydir'; C = join A by $0, B by $0 using 'merge'; store C into 'output'; {code} the sample job again has 0 maps and the script fails with ERROR 2176: Error processing right input during merge join. But if we change the join order: {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'merge'; store C into 'output'; {code} The second job (merge) now has 0 maps and 0 reduces. And it generates an empty 'output' directory. Order by on empty directory works fine and generates empty part files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1513) Pig doesn't handle empty input directory
[ https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891366#action_12891366 ] Richard Ding commented on PIG-1513: --- Changed the JIRA title to deal with general problem of empty input directory handling. Pig doesn't handle empty input directory Key: PIG-1513 URL: https://issues.apache.org/jira/browse/PIG-1513 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; store C into 'output'; {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . For merge join the script The merge join script {code} A = load 'input'; B = load 'emptydir'; C = join A by $0, B by $0 using 'merge'; store C into 'output'; {code} the sample job again has 0 maps and the script fails with ERROR 2176: Error processing right input during merge join. But if we change the join order: {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'merge'; store C into 'output'; {code} The second job (merge) now has 0 maps and 0 reduces. And it generates an empty 'output' directory. Order by on empty directory works fine and generates empty part files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1178: Attachment: PIG-1178-4.patch LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1178: Status: Open (was: Patch Available) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1178: Status: Patch Available (was: Open) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1505) support jars and scripts in dfs
[ https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891436#action_12891436 ] Andrew Hitchcock commented on PIG-1505: --- Thanks Richard. Is there a unit test you recommend that I can model mine after? Something that uses the mini-cluster. support jars and scripts in dfs --- Key: PIG-1505 URL: https://issues.apache.org/jira/browse/PIG-1505 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Attachments: pig-jars-and-scripts-from-dfs-3.patch, pig-jars-and-scripts-from-dfs-trunk-1.patch, pig-jars-and-scripts-from-dfs-trunk-2.patch, pig-jars-and-scripts-from-dfs-trunk.patch Pig can't operate on files stored in Amazon S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Attachment: RegisterPythonUDFLatest2.patch Added test for map-udf, null-inputoutput and grunt Made required changes as per suggestions. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891438#action_12891438 ] Daniel Dai commented on PIG-1178: - Attach PIG-1178-4.patch, include change of the following area: 1. Add all the relational operators 2. Add foreach nested plans 3. Add field schema to expression operators 4. Remove UidStamp, instead, uid will be generated and cached first time we get fieldschema 5. Fix column pruner and all other new logical plan test cases 6. Add TypeCastInserter Still polishing and refactory the code. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Open (was: Patch Available) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-928: --- Status: Patch Available (was: Open) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1514) Migrate logical optimization rule: OpLimitOptimizer
Migrate logical optimization rule: OpLimitOptimizer --- Key: PIG-1514 URL: https://issues.apache.org/jira/browse/PIG-1514 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Xuefu Zhang Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten
Migrate logical optimization rule: PushDownForeachFlatten - Key: PIG-1515 URL: https://issues.apache.org/jira/browse/PIG-1515 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Xuefu Zhang Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.