[jira] Commented: (PIG-1511) Pig removes packages from its own jar when building the JAR to ship to Hadoop
[ https://issues.apache.org/jira/browse/PIG-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891648#action_12891648 ] Alan Gates commented on PIG-1511: - The issue there is that blacklists are hard to maintain. Every time some adds a package to Pig they have to remember to add to that blacklist. If you register your jar Pig will wrap it up and take it along. Does this not work for your use case? Pig removes packages from its own jar when building the JAR to ship to Hadoop - Key: PIG-1511 URL: https://issues.apache.org/jira/browse/PIG-1511 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Eric Tschetter Attachments: pig-1511.diff Pig generates a new jar file to ship over to Hadoop. Pig has a couple of packages whitelisted that it includes from its own jar. Pig throws away everything else. I package all of my dependencies into a single jar file. Pig is included in this jar file. I do it this way because my code needs to run reliably and reproducibly in production. Pig throws away all of my dependencies. I don't know what the performance gain is of shaving ~5MB off of a jar that is pushed to a job tracker once and then used to run over 100s of GB of data. The overhead is minimal on my cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception
[ https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-656: --- Attachment: pigusergroup656.patch Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception - Key: PIG-656 URL: https://issues.apache.org/jira/browse/PIG-656 Project: Pig Issue Type: Bug Components: documentation, grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Milind Bhandarkar Fix For: 0.3.0 Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, TOKENIZE.jar Consider a Pig script which does something similar to a word count. It uses the built-in TOKENIZE function, but packages it inside a class hierarchy such as mypackage.eval {code} register TOKENIZE.jar my_src = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t') AS (mlist: chararray); modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); describe modules; grouped = GROUP modules BY $0; describe grouped; counts = FOREACH grouped GENERATE COUNT(modules), group; ordered = ORDER counts BY $0; dump ordered; {code} The parser complains: === 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray} === I looked at the following source code at (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems that : EVAL is a keyword in Pig. Here are some clarifications: 1) Is there documentation on what the EVAL keyword actually is? 2) Is EVAL keyword actually implemented? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1505) support jars and scripts in dfs
[ https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891685#action_12891685 ] Richard Ding commented on PIG-1505: --- You can take a look at the test cases in TestPigRunner where local Pig scripts are passed to the PigRunner.run method. You can first copy a local Pig script to the mini-cluster using {code} Util.copyFromLocalToCluster(cluster, localScriptFileName, scriptFileNameOnCluster); {code} and then invoke run method with argument {code} String[] args = { -f, hdfs://scriptFileNameOnCluste }; PigRunner.run(args, null); {code} support jars and scripts in dfs --- Key: PIG-1505 URL: https://issues.apache.org/jira/browse/PIG-1505 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Attachments: pig-jars-and-scripts-from-dfs-3.patch, pig-jars-and-scripts-from-dfs-trunk-1.patch, pig-jars-and-scripts-from-dfs-trunk-2.patch, pig-jars-and-scripts-from-dfs-trunk.patch Pig can't operate on files stored in Amazon S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1435) make sure dependent jobs fail when a jon in multiquery fails
[ https://issues.apache.org/jira/browse/PIG-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1435: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to trunk. Thanks Niraj. make sure dependent jobs fail when a jon in multiquery fails Key: PIG-1435 URL: https://issues.apache.org/jira/browse/PIG-1435 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: niraj rai Fix For: 0.8.0 Attachments: depJobs.patch, depJobsFailure.patch, depJobsFailure2.patch, depJobsFailure3.patch Currently if one of the MQ jobs fails, Pig tries to run all remainin jobs. As the result, if data was partially generated by the failed job, you might get incorrect results from dependent jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce
[ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reassigned PIG-1516: -- Assignee: Thejas M Nair finalize in bag implementations causes pig to run out of memory in reduce -- Key: PIG-1516 URL: https://issues.apache.org/jira/browse/PIG-1516 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 *Problem:* pig bag implementations that are subclasses of DefaultAbstractBag, have finalize methods implemented. As a result, the garbage collector moves them to a finalization queue, and the memory used is freed only after the finalization happens on it. If the bags are not finalized fast enough, a lot of memory is consumed by the finalization queue, and pig runs out of memory. This can happen if large number of small bags are being created. *Solution:* The finalize function exists for the purpose of deleting the spill files that are created when the bag is too large. But if the bags are small enough, no spill files are created, and there is no use of the finalize function. A new class that holds a list of files will be introduced (FileList). This class will have a finalize method that deletes the files. The bags will no longer have finalize methods, and the bags will use FileList instead of ArrayListFile. *Possible workaround for earlier releases:* Since the fix is going into 0.8, here is a workaround - Disabling the combiner will reduce the number of bags getting created, as there will not be the stage of combining intermediate merge results. But I would recommend disabling it only if you have this problem as it is likely to slow down the query . To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce
finalize in bag implementations causes pig to run out of memory in reduce -- Key: PIG-1516 URL: https://issues.apache.org/jira/browse/PIG-1516 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Fix For: 0.8.0 *Problem:* pig bag implementations that are subclasses of DefaultAbstractBag, have finalize methods implemented. As a result, the garbage collector moves them to a finalization queue, and the memory used is freed only after the finalization happens on it. If the bags are not finalized fast enough, a lot of memory is consumed by the finalization queue, and pig runs out of memory. This can happen if large number of small bags are being created. *Solution:* The finalize function exists for the purpose of deleting the spill files that are created when the bag is too large. But if the bags are small enough, no spill files are created, and there is no use of the finalize function. A new class that holds a list of files will be introduced (FileList). This class will have a finalize method that deletes the files. The bags will no longer have finalize methods, and the bags will use FileList instead of ArrayListFile. *Possible workaround for earlier releases:* Since the fix is going into 0.8, here is a workaround - Disabling the combiner will reduce the number of bags getting created, as there will not be the stage of combining intermediate merge results. But I would recommend disabling it only if you have this problem as it is likely to slow down the query . To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Pig 0.8.0 branch plan
Pig Developers, I would like to propose that we branch for Pig 0.8.0 at the end of August and plan for the release by the end of October. Please, let me know if you see problem with either of the dates. If you are planning to contribute any patches to Pig 0.8.0, please, make sure that you have a JIRA open and linked to 0.8.0 release and also that you will be able to get the code in before the branch is created. If you have a JIRA assigned to you that is linked to Pig 0.8.0 and you don't think you can get it in before the branch, please, unlink it from the release. Thanks, Olga
[jira] Commented: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce
[ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891784#action_12891784 ] Thejas M Nair commented on PIG-1516: Regarding the workaround - I would recommend disabling the combiner only if other steps such as increasing the heap size or increasing the number of reducers do not help. finalize in bag implementations causes pig to run out of memory in reduce -- Key: PIG-1516 URL: https://issues.apache.org/jira/browse/PIG-1516 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 *Problem:* pig bag implementations that are subclasses of DefaultAbstractBag, have finalize methods implemented. As a result, the garbage collector moves them to a finalization queue, and the memory used is freed only after the finalization happens on it. If the bags are not finalized fast enough, a lot of memory is consumed by the finalization queue, and pig runs out of memory. This can happen if large number of small bags are being created. *Solution:* The finalize function exists for the purpose of deleting the spill files that are created when the bag is too large. But if the bags are small enough, no spill files are created, and there is no use of the finalize function. A new class that holds a list of files will be introduced (FileList). This class will have a finalize method that deletes the files. The bags will no longer have finalize methods, and the bags will use FileList instead of ArrayListFile. *Possible workaround for earlier releases:* Since the fix is going into 0.8, here is a workaround - Disabling the combiner will reduce the number of bags getting created, as there will not be the stage of combining intermediate merge results. But I would recommend disabling it only if you have this problem as it is likely to slow down the query . To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891788#action_12891788 ] Olga Natkovich commented on PIG-1249: - Ashutosh, First, the changes are not going to be in framework till Hadoop 22 and I don't think we want to wait that far as we are seeing quite a few problems on our cluster. Second, I think we want to take a direction with pig of setting things up for users. Of course, we don't have stats right now to do so accurately but I think this is a step in the right direction Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891789#action_12891789 ] Olga Natkovich commented on PIG-1249: - Jeff, sorry this patch did not get much attention in a while. Can I ask you to do the following: (1) Regenrate the patch for the latest trunk and make sure that the tests are passing and we get no additional warnings (2) Add a docs comment that describes in one place what are the exact heuristics, when they are applied and how they can be influenced. I will ask our doc writer to incorporate this information in Pig 0.8.0 documentation (3) If it is not already done, can we log the value that will be used so that the user knows what is happenning Thanks! Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-259: --- Fix Version/s: (was: 0.8.0) Unlinking since there is no activity since early may. Jeff, please, feel free to link in if you still planning to work on it for 0.8 release allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Sub-task Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Attachments: Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-466) PERFORMANCE: dropping the columns as soon as possible
[ https://issues.apache.org/jira/browse/PIG-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-466. Resolution: Fixed This is already resolved as part of PIG-1178 PERFORMANCE: dropping the columns as soon as possible - Key: PIG-466 URL: https://issues.apache.org/jira/browse/PIG-466 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Daniel Dai Fix For: 0.8.0 Currently, each operator carries all the data until foreach is encountered. This can cause significant performance degradation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-498) Pig does not error out while trying to use a input file to which the user does not have access permissions
[ https://issues.apache.org/jira/browse/PIG-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-498: -- Assignee: niraj rai I am guessing this issue might have gone away with Pig 0.7.0. Niraj, could you verify and if it is gone, please, close Pig does not error out while trying to use a input file to which the user does not have access permissions -- Key: PIG-498 URL: https://issues.apache.org/jira/browse/PIG-498 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: niraj rai Fix For: 0.8.0 Session illustrating the issue. {code} bash-3.00$ hadoop fs -ls /data/statistics.txt ls: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=username, access=READ_EXECUTE, inode=inodepermissions- bash-3.00$ pig -latest 2008-10-16 23:31:25,134 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to HOD... ... 2008-10-16 23:34:45,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: local grunt a = load '/data/statistics.txt'; grunt dump a; 2008-10-16 23:39:05,624 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2008-10-16 23:39:05,624 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! grunt {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-348: Assignee: Richard Ding (was: Corinne Chandel) -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891795#action_12891795 ] Richard Ding commented on PIG-348: -- I'll first remove the -j option from source code. -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Corinne Chandel Fix For: 0.8.0 According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to the trunk. [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-602) Pass global configurations to UDF
[ https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-602. Resolution: Fixed Pass global configurations to UDF - Key: PIG-602 URL: https://issues.apache.org/jira/browse/PIG-602 Project: Pig Issue Type: New Feature Components: impl Reporter: Yiping Han Fix For: 0.8.0 We are seeking an easy way to pass a large number of global configurations to UDFs. Since our application contains many pig jobs, and has a large number of configurations. Passing configurations through command line is not an ideal way (i.e. modifying single parameter needs to change multiple command lines). And to put everything into the hadoop conf is not an ideal way either. We would like to see if Pig can provide such a facility that allows us to pass a configuration file in some format(XML?) and then make it available through out all the UDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-602) Pass global configurations to UDF
[ https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891800#action_12891800 ] Olga Natkovich commented on PIG-602: This work is already done. The user can propagate the properties via -propertyfile filename from the command line and the retrieve the properties via call to UDFContext.getJobConf. Just need to document this for Pig 0.8.0 release Pass global configurations to UDF - Key: PIG-602 URL: https://issues.apache.org/jira/browse/PIG-602 Project: Pig Issue Type: New Feature Components: impl Reporter: Yiping Han Fix For: 0.8.0 We are seeking an easy way to pass a large number of global configurations to UDFs. Since our application contains many pig jobs, and has a large number of configurations. Passing configurations through command line is not an ideal way (i.e. modifying single parameter needs to change multiple command lines). And to put everything into the hadoop conf is not an ideal way either. We would like to see if Pig can provide such a facility that allows us to pass a configuration file in some format(XML?) and then make it available through out all the UDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-348: - Attachment: PIG-348.path -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-348: - Status: Patch Available (was: Open) -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1379: -- Status: Open (was: Patch Available) Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1379: -- Attachment: (was: PIG-1379.patch) Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1379: -- Attachment: (was: PIG-1379.patch) Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891815#action_12891815 ] Richard Ding commented on PIG-1379: --- Alan, I got your point. I now think that we should reconsider this feature request. It isn't clear to me why this is useful. Users can use parameter substitution if they don't want to change the Pig scripts. I moved the posted patch to PIG-348. Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1379. - Resolution: Won't Fix This is a non-backward compatible fix and it is not clear why we need to make it. Parameter substitution can be used to drive execution from command line Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891820#action_12891820 ] Olga Natkovich commented on PIG-348: +1, changes look good -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-621) Casts swallow exceptions when there are issues with conversion of bytes to Pig types
[ https://issues.apache.org/jira/browse/PIG-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-621: --- Fix Version/s: 0.9.0 (was: 0.8.0) 0.9 is all about improved error handling Casts swallow exceptions when there are issues with conversion of bytes to Pig types Key: PIG-621 URL: https://issues.apache.org/jira/browse/PIG-621 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Fix For: 0.9.0 In the current implementation of casts, exceptions thrown while converting bytes to Pig types are swallowed. Pig needs to either return NULL or rethrow the exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-729) Use of default parallelism
[ https://issues.apache.org/jira/browse/PIG-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-729. Resolution: Duplicate We are going with the approach outlined in PIG-1249. Use of default parallelism -- Key: PIG-729 URL: https://issues.apache.org/jira/browse/PIG-729 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Fix For: 0.8.0 Currently, if the user does not specify the number of reduce slots using the parallel keyword, Pig lets Hadoop decide on the default number of reducers. This model worked well with dynamically allocated clusters using HOD and for static clusters where the default number of reduce slots was explicitly set. With Hadoop 0.20, a single static cluster will be shared amongst a number of queues. As a result, a common scenario is to end up with default number of reducers set to one (1). When users migrate to Hadoop 0.20, they might see a dramatic change in the performance of their queries if they had not used the parallel keyword to specify the number of reducers. In order to mitigate such circumstances, Pig can support one of the following: 1. Specify a default parallelism for the entire script. This option will allow users to use the same parallelism for all operators that do not have the explicit parallel keyword. This will ensure that the scripts utilize more reducers than the default of one reducer. On the down side, due to data transformations, usually operations that are performed towards the end of the script will need smaller number of reducers compared to the operators that appear at the beginning of the script. 2. Display a warning message for each reduce side operator that does have the use of the explicit parallel keyword. Proceed with the execution. 3. Display an error message indicating the operator that does not have the explicit use of the parallel keyword. Stop the execution. Other suggestions/thoughts/solutions are welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891827#action_12891827 ] Richard Ding commented on PIG-348: -- test-patch results: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] {code} -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-787) Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache
[ https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-787. Resolution: Won't Fix Does not look like there is reason to do this Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache -- Key: PIG-787 URL: https://issues.apache.org/jira/browse/PIG-787 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-873) Optimizer should allow search for global patterns
[ https://issues.apache.org/jira/browse/PIG-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-873: -- Assignee: Daniel Dai Daniel, please review with Santhosh if additional work is required. If not, please, close. If there is more work, lets discuss if we need to do this in Pig 0.8.0. Thanks Optimizer should allow search for global patterns - Key: PIG-873 URL: https://issues.apache.org/jira/browse/PIG-873 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Santhosh Srinivasan Assignee: Daniel Dai Fix For: 0.8.0 Currently, the optimizer works on the following mechanism: 1. Specify the pattern to be searched 2. For each occurrence of the pattern, check and then apply a transformation With this approach, the search for a pattern is localized. An example will illustrate the problem. If the pattern to be searched for is foreach (with flatten) connected to any operator and if the graph has more than one foreach (with flatten) connected to an operator (cross, join, union, etc), then each instance of foreach connected to the operator is returned as a match. While this is fine for a localized view (per match), at a global view the pattern to be searched for is any number of foreach connected to an operator. The implication of not having a globalized view is more rules. There will be one rule for one foreach connected to an opeator, one rule for two foreachs connected to an operators, etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-930) merge join should handle compressed bz2 sorted files
[ https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-930: --- Fix Version/s: (was: 0.8.0) Unlinking from the release. We have not really seen user asks for this merge join should handle compressed bz2 sorted files Key: PIG-930 URL: https://issues.apache.org/jira/browse/PIG-930 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath There are two issues - POLoad which is used to read the right side input does not handle bz2 files right now. This needs to be fixed. Further inn the index map job we bindTo(startOfBlockOffSet) (this will internally discard first tuple if offset 0). Then we do the following: {noformat} While(tuple survives pipeline) { Pos = getPosition() getNext() run the tuple through pipeline in the right side which could have filter } Emit(key, pos, filename). {noformat} Then in the map job which does the join, we bindTo(pos 0 ? pos 1 : pos) (we do pos -1 because bindTo will discard first tuple for pos 0). Then we do getNext() Now in bz2 compressed files, getPosition() returns a position which is not really accurate. The problem is it could be a position in the middle of a compressed bz2 block. Then when we use that position to bindTo() in the final map job, the code would first hunt for a bz2 block header thus skipping the whole current bz2 block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-932) Required fields projection in Loader: nested fields in bag/tuple, map key lookup more than two levels
[ https://issues.apache.org/jira/browse/PIG-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-932. Resolution: Duplicate This is duplicate of https://issues.apache.org/jira/browse/PIG-1324 Required fields projection in Loader: nested fields in bag/tuple, map key lookup more than two levels - Key: PIG-932 URL: https://issues.apache.org/jira/browse/PIG-932 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 To leverage the performance features provided by Zebra, Pig should be able to figure out which input fields are actually used in Pig script, and prune unnecessary inputs. This feature is being implementing in [PIG-922|https://issues.apache.org/jira/browse/PIG-922]. However, there are two limitations currently: 1. Pruning nested fields only apply to map. We do not prune sub-field inside a bag or tuple 2. For map, currently we only go one level deep. Eg, if in Pig script, user uses a#'key0'#'key1', a#'key0' will be asked These two limitations are in line with current limitation of Zebra loader. Once Zebra loader can handle this, we need to work to lift these limitations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.
[ https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-947: --- Fix Version/s: (was: 0.8.0) I don't think anybody is signed up for this issue. Please, relink to the release if you are interested to work on it and assign to yourself. Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple. Key: PIG-947 URL: https://issues.apache.org/jira/browse/PIG-947 Project: Pig Issue Type: Bug Components: data Environment: Pig on Hadoop 18 Reporter: Gandul Azul PigStorage parser for bags is not working correctly when a tuple in a bag is proceeded by a space. For example, the following is parsed correctly: {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} while this is not: (Note the space before the second tuple) {(-5.243084,3.142401,0.000138,2.071200,0), (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} It seems that the parser when it encounters the space, treats the rest of the line as a String. With a schema, this results in a typecast of string to databag which results in exception. |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field being converted to type bag, caught ParseException Encountered STRING at |line 1, column 43. |Was expecting: |( ... | field discarded Below is the parser debug output for the parsing of the above error sequence: 2.071200,0), ( from above... ** FOUND A DOUBLENUMBER MATCH (2.071200) ** Call: AtomDatum Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : , (44) at line 1 column 39 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 39 Call: Datum Matched the empty string as STRING token. Current character : 0 (48) at line 1 column 40 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : 0 (48) at line 1 column 40 Currently matched the first 1 characters as a SIGNEDINTEGER token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER, LONGINTEGER, FLOATNUMBER } Current character : ) (41) at line 1 column 41 Currently matched the first 1 characters as a SIGNEDINTEGER token. Putting back 1 characters into the input stream. ** FOUND A SIGNEDINTEGER MATCH (0) ** Call: AtomDatum Consumed token: SIGNEDINTEGER: 0 at line 1 column 40 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : ) (41) at line 1 column 41 No more string literal token matches are possible. Currently matched the first 1 characters as a ) token. ** FOUND A ) MATCH ()) ** Return: Tuple Consumed token: ) at line 1 column 41 Matched the empty string as STRING token. Current character : , (44) at line 1 column 42 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 42 Matched the empty string as STRING token. Current character : (32) at line 1 column 43 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : (32) at line 1 column 43 Currently matched the first 1 characters as a STRING token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : ( (40) at line 1 column 44 Currently matched the first 1 characters as a STRING token. Putting back 1 characters into the input stream. ** FOUND A STRING MATCH ( ) ** Return: Bag Return: Datum Return: Parse -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
[ https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-959: --- Fix Version/s: (was: 0.8.0) We are not seeing any asks for this at this time Merge Join fails when there is a blocking operator before it in query. -- Key: PIG-959 URL: https://issues.apache.org/jira/browse/PIG-959 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-959.patch If there is an order-by, distinct or any other blocking operator in query followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement
[ https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1489: -- Attachment: PIG-1489_1.patch New patch adding the source code of the test jar. Pig MapReduceLauncher does not use jars in register statement --- Key: PIG-1489 URL: https://issues.apache.org/jira/browse/PIG-1489 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher will try to instantiate it before launching the mapreduce job and fail with ClassNotFoundException. This happens because Pig MapReduce launcher uses its own classloader and ignores the classes in the jars in the register statement. The effect is that the jars not only have to be in register statement in the script but also in the pig classpath with the -classpath tag. This can be remedied by making the Pig MapReduceLauncher constructing a classloader that includes the registered jars and using that to instantiate the OutputFormat class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement
[ https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891856#action_12891856 ] Thejas M Nair commented on PIG-1489: +1 You can commit after verifying that tests checks are passing. Pig MapReduceLauncher does not use jars in register statement --- Key: PIG-1489 URL: https://issues.apache.org/jira/browse/PIG-1489 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher will try to instantiate it before launching the mapreduce job and fail with ClassNotFoundException. This happens because Pig MapReduce launcher uses its own classloader and ignores the classes in the jars in the register statement. The effect is that the jars not only have to be in register statement in the script but also in the pig classpath with the -classpath tag. This can be remedied by making the Pig MapReduceLauncher constructing a classloader that includes the registered jars and using that to instantiate the OutputFormat class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891857#action_12891857 ] Olga Natkovich commented on PIG-1150: - Dmitry, is patch ready to be committed or are you planning to submit a new one? Thanks VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891858#action_12891858 ] Olga Natkovich commented on PIG-1205: - Jeff and Dmitry - are you still planning to finish this for Pig 0.8.0 release Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement
[ https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891861#action_12891861 ] Richard Ding commented on PIG-1489: --- test-patch results: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 10 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Pig MapReduceLauncher does not use jars in register statement --- Key: PIG-1489 URL: https://issues.apache.org/jira/browse/PIG-1489 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher will try to instantiate it before launching the mapreduce job and fail with ClassNotFoundException. This happens because Pig MapReduce launcher uses its own classloader and ignores the classes in the jars in the register statement. The effect is that the jars not only have to be in register statement in the script but also in the pig classpath with the -classpath tag. This can be remedied by making the Pig MapReduceLauncher constructing a classloader that includes the registered jars and using that to instantiate the OutputFormat class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891863#action_12891863 ] Dmitriy V. Ryaboy commented on PIG-1150: Meh. Go ahead and commit. Don't put it into builtin, since it has math problems at scale. Ok for piggybank. VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891864#action_12891864 ] Dmitriy V. Ryaboy commented on PIG-1205: When is the cut-off date for that? Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Attachment: guava.jar.r06.patch Attaching the patch with guava.jar r06 version as no one had problem in migrating to that version. guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Status: Open (was: Patch Available) guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Status: Patch Available (was: Open) guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.