[jira] Commented: (PIG-882) log level not propogated to loggers
[ https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738972#action_12738972 ] Hudson commented on PIG-882: Integrated in Pig-trunk #512 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/512/]) : log level not propogated to loggers log level not propogated to loggers Key: PIG-882 URL: https://issues.apache.org/jira/browse/PIG-882 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, PIG-882-4.patch, PIG-882-5.patch Pig accepts log level as a parameter. But the log level it captures is not set appropriately, so that loggers in different classes log at the specified level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Is it possible to access Configuration in UDF ?
At the moment we can't make UDFs dependant on Hadoop as people also use them for testing in local mode which is currently not based on Hadoop local mode due to performance constrains. I agree that we need to provide a way to get UDF a configuration/property object. Olga -Original Message- From: Daniel Dai [mailto:dai...@gmail.com] Sent: Monday, August 03, 2009 9:20 PM To: pig-dev@hadoop.apache.org; pig-u...@hadoop.apache.org Subject: Re: Is it possible to access Configuration in UDF ? Hi, Jeff, This is not API at all, this is a hack to make things work. We do lack couples of features for UDF: 1. reporter and counter (PIG-889) 2. access global properties 3. ability to maintain states across different UDF invocations 4. input schema 5. variable length arguments (PIG-902) Your suggestion sounds resonable. We need to provide a well designed interface for these features. - Original Message - From: zhang jianfeng zjf...@gmail.com To: pig-u...@hadoop.apache.org; pig-dev@hadoop.apache.org Sent: Monday, August 03, 2009 8:03 PM Subject: Re: Is it possible to access Configuration in UDF ? Dmitriy, Thank you for your help. I find this way of using API is not so intuitive , I recommend the base class of UDF to implements the Configurable interface. Then each UDF can use the getConf() to get the Configuration object. Because UDF is part of MapReduce , it makes sense to make it Configurable. The following is what I recommend to change the EvalFunc public abstract class EvalFuncT implements Configurable{ .. protected Configuration conf; .. public EvalFunc(){ conf=PigMapReduce.sJobConf; } .. @Override public void setConf(Configuration conf) { this.conf=conf; } @Override public Configuration getConf() { return this.conf; } Jeff Zhang On Mon, Aug 3, 2009 at 8:52 PM, Dmitriy Ryaboy dvrya...@cloudera.comwrote: You can access the JobConf with the following call: ConfigurationUtil.toProperties(PigMapReduce.sJobConf) On Mon, Aug 3, 2009 at 12:40 AM, zhang jianfengzjf...@gmail.com wrote: Hi all, I'd like to set property in Configuration to customize my UDF. But it looks like I can not access the Configuration object in UDF. Does pig have a plan to support this feature ? Thank you. Jeff Zhang
[jira] Created: (PIG-905) TOKENIZE throws exception on null data
TOKENIZE throws exception on null data -- Key: PIG-905 URL: https://issues.apache.org/jira/browse/PIG-905 Project: Pig Issue Type: Bug Reporter: Olga Natkovich it should just return null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-901: --- Attachment: PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext Key: PIG-901 URL: https://issues.apache.org/jira/browse/PIG-901 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext. SliceWrapper only needs ExecType - so the entire PigContext should not be serialized and only the ExecType should be serialized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-901: --- Status: Patch Available (was: Open) PIG-901-trunk.patch is for the trunk. The change is in SliceWrapper to serialize ExecType only instead of PigContext since only the ExecType from the PigContext is used on deserialization. The package import list which Daniel referred to is a static member of PigContext which is explicitly set in SliceWrapper.makeRecordReader() and hence is taken care of. It is a good suggestion to include a test case to check that even with a sizeable PigContext, we actually create small input splits. However to do this in the current Pig code layout means opening up PigServer and JobControlCompiler so that we can compile a pig script upto job creation and then instead of submitting the job to hadoop, instatiate PigInputFormat with the jobConf and get the Input Splits. This may require some design changes which we should address at some point for these kinds of tests. For now there is regression test in the patch to ensure the package import list is correctly handled and we have manually tested to ensure the split size is small (order of KBs). InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext Key: PIG-901 URL: https://issues.apache.org/jira/browse/PIG-901 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext. SliceWrapper only needs ExecType - so the entire PigContext should not be serialized and only the ExecType should be serialized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739048#action_12739048 ] Arun C Murthy commented on PIG-901: --- bq. This may require some design changes which we should address at some point for these kinds of tests. Could you please track this with a new jira? Thanks! InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext Key: PIG-901 URL: https://issues.apache.org/jira/browse/PIG-901 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext. SliceWrapper only needs ExecType - so the entire PigContext should not be serialized and only the ExecType should be serialized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-906) Need a way to test integration points with Hadoop from unit tests
Need a way to test integration points with Hadoop from unit tests - Key: PIG-906 URL: https://issues.apache.org/jira/browse/PIG-906 Project: Pig Issue Type: Improvement Affects Versions: 0.3.1 Reporter: Pradeep Kamath Priority: Minor Currently there is no easy mechanisim from unit tests to get hold of the compiled JobConf (or Job) for a script from a unit test testcase. This may require some design changes like having public methods in PigServer and JobControlCompiler to be able to compile a script upto launch and then get hold of the JobConf or Job to ensure things are set up right. The need for this showed up in PIG-901 as described in https://issues.apache.org/jira/browse/PIG-901?focusedCommentId=12739044page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12739044. That use case can be used as one of the requirements for the design change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739056#action_12739056 ] Pradeep Kamath commented on PIG-901: https://issues.apache.org/jira/browse/PIG-906 has been created to track changes to enable unit testing these types of hadoop integration scenarios. InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext Key: PIG-901 URL: https://issues.apache.org/jira/browse/PIG-901 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext. SliceWrapper only needs ExecType - so the entire PigContext should not be serialized and only the ExecType should be serialized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-907: --- Attachment: PIG-907-1.patch Provide multiple version of HashFNV (Piggybank) --- Key: PIG-907 URL: https://issues.apache.org/jira/browse/PIG-907 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-907-1.patch HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV when PIG-902 is not solved. So we can let the Pig pick the right version, do the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-908) Need a way to correlate MR jobs with Pig statements
Need a way to correlate MR jobs with Pig statements --- Key: PIG-908 URL: https://issues.apache.org/jira/browse/PIG-908 Project: Pig Issue Type: Wish Reporter: Dmitriy V. Ryaboy Complex Pig Scripts often generate many Map-Reduce jobs, especially with the recent introduction of multi-store capabilities. For example, the first script in the Pig tutorial produces 5 MR jobs. There is currently very little support for debugging resulting jobs; if one of the MR jobs fails, it is hard to figure out which part of the script it was responsible for. Explain plans help, but even with the explain plan, a fair amount of effort (and sometimes, experimentation) is required to correlate the failing MR job with the corresponding PigLatin statements. This ticket is created to discuss approaches to alleviating this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-908) Need a way to correlate MR jobs with Pig statements
[ https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739125#action_12739125 ] Dmitriy V. Ryaboy commented on PIG-908: --- An idea for something might work (haven't evaluated the complexity of implementing this) When LogicalOperators are created, a bit of metadata is attached to them, listing the line number that they come from. Multiple LOs may be created from a single line, and multiple lines may be associated with a single operator. This metadata is passed down to Physical Operators. When an MR job is created, a log message is written listing the line numbers that are associated with the POs in this map-reduce job, and the job name. Thoughts? Need a way to correlate MR jobs with Pig statements --- Key: PIG-908 URL: https://issues.apache.org/jira/browse/PIG-908 Project: Pig Issue Type: Wish Reporter: Dmitriy V. Ryaboy Complex Pig Scripts often generate many Map-Reduce jobs, especially with the recent introduction of multi-store capabilities. For example, the first script in the Pig tutorial produces 5 MR jobs. There is currently very little support for debugging resulting jobs; if one of the MR jobs fails, it is hard to figure out which part of the script it was responsible for. Explain plans help, but even with the explain plan, a fair amount of effort (and sometimes, experimentation) is required to correlate the failing MR job with the corresponding PigLatin statements. This ticket is created to discuss approaches to alleviating this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-908) Need a way to correlate MR jobs with Pig statements
[ https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739147#action_12739147 ] Santhosh Srinivasan commented on PIG-908: - +1 This approach has been discussed but not documented. Need a way to correlate MR jobs with Pig statements --- Key: PIG-908 URL: https://issues.apache.org/jira/browse/PIG-908 Project: Pig Issue Type: Wish Reporter: Dmitriy V. Ryaboy Complex Pig Scripts often generate many Map-Reduce jobs, especially with the recent introduction of multi-store capabilities. For example, the first script in the Pig tutorial produces 5 MR jobs. There is currently very little support for debugging resulting jobs; if one of the MR jobs fails, it is hard to figure out which part of the script it was responsible for. Explain plans help, but even with the explain plan, a fair amount of effort (and sometimes, experimentation) is required to correlate the failing MR job with the corresponding PigLatin statements. This ticket is created to discuss approaches to alleviating this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-907: --- Status: Patch Available (was: Open) Provide multiple version of HashFNV (Piggybank) --- Key: PIG-907 URL: https://issues.apache.org/jira/browse/PIG-907 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-907-1.patch HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV when PIG-902 is not solved. So we can let the Pig pick the right version, do the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-907: --- Attachment: PIG-907-2.patch Changed the patch to include license and more decent error handling. Thanks Thejas to point out. Provide multiple version of HashFNV (Piggybank) --- Key: PIG-907 URL: https://issues.apache.org/jira/browse/PIG-907 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-907-1.patch, PIG-907-2.patch HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV when PIG-902 is not solved. So we can let the Pig pick the right version, do the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-909: -- Attachment: pig_909.patch The attached patch modifies bin/pig as described. Tested locally by setting and unsetting HADOOP_HOME and making sure the right configurations, etc, are picked up. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739244#action_12739244 ] Daniel Dai commented on PIG-909: Seems like bin/pig is broken for a while. Some libraries have been moved to build/ivy/lib/Pig, and pig script does not take care of it correctly. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-909: -- Attachment: pig_909.2.patch added ivy jars to classpath Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.2.patch, pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739269#action_12739269 ] Daniel Dai commented on PIG-909: Hi, Dmitriy, One problem is that hadoop.jar comes with pig actually bundles lots of external libraries needed by hadoop such as log4j, common-logging. If we skip hadoop.jar and use external one, we miss all those libraries. Can we try this? If we have external hadoop.jar, put it in front of pig.jar in classpath. So java will pick classes in external hadoop.jar first. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.2.patch, pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739282#action_12739282 ] Daniel Dai commented on PIG-909: Yes, Dmitriy, you said it. However, if we do not have external hadoop, pig script do not currently work. We need to fix it. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.2.patch, pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739287#action_12739287 ] Dmitriy V. Ryaboy commented on PIG-909: --- Daniel, not sure what you mean. Do you mean that the patch makes it necessary to have an external version of hadoop to build/run pig? That's not the case, as I wrapped the whole thing in an if -- external hadoop jars will only be used instead of the bundled hadoop.jar if HADOOP_HOME is defined (and valid). Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.2.patch, pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739292#action_12739292 ] Daniel Dai commented on PIG-909: Hi, Dmitriy, It does not related to the patch. What I mean is pig script in trunk is not working correctly even before patch. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.2.patch, pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739297#action_12739297 ] Dmitriy V. Ryaboy commented on PIG-909: --- Actually I looked at build.xml for pig, and it includes the Ivy dependencies in pig.jar Which explains why this stuff has been working for me. I'll delete the second patch -- that change is unnecessary. Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-909: -- Attachment: (was: pig_909.2.patch) Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-660) Integration with Hadoop 0.20
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-660: -- Attachment: pig_660_shims.patch Attached patch, pig_660_shims.patch, introduces an compatibility layer similar to that in https://issues.apache.org/jira/browse/HIVE-487 . HadoopShims.java contains wrappers that hide interface differences between Hadoop 18 and 20; when an interface change affects Pig, a shim is added into this class, and used by Pig. Separate versions of the shims are maintained for different Hadoop versions. This way, Pig users can compile against either Hadoop 18 or Hadoop 20 by simply changing an ant property, either via the -D flag, or build.properties, instead of having to go through the process of patching. There has been discussion of officially moving Pig to 0.20; this way, we sidestep the whole question, and only need to worry about version compatibility when using specific Hadoop APIs. I propose that we use this mechanism until Pig is moved to use the new, future-proofed API. Pig compiled against 18 won't be able to use some of the newest features, such as Zebra storage. Ant can be configured not to build ant if Hadoop version is 20. Integration with Hadoop 0.20 Key: PIG-660 URL: https://issues.apache.org/jira/browse/PIG-660 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking. 1. Hadoop should return objects instead of strings when exceptions are thrown 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-905) TOKENIZE throws exception on null data
[ https://issues.apache.org/jira/browse/PIG-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739320#action_12739320 ] Jeff Zhang commented on PIG-905: I find that TOKENIZE can not handle DataByteArray. It can only handle string. I believe it should be better to handle both DataByteArray and String. In my opinion, whenever an UDF support one of them, it should support both of them. Because they are almost the same except that DataByteArray is Comparable and Serializable. TOKENIZE throws exception on null data -- Key: PIG-905 URL: https://issues.apache.org/jira/browse/PIG-905 Project: Pig Issue Type: Bug Reporter: Olga Natkovich it should just return null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.