[jira] Commented: (PIG-882) log level not propogated to loggers

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738972#action_12738972
 ] 

Hudson commented on PIG-882:


Integrated in Pig-trunk #512 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/512/])
: log level not propogated to loggers


 log level not propogated to loggers 
 

 Key: PIG-882
 URL: https://issues.apache.org/jira/browse/PIG-882
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.4.0

 Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
 PIG-882-4.patch, PIG-882-5.patch


 Pig accepts log level as a parameter. But the log level it captures is not 
 set appropriately, so that loggers in different classes log at the specified 
 level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Is it possible to access Configuration in UDF ?

2009-08-04 Thread Olga Natkovich
At the moment we can't make UDFs dependant on Hadoop as people also use
them for testing in local mode which is currently not based on Hadoop
local mode due to performance constrains.

I agree that we need to provide a way to get UDF a
configuration/property object.

Olga

-Original Message-
From: Daniel Dai [mailto:dai...@gmail.com] 
Sent: Monday, August 03, 2009 9:20 PM
To: pig-dev@hadoop.apache.org; pig-u...@hadoop.apache.org
Subject: Re: Is it possible to access Configuration in UDF ?

Hi, Jeff,
This is not API at all, this is a hack to make things work. We do lack 
couples of features for UDF:
1. reporter and counter (PIG-889)
2. access global properties
3. ability to maintain states across different UDF invocations
4. input schema
5. variable length arguments (PIG-902)

Your suggestion sounds resonable. We need to provide a well designed 
interface for these features.

- Original Message - 
From: zhang jianfeng zjf...@gmail.com
To: pig-u...@hadoop.apache.org; pig-dev@hadoop.apache.org
Sent: Monday, August 03, 2009 8:03 PM
Subject: Re: Is it possible to access Configuration in UDF ?


 Dmitriy,

 Thank you for your help.

 I find this way of using API is not so intuitive ,  I recommend the
base
 class of UDF to implements the Configurable interface.
 Then each UDF can use the getConf() to get the Configuration object.
 Because UDF is part of MapReduce , it makes sense to make it
Configurable.

 The following is what I recommend to change the EvalFunc

 public abstract class EvalFuncT  implements Configurable{
 ..
 protected Configuration conf;
 ..
 public EvalFunc(){
 conf=PigMapReduce.sJobConf;
 }
 ..
 @Override
public void setConf(Configuration conf) {
this.conf=conf;
}

@Override
public Configuration getConf() {
return this.conf;
}




 Jeff Zhang





 On Mon, Aug 3, 2009 at 8:52 PM, Dmitriy Ryaboy 
 dvrya...@cloudera.comwrote:

 You can access the JobConf with the following call:

 ConfigurationUtil.toProperties(PigMapReduce.sJobConf)

 On Mon, Aug 3, 2009 at 12:40 AM, zhang jianfengzjf...@gmail.com
wrote:
  Hi all,
 
  I'd like to set property in Configuration to customize my UDF. But
it
 looks
  like I can not access the Configuration object in UDF.
 
  Does pig have a plan to support this feature ?
 
 
  Thank you.
 
  Jeff Zhang
 

 



[jira] Created: (PIG-905) TOKENIZE throws exception on null data

2009-08-04 Thread Olga Natkovich (JIRA)
TOKENIZE throws exception on null data
--

 Key: PIG-905
 URL: https://issues.apache.org/jira/browse/PIG-905
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich


it should just return null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-901:
---

Attachment: PIG-901-trunk.patch

 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext
 

 Key: PIG-901
 URL: https://issues.apache.org/jira/browse/PIG-901
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, 
 PIG-901-trunk.patch


 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext. SliceWrapper only needs ExecType - so the entire PigContext 
 should not be serialized and only the ExecType should be serialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-901:
---

Status: Patch Available  (was: Open)

PIG-901-trunk.patch is for the trunk. The change is in SliceWrapper to 
serialize ExecType only instead of PigContext since only the ExecType from the 
PigContext is used on deserialization. The package import list which Daniel 
referred to is a static member of PigContext which is explicitly set in 
SliceWrapper.makeRecordReader() and hence is taken care of.

It is a good suggestion to include a test case to check that even with a 
sizeable PigContext, we actually create small input splits. However to do this 
in the current Pig code layout means opening up PigServer and 
JobControlCompiler so that we can compile a pig script upto job creation and 
then instead of submitting the job to hadoop, instatiate PigInputFormat with 
the jobConf and get the Input Splits. This may require some design changes 
which we should address at some point for these kinds of tests. For now there 
is regression test in the patch to ensure the package import list is correctly 
handled and we have manually tested to ensure the split size is small (order of 
KBs).

 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext
 

 Key: PIG-901
 URL: https://issues.apache.org/jira/browse/PIG-901
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, 
 PIG-901-trunk.patch


 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext. SliceWrapper only needs ExecType - so the entire PigContext 
 should not be serialized and only the ExecType should be serialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739048#action_12739048
 ] 

Arun C Murthy commented on PIG-901:
---

bq. This may require some design changes which we should address at some point 
for these kinds of tests.

Could you please track this with a new jira? Thanks!

 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext
 

 Key: PIG-901
 URL: https://issues.apache.org/jira/browse/PIG-901
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, 
 PIG-901-trunk.patch


 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext. SliceWrapper only needs ExecType - so the entire PigContext 
 should not be serialized and only the ExecType should be serialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-906) Need a way to test integration points with Hadoop from unit tests

2009-08-04 Thread Pradeep Kamath (JIRA)
Need a way to test integration points with Hadoop from unit tests
-

 Key: PIG-906
 URL: https://issues.apache.org/jira/browse/PIG-906
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Priority: Minor


Currently there is no easy mechanisim from unit tests to get hold of the 
compiled JobConf (or Job) for a script from a unit test testcase. This may 
require some design changes like having public methods in PigServer and 
JobControlCompiler to be able to compile a script upto launch and then get hold 
of the JobConf or Job to ensure things are set up right. The need for this 
showed up in PIG-901 as described in 
https://issues.apache.org/jira/browse/PIG-901?focusedCommentId=12739044page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12739044.
 That use case can be used as one of the requirements for the design change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739056#action_12739056
 ] 

Pradeep Kamath commented on PIG-901:


https://issues.apache.org/jira/browse/PIG-906 has been created to track changes 
to enable unit testing these types of hadoop integration scenarios.

 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext
 

 Key: PIG-901
 URL: https://issues.apache.org/jira/browse/PIG-901
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, 
 PIG-901-trunk.patch


 InputSplit (SliceWrapper) created by Pig is big in size due to serialized 
 PigContext. SliceWrapper only needs ExecType - so the entire PigContext 
 should not be serialized and only the ExecType should be serialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-907:
---

Attachment: PIG-907-1.patch

 Provide multiple version of HashFNV (Piggybank)
 ---

 Key: PIG-907
 URL: https://issues.apache.org/jira/browse/PIG-907
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-907-1.patch


 HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
 when PIG-902 is not solved. So we can let the Pig pick the right version, do 
 the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-908) Need a way to correlate MR jobs with Pig statements

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)
Need a way to correlate MR jobs with Pig statements
---

 Key: PIG-908
 URL: https://issues.apache.org/jira/browse/PIG-908
 Project: Pig
  Issue Type: Wish
Reporter: Dmitriy V. Ryaboy


Complex Pig Scripts often generate many Map-Reduce jobs, especially with the 
recent introduction of multi-store capabilities.
For example, the first script in the Pig tutorial produces 5 MR jobs.

There is currently very little support for debugging resulting jobs; if one of 
the MR jobs fails, it is hard to figure out which part of the script it was 
responsible for. Explain plans help, but even with the explain plan, a fair 
amount of effort (and sometimes, experimentation) is required to correlate the 
failing MR job with the corresponding PigLatin statements.

This ticket is created to discuss approaches to alleviating this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-908) Need a way to correlate MR jobs with Pig statements

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739125#action_12739125
 ] 

Dmitriy V. Ryaboy commented on PIG-908:
---

An idea for something might work (haven't evaluated the complexity of 
implementing this)

When LogicalOperators are created, a bit of metadata is attached to them, 
listing the line number that they come from.  Multiple LOs may be created from 
a single line, and multiple lines may be associated with a single operator. 

This metadata is passed down to Physical Operators.

When an MR job is created, a log message is written listing the line numbers 
that are associated with the POs in this map-reduce job, and the job name.

Thoughts?

 Need a way to correlate MR jobs with Pig statements
 ---

 Key: PIG-908
 URL: https://issues.apache.org/jira/browse/PIG-908
 Project: Pig
  Issue Type: Wish
Reporter: Dmitriy V. Ryaboy

 Complex Pig Scripts often generate many Map-Reduce jobs, especially with the 
 recent introduction of multi-store capabilities.
 For example, the first script in the Pig tutorial produces 5 MR jobs.
 There is currently very little support for debugging resulting jobs; if one 
 of the MR jobs fails, it is hard to figure out which part of the script it 
 was responsible for. Explain plans help, but even with the explain plan, a 
 fair amount of effort (and sometimes, experimentation) is required to 
 correlate the failing MR job with the corresponding PigLatin statements.
 This ticket is created to discuss approaches to alleviating this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-908) Need a way to correlate MR jobs with Pig statements

2009-08-04 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739147#action_12739147
 ] 

Santhosh Srinivasan commented on PIG-908:
-

+1

This approach has been discussed but not documented.

 Need a way to correlate MR jobs with Pig statements
 ---

 Key: PIG-908
 URL: https://issues.apache.org/jira/browse/PIG-908
 Project: Pig
  Issue Type: Wish
Reporter: Dmitriy V. Ryaboy

 Complex Pig Scripts often generate many Map-Reduce jobs, especially with the 
 recent introduction of multi-store capabilities.
 For example, the first script in the Pig tutorial produces 5 MR jobs.
 There is currently very little support for debugging resulting jobs; if one 
 of the MR jobs fails, it is hard to figure out which part of the script it 
 was responsible for. Explain plans help, but even with the explain plan, a 
 fair amount of effort (and sometimes, experimentation) is required to 
 correlate the failing MR job with the corresponding PigLatin statements.
 This ticket is created to discuss approaches to alleviating this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-907:
---

Status: Patch Available  (was: Open)

 Provide multiple version of HashFNV (Piggybank)
 ---

 Key: PIG-907
 URL: https://issues.apache.org/jira/browse/PIG-907
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-907-1.patch


 HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
 when PIG-902 is not solved. So we can let the Pig pick the right version, do 
 the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-907:
---

Attachment: PIG-907-2.patch

Changed the patch to include license and more decent error handling. Thanks 
Thejas to point out.

 Provide multiple version of HashFNV (Piggybank)
 ---

 Key: PIG-907
 URL: https://issues.apache.org/jira/browse/PIG-907
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-907-1.patch, PIG-907-2.patch


 HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
 when PIG-902 is not solved. So we can let the Pig pick the right version, do 
 the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-909:
--

Attachment: pig_909.patch

The attached patch modifies bin/pig as described.

Tested locally by setting and unsetting HADOOP_HOME and making sure the right 
configurations, etc, are picked up.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739244#action_12739244
 ] 

Daniel Dai commented on PIG-909:


Seems like bin/pig is broken for a while. Some libraries have been moved to 
build/ivy/lib/Pig, and pig script does not take care of it correctly.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-909:
--

Attachment: pig_909.2.patch

added ivy jars to classpath

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.2.patch, pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739269#action_12739269
 ] 

Daniel Dai commented on PIG-909:


Hi, Dmitriy,
One problem is that hadoop.jar comes with pig actually bundles lots of external 
libraries needed by hadoop such as log4j, common-logging. If we skip hadoop.jar 
and use external one, we miss all those libraries. Can we try this? If we have 
external hadoop.jar, put it in front of pig.jar in classpath. So java will pick 
classes in external hadoop.jar first.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.2.patch, pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739282#action_12739282
 ] 

Daniel Dai commented on PIG-909:


Yes, Dmitriy, you said it. However, if we do not have external hadoop, pig 
script do not currently work. We need to fix it.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.2.patch, pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739287#action_12739287
 ] 

Dmitriy V. Ryaboy commented on PIG-909:
---

Daniel, not sure what you mean.
Do you mean that the patch makes it necessary to have an external version of 
hadoop to build/run pig?
That's not the case, as I wrapped the whole thing in an if -- external hadoop 
jars will only be used instead of the bundled hadoop.jar if HADOOP_HOME is 
defined (and valid).

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.2.patch, pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739292#action_12739292
 ] 

Daniel Dai commented on PIG-909:


Hi, Dmitriy, 
It does not related to the patch. What I mean is pig script in trunk is not 
working correctly even before patch.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.2.patch, pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739297#action_12739297
 ] 

Dmitriy V. Ryaboy commented on PIG-909:
---

Actually I looked at build.xml for pig, and it includes the Ivy dependencies in 
pig.jar

Which explains why this stuff has been working for me.

I'll delete the second patch -- that change is unnecessary.

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-909:
--

Attachment: (was: pig_909.2.patch)

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-660) Integration with Hadoop 0.20

2009-08-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-660:
--

Attachment: pig_660_shims.patch

Attached patch, pig_660_shims.patch, introduces an compatibility layer similar 
to that in https://issues.apache.org/jira/browse/HIVE-487 . HadoopShims.java 
contains wrappers that hide interface differences between Hadoop 18 and 20; 
when an interface change affects Pig, a shim is added into this class, and used 
by Pig.

Separate versions of the shims are maintained for different Hadoop versions.

This way, Pig users can compile against either Hadoop 18 or Hadoop 20 by simply 
changing an ant property, either via the -D flag, or build.properties, instead 
of having to go through the process of patching.

There has been discussion of officially moving Pig to 0.20; this way, we 
sidestep the whole question, and only need to worry about version compatibility 
when using specific Hadoop APIs.

I propose that we use this mechanism until Pig is moved to use the new, 
future-proofed API.  

Pig compiled against 18 won't be able to use some of the newest features, such 
as Zebra storage. Ant can be configured not to build ant if Hadoop version is  
20.


 Integration with Hadoop 0.20
 

 Key: PIG-660
 URL: https://issues.apache.org/jira/browse/PIG-660
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, 
 PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, 
 PIG-660_5.patch, pig_660_shims.patch


 With Hadoop 0.20, it will be possible to query the status of each map and 
 reduce in a map reduce job. This will allow better error reporting. Some of 
 the other items that could be on Hadoop's feature requests/bugs are 
 documented here for tracking.
 1. Hadoop should return objects instead of strings when exceptions are thrown
 2. The JobControl should handle all exceptions and report them appropriately. 
 For example, when the JobControl fails to launch jobs, it should handle 
 exceptions appropriately and should support APIs that query this state, i.e., 
 failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-905) TOKENIZE throws exception on null data

2009-08-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739320#action_12739320
 ] 

Jeff Zhang commented on PIG-905:


I find that TOKENIZE can not handle DataByteArray.  It can only handle string. 
I believe it should be better to handle both DataByteArray and String. In my 
opinion, whenever an UDF support one of them, it should support both of them.
Because they are almost the same except that DataByteArray is Comparable and 
Serializable.



 TOKENIZE throws exception on null data
 --

 Key: PIG-905
 URL: https://issues.apache.org/jira/browse/PIG-905
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich

 it should just return null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.