[jira] Assigned: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-832:
--

Assignee: Daniel Dai  (was: Olga Natkovich)

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721353#action_12721353
 ] 

Olga Natkovich commented on PIG-832:


As part of this fix we should also expand the default list to include piggybank 
functions

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721361#action_12721361
 ] 

Milind Bhandarkar commented on PIG-832:
---

If we include the piggybank functions in the default import list, we need to 
make sure that they are compiled and tested in the default build, and that the 
releases will be blocked due to them not compiling etc. Is that the intention ?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721373#action_12721373
 ] 

Olga Natkovich commented on PIG-832:


In response to Milind. I don't think we are committing to more support for 
piggybank. All this does is, if you do use UDFs from piggybank, you don't need 
to use full package name.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721387#action_12721387
 ] 

Olga Natkovich commented on PIG-832:


Milind, Not quite sure what you are saying. We currently don't have any way to 
pass the list in. import.list does not exist in pig as far as I know.  

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721396#action_12721396
 ] 

Milind Bhandarkar commented on PIG-832:
---

Olga, what I am saying is to have a default import list: which contains default 
UDFs (tokenize, Max, Min, flatten), followed by piggybank contribs. And the 
same list can be added to / overridden on the command-line. This has several 
advantages. Pig built-ins do not have to be reserved words, and can be 
overridden. For example, recent mails on pig-users have mentioned that 
tokenize+flatten should be a single udf. This can be done by providing a 
flatten (which is null), and tokenize, which does tokenize+flatten, and 
existing scripts will still work. This simplifies pig grammar as well. Users 
can create udf libraries, and use them with:

{code}
java -Dimport.list += `cat my-udf-lib.import`
{code}

Thoughts ?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721413#action_12721413
 ] 

Milind Bhandarkar commented on PIG-832:
---

Instead of a list, if you make it a map (i.e. short name - fully qualified 
class name), it will be much easier, as it will guarantee that each name has 
exactly one udf class associated with it. It will also allow users to use udfs 
that have class names which are pig reserved words. For example, If I have an 
existing UDF with a class name such as load or store, I can still use them with 
a different name like myload, without having to rename the class.

So, I suggest:

{code}
java -jar pig.jar 
-Dimport.list+=MyLoad:com..Load,Flatten:com..Flatten,... 
{code}

If I do not specify -Dimport.list on the pig command line, then the default 
import.list is used.

Thoughts ?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721414#action_12721414
 ] 

Olga Natkovich commented on PIG-832:


Milind,

Couple of comments and clarifications:

(1) Builtin UDFs are not reserved words. (Flatten is reserved but it is not a 
UDF) The issue we have seen is users creating UDFs that had reserved words in 
the package name and if the package name is registered as proposed in this 
JIRa, their problem will go away.
(2) I don't think we need to allow to overwrite the defaults. We are not 
planning to expand the list beyond default distribution (builtins + piggybank.) 
The plan is to hardwire this values in the code since they are not likely to 
change
(3) Our plan is to keep it simple and to just allow users to add packages based 
on what they use in their UDFs.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721423#action_12721423
 ] 

Olga Natkovich commented on PIG-832:


Also think you are suggesting UDF aliasing on command line which I am not sure 
is the right place for it. 

The scope of this work is just to make it easier for users to refer to their 
UDFs.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] Release Pig 0.3.0 (candidate 0)

2009-06-18 Thread Olga Natkovich
Hi,
 
I created a candidate build for Pig 0.3.0 release. The main feature of
this release is support for multiquery which allows to share computation
across multiple queries within the same script. We see significant
performance improvements (up to order of magnitude) as the result of
this optimization.
 
I ran the rat report and made sure that all the source files contain
proper headers. (Not attaching the report since it caused trouble with
the last release.)
 
Keys used to sign the release candidate are at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS.
 
Please, download and try the release candidate:
http://people.apache.org/~olga/pig-0.3.0-candidate-0/.
 
Please, vote by Wednesday, June 24th.
 
Olga
 


[jira] Created: (PIG-857) Pig should implement Tool interface from Hadoop

2009-06-18 Thread Milind Bhandarkar (JIRA)
Pig should implement Tool interface from Hadoop
---

 Key: PIG-857
 URL: https://issues.apache.org/jira/browse/PIG-857
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
 Environment: All
Reporter: Milind Bhandarkar


Hadoop, Hadoop Streaming, and Hadoop Pipes all use Tool interface, which 
provides support for parsing generic options. This has resulted in consistent 
options for all three hadoop launch mechanisms. Pig should also implement Tool 
(or use GenericOptionsParser directly.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721456#action_12721456
 ] 

Daniel Dai commented on PIG-832:


Hi, Milind, in the use case you mentioned, he/she can write his own PigStorage, 
put the jar in the import list. Pig will take user supplied UDF first, thus 
override the buildin PigStorage. How is this?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-734) Non-string keys in maps

2009-06-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-734:
---

Status: Open  (was: Patch Available)

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-734.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-734) Non-string keys in maps

2009-06-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-734:
---

Attachment: PIG-734_2.patch

New version of the patch, brought up to date with current trunk.

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-734.patch, PIG-734_2.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721481#action_12721481
 ] 

Olga Natkovich commented on PIG-832:


Milind, we have parameter substitution for what you are mentioning as example.

My proposal would be to keep this issue strictly for the packaging thing. This 
will already make a lot of people happy and users asked for just that.

We can discuss and understand more user requirements regarding aliases in a 
separate thread. 

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721490#action_12721490
 ] 

Milind Bhandarkar commented on PIG-832:
---

Daniel: For that to work, user's class will have to be called PigStorage. And 
also, inserting user's jars before pig jar for looking up methods can have 
major unintended consequences. pig.jar should always be the first in the 
classpath.

Olga: My use case cannot use parameter substitution, because PigMix scrips does 
not specify PigStorage as, say, $storage. The solution I proposed is as simple 
to implement as Daniel's original proposal (+= is a syntactic sugar. even = can 
be used with the same effect.), and it fixes a specific ask, and also allows 
for extensibility. Am I missing something here ?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721503#action_12721503
 ] 

Daniel Dai commented on PIG-832:


Hi, Milind,
For your first comment, yes, user's class have to be PigStorage. For your 
second comment, we do not put user's jar before pig.jar. We put their udf 
search path first. Let's say user put 
-Dudf.import.list=com.xxx.udf1:com.xxx.udf2, when we see an unknown UDF, we 
first search in the package com.xxx.udf1, then com.xxx.udf2, then 
org.apache.pig.builtin. We build this policy in our code. It's not put 
user.jar in front of pig.jar.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721506#action_12721506
 ] 

Olga Natkovich commented on PIG-832:


Milind,

Issue is not the complexity of implementation but that I am not sure we want to 
support command line aliasing and I want to discuss and understand the use 
cases for it separately. And we can parameterize PigMix if we needed to - that 
was just an example of an alternative solution for the issue you specified.

I looking for a list of requirements - not a solution.

Another comment is I don't think the solution you are proposing would work. The 
way the list is used to by prepending the package name to the function name to 
see if the function exist. It deos not do anything with function name itself.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721514#action_12721514
 ] 

Milind Bhandarkar commented on PIG-832:
---

Olga, specifying a list of packages as a path list will have the same issues as

{code}
import com.xyz.package.*;
{code}

in java, where it is considered to be a bad practice. So, in the solution that 
I have proposed, I am assuming the class name is specified on the commandline 
and not the package name.


 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721519#action_12721519
 ] 

Daniel Dai commented on PIG-832:


Hi, Milind,
If a user wrote 10 UDFs, I guess he/she does not suppose to put 10 entries in 
the command line, right?

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-852) pig -version or pig -help returns exit code of 1

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-852:
---

Fix Version/s: 0.3.0

 pig -version or pig -help returns exit code of 1
 

 Key: PIG-852
 URL: https://issues.apache.org/jira/browse/PIG-852
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: rc.patch


 {code}
 java -jar pig.jar -x local [-version|-help]
 {code}
 returns an exit code of 1 to the shell.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-819) run -param -param; is a valid grunt command

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-819:
---

Fix Version/s: 0.3.0

 run -param -param; is a valid grunt command
 ---

 Key: PIG-819
 URL: https://issues.apache.org/jira/browse/PIG-819
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
 Environment: all
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: invalidparam.patch


 By mistake, I typed 
 {code}
 run -param -param;
 {code}
 in grunt. And was surprised to find it to be  a valid grunt command.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-818) Explain doesn't handle PODemux properly

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-818:
---

Fix Version/s: 0.3.0

 Explain doesn't handle PODemux properly
 ---

 Key: PIG-818
 URL: https://issues.apache.org/jira/browse/PIG-818
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: explain.patch


 The PODemux operator has nested plans but they are not expanded in the -dot 
 version of explain.
 Also, both split and demux are displayed as clusters of nodes, but it really 
 makes more sense to just show them as multi output operators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-564:
---

Fix Version/s: 0.3.0

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Fix For: 0.3.0

 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721523#action_12721523
 ] 

Milind Bhandarkar commented on PIG-832:
---

Daniel,

Hi, Milind, If a user wrote 10 UDFs, I guess he/she does not suppose to put 
10 entries in the command line, right?

No, thats why I have a `cat myudflist` allowed on the command-line.



 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-627) PERFORMANCE: multi-query optimization

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-627:
---

Fix Version/s: 0.3.0

 PERFORMANCE: multi-query optimization
 -

 Key: PIG-627
 URL: https://issues.apache.org/jira/browse/PIG-627
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Fix For: 0.3.0

 Attachments: doc-fix.patch, error_handling_0415.patch, 
 error_handling_0416.patch, file_cmds-0305.patch, fix_store_prob.patch, 
 merge-041409.patch, merge_741727_HEAD__0324.patch, 
 merge_741727_HEAD__0324_2.patch, merge_trunk_to_branch.patch, 
 multi-store-0303.patch, multi-store-0304.patch, multiquery-phase2_0313.patch, 
 multiquery-phase2_0323.patch, multiquery-phase3_0423.patch, 
 multiquery_0223.patch, multiquery_0224.patch, multiquery_0306.patch, 
 multiquery_explain_fix.patch, non_reversible_store_load_dependencies.patch, 
 non_reversible_store_load_dependencies_2.patch, 
 noop_filter_absolute_path_flag.patch, 
 noop_filter_absolute_path_flag_0401.patch, streaming-fix.patch


 Currently, if your Pig script contains multiple stores and some shared 
 computation, Pig will execute several independent queries. For instance:
 A = load 'data' as (a, b, c);
 B = filter A by a  5;
 store B into 'output1';
 C = group B by b;
 store C into 'output2';
 This script will result in map-only job that generated output1 followed by a 
 map-reduce job that generated output2. As the resuld data is read, parsed and 
 filetered twice which is unnecessary and costly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-850) Dump produce wrong result while store into is ok

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-850:
---

Fix Version/s: (was: 0.3.0)
   0.4.0

 Dump produce wrong result while store into is ok
 --

 Key: PIG-850
 URL: https://issues.apache.org/jira/browse/PIG-850
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.4.0

 Attachments: PIG-850.patch


 The following script will wrongly produce 20 output, however, if we change 
 dump to store into, the result is correct. Not sure if the problem is only 
 for limited sort case.
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = order A by gpa parallel 2;
 C = limit B 10;
 dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-852) pig -version or pig -help returns exit code of 1

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-852:
---

Fix Version/s: (was: 0.3.0)
   0.4.0

 pig -version or pig -help returns exit code of 1
 

 Key: PIG-852
 URL: https://issues.apache.org/jira/browse/PIG-852
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.4.0

 Attachments: rc.patch


 {code}
 java -jar pig.jar -x local [-version|-help]
 {code}
 returns an exit code of 1 to the shell.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-849) Local engine loses records in splits

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-849:
---

Fix Version/s: (was: 0.3.0)
   0.4.0

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Gunther Hagleitner
 Fix For: 0.4.0

 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #92

2009-06-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/92/

--
[...truncated 93989 lines...]
 [exec] [junit] 09/06/18 22:16:02 INFO dfs.DataNode: PacketResponder 1 
for block blk_-3610909769110207607_1010 terminating
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42932 is added to 
blk_-3610909769110207607_1010 size 6
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Received block 
blk_-3610909769110207607_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:45477 is added to 
blk_-3610909769110207607_1010 size 6
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: PacketResponder 2 
for block blk_-3610909769110207607_1010 terminating
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_-3111693600154221798_1011
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Receiving block 
blk_-3111693600154221798_1011 src: /127.0.0.1:49669 dest: /127.0.0.1:42956
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Receiving block 
blk_-3111693600154221798_1011 src: /127.0.0.1:45986 dest: /127.0.0.1:54021
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Receiving block 
blk_-3111693600154221798_1011 src: /127.0.0.1:58580 dest: /127.0.0.1:45477
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Received block 
blk_-3111693600154221798_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: PacketResponder 0 
for block blk_-3111693600154221798_1011 terminating
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:45477 is added to 
blk_-3111693600154221798_1011 size 6
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Received block 
blk_-3111693600154221798_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: PacketResponder 1 
for block blk_-3111693600154221798_1011 terminating
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54021 is added to 
blk_-3111693600154221798_1011 size 6
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: Received block 
blk_-3111693600154221798_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42956 is added to 
blk_-3111693600154221798_1011 size 6
 [exec] [junit] 09/06/18 22:16:03 INFO dfs.DataNode: PacketResponder 2 
for block blk_-3111693600154221798_1011 terminating
 [exec] [junit] 09/06/18 22:16:03 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:35520
 [exec] [junit] 09/06/18 22:16:03 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:49012
 [exec] [junit] 09/06/18 22:16:03 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/18 22:16:03 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/18 22:16:04 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/18 22:16:04 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906182215_0002/job.jar. 
blk_-3197336206391371647_1012
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: Receiving block 
blk_-3197336206391371647_1012 src: /127.0.0.1:45988 dest: /127.0.0.1:54021
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: Receiving block 
blk_-3197336206391371647_1012 src: /127.0.0.1:34269 dest: /127.0.0.1:42932
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: Receiving block 
blk_-3197336206391371647_1012 src: /127.0.0.1:58583 dest: /127.0.0.1:45477
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: Received block 
blk_-3197336206391371647_1012 of size 1415240 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: PacketResponder 0 
for block blk_-3197336206391371647_1012 terminating
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: Received block 
blk_-3197336206391371647_1012 of size 1415240 from /127.0.0.1
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:45477 is added to 
blk_-3197336206391371647_1012 size 1415240
 [exec] [junit] 09/06/18 22:16:04 INFO dfs.DataNode: 

[jira] Commented: (PIG-734) Non-string keys in maps

2009-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721527#action_12721527
 ] 

Hadoop QA commented on PIG-734:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12411133/PIG-734_2.patch
  against trunk revision 785450.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 63 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/92/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/92/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/92/console

This message is automatically generated.

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-734.patch, PIG-734_2.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-781) Error reporting for failed MR jobs

2009-06-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-781:
---

Fix Version/s: 0.3.0

 Error reporting for failed MR jobs
 --

 Key: PIG-781
 URL: https://issues.apache.org/jira/browse/PIG-781
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: partial_failure.patch, partial_failure.patch, 
 partial_failure.patch, partial_failure.patch


 If we have multiple MR jobs to run and some of them fail the behavior of the 
 system is to not stop on the first failure but to keep going. That way jobs 
 that do not depend on the failed job might still succeed.
 The question is to how best report this scenario to a user. How do we tell 
 which jobs failed and which didn't?
 One way could be to tie jobs to stores and report which store locations won't 
 have data and which ones do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721528#action_12721528
 ] 

Daniel Dai commented on PIG-832:


yes, `cat myudflist` is a way to get around. However, in my humble opinion, 
this syntax is not very intuitive to the ordinary user.  Many users may have 
the impression that they have to put their UDFs one by one.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-734) Non-string keys in maps

2009-06-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-734:
---

Fix Version/s: (was: 0.3.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-734.patch, PIG-734_2.patch, PIG-734_3.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-734) Non-string keys in maps

2009-06-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-734:
---

Attachment: PIG-734_3.patch

Attaching a version of the file that fixes some of the introduced compiler 
warnings.  The findbugs warnings have to do with naming convention.  All of the 
function names in QueryParser start with upper case, so I am only following the 
convention there.

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-734.patch, PIG-734_2.patch, PIG-734_3.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721550#action_12721550
 ] 

Milind Bhandarkar commented on PIG-832:
---

Daniel,

Pig streaming already uses backquotes for executing external programs. So, 
users are familiar with this syntax. I believe an ordinary pig user already 
knows about doing such things in unix shells. But anyway, as Olga said, she is 
looking for requirements, and not solutions, so, here is a requirement:

I have two jars: xyz.jar, and abc.jar. I am using two UDFs in my scripts. I 
want to use function1 from xyz.jar, and function2 from abc.jar. How do I use 
function2 from abc.jar with full confidence that xyz.jar does not contain a UDF 
named function2? How do you propose I do that  without modifying a whole bunch 
of pig scripts that I am testing for my functions ?

In the solution that I proposed, I can just change function2 mapping by 
including -Dimport.list=function2:com.yahoo.milind.function2 on the 
command-line.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0


 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721551#action_12721551
 ] 

Olga Natkovich commented on PIG-832:


You use a fully qualified name for the other one.

I would like for us to continue on our original plan. It might not solve all 
the issues but it certainly helps and it is a very small change to the current 
implementation.

We can discuss improvements in a separate JIRA.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai

 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-753) Provide support for UDFs without parameters

2009-06-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721559#action_12721559
 ] 

Alan Gates commented on PIG-753:


+1

I tested the patch, and the issue was just with the bzip tests.

I'd like to have Santosh's opinion on this as he is the expert in the logical 
plan and type checker area where these changes are.

 Provide support for UDFs without parameters
 ---

 Key: PIG-753
 URL: https://issues.apache.org/jira/browse/PIG-753
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Jeff Zhang
 Attachments: Pig_753_Patch.txt


 Pig do not support UDF without parameters, it force me provide a parameter.
 like the following statement:
  B = FOREACH A GENERATE bagGenerator();  this will generate error. I have to 
 provide a parameter like following
  B = FOREACH A GENERATE bagGenerator($0);
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721558#action_12721558
 ] 

Olga Natkovich commented on PIG-856:


The number of replicas can be set via dfs.replication parameter in Hadoop's 
JobConf

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721562#action_12721562
 ] 

Milind Bhandarkar commented on PIG-832:
---

Olga,

As long the suggested improvements do not result in redundancy / make the 
original solutions obsolete, its fine. But I believe that the core issue, which 
is, how does pig resolve UDFs?, is not addressed properly in the small 
change to current implementation.

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai

 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-832) Make import list configurable

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721563#action_12721563
 ] 

Olga Natkovich commented on PIG-832:


I don't believe this prevents future improvements

 Make import list configurable
 -

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai

 Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-753) Provide support for UDFs without parameters

2009-06-18 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721564#action_12721564
 ] 

Santhosh Srinivasan commented on PIG-753:
-

+1 for the code changes. The license header and the unit tests that failed have 
to be checked.

 Provide support for UDFs without parameters
 ---

 Key: PIG-753
 URL: https://issues.apache.org/jira/browse/PIG-753
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Jeff Zhang
 Attachments: Pig_753_Patch.txt


 Pig do not support UDF without parameters, it force me provide a parameter.
 like the following statement:
  B = FOREACH A GENERATE bagGenerator();  this will generate error. I have to 
 provide a parameter like following
  B = FOREACH A GENERATE bagGenerator($0);
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721570#action_12721570
 ] 

Olga Natkovich commented on PIG-856:


Hi Milind, yes, these are very good points. 

I was hoping that we could set the flag for jobs that produce temparary results 
only but I did not clearly state this in the bug.

I am also considering replication of 1 as I agree it should yield much better 
performance gains. My plan is to run a test on a large query (join + order by) 
with replication factor of 1, 2, and default and see what perf differences are.


 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721574#action_12721574
 ] 

Milind Bhandarkar commented on PIG-856:
---

+1 on seeing performance differences. But, is there code in pig to determine 
that the output of a previous map-reduce stage is not accessible because of 
datanode failures (as opposed to some other reason), and repeat the map-reduce 
stage ? Because a single datanode failure with replication 1 will cause 
temporary data to be unavailable, and is  very likely for long-running queries.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721577#action_12721577
 ] 

Olga Natkovich commented on PIG-856:


If a job fails, the store connected to this job will fail as well. Pig has no 
retries beyond what hadoop provides. That's why no replication seems a little 
risky but I want to see what the perf difference is and whether it is worth the 
risk.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721581#action_12721581
 ] 

Milind Bhandarkar commented on PIG-856:
---

+1. I will file a separate Jira (if replication of 1 is decided upon) so that 
Pig retries a map-reduce stage if it fails for *external* reasons.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721590#action_12721590
 ] 

Alan Gates commented on PIG-856:


My $0.02, based on the assumption that we see a significant performance 
improvement using only 1 replica instead of 2 or 3:

In the long term we might want Pig to retry jobs if they fail for this.  But in 
the short term, I would think some users would be willing to trade reliability 
for performance and some would not, so we should let them choose.  

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721593#action_12721593
 ] 

Olga Natkovich commented on PIG-856:


Yes, I agree - we should let users choose, I was thinking perhaps even for 
their final output.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721592#action_12721592
 ] 

Santhosh Srinivasan commented on PIG-856:
-

Would that be through a configuration parameter? What would be the default 1 or 
2 ?

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721594#action_12721594
 ] 

Milind Bhandarkar commented on PIG-856:
---

+1 to both Alan and Olga. Default should still be hadoop's default 
dfs.replication.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721596#action_12721596
 ] 

Santhosh Srinivasan commented on PIG-856:
-

Essentially, are we adding more knobs to tune Pig? We should document these 
knobs and explain how they interact with each other.

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #93

2009-06-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/93/

--
[...truncated 94294 lines...]
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:49808 is added to 
blk_2088255647506135164_1011 size 6
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: Received block 
blk_2088255647506135164_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: PacketResponder 1 
for block blk_2088255647506135164_1011 terminating
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54748 is added to 
blk_2088255647506135164_1011 size 6
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: Received block 
blk_2088255647506135164_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: PacketResponder 2 
for block blk_2088255647506135164_1011 terminating
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40871 is added to 
blk_2088255647506135164_1011 size 6
 [exec] [junit] 09/06/19 01:06:15 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:55595
 [exec] [junit] 09/06/19 01:06:15 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:38969
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: Deleting block 
blk_2919053229063530843_1005 file dfs/data/data2/current/blk_2919053229063530843
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: Deleting block 
blk_6688640043981499581_1006 file dfs/data/data1/current/blk_6688640043981499581
 [exec] [junit] 09/06/19 01:06:15 INFO dfs.DataNode: Deleting block 
blk_6773019531096958866_1004 file dfs/data/data1/current/blk_6773019531096958866
 [exec] [junit] 09/06/19 01:06:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/19 01:06:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:40871 to delete  blk_6773019531096958866_1004 
blk_6688640043981499581_1006
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:53215 to delete  blk_2919053229063530843_1005 
blk_6688640043981499581_1006
 [exec] [junit] 09/06/19 01:06:16 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/19 01:06:16 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906190105_0002/job.jar. 
blk_-557104969554073193_1012
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.DataNode: Receiving block 
blk_-557104969554073193_1012 src: /127.0.0.1:48141 dest: /127.0.0.1:40871
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.DataNode: Receiving block 
blk_-557104969554073193_1012 src: /127.0.0.1:60050 dest: /127.0.0.1:53215
 [exec] [junit] 09/06/19 01:06:16 INFO dfs.DataNode: Receiving block 
blk_-557104969554073193_1012 src: /127.0.0.1:49230 dest: /127.0.0.1:54748
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54748 is added to 
blk_-557104969554073193_1012 size 1415279
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: Received block 
blk_-557104969554073193_1012 of size 1415279 from /127.0.0.1
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: PacketResponder 0 
for block blk_-557104969554073193_1012 terminating
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: Received block 
blk_-557104969554073193_1012 of size 1415279 from /127.0.0.1
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53215 is added to 
blk_-557104969554073193_1012 size 1415279
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: PacketResponder 1 
for block blk_-557104969554073193_1012 terminating
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: Received block 
blk_-557104969554073193_1012 of size 1415279 from /127.0.0.1
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40871 is added to 
blk_-557104969554073193_1012 size 1415279
 [exec] [junit] 09/06/19 01:06:17 INFO dfs.DataNode: PacketResponder 2 
for block blk_-557104969554073193_1012 terminating
 [exec] [junit] 09/06/19 01:06:17 INFO fs.FSNamesystem: Increasing 
replication for file 

[jira] Commented: (PIG-734) Non-string keys in maps

2009-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721598#action_12721598
 ] 

Hadoop QA commented on PIG-734:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12411160/PIG-734_3.patch
  against trunk revision 785450.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 63 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/93/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/93/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/93/console

This message is automatically generated.

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-734.patch, PIG-734_2.patch, PIG-734_3.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas

2009-06-18 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721599#action_12721599
 ] 

Milind Bhandarkar commented on PIG-856:
---

+1 to Sathosh to documenting Knobs. Better to add and document knobs rather 
than modify language like this:

{code}
%TempReplicate 2
store A into PigStorage('\t') with replication 2;
{code}

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.