date:20130315


[ 
https://issues.apache.org/jira/browse/PIG-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603540#comment-13603540
 ] 

Daniel Dai commented on PIG-3249:
-

We can change to use java -cp pig.jar org.apache.hadoop.util.VersionInfo to 
get Hadoop version.

 Pig startup script prints out a wrong version of hadoop when using fat jar
 --

 Key: PIG-3249
 URL: https://issues.apache.org/jira/browse/PIG-3249
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Prashant Kommireddi
  Labels: newbie
 Fix For: 0.12


 Script suggests 0.20.2 is used with the bundled jar but we are using 1.0 at 
 the moment.
 {code}
 # fall back to use fat pig.jar
 if [ $debug == true ]; then
 echo Cannot find local hadoop installation, using bundled hadoop 
 20.2
 fi
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3205) Passing arguments to python script does not work with -f option


[ 
https://issues.apache.org/jira/browse/PIG-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603560#comment-13603560
 ] 

Cheolsoo Park commented on PIG-3205:


+1.

Do you mind deleting the following line if it's not necessary when you commit?
{code}
+System.out.println(---);
{code}

 Passing arguments to python script does not work with -f option
 ---

 Key: PIG-3205
 URL: https://issues.apache.org/jira/browse/PIG-3205
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3205.patch


 With pig sample.py arg1 arg2, arguments can be accessed in the embedded 
 python script using sys.argv[]. But not in the case pig -f sample.py arg1 
 arg2. 
 In case of ExecMode.FILE, we don't set PigContext.PIG_CMD_ARGS_REMAINDERS and 
 so the arguments are not passed to JythonScriptEngine or GroovyScriptEngine. 
 This is specially a problem with Oozie as it always uses -f option to specify 
 the pig script.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3205) Passing arguments to python script does not work with -f option

2013-03-15 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603563#comment-13603563
 ] 

Rohini Palaniswamy commented on PIG-3205:
-

Ah. Sure. Thanks for catching it. Left over from some debug statements.

 Passing arguments to python script does not work with -f option
 ---

 Key: PIG-3205
 URL: https://issues.apache.org/jira/browse/PIG-3205
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3205.patch


 With pig sample.py arg1 arg2, arguments can be accessed in the embedded 
 python script using sys.argv[]. But not in the case pig -f sample.py arg1 
 arg2. 
 In case of ExecMode.FILE, we don't set PigContext.PIG_CMD_ARGS_REMAINDERS and 
 so the arguments are not passed to JythonScriptEngine or GroovyScriptEngine. 
 This is specially a problem with Oozie as it always uses -f option to specify 
 the pig script.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2597) Move grunt from javacc to ANTRL

2013-03-15 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603583#comment-13603583
 ] 

Koji Noguchi commented on PIG-2597:
---

bq. Jonathan, any update on this?
I'm interested in this status as well.  
Does Boski have a plan to continue working on this?


 Move grunt from javacc to ANTRL
 ---

 Key: PIG-2597
 URL: https://issues.apache.org/jira/browse/PIG-2597
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
  Labels: GSoC2012
 Attachments: pig02.diff


 Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The 
 parser is very difficult to work with, and next to impossible to understand 
 or modify. ANTLR provides a much cleaner, more standard way to generate 
 parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we 
 continue to add features to Pig.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: LoadFunc and LoadMetadata

getPartitionKeys should be called by default. Did you use AS clause
in load statement? That could add a foreach between Load and Filter,
and getPartitionKeys will only be invoked if filter is right after
load. Do an explain to check for it.

Thanks,
Daniel

On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan quaintena...@gmail.com wrote:
 Hi all,

 For CustomLoader (a class I'm implementing) which extends LoadFunct,
 implemented LoadMetadata, the getPartitionKeys function is supposed
 to be called by PartitionFilterOptimizer, right? I put some debug
 statements in getPartitionKeys, but this function doesn't seem like
 it's ever called.

 I've read through some Pig source, optimization rules can be disabled
 by properties, but by default the PartitionFilterOptimizer should be
 enabled. Also, in PartitionFilterOptimizer, I saw checks to saw some
 other checks, like the Filter operator cannot have another dependency
 other than load, which is true in my case. Anyway, can someone shed
 some light on this? Am I understanding this interface incorrectly?

 My script is very simple (line 1 is load, line 2 is filter, and line 3
 is store), so the Logical Plan should be very simple. Also, I'm
 testing this in Pig local mode, not sure if that matters.

 Greatly appreciate any hints!

[jira] [Commented] (PIG-3223) AvroStorage does not handle comma separated input paths

2013-03-15 Thread Michael Kramer (JIRA)

[
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603652#comment-13603652
]

Michael Kramer commented on PIG-3223:
-

[~cheolsoo], thanks for getting back to me so quickly!

We're using variable substitution and input path generation via Oozie
Coordinator. We include the hdfs://namenode:8020 at the beginning of our path
templates, which I think is pretty standard (e.g. something like
uri-template$\{nameNode\}/data//uri-template ) When Oozie constructs input
paths to be passed to the pig script or map reduce job, it enumerates the paths
via a comma separated list, something like
hdfs://namenode:8020/data/1,hdfs://namenode:8020/data/2. This is how we
figured out AvroStorage was breaking in the first place.

A good coordinator/workflow example that is indicative of the types of
workflows we're running can be found in the Oozie source examples:
https://github.com/apache/oozie/blob/trunk/examples/src/main/apps/aggregator/coordinator.xml

AvroStorage does not handle comma separated input paths
---

Key: PIG-3223
URL: https://issues.apache.org/jira/browse/PIG-3223
Project: Pig
Issue Type: Bug
Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Michael Kramer
Assignee: Johnny Zhang
Attachments: AvroStorage.patch, AvroStorage.patch-2,
AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt

In pig 0.11, a patch was issued to AvroStorage to support globs and comma
separated input paths (PIG-2492). While this function works fine for
glob-formatted input paths, it fails when issued a standard comma separated
list of paths. fs.globStatus does not seem to be able to parse out such a
list, and a java.net.URISyntaxException is thrown when toURI is called on the
path.
I have a working fix for this, but it's extremely ugly (basically checking if
the string of input paths is globbed, otherwise splitting on ,). I'm sure
there's a more elegant solution. I'd be happy to post the relevant methods
and fixes if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: LoadFunc and LoadMetadata

2013-03-15 Thread Jeff Yuan

Yes, I do use AS in the load statement. I thought Filters are always
pushed as close to the Load operators as possible? What kind of
Foreach is added?

Thanks,
Jeff

On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai da...@hortonworks.com wrote:
 getPartitionKeys should be called by default. Did you use AS clause
 in load statement? That could add a foreach between Load and Filter,
 and getPartitionKeys will only be invoked if filter is right after
 load. Do an explain to check for it.

 Thanks,
 Daniel

 On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan quaintena...@gmail.com wrote:
 Hi all,

 For CustomLoader (a class I'm implementing) which extends LoadFunct,
 implemented LoadMetadata, the getPartitionKeys function is supposed
 to be called by PartitionFilterOptimizer, right? I put some debug
 statements in getPartitionKeys, but this function doesn't seem like
 it's ever called.

 I've read through some Pig source, optimization rules can be disabled
 by properties, but by default the PartitionFilterOptimizer should be
 enabled. Also, in PartitionFilterOptimizer, I saw checks to saw some
 other checks, like the Filter operator cannot have another dependency
 other than load, which is true in my case. Anyway, can someone shed
 some light on this? Am I understanding this interface incorrectly?

 My script is very simple (line 1 is load, line 2 is filter, and line 3
 is store), so the Logical Plan should be very simple. Also, I'm
 testing this in Pig local mode, not sure if that matters.

 Greatly appreciate any hints!

[jira] [Assigned] (PIG-2630) Issue with setting b = a;

2013-03-15 Thread Johnny Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang reassigned PIG-2630:
-

Assignee: (was: Johnny Zhang)

 Issue with setting b = a;
 ---

 Key: PIG-2630
 URL: https://issues.apache.org/jira/browse/PIG-2630
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11
Reporter: Jonathan Coveney
 Fix For: 0.12


 The following gives an error:
 {code}
 a = load 'thing' as (x:int);
 b = a; c = join a by x, b by x;
 {code}
 Error:
 {code}
 2012-04-03 14:02:47,434 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1200: Pig script failed to parse: 
 line 14, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
 {code}
 No issue with the following, however
 {code}
 a = load 'thing' as (x:int);
 b = foreach a generate *;
 c = join a by x, b by x;
 {code}
 oh and here is the log:
 {code}
 $ cat pig_1333487146863.log
 Pig Stack Trace
 ---
 ERROR 1200: Pig script failed to parse: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
 Failed to parse: Pig script failed to parse: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
   at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1539)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
   at org.apache.pig.Main.run(Main.java:535)
   at org.apache.pig.Main.main(Main.java:153)
 Caused by: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
   at 
 org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11441)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1491)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
   ... 10 more
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Contribute to PIG-3225

2013-03-15 Thread Russell Jurney

Can we have a GsoC entry to antlrize grunt? Who can mentor it?

Russell Jurney http://datasyndrome.com

On Mar 15, 2013, at 11:34 AM, Daniel Dai da...@hortonworks.com wrote:

 GSoC 2013 wiki is not up yet. You can find some information from last
 year's wiki: https://cwiki.apache.org/confluence/display/PIG/GSoc2012.

 Thanks,
 Daniel

 On Mon, Mar 11, 2013 at 6:38 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
 + Gianmarco


 On Mon, Mar 11, 2013 at 11:20 AM, Sadari Jayawardena 
 sjayawardena...@gmail.com wrote:

 I am a final year undergraduate in Computer Science  Engineering. I have a
 good experience in Java programming and interested in mathematics and
 statistics. I would like to contribute to this project through GSoC 2013. (
 https://issues.apache.org/jira/browse/PIG-3225)

 I went through the Wikipedia link provided. Could I be provided with
 additional references and study materials?


 Thanks in advance
 --
 Sadari Jayawardena

 Undergraduate
 Department of Computer Science  Engineering
 University of Moratuwa

[jira] [Updated] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2


 [ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3194:
-

Attachment: PIG-3194_2.patch

Uploading a new patch with Dmitriy's feedback incorporated.

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg
Assignee: Prashant Kommireddi
 Fix For: 0.11.1

 Attachments: PIG-3194_2.patch, PIG-3194.patch


 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3244) Make PIG_HOME configurable


 [ 
https://issues.apache.org/jira/browse/PIG-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3244:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Looks good. Committed to trunk.

 Make PIG_HOME configurable
 --

 Key: PIG-3244
 URL: https://issues.apache.org/jira/browse/PIG-3244
 Project: Pig
  Issue Type: Improvement
Reporter: Robert Schooley
Priority: Minor
 Attachments: make-pig-home-configurable.patch


 It looks like the pig shell script in v0.11 exports PIG_HOME without first 
 checking to see if it already exists.
 from line 78 in path/bin/pig:
 \# the root of the Pig installation
 export PIG_HOME=`dirname $this`/..
 The supplied patch checks to see if the env has already been set prior to 
 setting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3244) Make PIG_HOME configurable


 [ 
https://issues.apache.org/jira/browse/PIG-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3244:


Assignee: Robert Schooley

 Make PIG_HOME configurable
 --

 Key: PIG-3244
 URL: https://issues.apache.org/jira/browse/PIG-3244
 Project: Pig
  Issue Type: Improvement
Reporter: Robert Schooley
Assignee: Robert Schooley
Priority: Minor
 Attachments: make-pig-home-configurable.patch


 It looks like the pig shell script in v0.11 exports PIG_HOME without first 
 checking to see if it already exists.
 from line 78 in path/bin/pig:
 \# the root of the Pig installation
 export PIG_HOME=`dirname $this`/..
 The supplied patch checks to see if the env has already been set prior to 
 setting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: LoadFunc and LoadMetadata

Yes, in theory filter should pushed above foreach. I don't know what
happen, the easiest way is do an explain and let's check the plan.

Daniel

On Fri, Mar 15, 2013 at 11:32 AM, Jeff Yuan quaintena...@gmail.com wrote:
 Yes, I do use AS in the load statement. I thought Filters are always
 pushed as close to the Load operators as possible? What kind of
 Foreach is added?

 Thanks,
 Jeff

 On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai da...@hortonworks.com wrote:
 getPartitionKeys should be called by default. Did you use AS clause
 in load statement? That could add a foreach between Load and Filter,
 and getPartitionKeys will only be invoked if filter is right after
 load. Do an explain to check for it.

 Thanks,
 Daniel

 On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan quaintena...@gmail.com wrote:
 Hi all,

 For CustomLoader (a class I'm implementing) which extends LoadFunct,
 implemented LoadMetadata, the getPartitionKeys function is supposed
 to be called by PartitionFilterOptimizer, right? I put some debug
 statements in getPartitionKeys, but this function doesn't seem like
 it's ever called.

 I've read through some Pig source, optimization rules can be disabled
 by properties, but by default the PartitionFilterOptimizer should be
 enabled. Also, in PartitionFilterOptimizer, I saw checks to saw some
 other checks, like the Filter operator cannot have another dependency
 other than load, which is true in my case. Anyway, can someone shed
 some light on this? Am I understanding this interface incorrectly?

 My script is very simple (line 1 is load, line 2 is filter, and line 3
 is store), so the Logical Plan should be very simple. Also, I'm
 testing this in Pig local mode, not sure if that matters.

 Greatly appreciate any hints!

[jira] [Commented] (PIG-3249) Pig startup script prints out a wrong version of hadoop when using fat jar


[ 
https://issues.apache.org/jira/browse/PIG-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603946#comment-13603946
 ] 

Prashant Kommireddi commented on PIG-3249:
--

Thanks Daniel, that's a nice approach. 

{code}
# fall back to use fat pig.jar
if [ -f $PIG_HOME/pig.jar ]; then
PIG_JAR=$PIG_HOME/pig.jar
else
PIG_JAR=`echo $PIG_HOME/pig-?.!(*withouthadoop).jar`
fi

if [ -n $PIG_JAR ]; then
CLASSPATH=${CLASSPATH}:$PIG_JAR
else
echo Cannot locate pig.jar. do 'ant jar', and try again
exit 1
fi

if [ $debug == true ]; then
echo Cannot find local hadoop installation, using bundled `java -cp 
$PIG_JAR org.apache.hadoop.util.VersionInfo | head -1`
fi
{code}

Please note I have placed the debug statement below the code that looks for pig 
jar. It makes sense that the debug statements execute only after pig jar is 
found. Do you agree? I will upload the patch shortly.

 Pig startup script prints out a wrong version of hadoop when using fat jar
 --

 Key: PIG-3249
 URL: https://issues.apache.org/jira/browse/PIG-3249
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Prashant Kommireddi
  Labels: newbie
 Fix For: 0.12


 Script suggests 0.20.2 is used with the bundled jar but we are using 1.0 at 
 the moment.
 {code}
 # fall back to use fat pig.jar
 if [ $debug == true ]; then
 echo Cannot find local hadoop installation, using bundled hadoop 
 20.2
 fi
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2

2013-03-15 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603984#comment-13603984
 ] 

Dmitriy V. Ryaboy commented on PIG-3194:


+1

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg
Assignee: Prashant Kommireddi
 Fix For: 0.11.1

 Attachments: PIG-3194_2.patch, PIG-3194.patch


 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-3249) Pig startup script prints out a wrong version of hadoop when using fat jar


 [ 
https://issues.apache.org/jira/browse/PIG-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-3249:


Assignee: Prashant Kommireddi

 Pig startup script prints out a wrong version of hadoop when using fat jar
 --

 Key: PIG-3249
 URL: https://issues.apache.org/jira/browse/PIG-3249
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-3249.patch


 Script suggests 0.20.2 is used with the bundled jar but we are using 1.0 at 
 the moment.
 {code}
 # fall back to use fat pig.jar
 if [ $debug == true ]; then
 echo Cannot find local hadoop installation, using bundled hadoop 
 20.2
 fi
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3249) Pig startup script prints out a wrong version of hadoop when using fat jar


 [ 
https://issues.apache.org/jira/browse/PIG-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3249:
-

Attachment: PIG-3249.patch

 Pig startup script prints out a wrong version of hadoop when using fat jar
 --

 Key: PIG-3249
 URL: https://issues.apache.org/jira/browse/PIG-3249
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Prashant Kommireddi
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-3249.patch


 Script suggests 0.20.2 is used with the bundled jar but we are using 1.0 at 
 the moment.
 {code}
 # fall back to use fat pig.jar
 if [ $debug == true ]; then
 echo Cannot find local hadoop installation, using bundled hadoop 
 20.2
 fi
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Are we ready for 0.11.1 release?

2013-03-15 Thread Dmitriy Ryaboy

I think all the critical patches we discussed as required for 0.11.1 have
gone in -- is there anything else people want to finish up, or can we roll
this?  Current change log:

Release 0.11.1 (unreleased)

INCOMPATIBLE CHANGES

IMPROVEMENTS

PIG-2988: start deploying pigunit maven artifact part of Pig release
process (njw45 via rohini)

PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra
option to gc() before spilling large bag. (knoguchi via rohini)

PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)

PIG-3202: CUBE operator not documented in user docs (prasanth_j via
billgraham)

OPTIMIZATIONS

BUG FIXES

PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop
0.20.2 (prkommireddi via dvryaboy)

PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)

PIG-3144: Erroneous map entry alias resolution leading to Duplicate schema
alias errors (jcoveney via cheolsoo)

PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
Proactive Spill (kadeng via dvryaboy)

PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase
(rohini)

Re: pig 0.11 candidate 2 feedback: Several problems

2013-03-15 Thread Prashant Kommireddi

Looks like all outstanding 0.11.1 critical bugs are fixed. Time for an
RC? Please let me know if I can help.

On Fri, Mar 8, 2013 at 3:51 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Looks like Lohit found a critical bug we should fix for 11.1:
 https://issues.apache.org/jira/browse/PIG-3241(only observed in hadoop
 2.0)

 D


 On Wed, Mar 6, 2013 at 12:57 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:

  Dmitriy, are the gc fixes all in for 0.11.1? PIG-3148 and PIG-3212 are
 the
  2 JIRAs I know were fixed, any others?
 
  I have a patch up for 3194, I think we should be good for a release once
  that makes it in.
 
  -Prashant
 
  On Sat, Mar 2, 2013 at 11:16 AM, Prashant Kommireddi 
 prash1...@gmail.com
  wrote:
 
   Great.
  
   I have commented regarding a possible approach for PIG-3194
   http://goo.gl/UQ3zs. Please take a look when you folks have a chance.
  
  
   On Fri, Mar 1, 2013 at 7:00 PM, Dmitriy Ryaboy dvrya...@gmail.com
  wrote:
  
   I'd like to get the gc fix in as well, but looks like Rohini is about
 to
   commit it so we are good there.
  
   On Mar 1, 2013, at 11:33 AM, Bill Graham billgra...@gmail.com
 wrote:
  
+1 to releasing Pig 0.11.1 when this is addressed. I should be able
 to
   help
with the release again.
   
   
   
On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi 
   prash1...@gmail.comwrote:
   
Hey Guys,
   
I wanted to start a conversation on this again. If Kai is not
 looking
   at
PIG-3194 I can start working on it to get 0.11 compatible with
 20.2.
  If
everyone agrees, we should roll out 0.11.1 sooner than usual and I
volunteer to help with it in anyway possible.
   
Any objections to getting 0.11.1 out soon after 3194 is fixed?
   
-Prashant
   
On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney 
   russell.jur...@gmail.com
wrote:
   
I stand corrected. Cool, 0.11 is good!
   
   
On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho 
   jar...@apache.org
wrote:
   
Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than
  to
0.20.
   
Jarcec
   
On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
I agree -- this is a good release. The bugs Kai pointed out
 should
   be
fixed, but as they are not critical regressions, we can fix them
  in
0.11.1
(if someone wants to roll 0.11.1 the minute these fixes are
committed,
I
won't mind and will dutifully vote for the release).
   
I think the Hadoop 20.2 incompatibility is unfortunate but iirc
  this
is
fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
20.2?)
   
FWIW Twitter's running CDH3 and this release works in our
environment.
   
At this point things that block a release are critical
 regressions
   in
performance or correctness.
   
D
   
   
On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates 
  ga...@hortonworks.com
   
wrote:
   
No.  Bugs like these are supposed to be found and fixed after
 we
branch
from trunk (which happened several months ago in the case of
  0.11).
The
point of RCs are to check that it's a good build, licenses are
right,
etc.
Any bugs found this late in the game have to be seen as
 failures
of
earlier testing.
   
Alan.
   
On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
   
Isn't the point of an RC to find and fix bugs like these
   
   
On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham 
billgra...@gmail.com
wrote:
   
Regarding Pig 11 rc2, I propose we continue with the current
vote
as is
(which closes today EOD). Patches for 0.20.2 issues can be
rolled
into a
Pig 0.11.1 release whenever they're available and tested.
   
   
   
On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich 
onatkov...@yahoo.com
wrote:
   
I agree that supporting as much as we can is a good goal.
 The
issue is
who
is going to be testing against all these versions? We found
  the
issues
under discussion because of a customer report, not because
 we
consistently
test against all versions. Perhaps when we decide which
versions
to
support
for next release we need also to agree who is going to be
testing
and
maintaining compatibility with a particular version.
   
For instance since Hadoop 23 compatibility is important for
 us
at
Yahoo
we
have been maintaining compatibility with this version for
 0.9,
0.10 and
will do the same for 0.11 and going forward. I think we
 would
need
others
to step in and claim the versions of their interest.
   
Olga
   
   

From: Kai Londenberg kai.londenb...@googlemail.com
To: dev@pig.apache.org
Sent: Wednesday, February 20, 2013 1:51 AM
Subject: Re: pig 0.11 candidate 2 feedback: Several problems
   
Hi,
   
I stronly agree with Jonathan here. If

[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2


[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604023#comment-13604023
 ] 

Prashant Kommireddi commented on PIG-3194:
--

Thanks for review/commit, Dmitriy.

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg
Assignee: Prashant Kommireddi
 Fix For: 0.12, 0.11.1

 Attachments: PIG-3194_2.patch, PIG-3194.patch


 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2


[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604037#comment-13604037
 ] 

Prashant Kommireddi commented on PIG-3194:
--

Kai, can you confirm 11.1 works for you?

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg
Assignee: Prashant Kommireddi
 Fix For: 0.12, 0.11.1

 Attachments: PIG-3194_2.patch, PIG-3194.patch


 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Contribute to PIG-3225

That's PIG-2597. I am not sure about the status of it, if it is not
done, we can continue it this year.

Daniel

On Fri, Mar 15, 2013 at 11:52 AM, Russell Jurney
russell.jur...@gmail.com wrote:
 Can we have a GsoC entry to antlrize grunt? Who can mentor it?

 Russell Jurney http://datasyndrome.com

 On Mar 15, 2013, at 11:34 AM, Daniel Dai da...@hortonworks.com wrote:

 GSoC 2013 wiki is not up yet. You can find some information from last
 year's wiki: https://cwiki.apache.org/confluence/display/PIG/GSoc2012.

 Thanks,
 Daniel

 On Mon, Mar 11, 2013 at 6:38 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
 + Gianmarco


 On Mon, Mar 11, 2013 at 11:20 AM, Sadari Jayawardena 
 sjayawardena...@gmail.com wrote:

 I am a final year undergraduate in Computer Science  Engineering. I have a
 good experience in Java programming and interested in mathematics and
 statistics. I would like to contribute to this project through GSoC 2013. (
 https://issues.apache.org/jira/browse/PIG-3225)

 I went through the Wikipedia link provided. Could I be provided with
 additional references and study materials?


 Thanks in advance
 --
 Sadari Jayawardena

 Undergraduate
 Department of Computer Science  Engineering
 University of Moratuwa

Re: Are we ready for 0.11.1 release?

Can I put PIG-3132 in?

Thanks,
Daniel

On Fri, Mar 15, 2013 at 5:55 PM, Julien Le Dem jul...@twitter.com wrote:
 +1 for a new release

 On Friday, March 15, 2013, Dmitriy Ryaboy wrote:

 I think all the critical patches we discussed as required for 0.11.1 have
 gone in -- is there anything else people want to finish up, or can we roll
 this?  Current change log:

 Release 0.11.1 (unreleased)

 INCOMPATIBLE CHANGES

 IMPROVEMENTS

 PIG-2988: start deploying pigunit maven artifact part of Pig release
 process (njw45 via rohini)

 PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra
 option to gc() before spilling large bag. (knoguchi via rohini)

 PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)

 PIG-3202: CUBE operator not documented in user docs (prasanth_j via
 billgraham)

 OPTIMIZATIONS

 BUG FIXES

 PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop
 0.20.2 (prkommireddi via dvryaboy)

 PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)

 PIG-3144: Erroneous map entry alias resolution leading to Duplicate schema
 alias errors (jcoveney via cheolsoo)

 PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
 Proactive Spill (kadeng via dvryaboy)

 PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase
 (rohini)

[jira] [Commented] (PIG-3181) MultiStorage - java.lang.OutOfMemoryError: Java heap space

2013-03-15 Thread Johnny Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604062#comment-13604062
 ] 

Johnny Zhang commented on PIG-3181:
---

Hi, do you mean below right?
{noformat}
a = load '/input' as (f1, f2); 
a = group a by f1; 
logs = foreach a
{ generate group, a.f2; }
store logs into '/output/' using 
org.apache.pig.piggybank.storage.MultiStorage('/output/', '0');
{noformat}

can you share how large is your /input file? so that I can try to reproduce it.

 MultiStorage - java.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-3181
 URL: https://issues.apache.org/jira/browse/PIG-3181
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Fabian Alenius

 Hi, I have a script that looks like this:
 a = load '/input' as (f1, f2);
   
 a = group a by f1;
   

 a = foreach logs {
   
 
generate group, a.f2;  
   
   
 }
 store logs into '/output/' using 
 org.apache.pig.piggybank.storage.MultiStorage('/output/', '0');
 But for some reason it fails with:
 FATAL org.apache.hadoop.mapred.Child: Error running child : 
 java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2786)
   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
   at java.io.OutputStream.write(OutputStream.java:58)
   at org.apache.pig.impl.util.StorageUtil.putField(StorageUtil.java:145)
   at org.apache.pig.impl.util.StorageUtil.putField(StorageUtil.java:176)
   at org.apache.pig.impl.util.StorageUtil.putField(StorageUtil.java:194)
   at 
 org.apache.pig.piggybank.storage.MultiStorage$MultiStorageOutputFormat$1.write(MultiStorage.java:208)
   at 
 org.apache.pig.piggybank.storage.MultiStorage$MultiStorageOutputFormat$1.write(MultiStorage.java:187)
   at 
 org.apache.pig.piggybank.storage.MultiStorage.putNext(MultiStorage.java:138)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
   at 
 org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:537)
   at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:88)
   at 
 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:463)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:428)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:408)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:262)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 *stderr logs*
 java.lang.RuntimeException: InternalCachedBag.spill() should not be called
   at 
 org.apache.pig.data.InternalCachedBag.spill(InternalCachedBag.java:159)
   at 
 org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:243)
   at 
 sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
   at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
   at

Re: Are we ready for 0.11.1 release?

2013-03-15 Thread Julien Le Dem

+1 for a new release

Julien

On Mar 15, 2013, at 17:08, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 I think all the critical patches we discussed as required for 0.11.1 have
 gone in -- is there anything else people want to finish up, or can we roll
 this?  Current change log:
 
 Release 0.11.1 (unreleased)
 
 INCOMPATIBLE CHANGES
 
 IMPROVEMENTS
 
 PIG-2988: start deploying pigunit maven artifact part of Pig release
 process (njw45 via rohini)
 
 PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra
 option to gc() before spilling large bag. (knoguchi via rohini)
 
 PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)
 
 PIG-3202: CUBE operator not documented in user docs (prasanth_j via
 billgraham)
 
 OPTIMIZATIONS
 
 BUG FIXES
 
 PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop
 0.20.2 (prkommireddi via dvryaboy)
 
 PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
 
 PIG-3144: Erroneous map entry alias resolution leading to Duplicate schema
 alias errors (jcoveney via cheolsoo)
 
 PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
 Proactive Spill (kadeng via dvryaboy)
 
 PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase
 (rohini)

Re: Welcome our new PMC chair, Julien Le Dem

2013-03-15 Thread Julien Le Dem

Thank you all !

Julien

On Mar 10, 2013, at 21:31, Xuefu Zhang xzh...@inadco.com wrote:

 Congrats!!!
 
 --Xuefu
 
 On Sun, Mar 10, 2013 at 9:00 PM, Jarek Jarcec Cecho jar...@apache.orgwrote:
 
 Congratulations sir!
 
 Jarcec
 
 On Sun, Mar 10, 2013 at 08:55:55PM -0700, Aniket Mokashi wrote:
 Congrats Julien!
 
 
 On Sun, Mar 10, 2013 at 8:54 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:
 
 Congrats!
 
 Russell Jurney http://datasyndrome.com
 
 On Mar 10, 2013, at 8:53 PM, Daniel Dai da...@hortonworks.com wrote:
 
 It is a bit late, Apache board approved the nomination of Julien Le
 Dem as our Pig PMC Chair last month. Welcome Julien!
 
 Thanks,
 Daniel
 
 
 
 --
 ...:::Aniket:::... Quetzalco@tl

[jira] Subscription: PIG patch available

2013-03-15 Thread jira

Issue Subscription
Filter: PIG patch available (33 issues)

Subscriber: pigdaily

Key Summary
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a 
specified length of characters and inserts another set of characters at a 
specified starting point.
https://issues.apache.org/jira/browse/PIG-3238
PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set 
value (a string containing substrings separated by , characters) consisting 
of the strings that have the corresponding bit in the first argument
https://issues.apache.org/jira/browse/PIG-3237
PIG-3235Enable DEBUG log messages in unit tests by default
https://issues.apache.org/jira/browse/PIG-3235
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3208[zebra] TFile should not set io.compression.codec.lzo.buffersize
https://issues.apache.org/jira/browse/PIG-3208
PIG-3205Passing arguments to python script does not work with -f option
https://issues.apache.org/jira/browse/PIG-3205
PIG-3198Let users use any function from PigType - PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text 
tokenization
https://issues.apache.org/jira/browse/PIG-3190
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3172Partition filter push down does not happen when there is a non 
partition key map column filter
https://issues.apache.org/jira/browse/PIG-3172
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3141Giving CSVExcelStorage an option to handle header rows
https://issues.apache.org/jira/browse/PIG-3141
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir
https://issues.apache.org/jira/browse/PIG-2591
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384

[jira] [Commented] (PIG-3249) Pig startup script prints out a wrong version of hadoop when using fat jar


[ 
https://issues.apache.org/jira/browse/PIG-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604064#comment-13604064
 ] 

Daniel Dai commented on PIG-3249:
-

Sure, can't agree more.

 Pig startup script prints out a wrong version of hadoop when using fat jar
 --

 Key: PIG-3249
 URL: https://issues.apache.org/jira/browse/PIG-3249
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-3249.patch


 Script suggests 0.20.2 is used with the bundled jar but we are using 1.0 at 
 the moment.
 {code}
 # fall back to use fat pig.jar
 if [ $debug == true ]; then
 echo Cannot find local hadoop installation, using bundled hadoop 
 20.2
 fi
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3243) Documentation error


[ 
https://issues.apache.org/jira/browse/PIG-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604066#comment-13604066
 ] 

Daniel Dai commented on PIG-3243:
-

Would you like to create a patch? Change the 
src/docs/src/documentation/content/xdocs/udf.xml will do it.

 Documentation error
 ---

 Key: PIG-3243
 URL: https://issues.apache.org/jira/browse/PIG-3243
 Project: Pig
  Issue Type: Bug
Reporter: Tolga Konik
Priority: Trivial
   Original Estimate: 1h
  Remaining Estimate: 1h

 Error in documentation on web related to python udf usage:
 The document mentions JYTHON_PATH but it should be JYTHONPATH. Seasoned 
 jython users will easily figure this out but for cpython users who are new to 
 jython, 
 this error can easily be a show stopper.
 I observed this in 11.0 but it may be occuring in earlier versions.
 REFERENCE:
 http://pig.apache.org/docs/r0.11.0/udf.html#python-advanced
 Advanced Topics
 Importing Modules
 You can import Python modules in your Python script. Pig resolves Python 
 dependencies recursively, which means Pig will automatically ship all 
 dependent Python modules to the backend. Python modules should be found in 
 the jython search path: JYTHON_HOME, JYTHON_PATH, or current directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3238) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point.


[ 
https://issues.apache.org/jira/browse/PIG-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604071#comment-13604071
 ] 

Daniel Dai commented on PIG-3238:
-

Thanks for the patch. However, we need example in javadoc and a test case to 
commit it.

 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.
 ---

 Key: PIG-3238
 URL: https://issues.apache.org/jira/browse/PIG-3238
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Sonu Prathap
 Fix For: 0.10.0

 Attachments: Stuff.java.patch


 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3235) Enable DEBUG log messages in unit tests by default


[ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604076#comment-13604076
 ] 

Daniel Dai commented on PIG-3235:
-

Make sense, but would like set the default to warning. Don't need too much logs 
in most cases.

 Enable DEBUG log messages in unit tests by default
 --

 Key: PIG-3235
 URL: https://issues.apache.org/jira/browse/PIG-3235
 Project: Pig
  Issue Type: Improvement
  Components: tools
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3235.patch


 Currently, debug level messages are not logged for unit tests. It is helpful 
 to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3236) parametrize snapshot and staging repo id


 [ 
https://issues.apache.org/jira/browse/PIG-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3236:


Attachment: PIG-3236-0.12.patch

Attach patch for trunk.

 parametrize snapshot and staging repo id 
 -

 Key: PIG-3236
 URL: https://issues.apache.org/jira/browse/PIG-3236
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.11
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: PIG-3236-0.12.patch, PIG-3236.patch


 this would allow users to override the repo_id's to publish artifacts to 
 different repo's with different repo id's

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-3236) parametrize snapshot and staging repo id


 [ 
https://issues.apache.org/jira/browse/PIG-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3236.
-

   Resolution: Fixed
Fix Version/s: 0.12
 Hadoop Flags: Reviewed

Committed to trunk.

 parametrize snapshot and staging repo id 
 -

 Key: PIG-3236
 URL: https://issues.apache.org/jira/browse/PIG-3236
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.11
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Fix For: 0.12

 Attachments: PIG-3236-0.12.patch, PIG-3236.patch


 this would allow users to override the repo_id's to publish artifacts to 
 different repo's with different repo id's

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3235) Add log4j.properties for unit tests


 [ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3235:
---

Summary: Add log4j.properties for unit tests  (was: Enable DEBUG log 
messages in unit tests by default)

 Add log4j.properties for unit tests
 ---

 Key: PIG-3235
 URL: https://issues.apache.org/jira/browse/PIG-3235
 Project: Pig
  Issue Type: Improvement
  Components: tools
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3235-2.patch, PIG-3235.patch


 Currently, debug level messages are not logged for unit tests. It is helpful 
 to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3235) Enable DEBUG log messages in unit tests by default


 [ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3235:
---

Attachment: PIG-3235-2.patch

Sure. I lowered the logging level to INFO, which is the current default value.

 Enable DEBUG log messages in unit tests by default
 --

 Key: PIG-3235
 URL: https://issues.apache.org/jira/browse/PIG-3235
 Project: Pig
  Issue Type: Improvement
  Components: tools
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3235-2.patch, PIG-3235.patch


 Currently, debug level messages are not logged for unit tests. It is helpful 
 to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3235) Add log4j.properties for unit tests


[ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604127#comment-13604127
 ] 

Daniel Dai commented on PIG-3235:
-

+1

 Add log4j.properties for unit tests
 ---

 Key: PIG-3235
 URL: https://issues.apache.org/jira/browse/PIG-3235
 Project: Pig
  Issue Type: Improvement
  Components: tools
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3235-2.patch, PIG-3235.patch


 Currently, debug level messages are not logged for unit tests. It is helpful 
 to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3235) Add log4j.properties for unit tests