[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-30 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705097#comment-16705097
 ] 

Koji Noguchi commented on PIG-5370:
---

bq. Sorry, somehow my last patch didn't include a unit-test. Uploaded 
pig-5370-v2.patch with a test.

[~daijy], sorry to bug you again on this.  Can you take a look at my v2 patch 
that added a unit test? 

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch, pig-5370-v2.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-30 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705272#comment-16705272
 ] 

Daniel Dai commented on PIG-5370:
-

+1

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch, pig-5370-v2.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2018-11-30 Thread Will Lauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Lauer updated PIG-5362:

Attachment: pig5.patch

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, pig4.patch, 
> pig5.patch, test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-30 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5370.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0

Thanks for the review Daniel!!! 

 Committed to trunk.

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: pig-5370-v1.patch, pig-5370-v2.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Pig-trunk-commit #2586

2018-11-30 Thread Apache Jenkins Server
See 


Changes:

[knoguchi] PIG-5370: Union onschema + columnprune dropping used fields 
(knoguchi)

--
[...truncated 195.76 KB...]
jar:
 [echo] Compiling against Spark 2
Trying to override old definition of task propertycopy
Trying to override old definition of task propertycopy

clean-deps:
Trying to override old definition of task propertycopy
Trying to override old definition of task propertycopy

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: 


ivy-init-dirs:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 


ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 


ivy-resolve:
 [echo] *** Ivy resolve with Hadoop 2, Spark 2 and HBase 1 ***
[ivy:report] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' 
instead
[ivy:report] :: loading settings :: file = 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 


ivy-compile:
[ivy:cachepath] :: resolving dependencies :: org.apache.pig#pig;0.18.0-SNAPSHOT
[ivy:cachepath] confs: [compile]
[ivy:cachepath] found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:cachepath] found com.sun.jersey#jersey-server;1.8 in maven2
[ivy:cachepath] found com.sun.jersey.contribs#jersey-guice;1.8 in maven2
[ivy:cachepath] found commons-codec#commons-codec;1.4 in fs
[ivy:cachepath] found commons-configuration#commons-configuration;1.6 
in fs
[ivy:cachepath] found commons-collections#commons-collections;3.2.1 in 
fs
[ivy:cachepath] found javax.servlet#servlet-api;2.5 in fs
[ivy:cachepath] found javax.ws.rs#jsr311-api;1.1.1 in fs
[ivy:cachepath] found com.google.protobuf#protobuf-java;2.5.0 in fs
[ivy:cachepath] found javax.inject#javax.inject;1 in fs
[ivy:cachepath] found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:cachepath] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:cachepath] found com.google.inject#guice;3.0 in fs
[ivy:cachepath] found com.google.inject.extensions#guice-servlet;3.0 in 
fs
[ivy:cachepath] found aopalliance#aopalliance;1.0 in fs
[ivy:cachepath] found org.glassfish#javax.el;3.0.1-b08 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-annotations;2.7.3 in 
maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-auth;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-common;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-hdfs;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-tests;2.7.3 
in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-app;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-common;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-api;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-common;2.7.3 in 
maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server;2.7.3 in 
maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-web-proxy;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-common;2.7.3 
in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-nodemanager;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-client;2.7.3 in 
maven2
[ivy:cachepath] found 
org.apach

Build failed in Jenkins: Pig-trunk #2093

2018-11-30 Thread Apache Jenkins Server
See 

Changes:

[knoguchi] PIG-5370: Union onschema + columnprune dropping used fields 
(knoguchi)

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H20 (ubuntu xenial) in workspace 

Updating http://svn.apache.org/repos/asf/pig/trunk at revision 
'2018-11-30T22:05:08.744 +'
U test/org/apache/pig/test/TestNewPlanColumnPrune.java
U CHANGES.txt
U src/org/apache/pig/newplan/logical/relational/LOUnion.java
At revision 1847856

[Pig-trunk] $ /bin/bash /tmp/jenkins3673392360174627551.sh
[Pig-trunk] $ /home/jenkins/tools/ant/apache-ant-1.8.4/bin/ant 
-Djavac.version=1.7 -Dtest.junit.output.format=xml -Dhadoopversion=23 clean 
test-commit
Buildfile: 
 [echo] Property setting hadoopversion=23 is deprecated. Overwriting to 
hadoopversion=2
Trying to override old definition of task propertycopy

clean:
   [delete] Deleting directory 


clean:

clean:

setWindowsPath:

setLinuxPath:

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: 
  [get] Not modified - so not downloaded

ivy-init-dirs:
[mkdir] Created dir: 
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 


ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 


ivy-resolve:
 [echo] *** Ivy resolve with Hadoop 2, Spark 1 and HBase 1 ***
[ivy:report] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' 
instead
[ivy:report] :: loading settings :: file = 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 


ivy-compile:
[ivy:cachepath] :: resolving dependencies :: org.apache.pig#pig;0.18.0-SNAPSHOT
[ivy:cachepath] confs: [compile]
[ivy:cachepath] found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:cachepath] found com.sun.jersey#jersey-server;1.8 in maven2
[ivy:cachepath] found com.sun.jersey.contribs#jersey-guice;1.8 in maven2
[ivy:cachepath] found commons-codec#commons-codec;1.4 in fs
[ivy:cachepath] found commons-configuration#commons-configuration;1.6 
in fs
[ivy:cachepath] found commons-collections#commons-collections;3.2.1 in 
fs
[ivy:cachepath] found javax.servlet#servlet-api;2.5 in fs
[ivy:cachepath] found javax.ws.rs#jsr311-api;1.1.1 in fs
[ivy:cachepath] found com.google.protobuf#protobuf-java;2.5.0 in fs
[ivy:cachepath] found javax.inject#javax.inject;1 in fs
[ivy:cachepath] found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:cachepath] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:cachepath] found com.google.inject#guice;3.0 in fs
[ivy:cachepath] found com.google.inject.extensions#guice-servlet;3.0 in 
fs
[ivy:cachepath] found aopalliance#aopalliance;1.0 in fs
[ivy:cachepath] found org.glassfish#javax.el;3.0.1-b08 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-annotations;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-auth;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-common;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-hdfs;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-tests;2.7.3 
in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-app;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-common;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-api;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-

[jira] Subscription: PIG patch available

2018-11-30 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5369Add llap-client dependency
https://issues.apache.org/jira/browse/PIG-5369
PIG-5360Pig sets working directory of input file systems causes exception 
thrown
https://issues.apache.org/jira/browse/PIG-5360
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328&filterId=12322384