[jira] Subscription: PIG patch available

2018-11-21 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5369Add llap-client dependency
https://issues.apache.org/jira/browse/PIG-5369
PIG-5360Pig sets working directory of input file systems causes exception 
thrown
https://issues.apache.org/jira/browse/PIG-5360
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384


[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-21 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695231#comment-16695231
 ] 

Koji Noguchi commented on PIG-5370:
---

Pig does create the correct result if we skip ColumnMapKeyPrune optimization.

Calling "explain D" with '-t ColumnMapKeyPrune', it shows
{noformat}
|---D: (Name: LOUnion Schema: 
A#40:bag{tuple#41:tuple(a1#42:int,a2#43:chararray,a3#44:int)},a2#43:chararray,a3#44:int)
|
|---B: (Name: LOForEach Schema: 
A#36:bag{#37:tuple(a1#9:int,a2#*10*:chararray,a3#*11*:int)},a2#*10*:chararray,a3#*11*:int)
|
|---C: (Name: LOForEach Schema: 
A#17:bag{tuple#49:tuple(a1#50:int,a2#51:chararray,a3#52:int)},a2#22:chararray,a3#23:int)
{noformat}

This issue only happen when we have a relation like B where inner schema 
contains a field with same uid as the one at the root level.  In the above 
example, uid {{\*10\*}} and {{\*11\*}}.

Before PIG-5312, schema of the inner bag was set to null so we didn't have this 
issue.
With PIG-5312, and the way LOUnion determines the output UIDS based on input 
UIDs, two issues are happening.
# schema of LOUnion is using the same uid for inner bag and outside. (UID 43 & 
44)
# ColumnMapKeyPrune is (incorrectly) determining that a2#22 & a3#23 are not 
being used and dropping them. 

Reading {{DuplicateForEachColumnRewriteVisitor.java}}, "relation B using the 
same uid" is a correct behavior since they are not at the same level.  So I'm 
guessing the required fix would be in the LOUnion.

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-21 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5370:
--
Issue Type: Bug  (was: Task)

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-21 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-5370:
-

 Summary: Union onschema + columnprune dropping used fields 
 Key: PIG-5370
 URL: https://issues.apache.org/jira/browse/PIG-5370
 Project: Pig
  Issue Type: Task
Reporter: Koji Noguchi
Assignee: Koji Noguchi


After PIG-5312, below query started failing.
{code}
A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
B = FOREACH (GROUP A by (a1,a2)) {
A_FOREACH = FOREACH A GENERATE a2,a3;
GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
}
C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
chararray,a3:int);

D = UNION ONSCHEMA B, C;

dump D;
{code}

{code:title=input1.txt}
1   a   3
2   b   4
2   c   5
1   a   6
2   b   7
1   c   8
{code}

{code:title=input2.txt}
{(10,a0,30),(20,b0,40)} zzz 222
{code}
{noformat:title=Expected output}
({(10,a0,30),(20,b0,40)},zzz,222)
({(1,a,6),(1,a,3)},a,6)
({(1,a,6),(1,a,3)},a,3)
({(1,c,8)},c,8)
({(2,b,7),(2,b,4)},b,7)
({(2,b,7),(2,b,4)},b,4)
({(2,c,5)},c,5)
{noformat}
{noformat:title=Actual (incorrect) output}
({(10,a0,30),(20,b0,40)})ONLY 1 Field 
({(1,a,6),(1,a,3)},a,6)
({(1,a,6),(1,a,3)},a,3)
({(1,c,8)},c,8)
({(2,b,7),(2,b,4)},b,7)
({(2,b,7),(2,b,4)},b,4)
({(2,c,5)},c,5)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Pig-trunk #2091

2018-11-21 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-5366: Enable PigStreamingDepend to load from current directory in 
newer Perl versions (abstractdog via daijy)

--
[...truncated 195.58 KB...]

ivy-init-dirs:
[mkdir] Created dir: 
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 


ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 


ivy-resolve:
 [echo] *** Ivy resolve with Hadoop 2, Spark 1 and HBase 1 ***
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  ERRORS
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver main
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver main
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve]   unknown resolver public
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[ivy:report] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' 
instead
[ivy:report] :: loading settings :: file = 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 


ivy-compile:
[ivy:retrieve] 
[ivy:retrieve] :: problems summary ::
[ivy:retrieve]  ERRORS
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver public
[ivy:retrieve]  unknown resolver main
[ivy:retrieve] 
[ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[ivy:cachepath] :: resolving dependencies :: org.apache.pig#pig;0.18.0-SNAPSHOT
[ivy:cachepath] confs: [compile]
[ivy:cachepath] found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:cachepath] found com.sun.jersey#jersey-server;1.8 in maven2
[ivy:cachepath] found com.sun.jersey.contribs#jersey-guice;1.8 in maven2
[ivy:cachepath] found commons-codec#commons-codec;1.4 in fs
[ivy:cachepath] found commons-configuration#commons-configuration;1.6 
in fs
[ivy:cachepath] found commons-collections#commons-collections;3.2.1 in 
fs
[ivy:cachepath] found javax.servlet#servlet-api;2.5 in fs
[ivy:cachepath] found javax.ws.rs#jsr311-api;1.1.1 in fs
[ivy:cachepath] found com.google.protobuf#protobuf-java;2.5.0 in fs
[ivy:cachepath] found javax.inject#javax.inject;1 in fs
[ivy:cachepath] found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:cachepath] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:cachepath] found com.google.inject#guice;3.0 in fs
[ivy:cachepath] found com.google.inject.extensions#guice-servlet;3.0 in 
fs
[ivy:cachepath] found aopalliance#aopalliance;1.0 in fs
[ivy:cachepath] found org.glassfish#javax.el;3.0.1-b08 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-annotations;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-auth;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-common;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-hdfs;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-tests;2.7.3 
in