[jira] [Commented] (PIG-5317) Upgrade old dependencies: commons-lang, hsqldb, commons-logging

2018-10-03 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637635#comment-16637635
 ] 

Satish Subhashrao Saley commented on PIG-5317:
--

I tested  PIG-5317_without_new_dep_2.patch, it looks good. +1 (non-binding)

> Upgrade old dependencies: commons-lang, hsqldb, commons-logging
> ---
>
> Key: PIG-5317
> URL: https://issues.apache.org/jira/browse/PIG-5317
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5317_1.patch, PIG-5317_2.patch, 
> PIG-5317_amend.patch, PIG-5317_without_new_dep.patch, 
> PIG-5317_without_new_dep_2.patch
>
>
> Pig depends on old version of commons-lang, hsqldb and commons-logging. It 
> would be nice to upgrade the version of these dependencies, for commons-lang 
> Pig should depend on commons-lang3 instead (which is already present in the 
> ivy.xml)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Pig-trunk-commit #2576

2018-10-03 Thread Apache Jenkins Server
See 


Changes:

[rohini] PIG-5342: Add setting to turn off bloom join combiner (satishsaley via 
rohini)

--
[...truncated 197.20 KB...]
[ivy:cachepath] found javax.ws.rs#jsr311-api;1.1.1 in fs
[ivy:cachepath] found com.google.protobuf#protobuf-java;2.5.0 in default
[ivy:cachepath] found javax.inject#javax.inject;1 in default
[ivy:cachepath] found javax.xml.bind#jaxb-api;2.2.2 in default
[ivy:cachepath] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in default
[ivy:cachepath] found com.google.inject#guice;3.0 in default
[ivy:cachepath] found com.google.inject.extensions#guice-servlet;3.0 in 
fs
[ivy:cachepath] found aopalliance#aopalliance;1.0 in default
[ivy:cachepath] found org.glassfish#javax.el;3.0.1-b08 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-annotations;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-auth;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-common;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-hdfs;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-tests;2.7.3 
in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-app;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-common;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-api;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-common;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-web-proxy;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-common;2.7.3 
in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-nodemanager;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.7.3 in fs
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-client;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-applicationhistoryservice;2.7.3 in fs
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-hs;2.7.3 in fs
[ivy:cachepath] found org.apache.avro#avro-mapred;1.7.5 in maven2
[ivy:cachepath] found org.apache.avro#avro-ipc;1.7.5 in maven2
[ivy:cachepath] found org.apache.avro#avro;1.7.5 in fs
[ivy:cachepath] found com.thoughtworks.paranamer#paranamer;2.3 in 
default
[ivy:cachepath] found org.xerial.snappy#snappy-java;1.0.5 in fs
[ivy:cachepath] found org.apache.commons#commons-compress;1.4.1 in 
default
[ivy:cachepath] found org.tukaani#xz;1.0 in default
[ivy:cachepath] found org.slf4j#slf4j-api;1.6.4 in default
[ivy:cachepath] found org.mortbay.jetty#jetty;6.1.26 in fs
[ivy:cachepath] found org.mortbay.jetty#jetty-util;6.1.26 in default
[ivy:cachepath] found org.mortbay.jetty#servlet-api;2.5-20081211 in fs
[ivy:cachepath] found org.apache.velocity#velocity;1.7 in fs
[ivy:cachepath] found commons-lang#commons-lang;2.4 in fs
[ivy:cachepath] found org.htrace#htrace-core;3.0.4 in fs
[ivy:cachepath] found org.apache.htrace#htrace-core;3.1.0-incubating in 
fs
[ivy:cachepath] found org.fusesource.leveldbjni#leveldbjni-all;1.8 in fs
[ivy:cachepath] found org.apache.hive.shims#hive-shims-0.23;1.2.1 in fs
[ivy:cachepath] found org.apache.tez#tez;0.7.0 in maven2
[ivy:cachepath] found org.apache.tez#tez-common;0.7.0 in maven2
[ivy:cachepath] found org.apache.tez#tez-api;0.7.0 in maven2
[ivy:cachepath] found org.apache.tez#tez-dag;0.7.0 in maven2
[ivy:cachepath] found org.apache.tez#tez-runtime-internals;0.7.0 in 
maven2
[ivy:cachepath] found org.apache.tez#tez-runtime-library;0.7.0 in maven2
[ivy:cachepath] found org.apache.tez#tez-mapreduce;0.7.0 in maven2
[ivy:cachepath] found 
org.apache.tez#tez-yarn-timeline-history-with-acls;0.7.0 in maven2
[ivy:cachepath] found org.apache.commons#commons-collections4;4.0 in fs
[ivy:cachepath] found org.codehaus.jettison#jettison;1.3.4 in fs
[ivy:cachepath] found org.apache.commons#commons-math3;3.1.1 in default
[ivy:cachepath] found org.apache.curator#curator-framework;2.6.0 in fs
[ivy:cachepath] found org.apache.curator#curator-client;2.6.0 in fs
[ivy:cachepath] found org.apache.hbase#hbase-client;1.2.4 in fs
[ivy:cachepath] found 

[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner

2018-10-03 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5342:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

+1. Committed to trunk

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, 
> PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner

2018-10-03 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5342:
-
Attachment: (was: PIG-5342-7.patch)

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, 
> PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner

2018-10-03 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5342:
-
Attachment: PIG-5342-8.patch

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, 
> PIG-5342-7.patch, PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner

2018-10-03 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5342:
-
Attachment: PIG-5342-7.patch

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, 
> PIG-5342-7.patch, PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner

2018-10-03 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5342:
-
Attachment: PIG-5342-7.patch

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] Subscription: PIG patch available

2018-10-03 Thread jira
Issue Subscription
Filter: PIG patch available (39 issues)

Subscriber: pigdaily

Key Summary
PIG-5359Reduce time spent in split serialization
https://issues.apache.org/jira/browse/PIG-5359
PIG-5357BagFactory interface should support creating a distinct bag from a 
set
https://issues.apache.org/jira/browse/PIG-5357
PIG-5354Show fieldname and a line number for casting errors
https://issues.apache.org/jira/browse/PIG-5354
PIG-5342Add setting to turn off bloom join combiner
https://issues.apache.org/jira/browse/PIG-5342
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5317Upgrade old dependencies: commons-lang, hsqldb, commons-logging
https://issues.apache.org/jira/browse/PIG-5317
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit