[jira] [Updated] (PIG-5360) Pig sets working directory of input file systems causes exception thrown

2018-09-26 Thread Xuzhou Yin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuzhou Yin updated PIG-5360:

Description: 
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system, then it will fail since it is trying 
to set the working directory of non-default file system to a HDFS path.{color}

{color:#00}The proposed change is to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path). Having InputFormat changes OutputFormat does not make 
much sense here.
{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}

{color:#00}Lastly, if customer has a sequence of jobs, changing the working 
directory may change the input paths of downstream jobs if the input paths are 
specified as relative{color}

  was:
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system, then it will fail since it is trying 
to set the working directory of non-default file system to a HDFS path.{color}

{color:#00}The proposed change is to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}


> Pig sets working directory of input file systems causes exception thrown
> 
>
> Key: PIG-5360
> URL: https://issues.apache.org/jira/browse/PIG-5360
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.17.0
>Reporter: Xuzhou Yin
>Priority: Minor
>  Labels: patch
> Fix For: 0.18.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> {color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
> the working directory of input File System to 
> jobContext.getWorkingDirectory(), which is always the default working 
> directory of default file system (eg. hdfs://host:port/user/userId in case of 
> HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default 
> value. So if the input path uses non-default file system, then it will fail 
> since it is trying to set the working directory of non-default file system to 
> a HDFS path.{color}
> {color:#00}The proposed change is to completely remove this logic of 
> setting working directory. There are several reasons for doing so. {color}
> {color:#00}Firstly, getSplits() is only supposed to return a list of 
> input splits. It should not have side effects (especially doing so can 
> potentially change the output path). Having InputFormat changes OutputFormat 
> does not make much sense here.
> {color}
> {color:#00}Secondly, there is inconsistency between the working 
> directories of input and output file systems. if "mapreduce.job.working.dir" 
> is set 

[jira] [Updated] (PIG-5360) Pig sets working directory of input file systems causes exception thrown

2018-09-26 Thread Xuzhou Yin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuzhou Yin updated PIG-5360:

Description: 
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system, then it will fail since it is trying 
to set the working directory of non-default file system to a HDFS path.{color}

{color:#00}The proposed change is to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}

  was:
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system (eg. EmrFS), then it will fail since it 
is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#00}The proposed change is to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}


> Pig sets working directory of input file systems causes exception thrown
> 
>
> Key: PIG-5360
> URL: https://issues.apache.org/jira/browse/PIG-5360
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.17.0
>Reporter: Xuzhou Yin
>Priority: Minor
>  Labels: patch
> Fix For: 0.18.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> {color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
> the working directory of input File System to 
> jobContext.getWorkingDirectory(), which is always the default working 
> directory of default file system (eg. hdfs://host:port/user/userId in case of 
> HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default 
> value. So if the input path uses non-default file system, then it will fail 
> since it is trying to set the working directory of non-default file system to 
> a HDFS path.{color}
> {color:#00}The proposed change is to completely remove this logic of 
> setting working directory. There are several reasons for doing so. {color}
> {color:#00}Firstly, getSplits() is only supposed to return a list of 
> input splits. It should not have side effects (especially doing so can 
> potentially change the output path).{color}
> {color:#00}Secondly, there is inconsistency between the working 
> directories of input and output file systems. if "mapreduce.job.working.dir" 
> is set to non-default value, it will affect the output path only (if it is a 
> relative path) because input path will be made qualified even before this 
> logic.{color}
> {color:#00}Thirdly, there is already a "CD" functionality that allows 
> customers to change the working directory. However, this logic will overwrite 
> the "CD" functionality 

[jira] [Updated] (PIG-5360) Pig sets working directory of input file systems causes exception thrown

2018-09-26 Thread Xuzhou Yin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuzhou Yin updated PIG-5360:

Description: 
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system (eg. EmrFS), then it will fail since it 
is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#00}The proposed change is to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}

  was:
{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system (eg. EmrFS), then it will fail since it 
is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#00}The proposed change it to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}


> Pig sets working directory of input file systems causes exception thrown
> 
>
> Key: PIG-5360
> URL: https://issues.apache.org/jira/browse/PIG-5360
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.17.0
>Reporter: Xuzhou Yin
>Priority: Minor
>  Labels: patch
> Fix For: 0.18.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> {color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
> the working directory of input File System to 
> jobContext.getWorkingDirectory(), which is always the default working 
> directory of default file system (eg. hdfs://host:port/user/userId in case of 
> HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default 
> value. So if the input path uses non-default file system (eg. EmrFS), then it 
> will fail since it is trying to set the working directory of EmrFS to a HDFS 
> path.{color}
> {color:#00}The proposed change is to completely remove this logic of 
> setting working directory. There are several reasons for doing so. {color}
> {color:#00}Firstly, getSplits() is only supposed to return a list of 
> input splits. It should not have side effects (especially doing so can 
> potentially change the output path).{color}
> {color:#00}Secondly, there is inconsistency between the working 
> directories of input and output file systems. if "mapreduce.job.working.dir" 
> is set to non-default value, it will affect the output path only (if it is a 
> relative path) because input path will be made qualified even before this 
> logic.{color}
> {color:#00}Thirdly, there is already a "CD" functionality that allows 
> customers to change the working directory. However, this logic will overwrite 
> the "CD" functionality if input and 

[jira] [Created] (PIG-5360) Pig sets working directory of input file systems causes exception thrown

2018-09-26 Thread Xuzhou Yin (JIRA)
Xuzhou Yin created PIG-5360:
---

 Summary: Pig sets working directory of input file systems causes 
exception thrown
 Key: PIG-5360
 URL: https://issues.apache.org/jira/browse/PIG-5360
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.17.0
Reporter: Xuzhou Yin
 Fix For: 0.18.0


{color:#00}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system (eg. EmrFS), then it will fail since it 
is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#00}The proposed change it to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#00}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#00}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#00}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 
system.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Pig-trunk #2080

2018-09-26 Thread Apache Jenkins Server
See 

Changes:

[rohini] PIG-3038: Support for Credentials for UDF,Loader and Storer 
(satishsaley via rohini)

--
[...truncated 187.65 KB...]
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/RINT.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/DoubleRound.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/LOG10.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/ATAN.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/COS.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/ROUND.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/SQRT.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/FloatAbs.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/IntMin.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/getExponent.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/math/DoubleDoubleBase.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/Over.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/ISOToUnix.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToYear.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMinute.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToSecond.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToWeek.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOHelper.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToDay.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToHour.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/DiffDate.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISOYearsBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISOMinutesBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISOMonthsBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISOSecondsBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISODaysBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/diff/ISOHoursBetween.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/stats
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/stats/COV.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/stats/COR.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPathAll.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java
AUcontrib/piggybank/java/build.xml
A autocomplete
A tutorial
A tutorial/scripts
AUtutorial/scripts/script1-hadoop.pig
AUtutorial/scripts/script1-local.pig
AUtutorial/scripts/script2-hadoop.pig
AUtutorial/scripts/script2-local.pig
A tutorial/src
A tutorial/src/org
A tutorial/src/org/apache
A tutorial/src/org/apache/pig
A 

[jira] [Comment Edited] (PIG-5342) Add setting to turn off bloom join combiner

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629371#comment-16629371
 ] 

Rohini Palaniswamy edited comment on PIG-5342 at 9/26/18 8:35 PM:
--

1) For the reduce case, we can optimize by making the keys always 
NullableBytesWritable and doing the DataType.toBytes(key, keyType) in the 
POBuildBloomRearrangeTez itself on the map side. Comparator also needs to be 
set to PigBytesRawBytesComparator. Can you make that change?
 2) Can you remove these lines as we have a distinct combiner now?
 // In case of reduce, not adding a combiner and doing the distinct during 
reduce itself.
 // If needed one can be added later

Another optimization would be to use IntWritable instead of NullableTuple for 
the value type. But that needs more work. We can do that later in another jira.


was (Author: rohini):
 For the reduce case, we can optimize by making the keys always 
NullableBytesWritable and doing the DataType.toBytes(key, keyType) in the 
POBuildBloomRearrangeTez itself on the map side. Comparator also needs to be 
set to PigBytesRawBytesComparator. Can you make that change?

Another optimization would be to use IntWritable instead of NullableTuple for 
the value type. But that needs more work. We can do that later in another jira.

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5342) Add setting to turn off bloom join combiner

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629371#comment-16629371
 ] 

Rohini Palaniswamy commented on PIG-5342:
-

 For the reduce case, we can optimize by making the keys always 
NullableBytesWritable and doing the DataType.toBytes(key, keyType) in the 
POBuildBloomRearrangeTez itself on the map side. Comparator also needs to be 
set to PigBytesRawBytesComparator. Can you make that change?

Another optimization would be to use IntWritable instead of NullableTuple for 
the value type. But that needs more work. We can do that later in another jira.

> Add setting to turn off bloom join combiner
> ---
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PIG-2557) CSVExcelStorage save : empty quotes "" becomes 4 quotes """". This should become a null field.

2018-09-26 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita resolved PIG-2557.
-
   Resolution: Duplicate
Fix Version/s: 0.17.0

> CSVExcelStorage save : empty quotes "" becomes 4 quotes .  This should 
> become a null field.
> ---
>
> Key: PIG-2557
> URL: https://issues.apache.org/jira/browse/PIG-2557
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.1
>Reporter: Peter Welch
>Priority: Minor
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-2557) CSVExcelStorage save : empty quotes "" becomes 4 quotes """". This should become a null field.

2018-09-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629193#comment-16629193
 ] 

Adam Szita commented on PIG-2557:
-

Since this is the same issue as PIG-5045, I'm resolving this

> CSVExcelStorage save : empty quotes "" becomes 4 quotes .  This should 
> become a null field.
> ---
>
> Key: PIG-2557
> URL: https://issues.apache.org/jira/browse/PIG-2557
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.1
>Reporter: Peter Welch
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-3038) Support for Credentials for UDF,Loader and Storer

2018-09-26 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629099#comment-16629099
 ] 

Satish Subhashrao Saley commented on PIG-3038:
--

Updated patch

> Support for Credentials for UDF,Loader and Storer
> -
>
> Key: PIG-3038
> URL: https://issues.apache.org/jira/browse/PIG-3038
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-3038-5.patch
>
>
>   Pig does not have a clean way (APIs) to support adding Credentials (hbase 
> token, hcat/hive metastore token) to Job and retrieving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-3038) Support for Credentials for UDF,Loader and Storer

2018-09-26 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-3038:
-
Attachment: PIG-3038-5.patch

> Support for Credentials for UDF,Loader and Storer
> -
>
> Key: PIG-3038
> URL: https://issues.apache.org/jira/browse/PIG-3038
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-3038-5.patch
>
>
>   Pig does not have a clean way (APIs) to support adding Credentials (hbase 
> token, hcat/hive metastore token) to Job and retrieving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68799: [PIG-3038] Support for Credentials for UDF, Loader and Storer

2018-09-26 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68799/#review209029
---


Ship it!




Ship It!

- Rohini Palaniswamy


On Sept. 26, 2018, 4:23 p.m., Satish Saley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68799/
> ---
> 
> (Updated Sept. 26, 2018, 4:23 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> [PIG-3038] Support for Credentials for UDF,Loader and Storer
> 
> 
> Diffs
> -
> 
>   src/org/apache/pig/EvalFunc.java fd139a8b4 
>   src/org/apache/pig/LoadFunc.java 83e89a34c 
>   src/org/apache/pig/StoreFuncInterface.java c590084dc 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  4d3ab5086 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  2c8dea608 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> f292487f0 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/LoaderProcessor.java
>  7a12df784 
>   src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java 98040382f 
>   test/org/apache/pig/test/TestCredentials.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68799/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Satish Saley
> 
>



Re: Review Request 68799: [PIG-3038] Support for Credentials for UDF, Loader and Storer

2018-09-26 Thread Satish Saley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68799/
---

(Updated Sept. 26, 2018, 9:23 a.m.)


Review request for pig.


Repository: pig-git


Description
---

[PIG-3038] Support for Credentials for UDF,Loader and Storer


Diffs (updated)
-

  src/org/apache/pig/EvalFunc.java fd139a8b4 
  src/org/apache/pig/LoadFunc.java 83e89a34c 
  src/org/apache/pig/StoreFuncInterface.java c590084dc 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 4d3ab5086 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
 2c8dea608 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
f292487f0 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/LoaderProcessor.java
 7a12df784 
  src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java 98040382f 
  test/org/apache/pig/test/TestCredentials.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68799/diff/5/

Changes: https://reviews.apache.org/r/68799/diff/4-5/


Testing
---


Thanks,

Satish Saley



[jira] [Commented] (PIG-5357) BagFactory interface should support creating a distinct bag from a set

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628924#comment-16628924
 ] 

Rohini Palaniswamy commented on PIG-5357:
-

All of the internal code now uses InternalDistinctBag instead of 
DistinctDataBag. Will take some more time to review the difference between both 
to see if anything else is needed before committing the patch.

> BagFactory interface should support creating a distinct bag from a set
> --
>
> Key: PIG-5357
> URL: https://issues.apache.org/jira/browse/PIG-5357
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jacob Tolar
>Priority: Minor
> Attachments: PIG-5357-1.patch, PIG-5357-2.patch
>
>
> It would be nice if BagFactory supported creating a distinct bag from a set 
> of tuples, similar to:
> {code:java}
> newDefaultBag(List listOfTuples);
> {code}
> [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5357) BagFactory interface should support creating a distinct bag from a set

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628911#comment-16628911
 ] 

Rohini Palaniswamy commented on PIG-5357:
-

+1

> BagFactory interface should support creating a distinct bag from a set
> --
>
> Key: PIG-5357
> URL: https://issues.apache.org/jira/browse/PIG-5357
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jacob Tolar
>Priority: Minor
> Attachments: PIG-5357-1.patch, PIG-5357-2.patch
>
>
> It would be nice if BagFactory supported creating a distinct bag from a set 
> of tuples, similar to:
> {code:java}
> newDefaultBag(List listOfTuples);
> {code}
> [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-3038) Support for Credentials for UDF,Loader and Storer

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628909#comment-16628909
 ] 

Rohini Palaniswamy commented on PIG-3038:
-

bq. Alternative is to have credential related APIs added to EvalFunc, LoadFunc 
and StoreFunc. To deal with backward compatibility in that case, one will have 
to do reflection and determine whether the method is implemented or not and 
then call it which is not that good.
  We went with this approach as we are moving to Java 8 in 0.18 and can do 
default methods.

> Support for Credentials for UDF,Loader and Storer
> -
>
> Key: PIG-3038
> URL: https://issues.apache.org/jira/browse/PIG-3038
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Fix For: 0.18.0
>
>
>   Pig does not have a clean way (APIs) to support adding Credentials (hbase 
> token, hcat/hive metastore token) to Job and retrieving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-3038) Support for Credentials for UDF,Loader and Storer

2018-09-26 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628904#comment-16628904
 ] 

Rohini Palaniswamy commented on PIG-3038:
-

[~nkollar] and [~szita],
   Can one of you implement this for Spark as well?
   

> Support for Credentials for UDF,Loader and Storer
> -
>
> Key: PIG-3038
> URL: https://issues.apache.org/jira/browse/PIG-3038
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
>Priority: Major
> Fix For: 0.18.0
>
>
>   Pig does not have a clean way (APIs) to support adding Credentials (hbase 
> token, hcat/hive metastore token) to Job and retrieving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68799: [PIG-3038] Support for Credentials for UDF, Loader and Storer

2018-09-26 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68799/#review209028
---




test/org/apache/pig/test/TestCredentials.java
Lines 128 (patched)


Can you rename methods as testCredentialsEvalFunc, testCredentialsLoadFunc 
and testCredentialsStoreFunc?


- Rohini Palaniswamy


On Sept. 24, 2018, 7:39 p.m., Satish Saley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68799/
> ---
> 
> (Updated Sept. 24, 2018, 7:39 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> [PIG-3038] Support for Credentials for UDF,Loader and Storer
> 
> 
> Diffs
> -
> 
>   src/org/apache/pig/EvalFunc.java fd139a8b4 
>   src/org/apache/pig/LoadFunc.java 83e89a34c 
>   src/org/apache/pig/StoreFuncInterface.java c590084dc 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  4d3ab5086 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  2c8dea608 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> f292487f0 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/LoaderProcessor.java
>  7a12df784 
>   src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java 98040382f 
>   test/org/apache/pig/test/TestCredentials.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68799/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Satish Saley
> 
>



[jira] Subscription: PIG patch available

2018-09-26 Thread jira
Issue Subscription
Filter: PIG patch available (40 issues)

Subscriber: pigdaily

Key Summary
PIG-5357BagFactory interface should support creating a distinct bag from a 
set
https://issues.apache.org/jira/browse/PIG-5357
PIG-5355Negative progress report by HBaseTableRecordReader
https://issues.apache.org/jira/browse/PIG-5355
PIG-5354Show fieldname and a line number for casting errors
https://issues.apache.org/jira/browse/PIG-5354
PIG-5342Add setting to turn off bloom join combiner
https://issues.apache.org/jira/browse/PIG-5342
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5317Upgrade old dependencies: commons-lang, hsqldb, commons-logging
https://issues.apache.org/jira/browse/PIG-5317
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3038Support for Credentials for UDF,Loader and Storer
https://issues.apache.org/jira/browse/PIG-3038
PIG-1804Alow Jython