[jira] Subscription: PIG patch available

2018-11-27 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5369Add llap-client dependency
https://issues.apache.org/jira/browse/PIG-5369
PIG-5360Pig sets working directory of input file systems causes exception 
thrown
https://issues.apache.org/jira/browse/PIG-5360
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328&filterId=12322384


[jira] [Comment Edited] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-27 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701435#comment-16701435
 ] 

Koji Noguchi edited comment on PIG-5370 at 11/28/18 6:22 AM:
-

I can think of two different approaches.

\(i) Even for overlapping uids on different nested level, do not allow them and
 force IdentityColumn. This way, all uids will be unique.

(ii) Change LOUnion uidMapping logic from (output_uid, input_uid) lists to
 (output_uid, nested_uids).

Attaching a patch that tries (ii). If possible, I'd like to avoid \(i) which is 
already
 creating more uids to keep track.

Taking one relation as example,
{noformat}
B: (Name: LOForEach Schema: 
A#36:bag{#37:tuple(a1#9:int,a2#*10*:chararray,a3#*11*:int)},a2#*10*:chararray,a3#*11*:int)
{noformat}
Before the patch, input_uid
 36,9,10,11,10,11
were used for uidMapping.

After the patch, it'll use nested_uids,
 _36, _36_9, _36_10, _36_11, _10, _11

This way, there won't be any incorrect list lookup.

 [~daijy], would this approach work? 



was (Author: knoguchi):
I can think of two different approaches.

(i) Even for overlapping uids on different nested level, do not allow them and
 force IdentityColumn. This way, all uids will be unique.

(ii) Change LOUnion uidMapping logic from (output_uid, input_uid) lists to
 (output_uid, nested_uids).

Attaching a patch that tries (ii). If possible, I'd like to avoid (i) which is 
already
 creating more uids to keep track.

Taking one relation as example,
{noformat}
B: (Name: LOForEach Schema: 
A#36:bag{#37:tuple(a1#9:int,a2#*10*:chararray,a3#*11*:int)},a2#*10*:chararray,a3#*11*:int)
{noformat}
Before the patch, input_uid
 36,9,10,11,10,11
were used for uidMapping.

After the patch, it'll use nested_uids,
 _36, _36_9, _36_10, _36_11, _10, _11

This way, there won't be any incorrect list lookup.

 [~daijy], would this approach work? 


> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-27 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5370:
--
Attachment: pig-5370-v1.patch

I can think of two different approaches.

(i) Even for overlapping uids on different nested level, do not allow them and
 force IdentityColumn. This way, all uids will be unique.

(ii) Change LOUnion uidMapping logic from (output_uid, input_uid) lists to
 (output_uid, nested_uids).

Attaching a patch that tries (ii). If possible, I'd like to avoid (i) which is 
already
 creating more uids to keep track.

Taking one relation as example,
{noformat}
B: (Name: LOForEach Schema: 
A#36:bag{#37:tuple(a1#9:int,a2#*10*:chararray,a3#*11*:int)},a2#*10*:chararray,a3#*11*:int)
{noformat}
Before the patch, input_uid
 36,9,10,11,10,11
were used for uidMapping.

After the patch, it'll use nested_uids,
 _36, _36_9, _36_10, _36_11, _10, _11

This way, there won't be any incorrect list lookup.

 [~daijy], would this approach work? 


> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2018-11-27 Thread Will Lauer (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701389#comment-16701389
 ] 

Will Lauer commented on PIG-5362:
-

I've uploaded a new patch that changes how the parser detects and handles 
quoted strings. I think this is probably the correct approach for fixing the 
quoting issues, but I'm not sure yet whether there are any backward 
incompatibilities introduced by this fix.

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, pig4.patch, 
> test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2018-11-27 Thread Will Lauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Lauer updated PIG-5362:

Attachment: pig4.patch

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, pig4.patch, 
> test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Pig-trunk-commit #2585

2018-11-27 Thread Apache Jenkins Server
See 


Changes:

[daijy] PIG-5368: Braces without escaping in regexes throws error in recent 
perl versions (abstractdog via daijy)

[daijy] Revert PIG-5366: Enable PigStreamingDepend to load from current 
directory in newer Perl versions

--
[...truncated 195.76 KB...]
jar:
 [echo] Compiling against Spark 2
Trying to override old definition of task propertycopy
Trying to override old definition of task propertycopy

clean-deps:
Trying to override old definition of task propertycopy
Trying to override old definition of task propertycopy

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: 


ivy-init-dirs:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 


ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 


ivy-resolve:
 [echo] *** Ivy resolve with Hadoop 2, Spark 2 and HBase 1 ***
[ivy:report] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' 
instead
[ivy:report] :: loading settings :: file = 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 

[ivy:report] Processing 
/home/jenkins/.ivy2/cache/org.apache.pig-pig-compile.xml to 


ivy-compile:
[ivy:cachepath] :: resolving dependencies :: org.apache.pig#pig;0.18.0-SNAPSHOT
[ivy:cachepath] confs: [compile]
[ivy:cachepath] found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:cachepath] found com.sun.jersey#jersey-server;1.8 in maven2
[ivy:cachepath] found com.sun.jersey.contribs#jersey-guice;1.8 in maven2
[ivy:cachepath] found commons-codec#commons-codec;1.4 in fs
[ivy:cachepath] found commons-configuration#commons-configuration;1.6 
in fs
[ivy:cachepath] found commons-collections#commons-collections;3.2.1 in 
fs
[ivy:cachepath] found javax.servlet#servlet-api;2.5 in fs
[ivy:cachepath] found javax.ws.rs#jsr311-api;1.1.1 in fs
[ivy:cachepath] found com.google.protobuf#protobuf-java;2.5.0 in fs
[ivy:cachepath] found javax.inject#javax.inject;1 in fs
[ivy:cachepath] found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:cachepath] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:cachepath] found com.google.inject#guice;3.0 in fs
[ivy:cachepath] found com.google.inject.extensions#guice-servlet;3.0 in 
fs
[ivy:cachepath] found aopalliance#aopalliance;1.0 in fs
[ivy:cachepath] found org.glassfish#javax.el;3.0.1-b08 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-annotations;2.7.3 in 
maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-auth;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-common;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-hdfs;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-tests;2.7.3 
in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-app;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-mapreduce-client-common;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-api;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-common;2.7.3 in 
maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server;2.7.3 in 
maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-web-proxy;2.7.3 in maven2
[ivy:cachepath] found org.apache.hadoop#hadoop-yarn-server-common;2.7.3 
in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-nodemanager;2.7.3 in maven2
[ivy:cachepath] found 
org.apache.hadoop#hadoop-yarn-server-resourcemanager;2

[jira] [Commented] (PIG-5371) Hdfs bytes written assertions fail in TestPigRunner

2018-11-27 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701160#comment-16701160
 ] 

Rohini Palaniswamy commented on PIG-5371:
-

bq.  It seems like HDFS counter 'HDFS_BYTES_WRITTEN' returns the byte count not 
only for the result of pig store operator, but it includes the size of the jar 
files as well.
  Are you sure that jar files are included in it? It is a job counter. Don't 
understand how that can happen.

> Hdfs bytes written assertions fail in TestPigRunner
> ---
>
> Key: PIG-5371
> URL: https://issues.apache.org/jira/browse/PIG-5371
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5371.01.patch, simpleTest.out
>
>
> Attached  [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' 
> returns the byte count not only for the result of pig store operator, but it 
> includes the size of the jar files as well. The problem is this could change 
> very easily, so in my opinion the best would be to remove these assertions 
> from TestPigRunner as this is just causing intermittent and/or persistent 
> failures.
> The test class is for basic testing of PigRunner, and this is achieved well 
> enough without the asserts.
> {code}
> 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO  
> org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, 
> replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for 
> /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar
> ...
> 2018-11-23 10:14:52,735 [PacketResponder: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 
> 127.0.0.1:54943]] INFO  
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: 
> /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 
> 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 
> 57162859
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5369) Add llap-client dependency

2018-11-27 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701158#comment-16701158
 ] 

Rohini Palaniswamy commented on PIG-5369:
-

bq. Input.class, LlapProxy.class};
  This would break for anyone running with older versions of hive. You need to 
use reflection

> Add llap-client dependency
> --
>
> Key: PIG-5369
> URL: https://issues.apache.org/jira/browse/PIG-5369
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5369-1.patch
>
>
> Because of HIVE-20649, pig needs to ship llap-client.jar.
>  
> {code}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/io/api/LlapProxy
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.(OrcFile.java:155)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:349)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat.getRecordWriter(OrcNewOutputFormat.java:76)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:83)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initWriter(MROutput.java:469)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:391)
>   at org.apache
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2018-11-27 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reopened PIG-5362:
-

Reopening it. Will said he had a fix for the issue. 

[~wla...@yahoo-inc.com],
   Can you upload the test fix?

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2018-11-27 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701104#comment-16701104
 ] 

Koji Noguchi commented on PIG-5362:
---

bq. There are test failures in TestParamSubPreproc. I have attached test log.

9 failures total
{noformat}
TestParamSubPreproc.testCmdlineFileDeclareCombo
TestParamSubPreproc.testSameParamInMultipleFiles
TestParamSubPreproc.testCmdlineFileComboDuplicate
TestParamSubPreproc.testCmdlineFileCombo
TestParamSubPreproc.testMultipleParamsinSingleLine
TestParamSubPreproc.testCmdlineFileDeclareDefaultComboDuplicates
TestParamSubPreproc.testFileParamsFromMultipleFiles
TestParamSubPreproc.testMultipleDeclareScope
TestParamSubPreproc.testCmdlineFileDeclareComboDuplicates
{noformat}

[~rohini], do we want to revert this?  Or create a new jira?

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)