[jira] Subscription: PIG patch available

2018-12-12 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5369Add llap-client dependency
https://issues.apache.org/jira/browse/PIG-5369
PIG-5360Pig sets working directory of input file systems causes exception 
thrown
https://issues.apache.org/jira/browse/PIG-5360
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384


[jira] [Updated] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE

2018-12-12 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5372:
--
Attachment: pig-5372-v1.patch

Same would happen for SAMPLE since it internally calls RANDOM.

NPE happens at 
{code:java}
int jobidhash = 
PigMapReduce.sJobConfInternal.get().get(MRConfiguration.JOB_ID).hashCode();
{code}
when job_id returns null.

This was happening when SkewedPartitioner.setConf was replacing a configuration 
dropping the original one with necessary job info. 
Attaching a patch({{pig-5372-v1.patch}}) that would just add instead of 
replacing the conf.

> SAMPLE/RANDOM(udf) before skewed join failing with NPE
> --
>
> Key: PIG-5372
> URL: https://issues.apache.org/jira/browse/PIG-5372
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5372-v1.patch
>
>
> Sample short code like below
> {code}
> A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int);
> B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int);
> A2 = FOREACH A generate *, RANDOM() as randnum;
> D = join A2 by a1, B by b1 using 'skewed' parallel 2;
> store D into '$output';
> {code}
> Fails with NPE. 
> {noformat}
> 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO  
> org.apache.tez.dag.history.HistoryEventHandler - 
> [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: 
> vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, 
> startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, 
> status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, 
> info=[Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 ->   
> scope-58 Operator Key: scope-29): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
> at 
> 

[jira] [Created] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE

2018-12-12 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-5372:
-

 Summary: SAMPLE/RANDOM(udf) before skewed join failing with NPE
 Key: PIG-5372
 URL: https://issues.apache.org/jira/browse/PIG-5372
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Sample short code like below
{code}
A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int);
B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int);

A2 = FOREACH A generate *, RANDOM() as randnum;

D = join A2 by a1, B by b1 using 'skewed' parallel 2;

store D into '$output';
{code}

Fails with NPE. 
{noformat}
2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO  
org.apache.tez.dag.history.HistoryEventHandler - 
[HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: 
vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, 
startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, 
status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, 
info=[Error: Failure while running 
task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 ->   
scope-58 Operator Key: scope-29): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while 
executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.RANDOM)[double] 
- scope-40 Operator Key: scope-40) children: null at []]: 
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
Exception while executing [POUserFunc (Name: 
POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
scope-40) children: null at []]: java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 17 more
Caused by: java.lang.NullPointerException
at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:51)
at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:37)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:396)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:343)
... 20 more
]
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE

2018-12-12 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5372:
--
Affects Version/s: 0.16.0

> SAMPLE/RANDOM(udf) before skewed join failing with NPE
> --
>
> Key: PIG-5372
> URL: https://issues.apache.org/jira/browse/PIG-5372
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> Sample short code like below
> {code}
> A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int);
> B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int);
> A2 = FOREACH A generate *, RANDOM() as randnum;
> D = join A2 by a1, B by b1 using 'skewed' parallel 2;
> store D into '$output';
> {code}
> Fails with NPE. 
> {noformat}
> 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO  
> org.apache.tez.dag.history.HistoryEventHandler - 
> [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: 
> vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, 
> startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, 
> status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, 
> info=[Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 ->   
> scope-58 Operator Key: scope-29): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: java.lang.NullPointerException
> at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:51)
> at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:37)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:396)
> at 
>