[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (36 issues) Subscriber: pigdaily Key Summary PIG-5369Add llap-client dependency https://issues.apache.org/jira/browse/PIG-5369 PIG-5360Pig sets working directory of input file systems causes exception thrown https://issues.apache.org/jira/browse/PIG-5360 PIG-5338Prevent deep copy of DataBag into Jython List https://issues.apache.org/jira/browse/PIG-5338 PIG-5323Implement LastInputStreamingOptimizer in Tez https://issues.apache.org/jira/browse/PIG-5323 PIG-5273_SUCCESS file should be created at the end of the job https://issues.apache.org/jira/browse/PIG-5273 PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream https://issues.apache.org/jira/browse/PIG-5267 PIG-5256Bytecode generation for POFilter and POForeach https://issues.apache.org/jira/browse/PIG-5256 PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown NPE in multithread env https://issues.apache.org/jira/browse/PIG-5160 PIG-5115Builtin AvroStorage generates incorrect avro schema when the same pig field name appears in the alias https://issues.apache.org/jira/browse/PIG-5115 PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true https://issues.apache.org/jira/browse/PIG-5106 PIG-5081Can not run pig on spark source code distribution https://issues.apache.org/jira/browse/PIG-5081 PIG-5080Support store alias as spark table https://issues.apache.org/jira/browse/PIG-5080 PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput https://issues.apache.org/jira/browse/PIG-5057 PIG-5029Optimize sort case when data is skewed https://issues.apache.org/jira/browse/PIG-5029 PIG-4926Modify the content of start.xml for spark mode https://issues.apache.org/jira/browse/PIG-4926 PIG-4913Reduce jython function initiation during compilation https://issues.apache.org/jira/browse/PIG-4913 PIG-4849pig on tez will cause tez-ui to crash,because the content from timeline server is too long. https://issues.apache.org/jira/browse/PIG-4849 PIG-4750REPLACE_MULTI should compile Pattern once and reuse it https://issues.apache.org/jira/browse/PIG-4750 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4373Implement PIG-3861 in Tez https://issues.apache.org/jira/browse/PIG-4373 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-1804Alow Jython function to implement Algebraic and/or Accumulator interfaces https://issues.apache.org/jira/browse/PIG-1804 You may edit this subscription at: https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384
[jira] [Updated] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE
[ https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5372: -- Attachment: pig-5372-v1.patch Same would happen for SAMPLE since it internally calls RANDOM. NPE happens at {code:java} int jobidhash = PigMapReduce.sJobConfInternal.get().get(MRConfiguration.JOB_ID).hashCode(); {code} when job_id returns null. This was happening when SkewedPartitioner.setConf was replacing a configuration dropping the original one with necessary job info. Attaching a patch({{pig-5372-v1.patch}}) that would just add instead of replacing the conf. > SAMPLE/RANDOM(udf) before skewed join failing with NPE > -- > > Key: PIG-5372 > URL: https://issues.apache.org/jira/browse/PIG-5372 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5372-v1.patch > > > Sample short code like below > {code} > A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int); > B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int); > A2 = FOREACH A generate *, RANDOM() as randnum; > D = join A2 by a1, B by b1 using 'skewed' parallel 2; > store D into '$output'; > {code} > Fails with NPE. > {noformat} > 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO > org.apache.tez.dag.history.HistoryEventHandler - > [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: > vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, > startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, > status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, > info=[Error: Failure while running > task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 -> > scope-58 Operator Key: scope-29): > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: > Exception while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325) > at >
[jira] [Created] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE
Koji Noguchi created PIG-5372: - Summary: SAMPLE/RANDOM(udf) before skewed join failing with NPE Key: PIG-5372 URL: https://issues.apache.org/jira/browse/PIG-5372 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Sample short code like below {code} A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int); B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int); A2 = FOREACH A generate *, RANDOM() as randnum; D = join A2 by a1, B by b1 using 'skewed' parallel 2; store D into '$output'; {code} Fails with NPE. {noformat} 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO org.apache.tez.dag.history.HistoryEventHandler - [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 -> scope-58 Operator Key: scope-29): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: scope-40) children: null at []]: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: scope-40) children: null at []]: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305) ... 17 more Caused by: java.lang.NullPointerException at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:51) at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:37) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:332) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:396) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:343) ... 20 more ] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE
[ https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5372: -- Affects Version/s: 0.16.0 > SAMPLE/RANDOM(udf) before skewed join failing with NPE > -- > > Key: PIG-5372 > URL: https://issues.apache.org/jira/browse/PIG-5372 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > Sample short code like below > {code} > A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int); > B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int); > A2 = FOREACH A generate *, RANDOM() as randnum; > D = join A2 by a1, B by b1 using 'skewed' parallel 2; > store D into '$output'; > {code} > Fails with NPE. > {noformat} > 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO > org.apache.tez.dag.history.HistoryEventHandler - > [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: > vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, > startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, > status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, > info=[Error: Failure while running > task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 -> > scope-58 Operator Key: scope-29): > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: > Exception while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305) > ... 17 more > Caused by: java.lang.NullPointerException > at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:51) > at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:37) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:332) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:396) > at >