[jira] Subscription: PIG patch available

2019-02-27 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5377Move supportsParallelWriteToStoreLocation from StoreFunc to 
StoreFuncInterfce
https://issues.apache.org/jira/browse/PIG-5377
PIG-5369Add llap-client dependency
https://issues.apache.org/jira/browse/PIG-5369
PIG-5360Pig sets working directory of input file systems causes exception 
thrown
https://issues.apache.org/jira/browse/PIG-5360
PIG-5338Prevent deep copy of DataBag into Jython List
https://issues.apache.org/jira/browse/PIG-5338
PIG-5323Implement LastInputStreamingOptimizer in Tez
https://issues.apache.org/jira/browse/PIG-5323
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384


[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-02-27 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
--
Attachment: pig-5380-v01.patch

Attaching a patch {{pig-5380-v01.patch}}.
Without the change to SortedDataBag, test cases will fail with
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation took 0.213 sec
Caused an ERROR
null
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:348)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation(TestDataBag.java:1333)

{noformat}
and
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation2 took 1.012 sec
FAILED
tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)>
junit.framework.AssertionFailedError: tuples should be the same 
expected:<(-2147483648)> but was:<(-2055861747)>
at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation2(TestDataBag.java:1419)

Testcase: testSortedFirstSpillDuringRead took 0.003 sec
{noformat}
Basically ConcurrentModificationException can happen when new spill file is 
added while SortedDataBag is creating a priority queue at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L344-L360]

and missing value can happen when spilling occurs after files are read but 
before memory is being checked at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L361]
Also the smallest value has to be in memory.

In short, ConcurrentModificationException can happen when there are a lot of 
spills but chances of missing value is very small. Please note that test cases 
may not reliably fail. I tried to insert a short sleep to increase the chances 
of reproducing these race conditions.

Also, note that we probably didn't observe these bugs since our framework 
stopped using SortedDataBag a long time back when we switched to using 
InternalSortedBag.

> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -
>
> Key: PIG-5380
> URL: https://issues.apache.org/jira/browse/PIG-5380
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5380-v01.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-02-27 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-5380:
-

 Summary: SortedDataBag hitting ConcurrentModificationException or 
producing incorrect output in a corner-case 
 Key: PIG-5380
 URL: https://issues.apache.org/jira/browse/PIG-5380
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


User had a UDF that created large SortedDataBag.  This UDF was failing with 

{noformat}
java.util.ConcurrentModificationException
  at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
  at java.util.ArrayList$Itr.next(ArrayList.java:851)
  at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
  at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
  at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)