[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-07-09 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
--
Status: Patch Available  (was: Open)

> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -
>
> Key: PIG-5380
> URL: https://issues.apache.org/jira/browse/PIG-5380
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5380-v01.patch, pig-5380-v02.patch, 
> pig-5380-v03.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-07-08 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
--
Attachment: pig-5380-v03.patch

{quote}
Also while comparing the two source codes, I think I found one section that was 
missing locking while reading from memory. Added.
{quote}
Actually there were more than one.  Creating (yet another) jira for the race 
condition and updating the patch to just focus on the issue reported on this 
jira.

> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -
>
> Key: PIG-5380
> URL: https://issues.apache.org/jira/browse/PIG-5380
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5380-v01.patch, pig-5380-v02.patch, 
> pig-5380-v03.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-07-05 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
--
Attachment: pig-5380-v02.patch

{quote}
Looking back on changes, I think I understand that this is a bigger bug than 
this jira given all these self-spilling bags are designed on the premise that 
no other threads would touch them (and thus lockings are dropped).
{quote}
This turned out to be false. PIG-5390 has the details. 

bq. I think same change is required in InternalSortedBag as well as code is 
exactly same and it can spill too
Made the same changes to InternalSortedBag.
Also while comparing the two source codes,  I think I found one section that 
was missing locking while reading from memory.  Added.


> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -
>
> Key: PIG-5380
> URL: https://issues.apache.org/jira/browse/PIG-5380
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5380-v01.patch, pig-5380-v02.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case

2019-02-27 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
--
Attachment: pig-5380-v01.patch

Attaching a patch {{pig-5380-v01.patch}}.
Without the change to SortedDataBag, test cases will fail with
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation took 0.213 sec
Caused an ERROR
null
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:348)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation(TestDataBag.java:1333)

{noformat}
and
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation2 took 1.012 sec
FAILED
tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)>
junit.framework.AssertionFailedError: tuples should be the same 
expected:<(-2147483648)> but was:<(-2055861747)>
at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation2(TestDataBag.java:1419)

Testcase: testSortedFirstSpillDuringRead took 0.003 sec
{noformat}
Basically ConcurrentModificationException can happen when new spill file is 
added while SortedDataBag is creating a priority queue at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L344-L360]

and missing value can happen when spilling occurs after files are read but 
before memory is being checked at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L361]
Also the smallest value has to be in memory.

In short, ConcurrentModificationException can happen when there are a lot of 
spills but chances of missing value is very small. Please note that test cases 
may not reliably fail. I tried to insert a short sleep to increase the chances 
of reproducing these race conditions.

Also, note that we probably didn't observe these bugs since our framework 
stopped using SortedDataBag a long time back when we switched to using 
InternalSortedBag.

> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -
>
> Key: PIG-5380
> URL: https://issues.apache.org/jira/browse/PIG-5380
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5380-v01.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)