[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (36 issues) Subscriber: pigdaily Key Summary PIG-5377Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce https://issues.apache.org/jira/browse/PIG-5377 PIG-5369Add llap-client dependency https://issues.apache.org/jira/browse/PIG-5369 PIG-5360Pig sets working directory of input file systems causes exception thrown https://issues.apache.org/jira/browse/PIG-5360 PIG-5338Prevent deep copy of DataBag into Jython List https://issues.apache.org/jira/browse/PIG-5338 PIG-5323Implement LastInputStreamingOptimizer in Tez https://issues.apache.org/jira/browse/PIG-5323 PIG-5273_SUCCESS file should be created at the end of the job https://issues.apache.org/jira/browse/PIG-5273 PIG-5256Bytecode generation for POFilter and POForeach https://issues.apache.org/jira/browse/PIG-5256 PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown NPE in multithread env https://issues.apache.org/jira/browse/PIG-5160 PIG-5115Builtin AvroStorage generates incorrect avro schema when the same pig field name appears in the alias https://issues.apache.org/jira/browse/PIG-5115 PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true https://issues.apache.org/jira/browse/PIG-5106 PIG-5081Can not run pig on spark source code distribution https://issues.apache.org/jira/browse/PIG-5081 PIG-5080Support store alias as spark table https://issues.apache.org/jira/browse/PIG-5080 PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput https://issues.apache.org/jira/browse/PIG-5057 PIG-5029Optimize sort case when data is skewed https://issues.apache.org/jira/browse/PIG-5029 PIG-4926Modify the content of start.xml for spark mode https://issues.apache.org/jira/browse/PIG-4926 PIG-4913Reduce jython function initiation during compilation https://issues.apache.org/jira/browse/PIG-4913 PIG-4849pig on tez will cause tez-ui to crash,because the content from timeline server is too long. https://issues.apache.org/jira/browse/PIG-4849 PIG-4750REPLACE_MULTI should compile Pattern once and reuse it https://issues.apache.org/jira/browse/PIG-4750 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4373Implement PIG-3861 in Tez https://issues.apache.org/jira/browse/PIG-4373 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-1804Alow Jython function to implement Algebraic and/or Accumulator interfaces https://issues.apache.org/jira/browse/PIG-1804 You may edit this subscription at: https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384
[jira] [Updated] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case
[ https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5380: -- Attachment: pig-5380-v01.patch Attaching a patch {{pig-5380-v01.patch}}. Without the change to SortedDataBag, test cases will fail with {noformat} Testcase: testSortedSpillDuringPriorityQueueCreation took 0.213 sec Caused an ERROR null java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:348) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235) at org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation(TestDataBag.java:1333) {noformat} and {noformat} Testcase: testSortedSpillDuringPriorityQueueCreation2 took 1.012 sec FAILED tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)> junit.framework.AssertionFailedError: tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)> at org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation2(TestDataBag.java:1419) Testcase: testSortedFirstSpillDuringRead took 0.003 sec {noformat} Basically ConcurrentModificationException can happen when new spill file is added while SortedDataBag is creating a priority queue at [https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L344-L360] and missing value can happen when spilling occurs after files are read but before memory is being checked at [https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L361] Also the smallest value has to be in memory. In short, ConcurrentModificationException can happen when there are a lot of spills but chances of missing value is very small. Please note that test cases may not reliably fail. I tried to insert a short sleep to increase the chances of reproducing these race conditions. Also, note that we probably didn't observe these bugs since our framework stopped using SortedDataBag a long time back when we switched to using InternalSortedBag. > SortedDataBag hitting ConcurrentModificationException or producing incorrect > output in a corner-case > - > > Key: PIG-5380 > URL: https://issues.apache.org/jira/browse/PIG-5380 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5380-v01.patch > > > User had a UDF that created large SortedDataBag. This UDF was failing with > {noformat} > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PIG-5380) SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case
Koji Noguchi created PIG-5380: - Summary: SortedDataBag hitting ConcurrentModificationException or producing incorrect output in a corner-case Key: PIG-5380 URL: https://issues.apache.org/jira/browse/PIG-5380 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi User had a UDF that created large SortedDataBag. This UDF was failing with {noformat} java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)