[jira] [Updated] (PIG-5273) _SUCCESS file should be created at the end of the job
[ https://issues.apache.org/jira/browse/PIG-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Subhashrao Saley updated PIG-5273: - Description: One of the users ran into issues because _SUCCESS file was created by FileOutputCommitter.commitJob() and storeCleanup() called after that in PigOutputCommitter failed to store schema due to network outage. abortJob was then called and the StoreFunc.cleanupOnFailure method in it deleted the output directory. Downstream jobs that started because of _SUCCESS file ran with empty data Possible solutions: 1) Move storeCleanup before commit. Found that order was reversed in https://issues.apache.org/jira/browse/PIG-2642, probably due to FileOutputCommitter version 1 and might not be a problem with FileOutputCommitter version 2. This would still not help when there are multiple outputs as main problem is cleanupOnFailure in abortJob deleting directories. 2) We can change cleanupOnFailure not delete output directories. It still does not help. The Oozie action retry might kick in and delete the directory while the downstream has already started running because of the _SUCCESS file. 3) It cannot be done in the OutputCommitter at all as multiple output committers are called in parallel in Tez. We can have Pig suppress _SUCCESS creation and try creating them all at the end in TezLauncher if job has succeeded before calling cleanupOnSuccess. Can probably add it as a configurable setting and turn on by default in our clusters. This is probably the possible solution Thank you [~rohini] for finding out the issue and providing solution. was: One of the users ran into issues because _SUCCESS file was created by FileOutputCommitter.commitJob() and storeCleanup() called after that in PigOutputCommitter failed to store schema due to network outage. abortJob was then called and the StoreFunc.cleanupOnFailure method in it deleted the output directory. Downstream jobs that started because of _SUCCESS file ran with empty data Possible solutions: 1) Move storeCleanup before commit. Found that order was reversed in https://issues.apache.org/jira/browse/PIG-2642, probably due to FileOutputCommitter version 1 and might not be a problem with FileOutputCommitter version 2. This would still not help when there are multiple outputs as main problem is cleanupOnFailure in abortJob deleting directories. 2) We can change cleanupOnFailure not delete output directories. It still does not help. The Oozie action retry might kick in and delete the directory while the downstream has already started running because of the _SUCCESS file. 3) It cannot be done in the OutputCommitter at all as multiple output committers are called in parallel in Tez. We can have Pig suppress _SUCCESS creation and try creating them all at the end in TezLauncher if job has succeeded before calling cleanupOnSuccess. Can probably add it as a configurable setting and turn on by default in our clusters. This is probably the possible solution > _SUCCESS file should be created at the end of the job > - > > Key: PIG-5273 > URL: https://issues.apache.org/jira/browse/PIG-5273 > Project: Pig > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > > One of the users ran into issues because _SUCCESS file was created by > FileOutputCommitter.commitJob() and storeCleanup() called after that in > PigOutputCommitter failed to store schema due to network outage. abortJob was > then called and the StoreFunc.cleanupOnFailure method in it deleted the > output directory. Downstream jobs that started because of _SUCCESS file ran > with empty data > Possible solutions: > 1) Move storeCleanup before commit. Found that order was reversed in > https://issues.apache.org/jira/browse/PIG-2642, probably due to > FileOutputCommitter version 1 and might not be a problem with > FileOutputCommitter version 2. This would still not help when there are > multiple outputs as main problem is cleanupOnFailure in abortJob deleting > directories. > 2) We can change cleanupOnFailure not delete output directories. It still > does not help. The Oozie action retry might kick in and delete the directory > while the downstream has already started running because of the _SUCCESS > file. > 3) It cannot be done in the OutputCommitter at all as multiple output > committers are called in parallel in Tez. We can have Pig suppress _SUCCESS > creation and try creating them all at the end in TezLauncher if job has > succeeded before calling cleanupOnSuccess. Can probably add it as a > configurable setting and turn on by default in our clusters. This is probably > the possible solution > Thank you [~rohini] for finding out the issue and providing solution. -- This message was
[jira] [Created] (PIG-5273) _SUCCESS file should be created at the end of the job
Satish Subhashrao Saley created PIG-5273: Summary: _SUCCESS file should be created at the end of the job Key: PIG-5273 URL: https://issues.apache.org/jira/browse/PIG-5273 Project: Pig Issue Type: Bug Reporter: Satish Subhashrao Saley Assignee: Satish Subhashrao Saley One of the users ran into issues because _SUCCESS file was created by FileOutputCommitter.commitJob() and storeCleanup() called after that in PigOutputCommitter failed to store schema due to network outage. abortJob was then called and the StoreFunc.cleanupOnFailure method in it deleted the output directory. Downstream jobs that started because of _SUCCESS file ran with empty data Possible solutions: 1) Move storeCleanup before commit. Found that order was reversed in https://issues.apache.org/jira/browse/PIG-2642, probably due to FileOutputCommitter version 1 and might not be a problem with FileOutputCommitter version 2. This would still not help when there are multiple outputs as main problem is cleanupOnFailure in abortJob deleting directories. 2) We can change cleanupOnFailure not delete output directories. It still does not help. The Oozie action retry might kick in and delete the directory while the downstream has already started running because of the _SUCCESS file. 3) It cannot be done in the OutputCommitter at all as multiple output committers are called in parallel in Tez. We can have Pig suppress _SUCCESS creation and try creating them all at the end in TezLauncher if job has succeeded before calling cleanupOnSuccess. Can probably add it as a configurable setting and turn on by default in our clusters. This is probably the possible solution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path
[ https://issues.apache.org/jira/browse/PIG-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087938#comment-16087938 ] Anthony Hsu commented on PIG-4767: -- No problem, [~knoguchi]. Thanks for the fix! > Partition filter not pushed down when filter clause references variable from > another load path > -- > > Key: PIG-4767 > URL: https://issues.apache.org/jira/browse/PIG-4767 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Anthony Hsu >Assignee: Koji Noguchi > Fix For: 0.18.0 > > Attachments: pig-4767-v01.patch > > > To reproduce: > {noformat:title=test.pig} > a = load 'a.txt'; > a_group = group a all; > a_count = foreach a_group generate COUNT(a) as count; > b = load 'mytable' using org.apache.hcatalog.pig.HCatLoader(); > b = filter b by datepartition == '2015-09-01-00' and foo == a_count.count; > dump b; > {noformat} > The above query ends up reading all the table partitions. If you remove the > {{foo == a_count.count}} clause or replace {{a_count.count}} with a constant, > then partition filtering happens properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)
[ https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5271: -- Attachment: pig-5271-v02.patch Only updating a comment in the test from the previous patch. > StackOverflowError when compiling in Tez mode (with union and replicated join) > -- > > Key: PIG-5271 > URL: https://issues.apache.org/jira/browse/PIG-5271 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5271-v01.patch, pig-5271-v02.patch > > > Sample script > {code} > a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float); > a4_1 = filter a4 by gpa is null or gpa >= 3.9; > a4_2 = filter a4 by gpa < 1; > b4 = union a4_1, a4_2; > b4_1 = filter b4 by age < 30; > b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa; > c4 = load 'voternulltab10k' as (name, age, registration, contributions); > d4 = join b4_2 by name, c4 by name using 'replicated'; > e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, > registration, contributions; > f4 = order e4 by name, age DESC; > store f4 into 'tmp_table_4' ; > a5_1 = filter a4 by gpa is null or gpa <= 3.9; > a5_2 = filter a4 by gpa < 2; > b5 = union a5_1, a5_2; > d5 = join c4 by name, b5 by name using 'replicated'; > store d5 into 'tmp_table_5' ; > {code} > This script fails to compile with StackOverflowError. > {noformat} > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. null > java.lang.StackOverflowError > at java.lang.reflect.Constructor.newInstance(Constructor.java:415) > at java.lang.Class.newInstance(Class.java:442) > at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5272) BagToString Output Schema
[ https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua Juen updated PIG-5272: - Description: The output schema from BagToTuple is nonsensical causing problems using the tuple later in the same script. For example: Given a bag: { data:chararray }, calling BagToTuple yields the schema: ( data:chararray ) But, this makes no sense since if the above bag contains: {data1, data2, data3} entries, the output tuple from BagToTuple will be: (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the declared output schema from the UDF. Unfortunately, the schema of the tuple cannot be known during the initial validation phase. Thus, I believe the output schema from the UDF should be modified to be type tuple without the number of fields being fixed to the number of columns in the input bag. Under the current way, the elements in the tuple cannot be accessed in the script after calling BagToTuple without getting an incompatible type error. We have modified the UDF in our internal UDF jars to work around the issue. Let me know if this sounds reasonable and I can generate the patch. was: The output schema from BagToTuple is nonsensical causing problems using the tuple later in the same script. For example: Given a bag: { data:chararray }, calling BagToTuple yields the schema: ( data:chararray ) But, this makes no sense since if the above bag contains: {data1, data2, data3} entries, the output tuple from BagToTuple will be: (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the declared output schema from the UDF. Unfortunately, the schema of the tuple cannot be known during the initial validation phase. Thus, I believe the output schema from the UDF should be modified to be type tuple without the number of fields being fixed to the number of columns in the input bag. Under the current way, the elements in the tuple cannot be accessed in the script after calling BagToTuple without getting an incompatible type error. > BagToString Output Schema > - > > Key: PIG-5272 > URL: https://issues.apache.org/jira/browse/PIG-5272 > Project: Pig > Issue Type: Improvement >Reporter: Joshua Juen >Priority: Minor > > The output schema from BagToTuple is nonsensical causing problems using the > tuple later in the same script. > For example: Given a bag: { data:chararray }, calling BagToTuple yields the > schema: ( data:chararray ) > But, this makes no sense since if the above bag contains: {data1, data2, > data3} entries, the output tuple from BagToTuple will be: > (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the > declared output schema from the UDF. > Unfortunately, the schema of the tuple cannot be known during the initial > validation phase. Thus, I believe the output schema from the UDF should be > modified to be type tuple without the number of fields being fixed to the > number of columns in the input bag. > Under the current way, the elements in the tuple cannot be accessed in the > script after calling BagToTuple without getting an incompatible type error. > We have modified the UDF in our internal UDF jars to work around the issue. > Let me know if this sounds reasonable and I can generate the patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIG-5272) BagToString Output Schema
Joshua Juen created PIG-5272: Summary: BagToString Output Schema Key: PIG-5272 URL: https://issues.apache.org/jira/browse/PIG-5272 Project: Pig Issue Type: Improvement Reporter: Joshua Juen Priority: Minor The output schema from BagToTuple is nonsensical causing problems using the tuple later in the same script. For example: Given a bag: { data:chararray }, calling BagToTuple yields the schema: ( data:chararray ) But, this makes no sense since if the above bag contains: {data1, data2, data3} entries, the output tuple from BagToTuple will be: (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the declared output schema from the UDF. Unfortunately, the schema of the tuple cannot be known during the initial validation phase. Thus, I believe the output schema from the UDF should be modified to be type tuple without the number of fields being fixed to the number of columns in the input bag. Under the current way, the elements in the tuple cannot be accessed in the script after calling BagToTuple without getting an incompatible type error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
[ https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087361#comment-16087361 ] BELUGA BEHR commented on PIG-5268: -- [~rohini] [~nkollar] I understand that these trivial changes can be burdensome for the gain, but attracting more developers can sometimes hinge on how beat-up the code is. Additionally someone who is evaluating the code base to build a product on, they may be turned off by dead code, bad formatting, and bad code re-use. I know I would question it. I would question how much care was put into a product. So, I don't think there's anything wrong with putting some polish and shine on a project used by many production systems. My goal would be to, over time, mold the project into a better direction that people open up and are excited by. These are marked appropriately as trivial. Spare cycles can be dedicated to some cleanup. I have a customer interested in using Pig, so I was just starting to get a feel for the project by submitting a couple of trivial patches. I don't know how helpful I'll be on full feature requests, but I can review the open tickets. Thanks for considering this patch. > Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage > > > Key: PIG-5268 > URL: https://issues.apache.org/jira/browse/PIG-5268 > Project: Pig > Issue Type: Improvement > Components: data >Affects Versions: 0.17.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: PIG-5268.1.patch, PIG-5268.2.patch > > > # Optimize for case where {{asCollection}} is empty > # Tidy up -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
[ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087189#comment-16087189 ] Nandor Kollar commented on PIG-5246: [~kellyzly] does this mean, that the problem you mentioned in PIG-5157 with the basic script skew join is now fixed? > Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 > -- > > Key: PIG-5246 > URL: https://issues.apache.org/jira/browse/PIG-5246 > Project: Pig > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, > PIG-5246.3.patch, PIG-5246_4.patch, PIG-5246.patch > > > in bin/pig. > we copy assembly jar to pig's classpath in spark1.6. > {code} > # For spark mode: > # Please specify SPARK_HOME first so that we can locate > $SPARK_HOME/lib/spark-assembly*.jar, > # we will add spark-assembly*.jar to the classpath. > if [ "$isSparkMode" == "true" ]; then > if [ -z "$SPARK_HOME" ]; then >echo "Error: SPARK_HOME is not set!" >exit 1 > fi > # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar > to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need > to be distributed each time an application runs. > if [ -z "$SPARK_JAR" ]; then >echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs > location of spark-assembly*.jar. This allows YARN to cache > spark-assembly*.jar on nodes so that it doesn't need to be distributed each > time an application runs." >exit 1 > fi > if [ -n "$SPARK_HOME" ]; then > echo "Using Spark Home: " ${SPARK_HOME} > SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*` > CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR > fi > fi > {code} > after upgrade to spark2.0, we may modify it -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
[ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated PIG-5246: -- Attachment: PIG-5246_4.patch [~nkollar]: changes in PIG-5246_4.patch: {code} CLASSPATH=${CLASSPATH}:${SPARK_HOME}/lib/spark-assembly* {code} to {code} SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*` CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR {code} can not use wildcard to locate the spark_assembly_jar. After all unit tests pass on my local jenkins. will close PIG-5157 > Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 > -- > > Key: PIG-5246 > URL: https://issues.apache.org/jira/browse/PIG-5246 > Project: Pig > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, > PIG-5246.3.patch, PIG-5246_4.patch, PIG-5246.patch > > > in bin/pig. > we copy assembly jar to pig's classpath in spark1.6. > {code} > # For spark mode: > # Please specify SPARK_HOME first so that we can locate > $SPARK_HOME/lib/spark-assembly*.jar, > # we will add spark-assembly*.jar to the classpath. > if [ "$isSparkMode" == "true" ]; then > if [ -z "$SPARK_HOME" ]; then >echo "Error: SPARK_HOME is not set!" >exit 1 > fi > # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar > to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need > to be distributed each time an application runs. > if [ -z "$SPARK_JAR" ]; then >echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs > location of spark-assembly*.jar. This allows YARN to cache > spark-assembly*.jar on nodes so that it doesn't need to be distributed each > time an application runs." >exit 1 > fi > if [ -n "$SPARK_HOME" ]; then > echo "Using Spark Home: " ${SPARK_HOME} > SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*` > CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR > fi > fi > {code} > after upgrade to spark2.0, we may modify it -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (35 issues) Subscriber: pigdaily Key Summary PIG-5268Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage https://issues.apache.org/jira/browse/PIG-5268 PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream https://issues.apache.org/jira/browse/PIG-5267 PIG-5264Remove deprecated keys from PigConfiguration https://issues.apache.org/jira/browse/PIG-5264 PIG-5246Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 https://issues.apache.org/jira/browse/PIG-5246 PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown NPE in multithread env https://issues.apache.org/jira/browse/PIG-5160 PIG-5157Upgrade to Spark 2.0 https://issues.apache.org/jira/browse/PIG-5157 PIG-5115Builtin AvroStorage generates incorrect avro schema when the same pig field name appears in the alias https://issues.apache.org/jira/browse/PIG-5115 PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true https://issues.apache.org/jira/browse/PIG-5106 PIG-5081Can not run pig on spark source code distribution https://issues.apache.org/jira/browse/PIG-5081 PIG-5080Support store alias as spark table https://issues.apache.org/jira/browse/PIG-5080 PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput https://issues.apache.org/jira/browse/PIG-5057 PIG-5029Optimize sort case when data is skewed https://issues.apache.org/jira/browse/PIG-5029 PIG-4926Modify the content of start.xml for spark mode https://issues.apache.org/jira/browse/PIG-4926 PIG-4913Reduce jython function initiation during compilation https://issues.apache.org/jira/browse/PIG-4913 PIG-4849pig on tez will cause tez-ui to crash,because the content from timeline server is too long. https://issues.apache.org/jira/browse/PIG-4849 PIG-4750REPLACE_MULTI should compile Pattern once and reuse it https://issues.apache.org/jira/browse/PIG-4750 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3655BinStorage and InterStorage approach to record markers is broken https://issues.apache.org/jira/browse/PIG-3655 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-1804Alow Jython function to implement Algebraic and/or Accumulator interfaces https://issues.apache.org/jira/browse/PIG-1804 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384