date:20170714

[jira] [Updated] (PIG-5273) _SUCCESS file should be created at the end of the job

2017-07-14 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5273:
-
Description: 
One of the users ran into issues because _SUCCESS file was created by 
FileOutputCommitter.commitJob() and storeCleanup() called after that in 
PigOutputCommitter failed to store schema due to network outage. abortJob was 
then called and the StoreFunc.cleanupOnFailure method in it deleted the output 
directory. Downstream jobs that started because of _SUCCESS file ran with empty 
data 
Possible solutions:
1) Move storeCleanup before commit. Found that order was reversed in 
https://issues.apache.org/jira/browse/PIG-2642, probably due to 
FileOutputCommitter version 1 and might not be a problem with 
FileOutputCommitter version 2. This would still not help when there are 
multiple outputs as main problem is cleanupOnFailure in abortJob deleting 
directories.
2) We can change cleanupOnFailure not delete output directories. It still does 
not help. The Oozie action retry might kick in and delete the directory while 
the downstream has already started running because of the _SUCCESS file. 
3) It cannot be done in the OutputCommitter at all as multiple output 
committers are called in parallel in Tez. We can have Pig suppress _SUCCESS 
creation and try creating them all at the end in TezLauncher if job has 
succeeded before calling cleanupOnSuccess. Can probably add it as a 
configurable setting and turn on by default in our clusters. This is probably 
the possible solution

Thank you [~rohini] for finding out the issue and providing solution.

  was:
One of the users ran into issues because _SUCCESS file was created by 
FileOutputCommitter.commitJob() and storeCleanup() called after that in 
PigOutputCommitter failed to store schema due to network outage. abortJob was 
then called and the StoreFunc.cleanupOnFailure method in it deleted the output 
directory. Downstream jobs that started because of _SUCCESS file ran with empty 
data 
Possible solutions:
1) Move storeCleanup before commit. Found that order was reversed in 
https://issues.apache.org/jira/browse/PIG-2642, probably due to 
FileOutputCommitter version 1 and might not be a problem with 
FileOutputCommitter version 2. This would still not help when there are 
multiple outputs as main problem is cleanupOnFailure in abortJob deleting 
directories.
2) We can change cleanupOnFailure not delete output directories. It still does 
not help. The Oozie action retry might kick in and delete the directory while 
the downstream has already started running because of the _SUCCESS file. 
3) It cannot be done in the OutputCommitter at all as multiple output 
committers are called in parallel in Tez. We can have Pig suppress _SUCCESS 
creation and try creating them all at the end in TezLauncher if job has 
succeeded before calling cleanupOnSuccess. Can probably add it as a 
configurable setting and turn on by default in our clusters. This is probably 
the possible solution


> _SUCCESS file should be created at the end of the job
> -
>
> Key: PIG-5273
> URL: https://issues.apache.org/jira/browse/PIG-5273
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>
> One of the users ran into issues because _SUCCESS file was created by 
> FileOutputCommitter.commitJob() and storeCleanup() called after that in 
> PigOutputCommitter failed to store schema due to network outage. abortJob was 
> then called and the StoreFunc.cleanupOnFailure method in it deleted the 
> output directory. Downstream jobs that started because of _SUCCESS file ran 
> with empty data 
> Possible solutions:
> 1) Move storeCleanup before commit. Found that order was reversed in 
> https://issues.apache.org/jira/browse/PIG-2642, probably due to 
> FileOutputCommitter version 1 and might not be a problem with 
> FileOutputCommitter version 2. This would still not help when there are 
> multiple outputs as main problem is cleanupOnFailure in abortJob deleting 
> directories.
> 2) We can change cleanupOnFailure not delete output directories. It still 
> does not help. The Oozie action retry might kick in and delete the directory 
> while the downstream has already started running because of the _SUCCESS 
> file. 
> 3) It cannot be done in the OutputCommitter at all as multiple output 
> committers are called in parallel in Tez. We can have Pig suppress _SUCCESS 
> creation and try creating them all at the end in TezLauncher if job has 
> succeeded before calling cleanupOnSuccess. Can probably add it as a 
> configurable setting and turn on by default in our clusters. This is probably 
> the possible solution
> Thank you [~rohini] for finding out the issue and providing solution.



--
This message was

[jira] [Created] (PIG-5273) _SUCCESS file should be created at the end of the job

2017-07-14 Thread Satish Subhashrao Saley (JIRA)

Satish Subhashrao Saley created PIG-5273:


 Summary: _SUCCESS file should be created at the end of the job
 Key: PIG-5273
 URL: https://issues.apache.org/jira/browse/PIG-5273
 Project: Pig
  Issue Type: Bug
Reporter: Satish Subhashrao Saley
Assignee: Satish Subhashrao Saley


One of the users ran into issues because _SUCCESS file was created by 
FileOutputCommitter.commitJob() and storeCleanup() called after that in 
PigOutputCommitter failed to store schema due to network outage. abortJob was 
then called and the StoreFunc.cleanupOnFailure method in it deleted the output 
directory. Downstream jobs that started because of _SUCCESS file ran with empty 
data 
Possible solutions:
1) Move storeCleanup before commit. Found that order was reversed in 
https://issues.apache.org/jira/browse/PIG-2642, probably due to 
FileOutputCommitter version 1 and might not be a problem with 
FileOutputCommitter version 2. This would still not help when there are 
multiple outputs as main problem is cleanupOnFailure in abortJob deleting 
directories.
2) We can change cleanupOnFailure not delete output directories. It still does 
not help. The Oozie action retry might kick in and delete the directory while 
the downstream has already started running because of the _SUCCESS file. 
3) It cannot be done in the OutputCommitter at all as multiple output 
committers are called in parallel in Tez. We can have Pig suppress _SUCCESS 
creation and try creating them all at the end in TezLauncher if job has 
succeeded before calling cleanupOnSuccess. Can probably add it as a 
configurable setting and turn on by default in our clusters. This is probably 
the possible solution



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path

2017-07-14 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087938#comment-16087938
 ] 

Anthony Hsu commented on PIG-4767:
--

No problem, [~knoguchi]. Thanks for the fix!

> Partition filter not pushed down when filter clause references variable from 
> another load path
> --
>
> Key: PIG-4767
> URL: https://issues.apache.org/jira/browse/PIG-4767
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Anthony Hsu
>Assignee: Koji Noguchi
> Fix For: 0.18.0
>
> Attachments: pig-4767-v01.patch
>
>
> To reproduce:
> {noformat:title=test.pig}
> a = load 'a.txt';
> a_group = group a all;
> a_count = foreach a_group generate COUNT(a) as count;
> b = load 'mytable' using org.apache.hcatalog.pig.HCatLoader();
> b = filter b by datepartition == '2015-09-01-00' and foo == a_count.count;
> dump b;
> {noformat}
> The above query ends up reading all the table partitions. If you remove the 
> {{foo == a_count.count}} clause or replace {{a_count.count}} with a constant, 
> then partition filtering happens properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)

2017-07-14 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5271:
--
Attachment: pig-5271-v02.patch

Only updating a comment in the test from the previous patch.

> StackOverflowError when compiling in Tez mode (with union and replicated join)
> --
>
> Key: PIG-5271
> URL: https://issues.apache.org/jira/browse/PIG-5271
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5271-v01.patch, pig-5271-v02.patch
>
>
> Sample script
> {code}
> a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float);
> a4_1 = filter a4 by gpa is null or gpa >= 3.9;
> a4_2 = filter a4 by gpa < 1;
> b4 = union a4_1, a4_2;
> b4_1 = filter b4 by age < 30;
> b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa;
> c4 = load 'voternulltab10k' as (name, age, registration, contributions);
> d4 = join b4_2 by name, c4 by name using 'replicated';
> e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, 
> registration, contributions;
> f4 = order e4 by name, age DESC;
> store f4 into 'tmp_table_4' ;
> a5_1 = filter a4 by gpa is null or gpa <= 3.9;
> a5_2 = filter a4 by gpa < 2;
> b5 = union a5_1, a5_2;
> d5 = join c4 by name, b5 by name using 'replicated';
> store d5 into 'tmp_table_5' ;
> {code}
> This script fails to compile with StackOverflowError.
> {noformat}
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError
> at java.lang.reflect.Constructor.newInstance(Constructor.java:415)
> at java.lang.Class.newInstance(Class.java:442)
> at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5272) BagToString Output Schema

2017-07-14 Thread Joshua Juen (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Juen updated PIG-5272:
-
Description: 
The output schema from BagToTuple is nonsensical causing problems using the 
tuple later in the same script. 

For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
schema: ( data:chararray )

But, this makes no sense since if the above bag contains: {data1, data2, data3} 
entries, the output tuple from BagToTuple will be:
(data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
declared output schema from the UDF.

Unfortunately, the schema of the tuple cannot be known during the initial 
validation phase. Thus, I believe the output schema from the UDF should be 
modified to be type tuple without the number of fields being fixed to the 
number of columns in the input bag. 

Under the current way, the elements in the tuple cannot be accessed in the 
script after calling BagToTuple without getting an incompatible type error. We 
have modified the UDF in our internal UDF jars to work around the issue. Let me 
know if this sounds reasonable and I can generate the patch.

  was:
The output schema from BagToTuple is nonsensical causing problems using the 
tuple later in the same script. 

For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
schema: ( data:chararray )

But, this makes no sense since if the above bag contains: {data1, data2, data3} 
entries, the output tuple from BagToTuple will be:
(data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
declared output schema from the UDF.

Unfortunately, the schema of the tuple cannot be known during the initial 
validation phase. Thus, I believe the output schema from the UDF should be 
modified to be type tuple without the number of fields being fixed to the 
number of columns in the input bag. 

Under the current way, the elements in the tuple cannot be accessed in the 
script after calling BagToTuple without getting an incompatible type error.


> BagToString Output Schema
> -
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Reporter: Joshua Juen
>Priority: Minor
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIG-5272) BagToString Output Schema

2017-07-14 Thread Joshua Juen (JIRA)

Joshua Juen created PIG-5272:


 Summary: BagToString Output Schema
 Key: PIG-5272
 URL: https://issues.apache.org/jira/browse/PIG-5272
 Project: Pig
  Issue Type: Improvement
Reporter: Joshua Juen
Priority: Minor


The output schema from BagToTuple is nonsensical causing problems using the 
tuple later in the same script. 

For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
schema: ( data:chararray )

But, this makes no sense since if the above bag contains: {data1, data2, data3} 
entries, the output tuple from BagToTuple will be:
(data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
declared output schema from the UDF.

Unfortunately, the schema of the tuple cannot be known during the initial 
validation phase. Thus, I believe the output schema from the UDF should be 
modified to be type tuple without the number of fields being fixed to the 
number of columns in the input bag. 

Under the current way, the elements in the tuple cannot be accessed in the 
script after calling BagToTuple without getting an incompatible type error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage

2017-07-14 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087361#comment-16087361
 ] 

BELUGA BEHR commented on PIG-5268:
--

[~rohini] [~nkollar]

I understand that these trivial changes can be burdensome for the gain, but 
attracting more developers can sometimes hinge on how beat-up the code is.  
Additionally someone who is evaluating the code base to build a product on, 
they may be turned off by dead code, bad formatting, and bad code re-use.  I 
know I would question it.  I would question how much care was put into a 
product.

So, I don't think there's anything wrong with putting some polish and shine on 
a project used by many production systems.  My goal would be to, over time, 
mold the project into a better direction that people open up and are excited 
by.  These are marked appropriately as trivial.  Spare cycles can be dedicated 
to some cleanup.

I have a customer interested in using Pig, so I was just starting to get a feel 
for the project by submitting a couple of trivial patches. I don't know how 
helpful I'll be on full feature requests, but I can review the open tickets.

Thanks for considering this patch.

> Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
> 
>
> Key: PIG-5268
> URL: https://issues.apache.org/jira/browse/PIG-5268
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.17.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: PIG-5268.1.patch, PIG-5268.2.patch
>
>
> # Optimize for case where {{asCollection}} is empty
> # Tidy up



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2

2017-07-14 Thread Nandor Kollar (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087189#comment-16087189
 ] 

Nandor Kollar commented on PIG-5246:


[~kellyzly] does this mean, that the problem you mentioned in PIG-5157 with the 
basic script skew join is now fixed?

> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> --
>
> Key: PIG-5246
> URL: https://issues.apache.org/jira/browse/PIG-5246
> Project: Pig
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, 
> PIG-5246.3.patch, PIG-5246_4.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate 
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
> if [ -z "$SPARK_HOME" ]; then
>echo "Error: SPARK_HOME is not set!"
>exit 1
> fi
> # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar 
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need 
> to be distributed each time an application runs.
> if [ -z "$SPARK_JAR" ]; then
>echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs 
> location of spark-assembly*.jar. This allows YARN to cache 
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each 
> time an application runs."
>exit 1
> fi
> if [ -n "$SPARK_HOME" ]; then
> echo "Using Spark Home: " ${SPARK_HOME}
> SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
> CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
> fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2

2017-07-14 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-5246:
--
Attachment: PIG-5246_4.patch

[~nkollar]:
changes in PIG-5246_4.patch:
{code}
CLASSPATH=${CLASSPATH}:${SPARK_HOME}/lib/spark-assembly*
{code}

to 
{code}
   SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
   CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR

{code}

can not use wildcard to locate the spark_assembly_jar.
After all unit tests pass on my local jenkins.  will close PIG-5157

> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> --
>
> Key: PIG-5246
> URL: https://issues.apache.org/jira/browse/PIG-5246
> Project: Pig
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, 
> PIG-5246.3.patch, PIG-5246_4.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate 
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
> if [ -z "$SPARK_HOME" ]; then
>echo "Error: SPARK_HOME is not set!"
>exit 1
> fi
> # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar 
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need 
> to be distributed each time an application runs.
> if [ -z "$SPARK_JAR" ]; then
>echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs 
> location of spark-assembly*.jar. This allows YARN to cache 
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each 
> time an application runs."
>exit 1
> fi
> if [ -n "$SPARK_HOME" ]; then
> echo "Using Spark Home: " ${SPARK_HOME}
> SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
> CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
> fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] Subscription: PIG patch available

2017-07-14 Thread jira

Issue Subscription
Filter: PIG patch available (35 issues)

Subscriber: pigdaily

Key Summary
PIG-5268Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
https://issues.apache.org/jira/browse/PIG-5268
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5264Remove deprecated keys from PigConfiguration
https://issues.apache.org/jira/browse/PIG-5264
PIG-5246Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading 
spark to 2
https://issues.apache.org/jira/browse/PIG-5246
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5157Upgrade to Spark 2.0
https://issues.apache.org/jira/browse/PIG-5157
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3655BinStorage and InterStorage approach to record markers is broken
https://issues.apache.org/jira/browse/PIG-3655
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384

[jira] [Updated] (PIG-5273) _SUCCESS file should be created at the end of the job

[jira] [Created] (PIG-5273) _SUCCESS file should be created at the end of the job

[jira] [Commented] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path

[jira] [Updated] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)

[jira] [Updated] (PIG-5272) BagToString Output Schema

[jira] [Created] (PIG-5272) BagToString Output Schema

[jira] [Commented] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage

[jira] [Commented] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2

[jira] [Updated] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2

[jira] Subscription: PIG patch available

10 matches

Site Navigation

Mail list logo

Footer information