[jira] [Commented] (PIG-5387) Test failures on JRE 11

2019-05-20 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843746#comment-16843746
 ] 

Adam Szita commented on PIG-5387:
-

Thanks Nandor, +1 for [^PIG-5387_3.patch]. [~rohini], [~knoguchi] any 
objections?

> Test failures on JRE 11
> ---
>
> Key: PIG-5387
> URL: https://issues.apache.org/jira/browse/PIG-5387
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Attachments: PIG-5387_1.patch, PIG-5387_2.patch, PIG-5387_3.patch
>
>
> I tried to compile Pig with JDK 8 and execute the test with Java 11, and 
> faced with several test failures. For example TestCommit#testCheckin2 failed 
> with the following exception:
> {code}
> 2019-05-08 16:06:09,712 WARN  [Thread-108] mapred.LocalJobRunner 
> (LocalJobRunner.java:run(590)) - job_local1000317333_0003
> java.lang.Exception: java.io.IOException: Deserialization error: null
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
> Caused by: java.io.IOException: Deserialization error: null
>   at 
> org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:62)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:183)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at org.apache.pig.impl.plan.Operator.hashCode(Operator.java:106)
>   at java.base/java.util.HashMap.hash(HashMap.java:339)
>   at java.base/java.util.HashMap.readObject(HashMap.java:1461)
>   at 
> java.base/jdk.internal.reflect.GeneratedMethodAccessor12.invoke(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
> {code}
> It deserialization of one of the map plan failed, it appears we ran into 
> [JDK-8201131|https://bugs.openjdk.java.net/browse/JDK-8201131]. I seems that 
> the workaround in the issue report works, adding a readObject method to 
> org.apache.pig.impl.plan.Operator:
> {code}
> private void readObject(ObjectInputStream in) throws 
> ClassNotFoundException, IOException {
> in.defaultReadObject();
> }
> {code}
> solves the problem, however I'm not sure that this is the optimal solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5387) Test failures on JRE 11

2019-05-16 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841368#comment-16841368
 ] 

Adam Szita commented on PIG-5387:
-

So as far as I understand:
 * in the testing part you are injecting an URLClassLoader if we're running 
java11. Quite hacky but not worse than the current implementation which uses 
reflection to invoke addURL on URLClassLoader;
 * in Operator class I think using readObject like this will be fine, but 
please add javadoc or comments to the readObject method that includes some 
reference to the JVM bug. We don't want ppl to remove it by mistake :)

Having this in mind [^PIG-5387_2.patch] looks good to me.

> Test failures on JRE 11
> ---
>
> Key: PIG-5387
> URL: https://issues.apache.org/jira/browse/PIG-5387
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Attachments: PIG-5387_1.patch, PIG-5387_2.patch
>
>
> I tried to compile Pig with JDK 8 and execute the test with Java 11, and 
> faced with several test failures. For example TestCommit#testCheckin2 failed 
> with the following exception:
> {code}
> 2019-05-08 16:06:09,712 WARN  [Thread-108] mapred.LocalJobRunner 
> (LocalJobRunner.java:run(590)) - job_local1000317333_0003
> java.lang.Exception: java.io.IOException: Deserialization error: null
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
> Caused by: java.io.IOException: Deserialization error: null
>   at 
> org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:62)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:183)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at org.apache.pig.impl.plan.Operator.hashCode(Operator.java:106)
>   at java.base/java.util.HashMap.hash(HashMap.java:339)
>   at java.base/java.util.HashMap.readObject(HashMap.java:1461)
>   at 
> java.base/jdk.internal.reflect.GeneratedMethodAccessor12.invoke(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
> {code}
> It deserialization of one of the map plan failed, it appears we ran into 
> [JDK-8201131|https://bugs.openjdk.java.net/browse/JDK-8201131]. I seems that 
> the workaround in the issue report works, adding a readObject method to 
> org.apache.pig.impl.plan.Operator:
> {code}
> private void readObject(ObjectInputStream in) throws 
> ClassNotFoundException, IOException {
> in.defaultReadObject();
> }
> {code}
> solves the problem, however I'm not sure that this is the optimal solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PIG-5387) Test failures on JRE 11

2019-05-16 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita reassigned PIG-5387:
---

Assignee: Nandor Kollar

> Test failures on JRE 11
> ---
>
> Key: PIG-5387
> URL: https://issues.apache.org/jira/browse/PIG-5387
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Attachments: PIG-5387_1.patch, PIG-5387_2.patch
>
>
> I tried to compile Pig with JDK 8 and execute the test with Java 11, and 
> faced with several test failures. For example TestCommit#testCheckin2 failed 
> with the following exception:
> {code}
> 2019-05-08 16:06:09,712 WARN  [Thread-108] mapred.LocalJobRunner 
> (LocalJobRunner.java:run(590)) - job_local1000317333_0003
> java.lang.Exception: java.io.IOException: Deserialization error: null
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
> Caused by: java.io.IOException: Deserialization error: null
>   at 
> org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:62)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:183)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at org.apache.pig.impl.plan.Operator.hashCode(Operator.java:106)
>   at java.base/java.util.HashMap.hash(HashMap.java:339)
>   at java.base/java.util.HashMap.readObject(HashMap.java:1461)
>   at 
> java.base/jdk.internal.reflect.GeneratedMethodAccessor12.invoke(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
> {code}
> It deserialization of one of the map plan failed, it appears we ran into 
> [JDK-8201131|https://bugs.openjdk.java.net/browse/JDK-8201131]. I seems that 
> the workaround in the issue report works, adding a readObject method to 
> org.apache.pig.impl.plan.Operator:
> {code}
> private void readObject(ObjectInputStream in) throws 
> ClassNotFoundException, IOException {
> in.defaultReadObject();
> }
> {code}
> solves the problem, however I'm not sure that this is the optimal solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5374) Use CircularFifoBuffer in InterRecordReader

2019-01-08 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5374:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Use CircularFifoBuffer in InterRecordReader
> ---
>
> Key: PIG-5374
> URL: https://issues.apache.org/jira/browse/PIG-5374
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5374.0.patch
>
>
> We're currently using CircularFifoQueue in InterRecordReader, and it comes 
> from commons-collections4 dependency. Hadoop 2.8 installations do not have 
> this dependency by default, so for now we should switch to the older 
> CircularFifoBuffer instead (which comes from commons-collections and it's 
> present).
> We should open a separate ticket for investigating what libraries should we 
> update. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5374) Use CircularFifoBuffer in InterRecordReader

2019-01-08 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736987#comment-16736987
 ] 

Adam Szita commented on PIG-5374:
-

Thanks Nandor, committed to trunk.

> Use CircularFifoBuffer in InterRecordReader
> ---
>
> Key: PIG-5374
> URL: https://issues.apache.org/jira/browse/PIG-5374
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5374.0.patch
>
>
> We're currently using CircularFifoQueue in InterRecordReader, and it comes 
> from commons-collections4 dependency. Hadoop 2.8 installations do not have 
> this dependency by default, so for now we should switch to the older 
> CircularFifoBuffer instead (which comes from commons-collections and it's 
> present).
> We should open a separate ticket for investigating what libraries should we 
> update. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5374) Use CircularFifoBuffer in InterRecordReader

2019-01-08 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5374:

Attachment: PIG-5374.0.patch

> Use CircularFifoBuffer in InterRecordReader
> ---
>
> Key: PIG-5374
> URL: https://issues.apache.org/jira/browse/PIG-5374
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5374.0.patch
>
>
> We're currently using CircularFifoQueue in InterRecordReader, and it comes 
> from commons-collections4 dependency. Hadoop 2.8 installations do not have 
> this dependency by default, so for now we should switch to the older 
> CircularFifoBuffer instead (which comes from commons-collections and it's 
> present).
> We should open a separate ticket for investigating what libraries should we 
> update. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5374) Use CircularFifoBuffer in InterRecordReader

2019-01-08 Thread Adam Szita (JIRA)
Adam Szita created PIG-5374:
---

 Summary: Use CircularFifoBuffer in InterRecordReader
 Key: PIG-5374
 URL: https://issues.apache.org/jira/browse/PIG-5374
 Project: Pig
  Issue Type: Bug
Reporter: Adam Szita
Assignee: Adam Szita


We're currently using CircularFifoQueue in InterRecordReader, and it comes from 
commons-collections4 dependency. Hadoop 2.8 installations do not have this 
dependency by default, so for now we should switch to the older 
CircularFifoBuffer instead (which comes from commons-collections and it's 
present).

We should open a separate ticket for investigating what libraries should we 
update. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5362) Parameter substitution of shell cmd results doesn't handle backslash

2019-01-07 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735845#comment-16735845
 ] 

Adam Szita commented on PIG-5362:
-

Hi [~wla...@yahoo-inc.com], is there any update on fixing the failing tests?

> Parameter substitution of shell cmd results doesn't handle backslash  
> -
>
> Key: PIG-5362
> URL: https://issues.apache.org/jira/browse/PIG-5362
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Will Lauer
>Assignee: Will Lauer
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig.patch, pig2.patch, pig3.patch, pig4.patch, 
> pig5.patch, test-failure.txt
>
>
> It looks like there is a bug in how parameter substitution is handled in 
> PreprocessorContext.java that causes parameter values that contain 
> backslashed to not be processed correctly, resulting in the backslashes being 
> lost. For example, if you had the following:
> {code:java}
> %DECLARE A `echo \$foo\\bar`
> B = LOAD $A 
> {code}
> You would expect the echo command to produce the output {{$foo\bar}} but the 
> actual value that gets substituted is {{\$foobar}}. This is happening because 
> the {{substitute}} method in PreprocessorContext.java uses a regular 
> expression replacement instead of a basic string substitution and $ and \ are 
> special characters. The code attempts to escape $, but does not escape 
> backslash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5373.0.patch, PIG-5373.1.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733171#comment-16733171
 ] 

Adam Szita commented on PIG-5373:
-

Committed to trunk, thanks a lot for reviewing Nandor!

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch, PIG-5373.1.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Attachment: PIG-5373.1.patch

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch, PIG-5373.1.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Attachment: (was: PIG-5373.1.patch)

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732852#comment-16732852
 ] 

Adam Szita commented on PIG-5373:
-

Thanks for taking a look [~nkollar], I've uploaded a new patch that uses 
CircularFifoQueue from commons-collections4.

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch, PIG-5373.1.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2019-01-03 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Attachment: PIG-5373.1.patch

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch, PIG-5373.1.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-23 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728019#comment-16728019
 ] 

Adam Szita commented on PIG-5373:
-

[~rohini], right, it is not even released yet, so I just leave it blank then

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-23 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Affects Version/s: (was: 0.17.0)

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PIG-5371) Hdfs bytes written assertions fail in TestPigRunner

2018-12-20 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725861#comment-16725861
 ] 

Adam Szita edited comment on PIG-5371 at 12/20/18 1:56 PM:
---

Hi [~abstractdog],

Yeah sorry that's a typo, indeed -Dtestcase should be used.

The test doesn't hang on my side, it finishes successfully in 9 minutes.
 Do you see which test method completes on your side and which one doesn't?

In the past when I faced with the hanging issue was due to my Mac's HDD had 
over 90% utilisation which some HDFS code in MiniCluster did not like


was (Author: szita):
Hi [~abstractdog],

Yeah sorry that's a typo, indeed -Dtestcase should be used.

The doesn't hang on my side, it finishes successfully in 9 minutes.
Do you see which test method completes on your side and which one doesn't?

In the past when I faced with the hanging issue was due to my Mac's HDD had 
over 90% utilisation which some HDFS code in MiniCluster did not like

> Hdfs bytes written assertions fail in TestPigRunner
> ---
>
> Key: PIG-5371
> URL: https://issues.apache.org/jira/browse/PIG-5371
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5371.01.patch, simpleTest.out
>
>
> Attached  [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' 
> returns the byte count not only for the result of pig store operator, but it 
> includes the size of the jar files as well. The problem is this could change 
> very easily, so in my opinion the best would be to remove these assertions 
> from TestPigRunner as this is just causing intermittent and/or persistent 
> failures.
> The test class is for basic testing of PigRunner, and this is achieved well 
> enough without the asserts.
> {code}
> 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO  
> org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, 
> replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for 
> /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar
> ...
> 2018-11-23 10:14:52,735 [PacketResponder: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 
> 127.0.0.1:54943]] INFO  
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: 
> /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 
> 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 
> 57162859
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5371) Hdfs bytes written assertions fail in TestPigRunner

2018-12-20 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725861#comment-16725861
 ] 

Adam Szita commented on PIG-5371:
-

Hi [~abstractdog],

Yeah sorry that's a typo, indeed -Dtestcase should be used.

The doesn't hang on my side, it finishes successfully in 9 minutes.
Do you see which test method completes on your side and which one doesn't?

In the past when I faced with the hanging issue was due to my Mac's HDD had 
over 90% utilisation which some HDFS code in MiniCluster did not like

> Hdfs bytes written assertions fail in TestPigRunner
> ---
>
> Key: PIG-5371
> URL: https://issues.apache.org/jira/browse/PIG-5371
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5371.01.patch, simpleTest.out
>
>
> Attached  [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' 
> returns the byte count not only for the result of pig store operator, but it 
> includes the size of the jar files as well. The problem is this could change 
> very easily, so in my opinion the best would be to remove these assertions 
> from TestPigRunner as this is just causing intermittent and/or persistent 
> failures.
> The test class is for basic testing of PigRunner, and this is achieved well 
> enough without the asserts.
> {code}
> 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO  
> org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, 
> replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for 
> /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar
> ...
> 2018-11-23 10:14:52,735 [PacketResponder: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 
> 127.0.0.1:54943]] INFO  
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: 
> /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 
> 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 
> 57162859
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Attachment: PIG-5373.0.patch

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Status: Patch Available  (was: Open)

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725832#comment-16725832
 ] 

Adam Szita commented on PIG-5373:
-

Attached [^PIG-5373.0.patch] which corrects the reading of sync markers using a 
fifo, and compares the fifo content with the expected marker.

Test case attached, which verifies in a brute force way, that such prefix 
scenarios are handled well.

[~nkollar], [~rohini] can you take a look please?

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Comment: was deleted

(was: Attached [^PIG-5373.0.patch] which corrects the reading of sync markers 
using a fifo, and compares the fifo content with the expected marker.

Test case attached, which verifies in a brute force way, that such prefix 
scenarios are handled well.

[~nkollar], [~rohini] can you take a look please?)

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725831#comment-16725831
 ] 

Adam Szita commented on PIG-5373:
-

Attached [^PIG-5373.0.patch] which corrects the reading of sync markers using a 
fifo, and compares the fifo content with the expected marker.

Test case attached, which verifies in a brute force way, that such prefix 
scenarios are handled well.

[~nkollar], [~rohini] can you take a look please?

> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: PIG-5373.0.patch
>
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5373:

Description: 
Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can happen 
that sync markers are not identified while reading the interim binary file used 
to hold data between jobs.

In such files sync markers are placed upon writing, which later help during 
reading the data. These are random generated and it seems like that in some 
rare combinations of markers and data preceding it, they cannot be not found. 
This can result in reading through all the bytes (looking for the marker) and 
reaching split end or EOF, and extracting no records at all.

This symptom is also observable from JobHistory stats, where if a job is 
affected by this issue, will have tasks that have HDFS_BYTES_READ or 
FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
time having MAP_INPUT_RECORDS=0

One such (test) example is this:
{code:java}
marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 3]{code}
Due to a bug, such markers whose prefix overlap with the last data chunk are 
not seen by the reader.

  was:
Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can happen 
that sync markers are not identified while reading the interim binary file used 
to hold data between jobs.

In such files sync markers are placed upon writing, which later help during 
reading the data. These are random generated and it seems like that in some 
rare combinations of markers and data preceding it, they cannot be not found. 
This can result in reading through all the bytes (looking for the marker) and 
reaching split end or EOF, and extracting no records at all.

This symptom is also observable from JobHistory stats, where if a job is 
affected by this issue, will have tasks that have HDFS_BYTES_READ or 
FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
time having MAP_INPUT_RECORDS=0


> InterRecordReader might skip records if certain sync markers are used
> -
>
> Key: PIG-5373
> URL: https://issues.apache.org/jira/browse/PIG-5373
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can 
> happen that sync markers are not identified while reading the interim binary 
> file used to hold data between jobs.
> In such files sync markers are placed upon writing, which later help during 
> reading the data. These are random generated and it seems like that in some 
> rare combinations of markers and data preceding it, they cannot be not found. 
> This can result in reading through all the bytes (looking for the marker) and 
> reaching split end or EOF, and extracting no records at all.
> This symptom is also observable from JobHistory stats, where if a job is 
> affected by this issue, will have tasks that have HDFS_BYTES_READ or 
> FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
> time having MAP_INPUT_RECORDS=0
> One such (test) example is this:
> {code:java}
> marker: [-128, -128, 4] , data: [127, -1, 2, -128, -128, -128, 4, 1, 2, 
> 3]{code}
> Due to a bug, such markers whose prefix overlap with the last data chunk are 
> not seen by the reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5373) InterRecordReader might skip records if certain sync markers are used

2018-12-20 Thread Adam Szita (JIRA)
Adam Szita created PIG-5373:
---

 Summary: InterRecordReader might skip records if certain sync 
markers are used
 Key: PIG-5373
 URL: https://issues.apache.org/jira/browse/PIG-5373
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.17.0
Reporter: Adam Szita
Assignee: Adam Szita


Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can happen 
that sync markers are not identified while reading the interim binary file used 
to hold data between jobs.

In such files sync markers are placed upon writing, which later help during 
reading the data. These are random generated and it seems like that in some 
rare combinations of markers and data preceding it, they cannot be not found. 
This can result in reading through all the bytes (looking for the marker) and 
reaching split end or EOF, and extracting no records at all.

This symptom is also observable from JobHistory stats, where if a job is 
affected by this issue, will have tasks that have HDFS_BYTES_READ or 
FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
time having MAP_INPUT_RECORDS=0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5371) Hdfs bytes written assertions fail in TestPigRunner

2018-12-14 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721419#comment-16721419
 ] 

Adam Szita commented on PIG-5371:
-

Hi [~abstractdog], can you please elaborate on
{quote}TestPigRunner - work, only on an internal maintenance line
{quote}
I am able to run TestPigRunner checked out from trunk as per:
{code:java}
ant clean jar
ant test -Dtest=TestPigRunner{code}
..and it succeeds:
{code:java}
BUILD SUCCESSFUL
Total time: 8 minutes 47 seconds{code}

> Hdfs bytes written assertions fail in TestPigRunner
> ---
>
> Key: PIG-5371
> URL: https://issues.apache.org/jira/browse/PIG-5371
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5371.01.patch, simpleTest.out
>
>
> Attached  [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' 
> returns the byte count not only for the result of pig store operator, but it 
> includes the size of the jar files as well. The problem is this could change 
> very easily, so in my opinion the best would be to remove these assertions 
> from TestPigRunner as this is just causing intermittent and/or persistent 
> failures.
> The test class is for basic testing of PigRunner, and this is achieved well 
> enough without the asserts.
> {code}
> 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO  
> org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, 
> replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for 
> /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar
> ...
> 2018-11-23 10:14:52,735 [PacketResponder: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 
> 127.0.0.1:54943]] INFO  
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: 
> /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 
> 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: 
> BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 
> 57162859
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PIG-2557) CSVExcelStorage save : empty quotes "" becomes 4 quotes """". This should become a null field.

2018-09-26 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita resolved PIG-2557.
-
   Resolution: Duplicate
Fix Version/s: 0.17.0

> CSVExcelStorage save : empty quotes "" becomes 4 quotes .  This should 
> become a null field.
> ---
>
> Key: PIG-2557
> URL: https://issues.apache.org/jira/browse/PIG-2557
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.1
>Reporter: Peter Welch
>Priority: Minor
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-2557) CSVExcelStorage save : empty quotes "" becomes 4 quotes """". This should become a null field.

2018-09-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629193#comment-16629193
 ] 

Adam Szita commented on PIG-2557:
-

Since this is the same issue as PIG-5045, I'm resolving this

> CSVExcelStorage save : empty quotes "" becomes 4 quotes .  This should 
> become a null field.
> ---
>
> Key: PIG-2557
> URL: https://issues.apache.org/jira/browse/PIG-2557
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.1
>Reporter: Peter Welch
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-18 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5358:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Remove hive-contrib jar from lib directory
> --
>
> Key: PIG-5358
> URL: https://issues.apache.org/jira/browse/PIG-5358
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5358.0.patch
>
>
> As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
> 'export' some of our Hive dependencies into our lib folder too, and that 
> includes hive-contrib.jar so in order to be synced with Hive we should remove 
> it too.
> We don't depend on this jar runtime so there's no use of it being in Pig's 
> lib dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-18 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619576#comment-16619576
 ] 

Adam Szita commented on PIG-5358:
-

Committed to trunk, thanks for reviewing Nandor!

> Remove hive-contrib jar from lib directory
> --
>
> Key: PIG-5358
> URL: https://issues.apache.org/jira/browse/PIG-5358
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5358.0.patch
>
>
> As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
> 'export' some of our Hive dependencies into our lib folder too, and that 
> includes hive-contrib.jar so in order to be synced with Hive we should remove 
> it too.
> We don't depend on this jar runtime so there's no use of it being in Pig's 
> lib dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-14 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5358:

Status: Patch Available  (was: In Progress)

> Remove hive-contrib jar from lib directory
> --
>
> Key: PIG-5358
> URL: https://issues.apache.org/jira/browse/PIG-5358
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Minor
> Attachments: PIG-5358.0.patch
>
>
> As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
> 'export' some of our Hive dependencies into our lib folder too, and that 
> includes hive-contrib.jar so in order to be synced with Hive we should remove 
> it too.
> We don't depend on this jar runtime so there's no use of it being in Pig's 
> lib dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-14 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5358:

Attachment: PIG-5358.0.patch

> Remove hive-contrib jar from lib directory
> --
>
> Key: PIG-5358
> URL: https://issues.apache.org/jira/browse/PIG-5358
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Minor
> Attachments: PIG-5358.0.patch
>
>
> As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
> 'export' some of our Hive dependencies into our lib folder too, and that 
> includes hive-contrib.jar so in order to be synced with Hive we should remove 
> it too.
> We don't depend on this jar runtime so there's no use of it being in Pig's 
> lib dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-14 Thread Adam Szita (JIRA)
Adam Szita created PIG-5358:
---

 Summary: Remove hive-contrib jar from lib directory
 Key: PIG-5358
 URL: https://issues.apache.org/jira/browse/PIG-5358
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Adam Szita
Assignee: Adam Szita


As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
'export' some of our Hive dependencies into our lib folder too, and that 
includes hive-contrib.jar so in order to be synced with Hive we should remove 
it too.

We don't depend on this jar runtime so there's no use of it being in Pig's lib 
dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (PIG-5358) Remove hive-contrib jar from lib directory

2018-09-14 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-5358 started by Adam Szita.
---
> Remove hive-contrib jar from lib directory
> --
>
> Key: PIG-5358
> URL: https://issues.apache.org/jira/browse/PIG-5358
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Minor
>
> As per HIVE-20020 hive-contrib jar is moved out of under Hive's lib. We 
> 'export' some of our Hive dependencies into our lib folder too, and that 
> includes hive-contrib.jar so in order to be synced with Hive we should remove 
> it too.
> We don't depend on this jar runtime so there's no use of it being in Pig's 
> lib dir anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5343) Upgrade developer build environment

2018-09-07 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5343:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Upgrade developer build environment
> ---
>
> Key: PIG-5343
> URL: https://issues.apache.org/jira/browse/PIG-5343
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5343-1.patch, PIG-5343-2.patch, PIG-5343-3.patch
>
>
> The docker image that can be used to setup the build environment still uses 
> Java 1.7 and is based on a very old version of Ubuntu.
> Both of these should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5343) Upgrade developer build environment

2018-09-07 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607128#comment-16607128
 ] 

Adam Szita commented on PIG-5343:
-

[~nielsbasjes],

Thanks for the investigation on this one. +1 on [^PIG-5343-3.patch], it is now 
committed to trunk.

> Upgrade developer build environment
> ---
>
> Key: PIG-5343
> URL: https://issues.apache.org/jira/browse/PIG-5343
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5343-1.patch, PIG-5343-2.patch, PIG-5343-3.patch
>
>
> The docker image that can be used to setup the build environment still uses 
> Java 1.7 and is based on a very old version of Ubuntu.
> Both of these should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5340) Unable to compile files in PIG

2018-09-03 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601871#comment-16601871
 ] 

Adam Szita commented on PIG-5340:
-

Me neither. I did the following:
{code:java}
git clone https://github.com/apache/pig.git
cd pig
git checkout tags/release-0.17.0
ant clean jar
cd tutorial
ant jar{code}

After which I've got a BUILD SUCCESSFUL.

I'm resolving this jira as not a problem - we can reopen in the very unlikely 
case that it indeed turns out to be an issue

> Unable to compile files in PIG
> --
>
> Key: PIG-5340
> URL: https://issues.apache.org/jira/browse/PIG-5340
> Project: Pig
>  Issue Type: Bug
>Reporter: Remil
>Priority: Major
>
> hadoopuser@sherin-VirtualBox:/usr/local/pig/pig-0.17.0-src/tutorial$ sudo ant 
> jar
> Buildfile: /usr/local/pig/pig-0.17.0-src/tutorial/build.xml
> init:
> compile:
>  [echo] *** Compiling Tutorial files ***
>  [javac] /usr/local/pig/pig-0.17.0-src/tutorial/build.xml:66: warning: 
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set 
> to false for repeatable builds
>  [javac] Compiling 7 source files to 
> /usr/local/pig/pig-0.17.0-src/tutorial/build/classes
>  [javac] warning: [options] bootstrap class path not set in conjunction with 
> -source 1.5
>  [javac] warning: [options] source value 1.5 is obsolete and will be removed 
> in a future release
>  [javac] warning: [options] target value 1.5 is obsolete and will be removed 
> in a future release
>  [javac] warning: [options] To suppress warnings about obsolete options, use 
> -Xlint:-options.
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:24:
>  error: cannot find symbol
>  [javac] import org.apache.pig.EvalFunc;
>  [javac] ^
>  [javac] symbol: class EvalFunc
>  [javac] location: package org.apache.pig
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:25:
>  error: cannot find symbol
>  [javac] import org.apache.pig.FuncSpec;
>  [javac] ^
>  [javac] symbol: class FuncSpec
>  [javac] location: package org.apache.pig
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:26:
>  error: package org.apache.pig.data does not exist
>  [javac] import org.apache.pig.data.Tuple;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:27:
>  error: package org.apache.pig.data does not exist
>  [javac] import org.apache.pig.data.DataType;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:28:
>  error: package org.apache.pig.impl.logicalLayer.schema does not exist
>  [javac] import org.apache.pig.impl.logicalLayer.schema.Schema;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:29:
>  error: package org.apache.pig.impl.logicalLayer does not exist
>  [javac] import org.apache.pig.impl.logicalLayer.FrontendException;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:35:
>  error: cannot find symbol
>  [javac] public class ExtractHour extends EvalFunc {
>  [javac] ^
>  [javac] symbol: class EvalFunc
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:36:
>  error: cannot find symbol
>  [javac] public String exec(Tuple input) throws IOException {
>  [javac] ^
>  [javac] symbol: class Tuple
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:54:
>  error: cannot find symbol
>  [javac] public Schema outputSchema(Schema input) {
>  [javac] ^
>  [javac] symbol: class Schema
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:54:
>  error: cannot find symbol
>  [javac] public Schema outputSchema(Schema input) {
>  [javac] ^
>  [javac] symbol: class Schema
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:63:
>  error: cannot find symbol
>  [javac] public List getArgToFuncMapping() throws FrontendException 
> {
>  [javac] ^
>  [javac] symbol: class FuncSpec
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:63:
>  error: cannot find symbol
>  [javac] public List getArgToFuncMapping() throws FrontendException 
> {
>  [javac] ^
>  [javac] symbol: class FrontendException
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/NGramGenerator.java:26:
>  error: cannot find symbol
>  [javac] import 

[jira] [Resolved] (PIG-5340) Unable to compile files in PIG

2018-09-03 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita resolved PIG-5340.
-
Resolution: Not A Problem

> Unable to compile files in PIG
> --
>
> Key: PIG-5340
> URL: https://issues.apache.org/jira/browse/PIG-5340
> Project: Pig
>  Issue Type: Bug
>Reporter: Remil
>Priority: Major
>
> hadoopuser@sherin-VirtualBox:/usr/local/pig/pig-0.17.0-src/tutorial$ sudo ant 
> jar
> Buildfile: /usr/local/pig/pig-0.17.0-src/tutorial/build.xml
> init:
> compile:
>  [echo] *** Compiling Tutorial files ***
>  [javac] /usr/local/pig/pig-0.17.0-src/tutorial/build.xml:66: warning: 
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set 
> to false for repeatable builds
>  [javac] Compiling 7 source files to 
> /usr/local/pig/pig-0.17.0-src/tutorial/build/classes
>  [javac] warning: [options] bootstrap class path not set in conjunction with 
> -source 1.5
>  [javac] warning: [options] source value 1.5 is obsolete and will be removed 
> in a future release
>  [javac] warning: [options] target value 1.5 is obsolete and will be removed 
> in a future release
>  [javac] warning: [options] To suppress warnings about obsolete options, use 
> -Xlint:-options.
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:24:
>  error: cannot find symbol
>  [javac] import org.apache.pig.EvalFunc;
>  [javac] ^
>  [javac] symbol: class EvalFunc
>  [javac] location: package org.apache.pig
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:25:
>  error: cannot find symbol
>  [javac] import org.apache.pig.FuncSpec;
>  [javac] ^
>  [javac] symbol: class FuncSpec
>  [javac] location: package org.apache.pig
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:26:
>  error: package org.apache.pig.data does not exist
>  [javac] import org.apache.pig.data.Tuple;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:27:
>  error: package org.apache.pig.data does not exist
>  [javac] import org.apache.pig.data.DataType;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:28:
>  error: package org.apache.pig.impl.logicalLayer.schema does not exist
>  [javac] import org.apache.pig.impl.logicalLayer.schema.Schema;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:29:
>  error: package org.apache.pig.impl.logicalLayer does not exist
>  [javac] import org.apache.pig.impl.logicalLayer.FrontendException;
>  [javac] ^
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:35:
>  error: cannot find symbol
>  [javac] public class ExtractHour extends EvalFunc {
>  [javac] ^
>  [javac] symbol: class EvalFunc
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:36:
>  error: cannot find symbol
>  [javac] public String exec(Tuple input) throws IOException {
>  [javac] ^
>  [javac] symbol: class Tuple
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:54:
>  error: cannot find symbol
>  [javac] public Schema outputSchema(Schema input) {
>  [javac] ^
>  [javac] symbol: class Schema
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:54:
>  error: cannot find symbol
>  [javac] public Schema outputSchema(Schema input) {
>  [javac] ^
>  [javac] symbol: class Schema
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:63:
>  error: cannot find symbol
>  [javac] public List getArgToFuncMapping() throws FrontendException 
> {
>  [javac] ^
>  [javac] symbol: class FuncSpec
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/ExtractHour.java:63:
>  error: cannot find symbol
>  [javac] public List getArgToFuncMapping() throws FrontendException 
> {
>  [javac] ^
>  [javac] symbol: class FrontendException
>  [javac] location: class ExtractHour
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/NGramGenerator.java:26:
>  error: cannot find symbol
>  [javac] import org.apache.pig.EvalFunc;
>  [javac] ^
>  [javac] symbol: class EvalFunc
>  [javac] location: package org.apache.pig
>  [javac] 
> /usr/local/pig/pig-0.17.0-src/tutorial/src/org/apache/pig/tutorial/NGramGenerator.java:27:
>  error: cannot find symbol
>  [javac] import org.apache.pig.FuncSpec;
>  [javac] ^
>  [javac] symbol: class FuncSpec
>  

[jira] [Commented] (PIG-5191) Pig HBase 2.0.0 support

2018-08-29 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596222#comment-16596222
 ] 

Adam Szita commented on PIG-5191:
-

Looks good to me too. Committed to trunk.

Thanks for the patch Nandor, and thanks for reviewing Rohini, Daniel.

> Pig HBase 2.0.0 support
> ---
>
> Key: PIG-5191
> URL: https://issues.apache.org/jira/browse/PIG-5191
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5191_1.patch, PIG-5191_2.patch
>
>
> Pig doesn't support HBase 2.0.0. Since the new HBase API introduces several 
> API changes, we should find a way to support both 1.x and 2.x HBase API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5191) Pig HBase 2.0.0 support

2018-08-29 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5191:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Pig HBase 2.0.0 support
> ---
>
> Key: PIG-5191
> URL: https://issues.apache.org/jira/browse/PIG-5191
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5191_1.patch, PIG-5191_2.patch
>
>
> Pig doesn't support HBase 2.0.0. Since the new HBase API introduces several 
> API changes, we should find a way to support both 1.x and 2.x HBase API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5347) Add new target for generating dependency tree

2018-07-12 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541676#comment-16541676
 ] 

Adam Szita commented on PIG-5347:
-

[~satishsaley] thanks for looking into this.

Yes the ivy:report is there, but currently that only works for the active ivy 
config (e.g. spark1 and spark2 related libs are not included in the report)

It'd be very useful to have them all - right now when I look for something that 
spark pulls in, I have to take a look in ~/.ivy2/ and dig though xmls.

So this might not be such an invalid Jira ticket after all.

> Add new target for generating dependency tree
> -
>
> Key: PIG-5347
> URL: https://issues.apache.org/jira/browse/PIG-5347
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Major
>
> It would be really helpful in debugging dependency conflicts if we have some 
> easy way to get dependency tree. ivy:report - 
> http://ant.apache.org/ivy/history/latest-milestone/use/report.html task 
> generates html showing dependencies. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5344) Update Apache HTTPD LogParser to latest version

2018-07-02 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530078#comment-16530078
 ] 

Adam Szita commented on PIG-5344:
-

+1, patch committed to trunk.

Thanks for the patch [~nielsbasjes], and [~nkollar] for the review!

> Update Apache HTTPD LogParser to latest version
> ---
>
> Key: PIG-5344
> URL: https://issues.apache.org/jira/browse/PIG-5344
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.18.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5344-1.patch
>
>
> Similar to PIG-4717 this is to simply upgrade the 
> [logparser|https://github.com/nielsbasjes/logparser] library.
> I had to postpone this for a while because the latest version requires Java 8.
> I will simply update the version of the library.
> The new features are supported transparently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5344) Update Apache HTTPD LogParser to latest version

2018-07-02 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5344:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Update Apache HTTPD LogParser to latest version
> ---
>
> Key: PIG-5344
> URL: https://issues.apache.org/jira/browse/PIG-5344
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.18.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5344-1.patch
>
>
> Similar to PIG-4717 this is to simply upgrade the 
> [logparser|https://github.com/nielsbasjes/logparser] library.
> I had to postpone this for a while because the latest version requires Java 8.
> I will simply update the version of the library.
> The new features are supported transparently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5343) Upgrade developer build environment

2018-07-02 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530007#comment-16530007
 ] 

Adam Szita commented on PIG-5343:
-

[~nielsbasjes], are the failing tests passing under java 7? 

> Upgrade developer build environment
> ---
>
> Key: PIG-5343
> URL: https://issues.apache.org/jira/browse/PIG-5343
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Attachments: PIG-5343-1.patch, PIG-5343-2.patch, PIG-5343-3.patch
>
>
> The docker image that can be used to setup the build environment still uses 
> Java 1.7 and is based on a very old version of Ubuntu.
> Both of these should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5122) data

2018-06-25 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5122:

Fix Version/s: (was: site)

> data
> 
>
> Key: PIG-5122
> URL: https://issues.apache.org/jira/browse/PIG-5122
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.16.0
>Reporter: muhammad hamdani
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5343) Upgrade developer build environment

2018-06-25 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522044#comment-16522044
 ] 

Adam Szita commented on PIG-5343:
-

In {{TestDriverPig.pm}} we might want to leave the MaxPermSize setting. E2E 
tests can be used to compare results of the same Pig script using a new and an 
old release of Pig, and some test clusters might have java7 installed on them.

> Upgrade developer build environment
> ---
>
> Key: PIG-5343
> URL: https://issues.apache.org/jira/browse/PIG-5343
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
> Attachments: PIG-5343-1.patch, PIG-5343-2.patch
>
>
> The docker image that can be used to setup the build environment still uses 
> Java 1.7 and is based on a very old version of Ubuntu.
> Both of these should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5341) PigStorage with -tagFile/-tagPath produces incorrect results with column pruning

2018-06-05 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501494#comment-16501494
 ] 

Adam Szita commented on PIG-5341:
-

+1, thanks for fixing this Koji!

> PigStorage with -tagFile/-tagPath produces incorrect results with column 
> pruning
> 
>
> Key: PIG-5341
> URL: https://issues.apache.org/jira/browse/PIG-5341
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Critical
> Attachments: pig-5341-v01.patch
>
>
> I don't know why we didn't see this till now.
> {code}
> A = load 'test.txt' using PigStorage('\t', '-tagFile') as 
> (filename:chararray, a0:int, a1:int, a2:int, a3:int);
> B = FOREACH A GENERATE a0,a2;
> dump B;
> {code}
> Input 
> {noformat}
> knoguchi@pig > cat  test.txt
> 0   1   2   3
> 0   1   2   3
> 0   1   2   3
> {noformat}
> Expected Results
> {noformat}
> (0,2)
> (0,2)
> (0,2)
> {noformat}
> Actual Results
> {noformat}
> (,1)
> (,1)
> (,1)
> {noformat}
> This is really bad...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5338) Prevent deep copy of DataBag into Jython List

2018-04-23 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448317#comment-16448317
 ] 

Adam Szita commented on PIG-5338:
-

Yeah those errors look to be related to this patch - perhaps some classloading 
related issue, different classpath on MR side vs what is tested locally.

> Prevent deep copy of DataBag into Jython List
> -
>
> Key: PIG-5338
> URL: https://issues.apache.org/jira/browse/PIG-5338
> Project: Pig
>  Issue Type: Improvement
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Major
> Attachments: PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into 
> Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A 
> Jython DataBag which extends PyList could allow for iterative access to 
> DataBag elements, while only performing a deep copy when necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5338) Prevent deep copy of DataBag into Jython List

2018-04-20 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445478#comment-16445478
 ] 

Adam Szita commented on PIG-5338:
-

This looks like a good idea - although we'll also need to run (Scripting) e2e 
tests for verification.

> Prevent deep copy of DataBag into Jython List
> -
>
> Key: PIG-5338
> URL: https://issues.apache.org/jira/browse/PIG-5338
> Project: Pig
>  Issue Type: Improvement
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Major
> Attachments: PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into 
> Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A 
> Jython DataBag which extends PyList could allow for iterative access to 
> DataBag elements, while only performing a deep copy when necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5253) Pig Hadoop 3 support

2018-01-22 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334305#comment-16334305
 ] 

Adam Szita commented on PIG-5253:
-

{quote}If there is no change required for hadoop 2, we can just point it to 
compile from hadoop 2 shims directory.
{quote}
 
Wouldn't that be a source of confusion? Also if we keep the shims layer, what 
do we do with the maven classifiers in build.xml? I guess we would want to keep 
that structure as well (for future usage as said before) although I don't think 
we want to have a -h2 and -h3 jar with the very same content.

> Pig Hadoop 3 support
> 
>
> Key: PIG-5253
> URL: https://issues.apache.org/jira/browse/PIG-5253
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-5320) TestCubeOperator#testRollupBasic is flaky on Spark 2.2

2018-01-15 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326329#comment-16326329
 ] 

Adam Szita commented on PIG-5320:
-

+1 on [^PIG-5320_2.patch], and committed to trunk. Thanks Nandor!

> TestCubeOperator#testRollupBasic is flaky on Spark 2.2
> --
>
> Key: PIG-5320
> URL: https://issues.apache.org/jira/browse/PIG-5320
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5320_1.patch, PIG-5320_2.patch
>
>
> TestCubeOperator#testRollupBasic occasionally fails with
> {code}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias c
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:781)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:858)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:821)
>   at org.apache.pig.test.Util.registerMultiLineQuery(Util.java:972)
>   at 
> org.apache.pig.test.TestCubeOperator.testRollupBasic(TestCubeOperator.java:124)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator: 
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:237)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
>   at org.apache.pig.PigServer.execute(PigServer.java:1449)
>   at org.apache.pig.PigServer.access$500(PigServer.java:119)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {code}
> I think the problem is that in JobStatisticCollector#waitForJobToEnd 
> {{sparkListener.wait()}} is not inside a loop, like suggested in wait's 
> javadoc:
> {code}
>  * As in the one argument version, interrupts and spurious wakeups are
>  * possible, and this method should always be used in a loop:
> {code}
> Thus due to a spurious wakeup, the wait might pass without a notify getting 
> called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5320) TestCubeOperator#testRollupBasic is flaky on Spark 2.2

2018-01-15 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5320:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> TestCubeOperator#testRollupBasic is flaky on Spark 2.2
> --
>
> Key: PIG-5320
> URL: https://issues.apache.org/jira/browse/PIG-5320
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5320_1.patch, PIG-5320_2.patch
>
>
> TestCubeOperator#testRollupBasic occasionally fails with
> {code}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias c
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:781)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:858)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:821)
>   at org.apache.pig.test.Util.registerMultiLineQuery(Util.java:972)
>   at 
> org.apache.pig.test.TestCubeOperator.testRollupBasic(TestCubeOperator.java:124)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator: 
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:237)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
>   at org.apache.pig.PigServer.execute(PigServer.java:1449)
>   at org.apache.pig.PigServer.access$500(PigServer.java:119)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {code}
> I think the problem is that in JobStatisticCollector#waitForJobToEnd 
> {{sparkListener.wait()}} is not inside a loop, like suggested in wait's 
> javadoc:
> {code}
>  * As in the one argument version, interrupts and spurious wakeups are
>  * possible, and this method should always be used in a loop:
> {code}
> Thus due to a spurious wakeup, the wait might pass without a notify getting 
> called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-09 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5325:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: 0.18.0
>
> Attachments: PIG-5325.0.patch
>
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-09 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318178#comment-16318178
 ] 

Adam Szita commented on PIG-5325:
-

Patch committed to trunk, thanks for the review Rohini!

> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: 0.18.0
>
> Attachments: PIG-5325.0.patch
>
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-03 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5325:

Status: Patch Available  (was: In Progress)

> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5325.0.patch
>
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-03 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309875#comment-16309875
 ] 

Adam Szita commented on PIG-5325:
-

Attached [^PIG-5325.0.patch] for fixing this. [~mikebush], [~rohini] can you 
take a look please?

> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5325.0.patch
>
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-03 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5325:

Attachment: PIG-5325.0.patch

> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5325.0.patch
>
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5110) Removing schema alias and :: coming from parent relation

2018-01-03 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309727#comment-16309727
 ] 

Adam Szita commented on PIG-5110:
-

Thanks for catching this [~mikebush], I'll address this problem in PIG-5325.

> Removing schema alias and :: coming from parent relation
> 
>
> Key: PIG-5110
> URL: https://issues.apache.org/jira/browse/PIG-5110
> Project: Pig
>  Issue Type: New Feature
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: 0.17.0
>
> Attachments: PIG-5110.0.patch, PIG-5110.1.patch, PIG-5110.2.patch
>
>
> Customers have asked for a feature to get rid of the schema alias prefixes. 
> CROSS, JOIN, FLATTEN, etc.. prepend the field name with the parent field 
> alias and ::
> I would like to find a way to disable this feature. (The burden of making 
> sure not to have duplicate aliases - and hence the appropriate 
> FrontendException getting thrown - is on the user)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-03 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-5325 started by Adam Szita.
---
> Schema disambiguation can't be turned off for nested schemas
> 
>
> Key: PIG-5325
> URL: https://issues.apache.org/jira/browse/PIG-5325
> Project: Pig
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>
> PIG-5110 introduced the feature to turn off automatic schema field alias 
> disambiguation, removing parent alias and the ':' char. It seems like this 
> doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PIG-5325) Schema disambiguation can't be turned off for nested schemas

2018-01-03 Thread Adam Szita (JIRA)
Adam Szita created PIG-5325:
---

 Summary: Schema disambiguation can't be turned off for nested 
schemas
 Key: PIG-5325
 URL: https://issues.apache.org/jira/browse/PIG-5325
 Project: Pig
  Issue Type: Bug
Reporter: Adam Szita
Assignee: Adam Szita


PIG-5110 introduced the feature to turn off automatic schema field alias 
disambiguation, removing parent alias and the ':' char. It seems like this 
doesn't work for nested schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-4764) Make Pig work with Hive 2.0

2018-01-03 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309473#comment-16309473
 ] 

Adam Szita commented on PIG-4764:
-

I think this would be handy to have in 0.18

> Make Pig work with Hive 2.0
> ---
>
> Key: PIG-4764
> URL: https://issues.apache.org/jira/browse/PIG-4764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-4764-0.patch, PIG-4764-1.patch, PIG-4764-2.patch, 
> PIG-4764-3.patch, PIG-4764-4.patch
>
>
> There are a lot of changes especially around ORC in Hive 2.0. We need to make 
> Pig work with it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5320) TestCubeOperator#testRollupBasic is flaky on Spark 2.2

2017-12-14 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290835#comment-16290835
 ] 

Adam Szita commented on PIG-5320:
-

[~nkollar] patch looks good, but can you elaborate on your reason for changing 
hash-based implementations to tree-based ones for the sets and maps used in 
this class? I would think that the number of jobs here would very rarely be 
high (if I think that most Pig jobs are started in batch mode with a script 
specified, so the only jobs here are what that one script generates)

> TestCubeOperator#testRollupBasic is flaky on Spark 2.2
> --
>
> Key: PIG-5320
> URL: https://issues.apache.org/jira/browse/PIG-5320
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5320_1.patch
>
>
> TestCubeOperator#testRollupBasic occasionally fails with
> {code}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias c
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:781)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:858)
>   at org.apache.pig.PigServer.registerScript(PigServer.java:821)
>   at org.apache.pig.test.Util.registerMultiLineQuery(Util.java:972)
>   at 
> org.apache.pig.test.TestCubeOperator.testRollupBasic(TestCubeOperator.java:124)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator: 
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:237)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
>   at org.apache.pig.PigServer.execute(PigServer.java:1449)
>   at org.apache.pig.PigServer.access$500(PigServer.java:119)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
>   at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
>   at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {code}
> I think the problem is that in JobStatisticCollector#waitForJobToEnd 
> {{sparkListener.wait()}} is not inside a loop, like suggested in wait's 
> javadoc:
> {code}
>  * As in the one argument version, interrupts and spurious wakeups are
>  * possible, and this method should always be used in a loop:
> {code}
> Thus due to a spurious wakeup, the wait might pass without a notify getting 
> called.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-13 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5318:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch, PIG-5318_6.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-13 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289058#comment-16289058
 ] 

Adam Szita commented on PIG-5318:
-

[~nkollar], +1 for [^PIG-5318_6.patch], committed to trunk.
I think we should also upgrade the spark 2 minor version in Pig On Spark to 
2.2. We don't want to maintain a 1.6.1, 2.1.1, and 2.2.0 support at the same 
time, rather have one minor per major.
Created PIG-5321 to track the upgrade.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch, PIG-5318_6.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PIG-5321) Upgrade Spark 2 version to 2.2.0 for Pig on Spark

2017-12-13 Thread Adam Szita (JIRA)
Adam Szita created PIG-5321:
---

 Summary: Upgrade Spark 2 version to 2.2.0 for Pig on Spark
 Key: PIG-5321
 URL: https://issues.apache.org/jira/browse/PIG-5321
 Project: Pig
  Issue Type: Improvement
  Components: spark
Reporter: Adam Szita


Right now we maintain support for 2 versions of Spark for PoS jobs:
spark1.version=1.6.1
spark2.version=2.1.1

I believe we should move forward with the latter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5310) MergeJoin throwing NullPointer Exception

2017-11-29 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270572#comment-16270572
 ] 

Adam Szita commented on PIG-5310:
-

+1 on [^PIG-5310-2.patch]

> MergeJoin throwing NullPointer Exception
> 
>
> Key: PIG-5310
> URL: https://issues.apache.org/jira/browse/PIG-5310
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: PIG-5310-1.patch, PIG-5310-2.patch
>
>
> Merge join throws NullPointerException if left input's first key doesn't 
> exist in right input and if it is smaller than first key of right input.
> For ex
> |left|right|
> |1|3|
> |1|5|
> |1| |
> Error we get - 
> {code}
> ERROR 2998: Unhandled internal error. Vertex failed, vertexName=scope-16, 
> vertexId=vertex_1509400259446_0001_1_02, diagnostics=[Task failed, 
> taskId=task_1509400259446_0001_1_02_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1509400259446_0001_1_02_00_0:java.lang.NullPointerException
>   at java.lang.Integer.compareTo(Integer.java:1216)
>   at java.lang.Integer.compareTo(Integer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextTuple(POMergeJoin.java:525)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:416)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:281)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1945)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here, the key used in join is an integer. Integer.compareTo(other) method 
> throws null pointer exception if comparison is made against null. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5316:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch, PIG-5316_2.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268913#comment-16268913
 ] 

Adam Szita commented on PIG-5316:
-

Good catch Nandor, fix committed!

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch, PIG-5316_2.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268704#comment-16268704
 ] 

Adam Szita commented on PIG-5316:
-

Committed to trunk, thanks Nandor and Xuefu!

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5316:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-23 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263989#comment-16263989
 ] 

Adam Szita commented on PIG-5316:
-

[~nkollar] +1 on the patch, unless objections by [~xuefuz] I'll commit tomorrow

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-20 Thread Adam Szita (JIRA)
Adam Szita created PIG-5316:
---

 Summary: Initialize mapred.task.id property for PoS jobs
 Key: PIG-5316
 URL: https://issues.apache.org/jira/browse/PIG-5316
 Project: Pig
  Issue Type: Improvement
  Components: spark
Reporter: Adam Szita
Assignee: Nandor Kollar


Some downstream systems may require the presence of {{mapred.task.id}} property 
(e.g. HCatalog). This is currently not set when Pig On Spark jobs are started. 
Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5302) Remove HttpClient dependency

2017-11-14 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251104#comment-16251104
 ] 

Adam Szita commented on PIG-5302:
-

Committed to trunk. Thanks for the patch Nandor, and for the review Rohini.

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch, PIG-5302_3.patch, 
> PIG-5302_4.patch, ivy-report.css, org.apache.pig-pig-compile.html
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5302) Remove HttpClient dependency

2017-11-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5302:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch, PIG-5302_3.patch, 
> PIG-5302_4.patch, ivy-report.css, org.apache.pig-pig-compile.html
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5302) Remove HttpClient dependency

2017-11-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5302:

Issue Type: Improvement  (was: Bug)

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch, PIG-5302_3.patch, 
> PIG-5302_4.patch, ivy-report.css, org.apache.pig-pig-compile.html
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5302) Remove HttpClient dependency

2017-11-13 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249526#comment-16249526
 ] 

Adam Szita commented on PIG-5302:
-

+1 on latest patch. I successfully ran test-commit for verification.

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch, PIG-5302_3.patch, 
> PIG-5302_4.patch, ivy-report.css, org.apache.pig-pig-compile.html
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5302) Remove HttpClient dependency

2017-11-09 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245403#comment-16245403
 ] 

Adam Szita commented on PIG-5302:
-

There looks to be a lot of unused / old copy-pasted entries in our ivy file. 
Thanks for your efforts to clean this up, [^PIG-5302_3.patch] looks good to me, 
+1 pending that all unit tests pass.

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch, PIG-5302_3.patch, 
> ivy-report.css, org.apache.pig-pig-compile.html
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-10-06 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: 0.18.0
>
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-10-06 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194512#comment-16194512
 ] 

Adam Szita commented on PIG-5305:
-

Thanks for the review [~kellyzly], latest patch is now committed to trunk.

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-3864) ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones

2017-10-03 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189577#comment-16189577
 ] 

Adam Szita commented on PIG-3864:
-

+1 on [^PIG-3864-1.patch], it is a good fix [~daijy]

> ToDate(userstring, format, timezone) computes DateTime with strange handling 
> of Daylight Saving Time with location based timezones
> --
>
> Key: PIG-3864
> URL: https://issues.apache.org/jira/browse/PIG-3864
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.11.1
>Reporter: Frederic Schmaljohann
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-3864-1.patch
>
>
> When using ToDate with a location based timezone (e.g. "Europe/Berlin") the 
> handling of the timezone offset is based on whether the timezone is currently 
> in daylight saving and not based on whether the timestamp is in daylight 
> saving time or not.
> Example:
> {noformat}
> B = FOREACH A GENERATE ToDate('2014-02-02 18:00:00.000Z', '-MM-dd 
> HH:mm:ss.SSSZ', 'Europe/Berlin') AS Timestamp;
> {noformat}
> This yields 
> {noformat}2014-02-02 20:00:00.000+02{noformat}
> when called during daylight saving in Europe/Berlin although I would expect 
> {noformat}2014-02-02 19:00:00.000+01{noformat}
> During standard time In Europe/Berlin, the above call yields 
> {noformat}2014-02-02 19:00:00.000+01{noformat}
> In Europe/Berlin DST started on March 30th, 2014.
> This seems pretty strange to me. If it is on purpose it should at least be 
> noted in the documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5302) Remove HttpClient dependency

2017-10-03 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189514#comment-16189514
 ] 

Adam Szita commented on PIG-5302:
-

[~nkollar] do all unit tests pass with this change?

> Remove HttpClient dependency
> 
>
> Key: PIG-5302
> URL: https://issues.apache.org/jira/browse/PIG-5302
> Project: Pig
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5302_1.patch, PIG-5302_2.patch
>
>
> Pig depends on Apache Commons HttpClient 3.1 which is an old version with 
> security problems 
> ([CVE-2015-5262|https://cve.mitre.org/cgi-bin/cvename.cgi?name=%20CVE-2015-5262])
> Also, Pig depends on Apache HttpComponents (it also needs update to newer 
> version due to similar reason), which is the successor of HttpClient, thus we 
> should remove HttpClient dependency, and update HttpComponents to 4.4+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-22 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176186#comment-16176186
 ] 

Adam Szita commented on PIG-5305:
-

[~kellyzly] do you think this is ready for commit now?

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-21 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174837#comment-16174837
 ] 

Adam Szita commented on PIG-5305:
-

[~kellyzly] yes {{src.exclude.dir}} was probably just left there, and had no 
use since the removal of Hadoop 1 support. Then Spark 2 support came with 
PIG-5157, and as you correctly point it out, resetting src.exclude.dir does 
influence {{jar}} target.

The reason we didn't see this before is because nobody used the {{test-tez}} 
target, in the Apache Jenkins job we use {{test-core-mrtez}] which runs all MR 
and then all Tez unit tests.

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5298) Verify if org.mortbay.jetty is removable

2017-09-21 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5298:

   Resolution: Fixed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

> Verify if org.mortbay.jetty is removable
> 
>
> Key: PIG-5298
> URL: https://issues.apache.org/jira/browse/PIG-5298
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5298_1.patch, PIG-5298_2.patch, PIG-5298_3.patch
>
>
> Although we pull in jetty libraries in ivy Pig does not depend on 
> org.mortbay.jetty explicitly. The only exception I see is in Piggybank where 
> I think this can be swapped by javax.el-api and log4j.
> We should investigate (check build, run unit tests across all exec modes) and 
> remove if it turns out to be unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5298) Verify if org.mortbay.jetty is removable

2017-09-21 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174768#comment-16174768
 ] 

Adam Szita commented on PIG-5298:
-

+1, [^PIG-5298_3.patch] committed to trunk. Thanks for taking care of this 
Nandor

> Verify if org.mortbay.jetty is removable
> 
>
> Key: PIG-5298
> URL: https://issues.apache.org/jira/browse/PIG-5298
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5298_1.patch, PIG-5298_2.patch, PIG-5298_3.patch
>
>
> Although we pull in jetty libraries in ivy Pig does not depend on 
> org.mortbay.jetty explicitly. The only exception I see is in Piggybank where 
> I think this can be swapped by javax.el-api and log4j.
> We should investigate (check build, run unit tests across all exec modes) and 
> remove if it turns out to be unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-19 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171478#comment-16171478
 ] 

Adam Szita commented on PIG-5305:
-

[~kellyzly]
1: removed the dependecy from test-tez. I also checked, test-tez was not 
running properly since the Spark 2 support commit, because {{setTezEnv}} was 
clearing the excluded sources property. I fixed this in my latest patch as well.

2: There were quite a couple of failures at first, that's why I had to add a 
reset feature of SparkContexts into SparkLauncher. With the latest patch it 
shouldn't have any failures.

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-19 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Attachment: PIG-5305.2.patch

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-18 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Status: Patch Available  (was: Open)

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-18 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169815#comment-16169815
 ] 

Adam Szita commented on PIG-5305:
-

Thanks for the comments [~kellyzly].
Attached [^PIG-5305.1.patch].

1. Correct, test-core-mrtez indeed doesn't need jar-simple, I removed that. 
However I'd like to keep pigtest-jar target calls in test related targets. For 
example if someone launches {{ant clean test -Dtest.exec.type=spark}} we have 
too keep it on {{test-core}} target as well.
2. Added comment as requested.


> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-18 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Attachment: PIG-5305.1.patch

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-18 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Attachment: (was: PIG-5305.1.patch)

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-18 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Attachment: PIG-5305.1.patch

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5298) Verify if org.mortbay.jetty is removable

2017-09-15 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167730#comment-16167730
 ] 

Adam Szita commented on PIG-5298:
-

[~nkollar], if currently the only reason the pull in jetty is to make use of 
its EL implementation which as you say in the newer versions is basically 
borrowed from that of Glassfish the logical thing to do IMHO would be to use 
just glassfish EL, no jetty and no tomcat.

I believe we should always try to reduce the number and size of dependent 
libraries to only those that we actually make use of.

> Verify if org.mortbay.jetty is removable
> 
>
> Key: PIG-5298
> URL: https://issues.apache.org/jira/browse/PIG-5298
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5298_1.patch
>
>
> Although we pull in jetty libraries in ivy Pig does not depend on 
> org.mortbay.jetty explicitly. The only exception I see is in Piggybank where 
> I think this can be swapped by javax.el-api and log4j.
> We should investigate (check build, run unit tests across all exec modes) and 
> remove if it turns out to be unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-14 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166543#comment-16166543
 ] 

Adam Szita commented on PIG-5305:
-

Attached [^PIG-5305.0.patch] to enable running tests in yarn-client mode for 
Spark execution.

Main changes:
* build.xml: added target to build a jar with all test classes. This is 
required so that we can pass this test jar onto SparkContext which then 
distributes it among Spark executors + set SPARK_MASTER env var to "yarn-client"
* SparkLauncher: added feature to re-initialize SparkContext when switching 
between cluster and local mode PigServers + only setting 
ChildFirstURLClassLoader during cluster mode

[~kellyzly] can you please take a look?


> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Description: See parent jira (PIG-5305) for problem description

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5305:

Attachment: PIG-5305.0.patch

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5297) Yarn-client mode doesn't work with Spark 2

2017-09-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5297:

Attachment: PIG-5297.0.patch

> Yarn-client mode doesn't work with Spark 2
> --
>
> Key: PIG-5297
> URL: https://issues.apache.org/jira/browse/PIG-5297
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
>
> When running tests in yarn-client mode that were built with Spark 2 I'm 
> getting the following exception:
> {code}
> Caused by: java.lang.IllegalStateException: Library directory 
> './pig/assembly/target/scala-2.11/jars' does not exist; make sure Spark 
> is built.
>   at 
> org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
>   at 
> org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:368)
>   at 
> org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:558)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:882)
> {code}
> After overcoming this with symlinks and setting SPARK_HOME I hit another 
> issue:
> {code}
> Caused by: java.lang.NoSuchMethodError: 
> io.netty.channel.DefaultFileRegion.(Ljava/io/File;JJ)V
>   at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:133)
>   at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
>   at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
> {code}
> I believe this will be an incompatibility between netty-all versions required 
> by hadoop and spark..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5297) Yarn-client mode doesn't work with Spark 2

2017-09-14 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5297:

Attachment: (was: PIG-5297.0.patch)

> Yarn-client mode doesn't work with Spark 2
> --
>
> Key: PIG-5297
> URL: https://issues.apache.org/jira/browse/PIG-5297
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
>
> When running tests in yarn-client mode that were built with Spark 2 I'm 
> getting the following exception:
> {code}
> Caused by: java.lang.IllegalStateException: Library directory 
> './pig/assembly/target/scala-2.11/jars' does not exist; make sure Spark 
> is built.
>   at 
> org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
>   at 
> org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:368)
>   at 
> org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:558)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:882)
> {code}
> After overcoming this with symlinks and setting SPARK_HOME I hit another 
> issue:
> {code}
> Caused by: java.lang.NoSuchMethodError: 
> io.netty.channel.DefaultFileRegion.(Ljava/io/File;JJ)V
>   at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:133)
>   at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
>   at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
> {code}
> I believe this will be an incompatibility between netty-all versions required 
> by hadoop and spark..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-14 Thread Adam Szita (JIRA)
Adam Szita created PIG-5305:
---

 Summary: Enable yarn-client mode execution of tests in Spark (1) 
mode
 Key: PIG-5305
 URL: https://issues.apache.org/jira/browse/PIG-5305
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: Adam Szita
Assignee: Adam Szita






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5298) Verify if org.mortbay.jetty is removable

2017-09-11 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161025#comment-16161025
 ] 

Adam Szita commented on PIG-5298:
-

Thanks for looking into this [~nkollar].

Few comments:
* please list an entry for the tomcat version (9.0.0.M26) in 
libraries.properties instead of ivy.xml
* list the tomcat dependencies in the pom templates for pig and piggybank

> Verify if org.mortbay.jetty is removable
> 
>
> Key: PIG-5298
> URL: https://issues.apache.org/jira/browse/PIG-5298
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5298_1.patch
>
>
> Although we pull in jetty libraries in ivy Pig does not depend on 
> org.mortbay.jetty explicitly. The only exception I see is in Piggybank where 
> I think this can be swapped by javax.el-api and log4j.
> We should investigate (check build, run unit tests across all exec modes) and 
> remove if it turns out to be unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   8   9   >