[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-13 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289058#comment-16289058
 ] 

Adam Szita commented on PIG-5318:
-

[~nkollar], +1 for [^PIG-5318_6.patch], committed to trunk.
I think we should also upgrade the spark 2 minor version in Pig On Spark to 
2.2. We don't want to maintain a 1.6.1, 2.1.1, and 2.2.0 support at the same 
time, rather have one minor per major.
Created PIG-5321 to track the upgrade.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch, PIG-5318_6.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-08 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284463#comment-16284463
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

+1 on the patch from my side.

bq. found an universal way to tell the current Spark version
   Did not suggest that as [~gezapeti] has mentioned earlier that it is 
internal to Spark - 
https://issues.apache.org/jira/browse/OOZIE-2606?focusedCommentId=15528793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15528793.
 It is ok here though as it is only for tests.

bq. testKeepGoingFailed is excluded because of SPARK-7953
  SPARK-7953 still does not seem to be fixed as you mentioned. Can you try to 
find which jira actually fixed it and probably close SPARK-7953 if it is not 
required anymore. Identifying what behavior change caused this might also help 
find other places in Pig on Spark that have to be fixed or changed for the new 
behavior.



> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch, PIG-5318_6.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-08 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283321#comment-16283321
 ] 

Nandor Kollar commented on PIG-5318:


Attached PIG-5318_6.patch, found an universal way to tell the current Spark 
version, that works with both Spark 1.6.x and Spark 2.x too, and there's no 
need to start SparkContext. (thanks [~gezapeti] :) )

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch, PIG-5318_6.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-08 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283255#comment-16283255
 ] 

Nandor Kollar commented on PIG-5318:


[~kellyzly] thanks for the explanation, in this case I think enabling this test 
is fine, and there's no need to check for Spark version, we don't support older 
Spark versions.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-07 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283023#comment-16283023
 ] 

liyunzhang commented on PIG-5318:
-

[~nkollar]:  testKeepGoingFailed is excluded because of SPARK-7953. At that 
time we used spark 1.3.  And after upgrading to 1.6, not enable this test again.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-07 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281815#comment-16281815
 ] 

Nandor Kollar commented on PIG-5318:


Attached PIG-5318_5.patch which includes fix for TestAssert, TestScalarAliases, 
TestEvalPipeline2, TestStore and TestStoreLocal test cases, but doesn't fix 
TestStoreInstances failure. The Spark version is determined like Rohini 
suggested. I also noticed, that testKeepGoigFailed (fixed the typo in method 
name, now testKeepGoingFailed) was excluded from spark exec type, I enabled 
this test case, since it passed in my environment with 1.6, 2.1 and 2.2 Spark 
versions. [~kellyzly] do you remember why this was excluded? Looks like the 
Jira it is referring to is not yet fixed, despite this the test passes with 
1.6.x Spark.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch, PIG-5318_5.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280635#comment-16280635
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

bq. Should we open a separate Jira for fixing TestStoreInstances in spark mode?
 Sure. It will require more time for you to come up with a solution. We can get 
the other ones fixed in this jira.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277122#comment-16277122
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

bq. it looks like the way I wanted to tell Spark version doesn't work on Spark 
1.x
  Missed this earlier. If the spark-version-info.properties file is not there, 
you could just return false for isSpark2_2_plus which will be easier.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277120#comment-16277120
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

bq. but how about modifying PigOutputformat, like I did in the patch (making 
the relevant variables static)?
 This cannot be done. It is hacky and will break Pig local mode and Tez. In 
local mode, save jvm is used to execute the whole script which can have 
parallel STORE statements. Tez also allows storing to multiple outputs from 
same vertex in a DAG - i.e multiple PigOutputFormat in the save jvm.

bq. isSpark2_1_minus
  Can you make it  isSpark2_2_plus which is slightly more intuitive than 
2_1_minus. Also instantiating SparkContext just to get version seems overkill. 
Prefer the previous logic you had. Is there any reason that could not be used?


> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-04 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276747#comment-16276747
 ] 

Nandor Kollar commented on PIG-5318:


Attached PIG-5318_4.patch, it looks like the way I wanted to tell Spark version 
doesn't work on Spark 1.x, using SparkContext#version instead.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch, PIG-5318_3.patch, 
> PIG-5318_4.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-04 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276557#comment-16276557
 ] 

Nandor Kollar commented on PIG-5318:


bq. You should just do isSpark2_x (sparkVersion.startsWith("2.")) instead of 
isSpark2_2_x . If Spark 2.3 gets released, then code will have to change.

You're right, but matching for 2.x is not good enough. On Spark 2.1, abortTask 
and abortJob is not called (see SPARK-7953), but looks like in Spark 2.2 this 
is fixed (at least it looks like it is fixed). I'll update the patch soon, we 
should match Spark 2.2+.

bq. Spark should consistently use the same OutputFormat instance in this case

Ok, so I guess this should be a new Jira for Spark, however Spark 2.2 is 
already released, and creates more OutputFormat instances like said before. 
Indeed, we shouldn't modify the test case, but how about modifying 
PigOutputformat, like I did in the patch (making the relevant variables static)?

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-01 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274766#comment-16274766
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

You should just do isSpark2_x (sparkVersion.startsWith("2.")) instead of 
isSpark2_2_x . If Spark 2.3 gets released, then code will have to change. 

bq. Not sure if it is a bug in Pig, or in Spark, should Spark consistently use 
the same OutputFormat instance in this case?
  Spark should consistently use the same OutputFormat instance in this case. We 
should not be modifying the test case. There will be users who will be using 
local variables in StoreFunc for some computation at least.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-12-01 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274385#comment-16274385
 ] 

Nandor Kollar commented on PIG-5318:


Attached PIG-5318_2.patch, I addressed Rohini's comments there.

As of {{TestStoreInstances}} failure, it looks like Spark (unlike Tez and 
MapReduce) creates multiple instances from {{PigOutputFormat}} while setting up 
the output committers: 
[setupCommitter|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L74]
 is called from both 
[setupJob|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L138]
 and from 
[setupTask|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L165],
 and {{setupCommitter}} creates a new {{PigOutputFormat}} each time, saving in 
a private variable. In addition, when Spark writes to files, a new 
{{PigOutputFormat}} is [getting 
created|https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L75]
 too, and since POStores are saved and deserialized in configuration, but 
StoreFuncInterface inside stores are 
[transient|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L53],
 a new instance of {{STFuncCheckInstances}} is getting created, each time, thus 
{{putNext}} and {{commitTask}} will use different array instances. Not sure if 
it is a bug in Pig, or in Spark, should Spark consistently use the same 
OutputFormat instance in this case?

Making {{reduceStores}}, {{mapStores}}, {{currentConf}} static inside 
{{TestStoreInstances}} would solve the problem, [~rohini], [~kellyzly] what do 
you think about this solution?

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-11-30 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272501#comment-16272501
 ] 

Nandor Kollar commented on PIG-5318:


Thanks [~rohini] and [~kellyzly] for your review!
Hm, I think I understood the point of TestStoreInstances, and indeed, my change 
on that test looks pointless. I'm afraid this might be a bug and not a test 
issue. I'll continue the investigation why it is failing, and what how to fix 
it, so far it looks like commitTask is not called on the correct 
OutputCommitterTestInstances instance, the array is empty.

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-11-29 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271998#comment-16271998
 ] 

liyunzhang commented on PIG-5318:
-

[~nkollar]:
thanks for working on it.
thanks [~rohini] 's comments.
 just quick scan. several questions
1. the modification for 
{{TestAssert}},{{TestEvalPipeline}},{{TestScalarAliases}} suit for Pig on MR or 
Pig on Tez? I guess it will not hurt other engine, just want to confirm with it.
2. not very understand  the purpose about the modification of 
TestStoreInstances, are there some problems with previous code?
 before
 {code}
 private ArrayList outRows;
 {code}
 
 After
 {code}
  private static Map outRows = new HashMap<>();
 private String location;
 {code}

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-11-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271252#comment-16271252
 ] 

Rohini Palaniswamy commented on PIG-5318:
-

Few Comments:
  1) TestStoreBase 
- Please add a isSpark2_x() method to Util.java after isHadoop1_x() and use 
that
   - mode.toString().startsWith("SPARK") -> Util.isSparkExecType(mode)
  2) Changing test case of TestStoreInstances beats the purpose of the test.


> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5318) Unit test failures on Pig on Spark with Spark 2.2

2017-11-29 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270990#comment-16270990
 ] 

Nandor Kollar commented on PIG-5318:


[~szita], [~kellyzly] could you please have review?

> Unit test failures on Pig on Spark with Spark 2.2
> -
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch
>
>
> There are sever failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)