Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-03 Thread Jonathan Kelly
Small correction: I found a mention of it on
https://github.com/apache/spark/pull/39807 from a month ago.

On Fri, Mar 3, 2023 at 9:44 AM Jonathan Kelly 
wrote:

> So did I... :-( However, there had been no new JIRA issue or PR that has
> mentioned this test case specifically, until
> https://issues.apache.org/jira/browse/SPARK-42665, created just a minute
> ago.
>
> On Fri, Mar 3, 2023 at 5:13 AM Sean Owen  wrote:
>
>> Oh OK, I thought this RC was meant to fix that.
>>
>> On Fri, Mar 3, 2023 at 12:35 AM Jonathan Kelly 
>> wrote:
>>
>>> I see that one too but have not investigated it myself. In the RC1
>>> thread, it was mentioned that this occurs when running the tests via Maven
>>> but not via SBT. Does the test class path get set up differently when
>>> running via SBT vs. Maven?
>>>
>>> On Thu, Mar 2, 2023 at 5:37 PM Sean Owen  wrote:
>>>
 Thanks, that's good to know. The workaround (deleting the thriftserver
 target dir) works for me. Who knows?

 But I'm also still seeing:

 - simple udf *** FAILED ***
   io.grpc.StatusRuntimeException: INTERNAL:
 org.apache.spark.sql.ClientE2ETestSuite
   at io.grpc.Status.asRuntimeException(Status.java:535)
   at
 io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
   at org.apache.spark.sql.connect.client.SparkResult.org
 $apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
   at
 org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
   at
 org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
   at
 org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
   at
 scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

 On Thu, Mar 2, 2023 at 4:38 PM Jonathan Kelly 
 wrote:

> Yes, this issue has driven me quite crazy as well! I hit this issue
> for a long time when compiling the master branch and running tests.
> Strangely, it would only occur, as you say, when running the tests and not
> during an initial build that skips running the tests. (However, I have 
> seen
> instances where it does occur even in the initial build with tests 
> skipped,
> but only on AWS CodeBuild, not when building locally or on Amazon Linux.)
>
> I thought for a long time that I was alone in this bizarre issue, but
> I eventually found sbt#6183  and
> SPARK-41063 , but
> both are unfortunately still open.
>
> I found at one point that the issue magically disappeared once
> [SPARK-41408] [BUILD]
> Upgrade scala-maven-plugin to 4.8.0
> 
>  was
> merged, but then it cropped back up again at some point after that, and I
> used git bisect to find that the issue appeared again when
> [SPARK-27561] [SQL]
> Support implicit lateral column alias resolution on Project
> 
>  was
> merged. This commit didn't even directly affect anything in
> hive-thriftserver, but it does make some pretty big changes to pretty core
> classes in sql/catalyst, so it's not too surprising that this could 
> trigger
> an issue that seems to have to do with "very complicated inheritance
> hierarchies involving both Java and Scala", which is a phrase mentioned on
> sbt#6183 .
>
> One thing that I did find to help was to
> delete sql/hive-thriftserver/target between building Spark and running the
> tests. This helps in my builds where the issue only occurs during the
> testing phase and not during the initial build phase, but of course it
> doesn't help in my builds where the issue occurs during that first build
> phase.
>
> ~ Jonathan Kelly
>
> On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:
>
>> Has anyone seen this behavior -- I've never seen it before. The Hive
>> thriftserver module for me just goes into an infinite loop when running
>> tests:
>>
>> ...
>> [INFO] done compiling
>> [INFO] compiling 22 Scala sources and 24 Java sources to
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
>> ...
>> [INFO] done compiling
>> [INFO] compiling 22 Scala sources and 9 Java sources to
>> /mnt/data/testing/spark-3.4.0/sql/hive-thrift

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-03 Thread Jonathan Kelly
So did I... :-( However, there had been no new JIRA issue or PR that has
mentioned this test case specifically, until
https://issues.apache.org/jira/browse/SPARK-42665, created just a minute
ago.

On Fri, Mar 3, 2023 at 5:13 AM Sean Owen  wrote:

> Oh OK, I thought this RC was meant to fix that.
>
> On Fri, Mar 3, 2023 at 12:35 AM Jonathan Kelly 
> wrote:
>
>> I see that one too but have not investigated it myself. In the RC1
>> thread, it was mentioned that this occurs when running the tests via Maven
>> but not via SBT. Does the test class path get set up differently when
>> running via SBT vs. Maven?
>>
>> On Thu, Mar 2, 2023 at 5:37 PM Sean Owen  wrote:
>>
>>> Thanks, that's good to know. The workaround (deleting the thriftserver
>>> target dir) works for me. Who knows?
>>>
>>> But I'm also still seeing:
>>>
>>> - simple udf *** FAILED ***
>>>   io.grpc.StatusRuntimeException: INTERNAL:
>>> org.apache.spark.sql.ClientE2ETestSuite
>>>   at io.grpc.Status.asRuntimeException(Status.java:535)
>>>   at
>>> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>>>   at org.apache.spark.sql.connect.client.SparkResult.org
>>> $apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>>>   at
>>> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>>>   at
>>> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>>>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>>>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>>>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>>>   at
>>> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>>>   at
>>> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>>>
>>> On Thu, Mar 2, 2023 at 4:38 PM Jonathan Kelly 
>>> wrote:
>>>
 Yes, this issue has driven me quite crazy as well! I hit this issue for
 a long time when compiling the master branch and running tests. Strangely,
 it would only occur, as you say, when running the tests and not during an
 initial build that skips running the tests. (However, I have seen instances
 where it does occur even in the initial build with tests skipped, but only
 on AWS CodeBuild, not when building locally or on Amazon Linux.)

 I thought for a long time that I was alone in this bizarre issue, but I
 eventually found sbt#6183  and
 SPARK-41063 , but
 both are unfortunately still open.

 I found at one point that the issue magically disappeared once
 [SPARK-41408] [BUILD]
 Upgrade scala-maven-plugin to 4.8.0
 
  was
 merged, but then it cropped back up again at some point after that, and I
 used git bisect to find that the issue appeared again when
 [SPARK-27561] [SQL]
 Support implicit lateral column alias resolution on Project
 
  was
 merged. This commit didn't even directly affect anything in
 hive-thriftserver, but it does make some pretty big changes to pretty core
 classes in sql/catalyst, so it's not too surprising that this could trigger
 an issue that seems to have to do with "very complicated inheritance
 hierarchies involving both Java and Scala", which is a phrase mentioned on
 sbt#6183 .

 One thing that I did find to help was to
 delete sql/hive-thriftserver/target between building Spark and running the
 tests. This helps in my builds where the issue only occurs during the
 testing phase and not during the initial build phase, but of course it
 doesn't help in my builds where the issue occurs during that first build
 phase.

 ~ Jonathan Kelly

 On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:

> Has anyone seen this behavior -- I've never seen it before. The Hive
> thriftserver module for me just goes into an infinite loop when running
> tests:
>
> ...
> [INFO] done compiling
> [INFO] compiling 22 Scala sources and 24 Java sources to
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
> ...
> [INFO] done compiling
> [INFO] compiling 22 Scala sources and 9 Java sources to
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
> ...
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
>  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
> [WARNING] [W

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-03 Thread Sean Owen
Oh OK, I thought this RC was meant to fix that.

On Fri, Mar 3, 2023 at 12:35 AM Jonathan Kelly 
wrote:

> I see that one too but have not investigated it myself. In the RC1 thread,
> it was mentioned that this occurs when running the tests via Maven but not
> via SBT. Does the test class path get set up differently when running via
> SBT vs. Maven?
>
> On Thu, Mar 2, 2023 at 5:37 PM Sean Owen  wrote:
>
>> Thanks, that's good to know. The workaround (deleting the thriftserver
>> target dir) works for me. Who knows?
>>
>> But I'm also still seeing:
>>
>> - simple udf *** FAILED ***
>>   io.grpc.StatusRuntimeException: INTERNAL:
>> org.apache.spark.sql.ClientE2ETestSuite
>>   at io.grpc.Status.asRuntimeException(Status.java:535)
>>   at
>> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>>   at org.apache.spark.sql.connect.client.SparkResult.org
>> $apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>>   at
>> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>>   at
>> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>>   at
>> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>>   at
>> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>>
>> On Thu, Mar 2, 2023 at 4:38 PM Jonathan Kelly 
>> wrote:
>>
>>> Yes, this issue has driven me quite crazy as well! I hit this issue for
>>> a long time when compiling the master branch and running tests. Strangely,
>>> it would only occur, as you say, when running the tests and not during an
>>> initial build that skips running the tests. (However, I have seen instances
>>> where it does occur even in the initial build with tests skipped, but only
>>> on AWS CodeBuild, not when building locally or on Amazon Linux.)
>>>
>>> I thought for a long time that I was alone in this bizarre issue, but I
>>> eventually found sbt#6183  and
>>> SPARK-41063 , but
>>> both are unfortunately still open.
>>>
>>> I found at one point that the issue magically disappeared once
>>> [SPARK-41408] [BUILD]
>>> Upgrade scala-maven-plugin to 4.8.0
>>> 
>>>  was
>>> merged, but then it cropped back up again at some point after that, and I
>>> used git bisect to find that the issue appeared again when [SPARK-27561]
>>> [SQL] Support
>>> implicit lateral column alias resolution on Project
>>> 
>>>  was
>>> merged. This commit didn't even directly affect anything in
>>> hive-thriftserver, but it does make some pretty big changes to pretty core
>>> classes in sql/catalyst, so it's not too surprising that this could trigger
>>> an issue that seems to have to do with "very complicated inheritance
>>> hierarchies involving both Java and Scala", which is a phrase mentioned on
>>> sbt#6183 .
>>>
>>> One thing that I did find to help was to
>>> delete sql/hive-thriftserver/target between building Spark and running the
>>> tests. This helps in my builds where the issue only occurs during the
>>> testing phase and not during the initial build phase, but of course it
>>> doesn't help in my builds where the issue occurs during that first build
>>> phase.
>>>
>>> ~ Jonathan Kelly
>>>
>>> On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:
>>>
 Has anyone seen this behavior -- I've never seen it before. The Hive
 thriftserver module for me just goes into an infinite loop when running
 tests:

 ...
 [INFO] done compiling
 [INFO] compiling 22 Scala sources and 24 Java sources to
 /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
 ...
 [INFO] done compiling
 [INFO] compiling 22 Scala sources and 9 Java sources to
 /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
 ...
 [WARNING] [Warn]
 /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
 [WARNING] [Warn]
 /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:333:18:
  [deprecation] authorize(UserGroupInformation,String,Configuration) in
 ProxyUsers has been deprecated
 [WARNING] [Warn]
 /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apach