Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-29 Thread L. C. Hsieh
+1

Thanks Dongjoon!

On Wed, Nov 29, 2023 at 7:53 PM Mridul Muralidharan  wrote:
>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>
> Regards,
> Mridul
>
> On Wed, Nov 29, 2023 at 5:08 AM Yang Jie  wrote:
>>
>> +1(non-binding)
>>
>> Jie Yang
>>
>> On 2023/11/29 02:08:04 Kent Yao wrote:
>> > +1(non-binding)
>> >
>> > Kent Yao
>> >
>> > On 2023/11/27 01:12:53 Dongjoon Hyun wrote:
>> > > Hi, Marc.
>> > >
>> > > Given that it exists in 3.4.0 and 3.4.1, I don't think it's a release
>> > > blocker for Apache Spark 3.4.2.
>> > >
>> > > When the patch is ready, we can consider it for 3.4.3.
>> > >
>> > > In addition, note that we categorized release-blocker-level issues by
>> > > marking 'Blocker' priority with `Target Version` before the vote.
>> > >
>> > > Best,
>> > > Dongjoon.
>> > >
>> > >
>> > > On Sat, Nov 25, 2023 at 12:01 PM Marc Le Bihan  
>> > > wrote:
>> > >
>> > > > -1 If you can wait that the last remaining problem with Generics (?) is
>> > > > entirely solved, that causes this exception to be thrown :
>> > > >
>> > > > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast 
>> > > > to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and 
>> > > > [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 
>> > > > 'bootstrap')
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
>> > > > at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
>> > > > at 
>> > > > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
>> > > > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
>> > > > at org.apache.spark.sql.Encoders.bean(Encoders.scala)
>> > > >
>> > > >
>> > > > https://issues.apache.org/jira/browse/SPARK-45311
>> > > >
>> > > > Thanks !
>> > > >
>> > > > Marc Le Bihan
>> > > >
>> > > >
>> > > > On 25/11/2023 11:48, Dongjoon Hyun wrote:
>> > > >
>> > > > Please vote on releasing the following candidate as Apache Spark 
>> > > > version
>> > > > 3.4.2.
>> > > >
>> > > > The vote is open until November 30th 1AM (PST) and passes if a 
>> > > > majority +1
>> > > > PMC votes are cast, with a minimum of 3 +1 votes.
>> > > >
>> > > > [ ] +1 Release this package as Apache Spark 3.4.2
>> > > > [ ] -1 Do not release this package because ...
>> > > >
>> > > > To learn more about Apache Spark, please see https://spark.apache.org/
>> > > >
>> > > > The tag to be voted on is v3.4.2-rc1 (commit
>> > > > 0c0e7d4087c64efca259b4fb656b8be643be5686)
>> > > > https://github.com/apache/spark/tree/v3.4.2-rc1
>> > > >
>> > > > The release files, including signatures, digests, etc. can be found at:
>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-bin/
>> > > >
>> > > > Signatures used for Spark RCs can be found in this file:
>> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> > > >
>> > > > The staging repository for this release can be found at:
>> > > > https://repository.apache.org/content/repositories/orgapachespark-1450/
>> > > >
>> > > > The documentation corresponding to this release can be found at:
>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-docs/
>> > > >
>> > > > The list of bug fixes going into 3.4.2 can be found at the following 
>> > > > URL:
>> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353368
>> > > >
>> > > > This release is using the release script of the tag v3.4.2-rc1.
>> > > >
>> > > > FAQ
>> > > >
>> > > > =
>> > > > How can I help test this release?
>> > > > =
>> > > >
>> > > > If you are a Spark user, you can help us test this release by taking
>> > > > an existing Spark workload and running on this release candidate, then
>> > > > reporting any regressions.
>> > > >
>> > > > If you're working in PySpark you can set up a virtual env and install
>> > > > the current RC and see if anything important breaks, in the Java/Scala
>> > > > you can add the staging repository to your projects resolvers and test
>> > > > with the RC (make sure to clean up the artifact cache before/after so
>> > > > you don't end up building with a out of date RC going forward).
>> > > >
>> > > > ===
>> > > > What should happen to JIRA tickets still targeting 3.4.2?
>> > > > ===
>> > > >
>> > > > The current list of open 

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2023-11-29 Thread Anish Shrigondekar
Hi dev,

Addressed the comments that Jungtaek had on the doc. Bumping the thread
once again to see if other folks have any feedback on the proposal.

Thanks,
Anish

On Mon, Nov 27, 2023 at 8:15 PM Jungtaek Lim 
wrote:

> Kindly bump for better reach after the long holiday. Please kindly review
> the proposal which opens the chance to address complex use cases of
> streaming. Thanks!
>
> On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim 
> wrote:
>
>> Thanks Anish for proposing SPIP and initiating this thread! I believe
>> this SPIP will help a bunch of complex use cases on streaming.
>>
>> dev@: We are coincidentally initiating this discussion in thanksgiving
>> holidays. We understand people in the US may not have time to review the
>> SPIP, and we plan to bump this thread in early next week. We are open for
>> any feedback from non-US during the holiday. We can either address feedback
>> altogether after the holiday (Anish is in the US) or I can answer if the
>> feedback is more about the question. Thanks!
>>
>> On Thu, Nov 23, 2023 at 5:27 AM Anish Shrigondekar <
>> anish.shrigonde...@databricks.com> wrote:
>>
>>> Hi dev,
>>>
>>> I would like to start a discussion on "Structured Streaming - Arbitrary
>>> State API v2". This proposal aims to address a bunch of limitations we see
>>> today using mapGroupsWithState/flatMapGroupsWithState operator. The
>>> detailed set of limitations is described in the SPIP doc.
>>>
>>> We propose to support various features such as multiple state variables
>>> (flexible data modeling), composite types, enhanced timer functionality,
>>> support for chaining operators after new operator, handling initial state
>>> along with state data source, schema evolution etc This will allow users to
>>> write more powerful streaming state management logic primarily used in
>>> operational use-cases. Other built-in stateful operators could also benefit
>>> from such changes in the future.
>>>
>>> JIRA: https://issues.apache.org/jira/browse/SPARK-45939
>>> SPIP:
>>> https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
>>> Design Doc:
>>> https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing
>>>
>>> cc - @Jungtaek Lim   who has graciously
>>> agreed to be the shepherd for this project
>>>
>>> Looking forward to your feedback !
>>>
>>> Thanks,
>>> Anish
>>>
>>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-29 Thread Shiqi Sun
Hi Zhou,

Thanks for the reply. For the language choice, since I don't think I've
used many k8s components written in Java on k8s, I can't really tell, but
at least for the components written in Golang, they are well-organized,
easy to read/maintain and run well in general. In addition, goroutines
really ease things a lot when writing concurrency code. Golang also has a
lot less boilerplates, no complicated inheritance and easier dependency
management and linting toolings. Together with all these points, that's why
I prefer Golang for this k8s operator. I understand the Spark maintainers
are more familiar with JVM languages, but I think we should consider the
performance and maintainability vs the learning curve, to choose an option
that can win in the long run. Plus, I believe most of the Spark maintainers
who touch k8s related parts in the Spark project already have experiences
with Golang, so it shouldn't be a big problem. Our team had some experience
with the fabric8 client a couple years ago, and we've experienced some
issues with its reliability, mainly about the request dropping issue (i.e.
code call is made but the apiserver never receives the request), but that
was awhile ago and I'm not sure whether everything is good with the client
now. Anyway, this is my opinion about the language choice, and I will let
other people comment about it as well.

For compatibility, yes please make the CRD compatible from the user's
standpoint, so that it's easy for people to adopt the new operator. The
goal is to consolidate the many spark operators on the market to this new
official operator, so an easy adoption experience is the key.

Also, I feel that the discussion is pretty high level, and it's because the
only info revealed for this new operator is the SPIP doc and I haven't got
a chance to see the code yet. I understand the new operator project might
still not be open-sourced yet, but is there any way for me to take an early
peek into the code of your operator, so that we can discuss more
specifically about the points of language choice and compatibility? Thank
you so much!

Best,
Shiqi

On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang  wrote:

> Hi Shiqi,
>
> Thanks for the cross-posting here - sorry for the response delay during
> the holiday break :)
> We prefer Java for the operator project as it's JVM-based and widely
> familiar within the Spark community. This choice aims to facilitate better
> adoption and ease of onboarding for future maintainers. In addition, the
> Java API client can also be considered as a mature option widely used, by
> Spark itself and by other operator implementations like Flink.
> For easier onboarding and potential migration, we'll consider
> compatibility with existing CRD designs - the goal is to maintain
> compatibility as best as possible while minimizing duplication efforts.
> I'm enthusiastic about the idea of lean, version agnostic submission
> worker. It aligns with one of the primary goals in the operator design.
> Let's continue exploring this idea further in design doc.
>
> Thanks,
> Zhou
>
>
> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun  wrote:
>
>> Hi all,
>>
>> Sorry for being late to the party. I went through the SPIP doc and I
>> think this is a great proposal! I left a comment in the SPIP doc a couple
>> days ago, but I don't see much activity there and no one replied, so I
>> wanted to cross-post it here to get some feedback.
>>
>> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team
>> has been running the Spark on k8s operator
>>  (OSS from
>> Google) in my company to serve Spark users on production for 4+ years, and
>> we've been actively contributing to the Spark on k8s operator OSS and also,
>> occasionally, the Spark OSS. According to our experience, Google's Spark
>> Operator has its own problems, like its close coupling with the spark
>> version, as well as the JVM overhead during job submission. However on the
>> other side, it's been a great component in our team's service in the
>> company, especially being written in golang, it's really easy to have it
>> interact with k8s, and also its CRD covers a lot of different use cases, as
>> it has been built up through time thanks to many users' contribution during
>> these years. There were also a handful of sessions of Google's Spark
>> Operator Spark Summit that made it widely adopted.
>>
>> For this SPIP, I really love the idea of this proposal for the official
>> k8s operator of Spark project, as well as the separate layer of the
>> submission worker and being spark version agnostic. I think we can get the
>> best of the two:
>> 1. I would advocate the new project to still use golang for the
>> implementation, as golang is the go-to cloud native language that works the
>> best with k8s.
>> 2. We make sure the functionality of the current Google's spark operator
>> CRD is preserved in the new official Spark Operator; if we can 

Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-29 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Wed, Nov 29, 2023 at 5:08 AM Yang Jie  wrote:

> +1(non-binding)
>
> Jie Yang
>
> On 2023/11/29 02:08:04 Kent Yao wrote:
> > +1(non-binding)
> >
> > Kent Yao
> >
> > On 2023/11/27 01:12:53 Dongjoon Hyun wrote:
> > > Hi, Marc.
> > >
> > > Given that it exists in 3.4.0 and 3.4.1, I don't think it's a release
> > > blocker for Apache Spark 3.4.2.
> > >
> > > When the patch is ready, we can consider it for 3.4.3.
> > >
> > > In addition, note that we categorized release-blocker-level issues by
> > > marking 'Blocker' priority with `Target Version` before the vote.
> > >
> > > Best,
> > > Dongjoon.
> > >
> > >
> > > On Sat, Nov 25, 2023 at 12:01 PM Marc Le Bihan 
> wrote:
> > >
> > > > -1 If you can wait that the last remaining problem with Generics (?)
> is
> > > > entirely solved, that causes this exception to be thrown :
> > > >
> > > > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be
> cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and
> [Ljava.lang.reflect.TypeVariable; are in module java.base of loader
> 'bootstrap')
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
> > > > at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
> > > > at
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
> > > > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
> > > > at org.apache.spark.sql.Encoders.bean(Encoders.scala)
> > > >
> > > >
> > > > https://issues.apache.org/jira/browse/SPARK-45311
> > > >
> > > > Thanks !
> > > >
> > > > Marc Le Bihan
> > > >
> > > >
> > > > On 25/11/2023 11:48, Dongjoon Hyun wrote:
> > > >
> > > > Please vote on releasing the following candidate as Apache Spark
> version
> > > > 3.4.2.
> > > >
> > > > The vote is open until November 30th 1AM (PST) and passes if a
> majority +1
> > > > PMC votes are cast, with a minimum of 3 +1 votes.
> > > >
> > > > [ ] +1 Release this package as Apache Spark 3.4.2
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > To learn more about Apache Spark, please see
> https://spark.apache.org/
> > > >
> > > > The tag to be voted on is v3.4.2-rc1 (commit
> > > > 0c0e7d4087c64efca259b4fb656b8be643be5686)
> > > > https://github.com/apache/spark/tree/v3.4.2-rc1
> > > >
> > > > The release files, including signatures, digests, etc. can be found
> at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-bin/
> > > >
> > > > Signatures used for Spark RCs can be found in this file:
> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > > >
> > > > The staging repository for this release can be found at:
> > > >
> https://repository.apache.org/content/repositories/orgapachespark-1450/
> > > >
> > > > The documentation corresponding to this release can be found at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-docs/
> > > >
> > > > The list of bug fixes going into 3.4.2 can be found at the following
> URL:
> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353368
> > > >
> > > > This release is using the release script of the tag v3.4.2-rc1.
> > > >
> > > > FAQ
> > > >
> > > > =
> > > > How can I help test this release?
> > > > =
> > > >
> > > > If you are a Spark user, you can help us test this release by taking
> > > > an existing Spark workload and running on this release candidate,
> then
> > > > reporting any regressions.
> > > >
> > > > If you're working in PySpark you can set up a virtual env and install
> > > > the current RC and see if anything important breaks, in the
> Java/Scala
> > > > you can add the staging repository to your projects resolvers and
> test
> > > > with the RC (make sure to clean up the artifact cache before/after so
> > > > you don't end up building with a out of date RC going forward).
> > > >
> > > > ===
> > > > What should happen to JIRA tickets still targeting 3.4.2?
> > > > ===
> > > >
> > > > The current list of open tickets targeted at 3.4.2 can be found at:
> > > > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > > > Version/s" = 3.4.2
> > > >
> > > > Committers should look at those and triage. Extremely important bug
> > > > fixes, documentation, and API tweaks that impact 

[sql] how to connect query stage to Spark job/stages?

2023-11-29 Thread Chenghao Lyu
Hi,

I am seeking advice on measuring the performance of each QueryStage (QS) when 
AQE is enabled in Spark SQL. Specifically, I need help to automatically map a 
QS to its corresponding jobs (or stages) to get the QS runtime metrics.

I recorded the QS structure via a customized injected Query Stage Optimizer 
Rule. However, I am blocked by mapping a QS to its corresponding jobs (or 
stages) to aggregate its runtime metrics. I have tried the SparkListener, but 
neither the SparkListenerJobStart nor the SparkListenerStageSubmitted wraps the 
level of details that can match itself to a QS.

I am thinking of re-compiling Spark to enable the mapping. However, I am not 
experienced in the Spark source code…

Thanks for your help!

Cheers,
Chenghao


Re: Remove HiveContext from Apache Spark 4.0

2023-11-29 Thread Yang Jie
Thank you very much for the feedback from Dongjoon and Xiao Li. 

After carefully reading  
https://lists.apache.org/thread/mrx0y078cf3ozs7czykvv864y6dr55xq, I have 
decided to abandon the deletion of HiveContext. As Xiao Li said, its 
maintenance cost is not high, but it will increase the cost of users migrating 
to Spark4.0, so I also believe it is not worth deleting it in this context.

Thanks 
Jie Yang

On 2023/11/29 17:13:16 Xiao Li wrote:
> Thank you for raising it in the dev list. I do not think we should remove
> HiveContext based on the cost of break and maintenance.
> 
> FYI, when releasing Spark 3.0, we had a lot of discussions about the
> related topics
> https://lists.apache.org/thread/mrx0y078cf3ozs7czykvv864y6dr55xq
> 
> 
> Dongjoon Hyun  于2023年11月29日周三 08:43写道:
> 
> > Thank you for the heads-up.
> >
> > I agree with your intention and the fact that it's not useful in Apache
> > Spark 4.0.0.
> >
> > However, as you know, historically, it was removed once and explicitly
> > added back to the Apache Spark 3.0 via the vote.
> >
> > SPARK-31088 Add back HiveContext and createExternalTable
> > (As a subtask of SPARK-31085 Amend Spark's Semantic Versioning Policy)
> >
> > Like you, I'd love to remove that too, but it's a little hard to remove it
> > from Apache Spark 4.0.0 under our AS-IS versioning policy and history.
> >
> > I believe a new specific vote could make it possible to remove HiveContext
> > (if we need to remove it).
> >
> > So, do you want to delete it from Apache Spark 4.0.0 via the official
> > community vote with this thread context?
> >
> > Thanks,
> > Dongjoon.
> >
> >
> > On Wed, Nov 29, 2023 at 3:03 AM 杨杰  wrote:
> >
> >> Hi all,
> >>
> >> In SPARK-46171 (apache/spark#44077 [1]), I’m trying to remove the
> >> deprecated HiveContext from Apache Spark 4.0 since HiveContext has been
> >> marked as deprecated after Spark 2.0. This is a long-deprecated API, it
> >> should be replaced with SparkSession with enableHiveSupport now, so I think
> >> it's time to remove it.
> >>
> >> Feel free to comment if you have any concerns.
> >>
> >> [1] https://github.com/apache/spark/pull/44077
> >>
> >> Thanks,
> >> Jie Yang
> >>
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Remove HiveContext from Apache Spark 4.0

2023-11-29 Thread Xiao Li
Thank you for raising it in the dev list. I do not think we should remove
HiveContext based on the cost of break and maintenance.

FYI, when releasing Spark 3.0, we had a lot of discussions about the
related topics
https://lists.apache.org/thread/mrx0y078cf3ozs7czykvv864y6dr55xq


Dongjoon Hyun  于2023年11月29日周三 08:43写道:

> Thank you for the heads-up.
>
> I agree with your intention and the fact that it's not useful in Apache
> Spark 4.0.0.
>
> However, as you know, historically, it was removed once and explicitly
> added back to the Apache Spark 3.0 via the vote.
>
> SPARK-31088 Add back HiveContext and createExternalTable
> (As a subtask of SPARK-31085 Amend Spark's Semantic Versioning Policy)
>
> Like you, I'd love to remove that too, but it's a little hard to remove it
> from Apache Spark 4.0.0 under our AS-IS versioning policy and history.
>
> I believe a new specific vote could make it possible to remove HiveContext
> (if we need to remove it).
>
> So, do you want to delete it from Apache Spark 4.0.0 via the official
> community vote with this thread context?
>
> Thanks,
> Dongjoon.
>
>
> On Wed, Nov 29, 2023 at 3:03 AM 杨杰  wrote:
>
>> Hi all,
>>
>> In SPARK-46171 (apache/spark#44077 [1]), I’m trying to remove the
>> deprecated HiveContext from Apache Spark 4.0 since HiveContext has been
>> marked as deprecated after Spark 2.0. This is a long-deprecated API, it
>> should be replaced with SparkSession with enableHiveSupport now, so I think
>> it's time to remove it.
>>
>> Feel free to comment if you have any concerns.
>>
>> [1] https://github.com/apache/spark/pull/44077
>>
>> Thanks,
>> Jie Yang
>>
>


Re: Remove HiveContext from Apache Spark 4.0

2023-11-29 Thread Dongjoon Hyun
Thank you for the heads-up.

I agree with your intention and the fact that it's not useful in Apache
Spark 4.0.0.

However, as you know, historically, it was removed once and explicitly
added back to the Apache Spark 3.0 via the vote.

SPARK-31088 Add back HiveContext and createExternalTable
(As a subtask of SPARK-31085 Amend Spark's Semantic Versioning Policy)

Like you, I'd love to remove that too, but it's a little hard to remove it
from Apache Spark 4.0.0 under our AS-IS versioning policy and history.

I believe a new specific vote could make it possible to remove HiveContext
(if we need to remove it).

So, do you want to delete it from Apache Spark 4.0.0 via the official
community vote with this thread context?

Thanks,
Dongjoon.


On Wed, Nov 29, 2023 at 3:03 AM 杨杰  wrote:

> Hi all,
>
> In SPARK-46171 (apache/spark#44077 [1]), I’m trying to remove the
> deprecated HiveContext from Apache Spark 4.0 since HiveContext has been
> marked as deprecated after Spark 2.0. This is a long-deprecated API, it
> should be replaced with SparkSession with enableHiveSupport now, so I think
> it's time to remove it.
>
> Feel free to comment if you have any concerns.
>
> [1] https://github.com/apache/spark/pull/44077
>
> Thanks,
> Jie Yang
>


Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-29 Thread Yang Jie
+1(non-binding)

Jie Yang

On 2023/11/29 02:08:04 Kent Yao wrote:
> +1(non-binding)
> 
> Kent Yao
> 
> On 2023/11/27 01:12:53 Dongjoon Hyun wrote:
> > Hi, Marc.
> > 
> > Given that it exists in 3.4.0 and 3.4.1, I don't think it's a release
> > blocker for Apache Spark 3.4.2.
> > 
> > When the patch is ready, we can consider it for 3.4.3.
> > 
> > In addition, note that we categorized release-blocker-level issues by
> > marking 'Blocker' priority with `Target Version` before the vote.
> > 
> > Best,
> > Dongjoon.
> > 
> > 
> > On Sat, Nov 25, 2023 at 12:01 PM Marc Le Bihan  wrote:
> > 
> > > -1 If you can wait that the last remaining problem with Generics (?) is
> > > entirely solved, that causes this exception to be thrown :
> > >
> > > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to 
> > > class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and 
> > > [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 
> > > 'bootstrap')
> > > at 
> > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
> > > at 
> > > org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
> > > at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
> > > at 
> > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
> > > at 
> > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
> > > at 
> > > org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
> > > at 
> > > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
> > > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
> > > at org.apache.spark.sql.Encoders.bean(Encoders.scala)
> > >
> > >
> > > https://issues.apache.org/jira/browse/SPARK-45311
> > >
> > > Thanks !
> > >
> > > Marc Le Bihan
> > >
> > >
> > > On 25/11/2023 11:48, Dongjoon Hyun wrote:
> > >
> > > Please vote on releasing the following candidate as Apache Spark version
> > > 3.4.2.
> > >
> > > The vote is open until November 30th 1AM (PST) and passes if a majority +1
> > > PMC votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Release this package as Apache Spark 3.4.2
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org/
> > >
> > > The tag to be voted on is v3.4.2-rc1 (commit
> > > 0c0e7d4087c64efca259b4fb656b8be643be5686)
> > > https://github.com/apache/spark/tree/v3.4.2-rc1
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-bin/
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1450/
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-docs/
> > >
> > > The list of bug fixes going into 3.4.2 can be found at the following URL:
> > > https://issues.apache.org/jira/projects/SPARK/versions/12353368
> > >
> > > This release is using the release script of the tag v3.4.2-rc1.
> > >
> > > FAQ
> > >
> > > =
> > > How can I help test this release?
> > > =
> > >
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC and see if anything important breaks, in the Java/Scala
> > > you can add the staging repository to your projects resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with a out of date RC going forward).
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.4.2?
> > > ===
> > >
> > > The current list of open tickets targeted at 3.4.2 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > > Version/s" = 3.4.2
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > >
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > release. That 

Remove HiveContext from Apache Spark 4.0

2023-11-29 Thread 杨杰
Hi all,

In SPARK-46171 (apache/spark#44077 [1]), I’m trying to remove the
deprecated HiveContext from Apache Spark 4.0 since HiveContext has been
marked as deprecated after Spark 2.0. This is a long-deprecated API, it
should be replaced with SparkSession with enableHiveSupport now, so I think
it's time to remove it.

Feel free to comment if you have any concerns.

[1] https://github.com/apache/spark/pull/44077

Thanks,
Jie Yang