Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Gengliang Wang
+1

On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:

> I'll start with my +1.
>
> Dongjoon.
>
> On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > to `false` by default. The technical scope is defined in the following
> PR.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > - PR: https://github.com/apache/spark/pull/46207
> >
> > The vote is open until April 30th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Gengliang Wang
+1

On Tue, Apr 16, 2024 at 11:57 AM L. C. Hsieh  wrote:

> +1
>
> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan  wrote:
> >
> > +1
> >
> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun 
> wrote:
> >>
> >> I'll start with my +1.
> >>
> >> - Checked checksum and signature
> >> - Checked Scala/Java/R/Python/SQL Document's Spark version
> >> - Checked published Maven artifacts
> >> - All CIs passed.
> >>
> >> Thanks,
> >> Dongjoon.
> >>
> >> On 2024/04/15 04:22:26 Dongjoon Hyun wrote:
> >> > Please vote on releasing the following candidate as Apache Spark
> version
> >> > 3.4.3.
> >> >
> >> > The vote is open until April 18th 1AM (PDT) and passes if a majority
> +1 PMC
> >> > votes are cast, with a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 3.4.3
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >> >
> >> > The tag to be voted on is v3.4.3-rc2 (commit
> >> > 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f)
> >> > https://github.com/apache/spark/tree/v3.4.3-rc2
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-bin/
> >> >
> >> > Signatures used for Spark RCs can be found in this file:
> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1453/
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-docs/
> >> >
> >> > The list of bug fixes going into 3.4.3 can be found at the following
> URL:
> >> > https://issues.apache.org/jira/projects/SPARK/versions/12353987
> >> >
> >> > This release is using the release script of the tag v3.4.3-rc2.
> >> >
> >> > FAQ
> >> >
> >> > =
> >> > How can I help test this release?
> >> > =
> >> >
> >> > If you are a Spark user, you can help us test this release by taking
> >> > an existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > If you're working in PySpark you can set up a virtual env and install
> >> > the current RC and see if anything important breaks, in the Java/Scala
> >> > you can add the staging repository to your projects resolvers and test
> >> > with the RC (make sure to clean up the artifact cache before/after so
> >> > you don't end up building with a out of date RC going forward).
> >> >
> >> > ===
> >> > What should happen to JIRA tickets still targeting 3.4.3?
> >> > ===
> >> >
> >> > The current list of open tickets targeted at 3.4.3 can be found at:
> >> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> >> > Version/s" = 3.4.3
> >> >
> >> > Committers should look at those and triage. Extremely important bug
> >> > fixes, documentation, and API tweaks that impact compatibility should
> >> > be worked on immediately. Everything else please retarget to an
> >> > appropriate release.
> >> >
> >> > ==
> >> > But my bug isn't fixed?
> >> > ==
> >> >
> >> > In order to make timely releases, we will typically not hold the
> >> > release unless the bug in question is a regression from the previous
> >> > release. That being said, if there is something which is a regression
> >> > that has not been correctly targeted please ping me or a committer to
> >> > help target the issue.
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Gengliang Wang
+1

On Sat, Apr 13, 2024 at 3:26 PM Dongjoon Hyun  wrote:

> I'll start from my +1.
>
> Dongjoon.
>
> On 2024/04/13 22:22:05 Dongjoon Hyun wrote:
> > Please vote on SPARK-4 to use ANSI SQL mode by default.
> > The technical scope is defined in the following PR which is
> > one line of code change and one line of migration guide.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-4
> > - PR: https://github.com/apache/spark/pull/46013
> >
> > The vote is open until April 17th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Use ANSI SQL mode by default
> > [ ] -1 Do not use ANSI SQL mode by default because ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Gengliang Wang
+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly
enhance data quality and integrity. I fully support this initiative.

> In other words, the current Spark ANSI SQL implementation becomes the
first implementation for Spark SQL users to face at first while providing
`spark.sql.ansi.enabled=false` in the same way without losing any
capability.`spark.sql.ansi.enabled=false` in the same way without losing
any capability.

BTW, the try_*
<https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#useful-functions-for-ansi-mode>
functions and SQL Error Attribution Framework
<https://issues.apache.org/jira/browse/SPARK-38615> will also be beneficial
in migrating to ANSI SQL mode.


Gengliang


On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Thanks to you, we've been achieving many things and have on-going SPIPs.
> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly
> by asking your opinions about Apache Spark's ANSI SQL mode.
>
> https://issues.apache.org/jira/browse/SPARK-44111
> Prepare Apache Spark 4.0.0
>
> SPARK-4 was proposed last year (on 15/Jul/23) as the one of desirable
> items for 4.0.0 because it's a big behavior.
>
> https://issues.apache.org/jira/browse/SPARK-4
> Use ANSI SQL mode by default
>
> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and
> has
> been aiming to provide a better Spark SQL compatibility in a standard way.
> We also have a daily CI to protect the behavior too.
>
> https://github.com/apache/spark/actions/workflows/build_ansi.yml
>
> However, it's still behind the configuration with several known issues,
> e.g.,
>
> SPARK-41794 Reenable ANSI mode in test_connect_column
> SPARK-41547 Reenable ANSI mode in test_connect_functions
> SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard
>
> To be clear, we know that many DBMSes have their own implementations of
> SQL standard and not the same. Like them, SPARK-4 aims to enable
> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
> There is nothing more than that.
>
> In other words, the current Spark ANSI SQL implementation becomes the first
> implementation for Spark SQL users to face at first while providing
> `spark.sql.ansi.enabled=false` in the same way without losing any
> capability.
>
> If we don't want this change for some reasons, we can simply exclude
> SPARK-4 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation.
> It's time just to make a go/no-go decision for this item for the global
> optimization
> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
> for this again for the next four years until 2028.
>
> WDYT?
>
> Bests,
> Dongjoon
>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Gengliang Wang
+1

On Sun, Mar 31, 2024 at 8:24 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you, Hyukjin.
>
> Dongjoon
>
> On Sun, Mar 31, 2024 at 19:07 Haejoon Lee
>  wrote:
>
>> +1
>>
>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>>> Connect)
>>>
>>> JIRA 
>>> Prototype 
>>> SPIP doc
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.
>>>
>>


Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Gengliang Wang
+1, this is a reasonable change.

Gengliang

On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com  wrote:

> Going once, going twice, …. last call for objections
> On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com ,
> wrote:
>
> Hello,
>
> I have a PR https://github.com/apache/spark/pull/45620  ready to go that
> will extend the definition of whitespace (what separates token) from the
> small set of ASCII characters space, tab, linefeed to those defined in
> Unicode.
> While this is a small and safe change, it is one where we would have a
> hard time changing our minds about later.
> It is also a change that, AFAIK, cannot be controlled under a config.
>
> What does the community think?
>
> Cheers
> Serge
> SQL Architect at Databricks
>
>


[VOTE][RESULT] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
The vote passes with 24+1s (13 binding +1s).
Thanks to all who reviewed the SPIP doc and voted!

(* = binding)
+1:
- Haejoon Lee
- Jie Yang
- Hyukjin Kwon (*)
- Wenchen Fan (*)
- Mich Talebzadeh
- Kent Yao
- Denny Lee
- Mridul Muralidharan (*)
- Huaxin Gao (*)
- Dongjoon Hyun (*)
- Xinrong Meng (*)
- Scott
- Jungtaek Lim
- Reynold Xin (*)
- Holden Karau (*)
- Xiao Li (*)
- Chao Sun (*)
- Liang-Chi Hsieh (*)
- rhatlnux
- Robyn Nameth
- John Zhuge
- Ruifeng Zheng (*)
- Tom Graves (*)
- Bo Yang

+0: None

-1: None

Thanks,
Gengliang Wang


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
Thanks all for participating! The vote passed. I'll send out the result in
a separate thread.

On Wed, Mar 13, 2024 at 9:43 AM bo yang  wrote:

> +1
>
> On Wed, Mar 13, 2024 at 7:19 AM Tom Graves 
> wrote:
>
>> Similar as others,  will be interested in working out api's and details
>> but overall in favor of it.
>>
>> +1
>>
>> Tom Graves
>> On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>
>>
>>   I am supportive of the proposal - this is a step in the right direction
>> !
>> Additional metadata (explicit and inferred) for log records, and exposing
>> them for indexing is extremely useful.
>>
>> The specifics of the API still need some work IMO and does not need to be
>> this disruptive, but I consider that is orthogonal to this vote itself -
>> and something we need to iterate upon during PR reviews.
>>
>> +1
>>
>> Regards,
>> Mridul
>>
>>
>> On Mon, Mar 11, 2024 at 11:09 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> +1
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 11 Mar 2024 at 09:27, Hyukjin Kwon  wrote:
>>
>> +1
>>
>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>> wrote:
>>
>> +1
>>
>>
>>
>> Jie Yang
>>
>>
>>
>> *发件人**: *Haejoon Lee 
>> *日期**: *2024年3月11日 星期一 17:09
>> *收件人**: *Gengliang Wang 
>> *抄送**: *dev 
>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>
>>
>>
>> +1
>>
>>
>>
>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang  wrote:
>>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Structured Logging Framework for
>> Apache Spark
>>
>>
>> References:
>>
>>- JIRA ticket
>>
>> <https://mailshield.baidu.com/check?q=godVZoGJGzagfL5fHFKDXe8FOsAuf3UaY0E7uyGx6HVUGGWsmD%2fgOW2x6J1A1XYt8pai0Y8FBhY%3d>
>>- SPIP doc
>>
>> <https://mailshield.baidu.com/check?q=qnzij19o7FucfHJ%2f4C2cBnMVM2kxjtEi9Gv4zA05b3oPw5UX986BZOwzaJ30UdGRMv%2fix31TYpjtazJC5uyypG0pZVBCfSjQGqlzkUoZozkFtgMXfpmRMSSp1%2bq83gkbLyrm1g%3d%3d>
>>- Discussion thread
>>
>> <https://mailshield.baidu.com/check?q=6PGfLtMnDpsSvIF5SlbpQ4%2bwdg53GCedx5r%2b7AOnYMjYwomNs%2fBioZOabP9Ml3b%2bE8jzqXF0xR3j607DdbjV0JOnlvU%3d>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>>
>> Gengliang Wang
>>
>>


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Gengliang Wang
Hi Steve,

thanks for the suggestion in this email thread and the SPIP doc! I will
read the Audit Log and seek your feedback through PR reviews during the
implementation process.

> So worrying about how pass and manage that at the thread level matters.

We can have a specific logger for org.apache.spark and only show specific
keys from log context (MDC).

> The files get really big fast. I'd recommend considering Avro as an
option from the outset.

Agree, I have mentioned how to address this issue in section Q6. What are
the risks?
<https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit#bookmark=id.8zbyavz648i6>


Thanks,
Gengliang

On Mon, Mar 11, 2024 at 9:30 AM huaxin gao  wrote:

> +1
>
> On Mon, Mar 11, 2024 at 7:02 AM Wenchen Fan  wrote:
>
>> +1
>>
>> On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>>> wrote:
>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Jie Yang
>>>>
>>>>
>>>>
>>>> *发件人**: *Haejoon Lee 
>>>> *日期**: *2024年3月11日 星期一 17:09
>>>> *收件人**: *Gengliang Wang 
>>>> *抄送**: *dev 
>>>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>>>
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang 
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'd like to start the vote for SPIP: Structured Logging Framework for
>>>> Apache Spark
>>>>
>>>>
>>>> References:
>>>>
>>>>- JIRA ticket
>>>>
>>>> <https://mailshield.baidu.com/check?q=godVZoGJGzagfL5fHFKDXe8FOsAuf3UaY0E7uyGx6HVUGGWsmD%2fgOW2x6J1A1XYt8pai0Y8FBhY%3d>
>>>>- SPIP doc
>>>>
>>>> <https://mailshield.baidu.com/check?q=qnzij19o7FucfHJ%2f4C2cBnMVM2kxjtEi9Gv4zA05b3oPw5UX986BZOwzaJ30UdGRMv%2fix31TYpjtazJC5uyypG0pZVBCfSjQGqlzkUoZozkFtgMXfpmRMSSp1%2bq83gkbLyrm1g%3d%3d>
>>>>- Discussion thread
>>>>
>>>> <https://mailshield.baidu.com/check?q=6PGfLtMnDpsSvIF5SlbpQ4%2bwdg53GCedx5r%2b7AOnYMjYwomNs%2fBioZOabP9Ml3b%2bE8jzqXF0xR3j607DdbjV0JOnlvU%3d>
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because …
>>>>
>>>> Thanks!
>>>>
>>>> Gengliang Wang
>>>>
>>>>


[VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-10 Thread Gengliang Wang
Hi all,

I'd like to start the vote for SPIP: Structured Logging Framework for
Apache Spark

References:

   - JIRA ticket <https://issues.apache.org/jira/browse/SPARK-47240>
   - SPIP doc
   
<https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing>
   - Discussion thread
   <https://lists.apache.org/thread/gocslhbfv1r84kbcq3xt04nx827ljpxq>

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thanks!
Gengliang Wang


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-10 Thread Gengliang Wang
Thanks everyone for the valuable feedback!

Given the generally positive feedback received, I plan to move forward by
initiating the voting thread. I encourage you to participate in the
upcoming thread.

Warm regards,
Gengliang

On Sat, Mar 9, 2024 at 12:55 PM Mich Talebzadeh 
wrote:

> Splendid. Thanks Gengliang
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Sat, 9 Mar 2024 at 18:10, Gengliang Wang  wrote:
>
>> Hi Mich,
>>
>> Thanks for your suggestions. I agree that we should avoid confusion with
>> Spark Structured Streaming.
>>
>> So, I'll go with "Structured Logging Framework for Apache Spark". This
>> keeps the standard term "Structured Logging" and distinguishes it from
>> "Structured Streaming" clearly.
>>
>> Thanks for helping shape this!
>>
>> Best,
>> Gengliang
>>
>> On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Gengliang,
>>>
>>> Thanks for taking the initiative to improve the Spark logging system.
>>> Transitioning to structured logs seems like a worthy way to enhance the
>>> ability to analyze and troubleshoot Spark jobs and hopefully  the future
>>> integration with cloud logging systems. While "Structured Spark Logging"
>>> sounds good, I was wondering if we could consider an alternative name.
>>> Since we already use "Spark Structured Streaming", there might be a slight
>>> initial confusion with the terminology. I must confess it was my initial
>>> reaction so to speak.
>>>
>>> Here are a few alternative names I came up with if I may
>>>
>>>- Spark Log Schema Initiative
>>>- Centralized Logging with Structured Data for Spark
>>>- Enhanced Spark Logging with Queryable Format
>>>
>>> These options all highlight the key aspects of your proposal namely;
>>> schema, centralized logging and queryability and might be even clearer for
>>> everyone at first glance.
>>>
>>> Cheers
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>
>>>
>>> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>>>
>>>> Hi All,
>>>>
>>>> I propose to enhance our logging system by transitioning to structured
>>>> logs. This initiative is designed to tackle the challenges of analyzing
>>>> distributed logs from drivers, workers, and executors by allowing them to
>>>> be queried using a fixed schema. The goal is to improve the informativeness
>>>> and accessibility of logs, making it significantly easier to diagnose
>>>> issues.
>>>>
>>>> Key benefits include:
>>>>
>>>>- Clarity and queryability of distributed log files.
>>>>- Continued support for log4j, allowing users to switch back to
>>>>traditional text logging if preferred.
>>>>
>>>> The improvement will simplify debugging and enhance productivity
>>>> without disrupting existing logging practices. The implementation is
>>>> estimated to take around 3 months.
>>>>
>>>> *SPIP*:
>>>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>>>> *JIRA*: SPARK-47240 <https://issues.apache.org/jira/browse/SPARK-47240>
>>>>
>>>> Your comments and feedback would be greatly appreciated.
>>>>
>>>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Gengliang Wang
Hi Mich,

Thanks for your suggestions. I agree that we should avoid confusion with
Spark Structured Streaming.

So, I'll go with "Structured Logging Framework for Apache Spark". This
keeps the standard term "Structured Logging" and distinguishes it from
"Structured Streaming" clearly.

Thanks for helping shape this!

Best,
Gengliang

On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh 
wrote:

> Hi Gengliang,
>
> Thanks for taking the initiative to improve the Spark logging system.
> Transitioning to structured logs seems like a worthy way to enhance the
> ability to analyze and troubleshoot Spark jobs and hopefully  the future
> integration with cloud logging systems. While "Structured Spark Logging"
> sounds good, I was wondering if we could consider an alternative name.
> Since we already use "Spark Structured Streaming", there might be a slight
> initial confusion with the terminology. I must confess it was my initial
> reaction so to speak.
>
> Here are a few alternative names I came up with if I may
>
>- Spark Log Schema Initiative
>- Centralized Logging with Structured Data for Spark
>- Enhanced Spark Logging with Queryable Format
>
> These options all highlight the key aspects of your proposal namely;
> schema, centralized logging and queryability and might be even clearer for
> everyone at first glance.
>
> Cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>
>> Hi All,
>>
>> I propose to enhance our logging system by transitioning to structured
>> logs. This initiative is designed to tackle the challenges of analyzing
>> distributed logs from drivers, workers, and executors by allowing them to
>> be queried using a fixed schema. The goal is to improve the informativeness
>> and accessibility of logs, making it significantly easier to diagnose
>> issues.
>>
>> Key benefits include:
>>
>>- Clarity and queryability of distributed log files.
>>- Continued support for log4j, allowing users to switch back to
>>traditional text logging if preferred.
>>
>> The improvement will simplify debugging and enhance productivity without
>> disrupting existing logging practices. The implementation is estimated to
>> take around 3 months.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>> *JIRA*: SPARK-47240 <https://issues.apache.org/jira/browse/SPARK-47240>
>>
>> Your comments and feedback would be greatly appreciated.
>>
>


[DISCUSS] SPIP: Structured Spark Logging

2024-02-29 Thread Gengliang Wang
Hi All,

I propose to enhance our logging system by transitioning to structured
logs. This initiative is designed to tackle the challenges of analyzing
distributed logs from drivers, workers, and executors by allowing them to
be queried using a fixed schema. The goal is to improve the informativeness
and accessibility of logs, making it significantly easier to diagnose
issues.

Key benefits include:

   - Clarity and queryability of distributed log files.
   - Continued support for log4j, allowing users to switch back to
   traditional text logging if preferred.

The improvement will simplify debugging and enhance productivity without
disrupting existing logging practices. The implementation is estimated to
take around 3 months.

*SPIP*:
https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
*JIRA*: SPARK-47240 

Your comments and feedback would be greatly appreciated.


Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread Gengliang Wang
+1

On Sun, Feb 4, 2024 at 1:57 PM Hussein Awala  wrote:

> +1
>
> On Sun, Feb 4, 2024 at 10:13 PM John Zhuge  wrote:
>
>> +1
>>
>> John Zhuge
>>
>>
>> On Sun, Feb 4, 2024 at 11:23 AM Santosh Pingale
>>  wrote:
>>
>>> +1
>>>
>>> On Sun, Feb 4, 2024, 8:18 PM Xiao Li 
>>> wrote:
>>>
 +1

 On Sun, Feb 4, 2024 at 6:07 AM beliefer  wrote:

> +1
>
>
>
> 在 2024-02-04 15:26:13,"Dongjoon Hyun"  写道:
>
> +1
>
> On Sat, Feb 3, 2024 at 9:18 PM yangjie01 
> wrote:
>
>> +1
>>
>> 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>>
>> 写入:
>>
>>
>> +1
>>
>>
>> Jungtaek Lim > kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道:
>> >
>> > Hi dev,
>> >
>> > looks like there are a huge number of commits being pushed to
>> branch-3.5 after 3.5.0 was released, 200+ commits.
>> >
>> > $ git log --oneline v3.5.0..HEAD | wc -l
>> > 202
>> >
>> > Also, there are 180 JIRA tickets containing 3.5.1 as fixed version,
>> and 10 resolved issues are either marked as blocker (even correctness
>> issues) or critical, which justifies the release.
>> > https://issues.apache.org/jira/projects/SPARK/versions/12353495 <
>> https://issues.apache.org/jira/projects/SPARK/versions/12353495>
>> >
>> > What do you think about releasing 3.5.1 with the current head of
>> branch-3.5? I'm happy to volunteer as the release manager.
>> >
>> > Thanks,
>> > Jungtaek Lim (HeartSaVioR)
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>>
>>
>>
>>
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

 --




Re: Algolia search on website is broken

2023-12-10 Thread Gengliang Wang
Hi Nick,

Thank you for reporting the issue with our web crawler.

I've found that the issue was due to a change(specifically, pull request
#40269 <https://github.com/apache/spark/pull/40269>) in the website's HTML
structure, where the JavaScript selector ".container-wrapper" is now
".container". I've updated the crawler accordingly, and it's working
properly now.

Gengliang

On Sun, Dec 10, 2023 at 8:15 AM Nicholas Chammas 
wrote:

> Pinging Gengliang and Xiao about this, per these docs
> <https://github.com/apache/spark-website/blob/0ceaaaf528ec1d0201e1eab1288f37cce607268b/release-process.md#update-the-configuration-of-algolia-crawler>
> .
>
> It looks like to fix this problem you need access to the Algolia Crawler
> Admin Console.
>
>
> On Dec 5, 2023, at 11:28 AM, Nicholas Chammas 
> wrote:
>
> Should I report this instead on Jira? Apologies if the dev list is not the
> right place.
>
> Search on the website appears to be broken. For example, here is a search
> for “analyze”:
>
> [image: Image 12-5-23 at 11.26 AM.jpeg]
>
> And here is the same search using DDG
> <https://duckduckgo.com/?q=site:https://spark.apache.org/docs/latest/+analyze=osx=web>
> .
>
> Nick
>
>
>


Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-25 Thread Gengliang Wang
+1

On Sat, Nov 25, 2023 at 2:50 AM yangjie01 
wrote:

> +1
>
>
>
> *发件人**: *Reynold Xin 
> *日期**: *2023年11月25日 星期六 14:35
> *收件人**: *Dongjoon Hyun 
> *抄送**: *Ye Zhou , Mridul Muralidharan <
> mri...@gmail.com>, Kent Yao , dev 
> *主题**: *Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files
>
>
>
> +1
>
> [image: 图像已被发件人删除。]
>
>
>
>
>
> On Fri, Nov 24, 2023 at 10:19 PM, Dongjoon Hyun 
> wrote:
>
> +1
>
>
>
> Thanks,
>
> Dongjoon.
>
>
>
> On Fri, Nov 24, 2023 at 7:14 PM Ye Zhou  wrote:
>
> +1(non-binding)
>
>
>
> On Fri, Nov 24, 2023 at 11:16 Mridul Muralidharan 
> wrote:
>
>
>
> +1
>
>
>
> Regards,
>
> Mridul
>
>
>
> On Fri, Nov 24, 2023 at 8:21 AM Kent Yao  wrote:
>
> Hi Spark Dev,
>
> Following the discussion [1], I'd like to start the vote for the SPIP [2].
>
> The SPIP aims to improve the test coverage and develop experience for
> Spark UI-related javascript codes.
>
> This thread will be open for at least the next 72 hours.  Please vote
> accordingly,
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>
> Thank you!
> Kent Yao
>
> [1] https://lists.apache.org/thread/5rqrho4ldgmqlc173y2229pfll5sgkff
> 
> [2]
> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>


Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Gengliang Wang
Congratulations to all! Well deserved!

On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one new
> committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>


Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Gengliang Wang
+1

On Mon, Sep 11, 2023 at 11:28 AM Xiao Li  wrote:

> +1
>
> Xiao
>
> Yuanjian Li  于2023年9月11日周一 10:53写道:
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道:
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>


Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Gengliang Wang
+1

On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li  wrote:

> +1 (non-binding)
>
> Xiao Li  于2023年9月6日周三 15:27写道:
>
>> +1
>>
>> Xiao
>>
>> Herman van Hovell  于2023年9月6日周三 22:08写道:
>>
>>> Tested connect, and everything looks good.
>>>
>>> +1
>>>
>>> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC4) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 8th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc4 (commit
 c2939589a29dd0d6a2d3d31a8d833877a37ee02a):

 https://github.com/apache/spark/tree/v3.5.0-rc4

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1448

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc4.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>


Re: Welcome two new Apache Spark committers

2023-08-06 Thread Gengliang Wang
Congratulations! Peter and Xiduo!

On Sun, Aug 6, 2023 at 7:37 PM Jungtaek Lim 
wrote:

> Congrats Peter and Xiduo!
>
> On Mon, Aug 7, 2023 at 11:33 AM yangjie01 
> wrote:
>
>> Congratulations, Peter and Xiduo ~
>>
>>
>>
>> *发件人**: *Hyukjin Kwon 
>> *日期**: *2023年8月7日 星期一 10:30
>> *收件人**: *Ruifeng Zheng 
>> *抄送**: *Xiao Li , Debasish Das <
>> debasish.da...@gmail.com>, Wenchen Fan , Spark dev
>> list 
>> *主题**: *Re: Welcome two new Apache Spark committers
>>
>>
>>
>> Woohoo!
>>
>>
>>
>> On Mon, 7 Aug 2023 at 11:28, Ruifeng Zheng  wrote:
>>
>> Congratulations! Peter and Xiduo!
>>
>>
>>
>> On Mon, Aug 7, 2023 at 10:13 AM Xiao Li  wrote:
>>
>> Congratulations, Peter and Xiduo!
>>
>>
>>
>>
>>
>>
>>
>> Debasish Das  于2023年8月6日周日 19:08写道:
>>
>> Congratulations Peter and Xidou.
>>
>> On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan  wrote:
>>
>> Hi all,
>>
>>
>>
>> The Spark PMC recently voted to add two new committers. Please join me in
>> welcoming them to their new role!
>>
>>
>>
>> - Peter Toth (Spark SQL)
>>
>> - Xiduo You (Spark SQL)
>>
>>
>>
>> They consistently make contributions to the project and clearly showed
>> their expertise. We are very excited to have them join as committers.
>>
>>


Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Gengliang Wang
Hi Yuanjian,

Besides the abovementioned changes, it would be great to include the UI
page for Spakr Connect: SPARK-44394
<https://issues.apache.org/jira/browse/SPARK-44394>.

Best Regards,
Gengliang

On Fri, Jul 14, 2023 at 11:44 AM Julek Sompolski
 wrote:

> Thank you,
> My changes that you listed are tracked under this Epic:
> https://issues.apache.org/jira/browse/SPARK-43754
> I am also working on https://issues.apache.org/jira/browse/SPARK-44422,
> didn't mention it before because I have hopes that this one will make it
> before the cut.
>
> (Unrelated) My colleague is also working on
> https://issues.apache.org/jira/browse/SPARK-43923 and I am reviewing
> https://github.com/apache/spark/pull/41443, so I hope that that one will
> also make it before the cut.
>
> Best regards,
> Juliusz Sompolski
>
> On Fri, Jul 14, 2023 at 7:34 PM Yuanjian Li 
> wrote:
>
>> Hi everyone,
>> As discussed earlier in "Time for Spark v3.5.0 release", I will cut
>> branch-3.5 on *Monday, July 17th at 1 pm PST* as scheduled.
>>
>> Please plan your PR merge accordingly with the given timeline. Currently,
>> we have received the following exception merge requests:
>>
>>- SPARK-44421: Reattach to existing execute in Spark Connect (server
>>mechanism)
>>- SPARK-44423:  Reattach to existing execute in Spark Connect (scala
>>client)
>>- SPARK-44424:  Reattach to existing execute in Spark Connect (python
>>client)
>>
>> If there are any other exception feature requests, please reply to this
>> email. We will not merge any new features in 3.5 after the branch cut.
>>
>> Best,
>> Yuanjian
>>
>


Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Gengliang Wang
Dear Apache Spark community,

We are delighted to announce the launch of a groundbreaking tool that aims
to make Apache Spark more user-friendly and accessible - the English SDK
<https://github.com/databrickslabs/pyspark-ai/>. Powered by the application
of Generative AI, the English SDK
<https://github.com/databrickslabs/pyspark-ai/> allows you to execute
complex tasks with simple English instructions. This exciting news was
announced
recently at the Data+AI Summit
<https://www.youtube.com/watch?v=yj7XlTB1Jvc=511s> and also introduced
through a detailed blog post
<https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark>
.

Now, we need your invaluable feedback and contributions. The aim of the
English SDK is not only to simplify and enrich your Apache Spark experience
but also to grow with the community. We're calling upon Spark developers
and users to explore this innovative tool, offer your insights, provide
feedback, and contribute to its evolution.

You can find more details about the SDK and usage examples on the GitHub
repository https://github.com/databrickslabs/pyspark-ai/. If you have any
feedback or suggestions, please feel free to open an issue directly on the
repository. We are actively monitoring the issues and value your insights.

We also welcome pull requests and are eager to see how you might extend or
refine this tool. Let's come together to continue making Apache Spark more
approachable and user-friendly.

Thank you in advance for your attention and involvement. We look forward to
hearing your thoughts and seeing your contributions!

Best,
Gengliang Wang


Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Gengliang Wang
+1

On Thu, Jun 22, 2023 at 11:14 AM Driesprong, Fokko 
wrote:

> Thank you for running the release Dongjoon
>
> +1
>
> Tested against Iceberg and it looks good.
>
>
> Op do 22 jun 2023 om 18:03 schreef yangjie01 :
>
>> +1
>>
>>
>>
>> *发件人**: *Dongjoon Hyun 
>> *日期**: *2023年6月22日 星期四 23:35
>> *收件人**: *Chao Sun 
>> *抄送**: *Yuming Wang , Jacek Laskowski <
>> ja...@japila.pl>, dev 
>> *主题**: *Re: [VOTE] Release Spark 3.4.1 (RC1)
>>
>>
>>
>> Thank you everyone for your participation.
>>
>> The vote is open until June 23rd 1AM (PST) and I'll conclude this vote
>> after that.
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 22, 2023 at 8:29 AM Chao Sun  wrote:
>>
>> +1
>>
>> On Thu, Jun 22, 2023 at 6:52 AM Yuming Wang  wrote:
>> >
>> > +1.
>> >
>> > On Thu, Jun 22, 2023 at 4:41 PM Jacek Laskowski 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> Builds and runs fine on Java 17, macOS.
>> >>
>> >> $ ./dev/change-scala-version.sh 2.13
>> >> $ mvn \
>> >>
>> -Pkubernetes,hadoop-cloud,hive,hive-thriftserver,scala-2.13,volcano,connect
>> \
>> >> -DskipTests \
>> >> clean install
>> >>
>> >> $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session
>> SparkSession.sql'
>> >> ...
>> >> Tests passed in 28 second
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> 
>> >> "The Internals Of" Online Books
>> >> Follow me on https://twitter.com/jaceklaskowski
>> 
>> >>
>> >>
>> >>
>> >> On Tue, Jun 20, 2023 at 4:41 AM Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version 3.4.1.
>> >>>
>> >>> The vote is open until June 23rd 1AM (PST) and passes if a majority
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 3.4.1
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> 
>> >>>
>> >>> The tag to be voted on is v3.4.1-rc1 (commit
>> 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
>> >>> https://github.com/apache/spark/tree/v3.4.1-rc1
>> 
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/
>> 
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1443/
>> 
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/
>> 
>> >>>
>> >>> The list of bug fixes going into 3.4.1 can be found at the following
>> URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12352874
>> 
>> >>>
>> >>> This release is using the release script of the tag v3.4.1-rc1.
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>> >>>
>> >>> ===
>> >>> What should happen to JIRA tickets still targeting 3.4.1?
>> >>> ===
>> >>>
>> >>> The current list of open tickets targeted at 3.4.1 can be found at:
>> >>> https://issues.apache.org/jira/projects/SPARK
>> 

Re: [ANNOUNCE] Apache Spark 3.4.0 released

2023-04-14 Thread Gengliang Wang
Congratulations everyone!
Thank you Xinrong for driving the release!

On Fri, Apr 14, 2023 at 12:47 PM Xinrong Meng 
wrote:

> Hi All,
>
> We are happy to announce the availability of *Apache Spark 3.4.0*!
>
> Apache Spark 3.4.0 is the fifth release of the 3.x line.
>
> To download Spark 3.4.0, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-4-0.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
> Thanks,
>
> Xinrong Meng
>


Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Gengliang Wang
+1

On Sun, Apr 9, 2023 at 3:17 PM Dongjoon Hyun 
wrote:

> +1
>
> I verified the same steps like previous RCs.
>
> Dongjoon.
>
>
> On Sat, Apr 8, 2023 at 7:47 PM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>>
>> On Sat, Apr 8, 2023 at 12:13 PM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> Thanks Xinrong.
>>>
>>> On Sat, Apr 8, 2023 at 8:23 AM yangjie01  wrote:
>>> >
>>> > +1
>>> >
>>> >
>>> >
>>> > 发件人: Sean Owen 
>>> > 日期: 2023年4月8日 星期六 20:27
>>> > 收件人: Xinrong Meng 
>>> > 抄送: dev 
>>> > 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC7)
>>> >
>>> >
>>> >
>>> > +1 form me, same result as last time.
>>> >
>>> >
>>> >
>>> > On Fri, Apr 7, 2023 at 6:30 PM Xinrong Meng 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate(RC7) as Apache Spark
>>> version 3.4.0.
>>> >
>>> > The vote is open until 11:59pm Pacific time April 12th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.4.0
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v3.4.0-rc7 (commit
>>> 87a5442f7ed96b11051d8a9333476d080054e5a0):
>>> > https://github.com/apache/spark/tree/v3.4.0-rc7
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1441
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
>>> >
>>> > The list of bug fixes going into 3.4.0 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>> >
>>> > This release is using the release script of the tag v3.4.0-rc7.
>>> >
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with an out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 3.4.0?
>>> > ===
>>> > The current list of open tickets targeted at 3.4.0 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.4.0
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> > Thanks,
>>> > Xinrong Meng
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: Apache Spark 3.2.4 EOL Release?

2023-04-05 Thread Gengliang Wang
+1

On Wed, Apr 5, 2023 at 11:27 AM kazuyuki tanimura
 wrote:

> +1
>
> On Apr 5, 2023, at 6:53 AM, Tom Graves 
> wrote:
>
> +1
>
> Tom
>
> On Tuesday, April 4, 2023 at 12:25:13 PM CDT, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>
> Hi, All.
>
> Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2
> has been maintained and served well until now.
>
> - https://github.com/apache/spark/releases/tag/v3.2.0 (tagged on Oct 6,
> 2021)
> - https://lists.apache.org/thread/jslhkh9sb5czvdsn7nz4t40xoyvznlc7
>
> As of today, branch-3.2 has 62 additional patches after v3.2.3 and reaches
> the end-of-life this month according to the Apache Spark release cadence. (
> https://spark.apache.org/versioning-policy.html)
>
> $ git log --oneline v3.2.3..HEAD | wc -l
> 62
>
> With the upcoming Apache Spark 3.4, I hope the users can get a chance to
> have these last bits of Apache Spark 3.2.x, and I'd like to propose to have
> Apache Spark 3.2.4 EOL Release next week and volunteer as the release
> manager. WDTY? Please let me know if you need more patches on branch-3.2.
>
> Thanks,
> Dongjoon.
>
>
>


Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-05 Thread Gengliang Wang
Hi Anton,

+1 for adding the old constructors back!
Could you raise a PR for this? I will review it ASAP.

Thanks
Gengliang

On Wed, Apr 5, 2023 at 9:37 AM Anton Okolnychyi 
wrote:

> Sorry, I think my last message did not land on the list.
>
> I have a question about changes to exceptions used in the public connector
> API, such as NoSuchTableException and TableAlreadyExistsException.
>
> I consider those as part of the public Catalog API (TableCatalog uses them
> in method definitions). However, it looks like PR #37887 has changed them
> in an incompatible way. Old constructors accepting Identifier objects got
> removed. The only way to construct such exceptions is either by passing
> database and table strings or Scala Seq. Shall we add back old constructors
> to avoid breaking connectors?
>
> [1] - https://github.com/apache/spark/pull/37887/
> [2] - https://issues.apache.org/jira/browse/SPARK-40360
> [3] -
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala
>
> - Anton
>
> On 2023/04/05 16:23:52 Xinrong Meng wrote:
> > Considering the above blockers have been resolved, I am about to
> > cut v3.4.0-rc6 if no objections.
> >
> > On Tue, Apr 4, 2023 at 8:20 AM Xinrong Meng 
> > wrote:
> >
> > > Thank you Wenchen for the report. I marked them as blockers just now.
> > >
> > > On Tue, Apr 4, 2023 at 10:52 AM Wenchen Fan 
> wrote:
> > >
> > >> Sorry for the last-minute change, but we found two wrong behaviors and
> > >> want to fix them before the release:
> > >>
> > >> https://github.com/apache/spark/pull/40641
> > >> We missed a corner case when the input index for `array_insert` is 0.
> It
> > >> should fail as 0 is an invalid index.
> > >>
> > >> https://github.com/apache/spark/pull/40623
> > >> We found some usability issues with a new API and need to change the
> API
> > >> to fix it. If people have concerns we can also remove the new API
> entirely.
> > >>
> > >> Thus I'm -1 to this RC. I'll merge these 2 PRs today if no objections.
> > >>
> > >> Thanks,
> > >> Wenchen
> > >>
> > >> On Tue, Apr 4, 2023 at 3:47 AM L. C. Hsieh  wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Thanks Xinrong.
> > >>>
> > >>> On Mon, Apr 3, 2023 at 12:35 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > +1
> > >>> >
> > >>> > I also verified that RC5 has SBOM artifacts.
> > >>> >
> > >>> >
> > >>>
> https://repository.apache.org/content/repositories/orgapachespark-1439/org/apache/spark/spark-core_2.12/3.4.0/spark-core_2.12-3.4.0-cyclonedx.json
> > >>> >
> > >>>
> https://repository.apache.org/content/repositories/orgapachespark-1439/org/apache/spark/spark-core_2.13/3.4.0/spark-core_2.13-3.4.0-cyclonedx.json
> > >>> >
> > >>> > Thanks,
> > >>> > Dongjoon.
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Apr 3, 2023 at 1:57 AM yangjie01 
> wrote:
> > >>> >>
> > >>> >> +1, checked Java 17 + Scala 2.13 + Python 3.10.10.
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> 发件人: Herman van Hovell 
> > >>> >> 日期: 2023年3月31日 星期五 12:12
> > >>> >> 收件人: Sean Owen 
> > >>> >> 抄送: Xinrong Meng , dev <
> > >>> dev@spark.apache.org>
> > >>> >> 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC5)
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> +1
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Mar 30, 2023 at 11:05 PM Sean Owen 
> wrote:
> > >>> >>
> > >>> >> +1 same result from me as last time.
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng <
> > >>> xinrong.apa...@gmail.com> wrote:
> > >>> >>
> > >>> >> Please vote on releasing the following candidate(RC5) as Apache
> S

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-23 Thread Gengliang Wang
Thanks for creating the RC1, Xinrong!

Besides the blockers mentioned by Tom, let's include the following bug fix
in Spark 3.4.0 as well:
[SPARK-42406][SQL] Fix check for missing required fields of to_protobuf
<https://github.com/apache/spark/commit/fb5647732fa2f49838f803f67ea11b20fc14282b>

Gengliang

On Wed, Feb 22, 2023 at 3:09 PM Tom Graves 
wrote:

> It looks like there are still blockers open, we need to make sure they are
> addressed before doing a release:
>
> https://issues.apache.org/jira/browse/SPARK-41793
> https://issues.apache.org/jira/browse/SPARK-42444
>
> Tom
> On Tuesday, February 21, 2023 at 10:35:45 PM CST, Xinrong Meng <
> xinrong.apa...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.4.0.
>
> The vote is open until 11:59pm Pacific time *February 27th* and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v3.4.0-rc1* (commit
> e2484f626bb338274665a49078b528365ea18c3b):
> https://github.com/apache/spark/tree/v3.4.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1435
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/
>
> The list of bug fixes going into 3.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>
> This release is using the release script of the tag v3.4.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.0?
> ===
> The current list of open tickets targeted at 3.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Xinrong Meng
>


Re: Time for Spark 3.4.0 release?

2023-01-04 Thread Gengliang Wang
+1, thanks for driving the release!

Gengliang

On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you!
>
> Dongjoon
>
> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang  wrote:
>
>> +1 to cut the branch starting from a workday!
>>
>> Great to see this is happening!
>>
>> Thanks Xinrong!
>>
>> -Rui
>>
>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
>> wrote:
>>
>>> +1, thank you Xinrong for driving this release!
>>>
>>> --
>>> Ruifeng Zheng
>>> ruife...@foxmail.com
>>>
>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage=true=Ruifeng+Zheng=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242=ruifengz%40foxmail.com=>
>>>
>>>
>>>
>>> -- Original --
>>> *From:* "Hyukjin Kwon" ;
>>> *Date:* Wed, Jan 4, 2023 01:15 PM
>>> *To:* "Xinrong Meng";
>>> *Cc:* "dev";
>>> *Subject:* Re: Time for Spark 3.4.0 release?
>>>
>>> SGTM +1
>>>
>>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January
>>>> 15th per
>>>> https://spark.apache.org/versioning-policy.html, but I would suggest
>>>> we postpone one day since January 15th is a Sunday.
>>>>
>>>> I would like to volunteer as the release manager for *Apache Spark
>>>> 3.4.0*.
>>>>
>>>> Thanks,
>>>>
>>>> Xinrong Meng
>>>>
>>>>


[VOTE][RESULT] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-19 Thread Gengliang Wang
The vote passes with 11 +1s(3 binding +1s)
+1:
Kent Yao
Mridul Muralidharan*
Jie Yang
Yuming Wang
Maciej Szymkiewicz*
Chris Nauroth
Jungtaek Lim
Ye Zhou
Wenchen Fan*
Ruifeng Zheng
Peter Toth

0: None

-1: None

(* = binding)

Thank you all for chiming in and for your votes!

Cheers,
Gengliang


[VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Gengliang Wang
Hi all,

I’d like to start a vote for SPIP: "Better Spark UI scalability and Driver
stability for large applications"

The goal of the SPIP is to improve the Driver's stability by supporting
storing Spark's UI data on RocksDB. Furthermore, to fasten the read and
write operations on RocksDB, it introduces a new Protobuf serializer.

Please also refer to the following:

   - Previous discussion in the dev mailing list: [DISCUSS] SPIP: Better
   Spark UI scalability and Driver stability for large applications
   <https://lists.apache.org/thread/f30owdts644hk2oojyf9jr308mrx2m9l>
   - Design Doc: Better Spark UI scalability and Driver stability for large
   applications
   
<https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing>
   - JIRA: SPARK-41053 <https://issues.apache.org/jira/browse/SPARK-41053>


Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Kind Regards,
Gengliang


Re: [DISCUSS] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Gengliang Wang
With the positive feedback from Mridul and Wenchen, I will officially start
the vote.

On Tue, Nov 15, 2022 at 8:57 PM Wenchen Fan  wrote:

> This looks great! UI stability/scalability has been a pain point for a
> long time.
>
> On Sat, Nov 12, 2022 at 5:24 AM Gengliang Wang  wrote:
>
>> Hi Everyone,
>>
>> I want to discuss the "Better Spark UI scalability and Driver stability
>> for large applications" proposal. Please find the links below:
>>
>> *JIRA* - https://issues.apache.org/jira/browse/SPARK-41053
>> *SPIP Document* -
>> https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing
>>
>> *Excerpt from the document: *
>>
>> After SPARK-18085 <https://issues.apache.org/jira/browse/SPARK-18085>,
>> the Spark history server(SHS) becomes more scalable for processing large
>> applications by supporting a persistent KV-store(LevelDB/RocksDB) as the
>> storage layer.
>>
>> As for the live Spark UI, all the data is still stored in memory, which
>> can bring memory pressures to the Spark driver for large applications.
>>
>> For better Spark UI scalability and Driver stability, I propose to
>>
>>-
>>
>>Support storing all the UI data in a persistent KV store.
>>RocksDB/LevelDB provides low memory overhead. Their write/read performance
>>is fast enough to serve the workloads of live UI. Spark UI can retain more
>>data with the new backend, while SHS can leverage it to fasten its 
>> startup.
>>- Support a new Protobuf serializer for all the UI data. The new
>>serializer is supposed to be faster, according to benchmarks. It will be
>>the default serializer for the persistent KV store of live UI.
>>
>>
>>
>>
>> I appreciate any suggestions you can provide,
>> Gengliang
>>
>


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Gengliang Wang
+1. Thanks Chao!

On Tue, Oct 18, 2022 at 11:45 AM huaxin gao  wrote:

> +1 Thanks Chao!
>
> Huaxin
>
> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Thank you for volunteering, Chao!
>>
>> Dongjoon.
>>
>>
>> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>>
>>> OK by me, if someone is willing to drive it.
>>>
>>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>>
 Hi All,

 It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
 released There are now 66 patches accumulated in branch-3.2, including
 2 correctness issues.

 Is it a good time to start a new release? If there's no objection, I'd
 like to volunteer as the release manager for the 3.2.3 release, and
 start preparing the first RC next week.

 # Correctness issues

 SPARK-39833Filtered parquet data frame count() and show() produce
 inconsistent results when spark.sql.parquet.filterPushdown is true
 SPARK-40002.   Limit improperly pushed down through window using ntile
 function

 Best,
 Chao

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Gengliang Wang
+1 from me, same as last time.

On Tue, Oct 18, 2022 at 11:45 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Yuming!
>
> On Tue, Oct 18, 2022 at 11:28 AM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thank you, Yuming and all!
> >
> > Dongjoon.
> >
> >
> > On Tue, Oct 18, 2022 at 9:22 AM Yang,Jie(INF) 
> wrote:
> >>
> >> Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me
> >>
> >>
> >>
> >> 发件人: Sean Owen 
> >> 日期: 2022年10月17日 星期一 21:34
> >> 收件人: Yuming Wang 
> >> 抄送: dev 
> >> 主题: Re: [VOTE] Release Spark 3.3.1 (RC4)
> >>
> >>
> >>
> >> +1 from me, same as last time
> >>
> >>
> >>
> >> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.3.1.
> >>
> >> The vote is open until 11:59pm Pacific time October 21th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.3.1
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org
> >>
> >> The tag to be voted on is v3.3.1-rc4 (commit
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> >> https://github.com/apache/spark/tree/v3.3.1-rc4
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1430
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> >>
> >> The list of bug fixes going into 3.3.1 can be found at the following
> URL:
> >> https://s.apache.org/ttgz6
> >>
> >> This release is using the release script of the tag v3.3.1-rc4.
> >>
> >>
> >> FAQ
> >>
> >> ==
> >> What happened to v3.3.1-rc3?
> >> ==
> >> A performance regression(SPARK-40703) was found after tagging
> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
> >> So we skipped the vote on v3.3.1-rc3.
> >>
> >> =
> >> How can I help test this release?
> >> =
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.3.1?
> >> ===
> >> The current list of open tickets targeted at 3.3.1 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Gengliang Wang
Congratulations, Yikun!

On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com 
wrote:

> Congrats, Yikun!
>
> --
> Ruifeng Zheng
> ruife...@foxmail.com
>
> 
>
>
>
> -- Original --
> *From:* "Martin Grigorov" ;
> *Date:* Sun, Oct 9, 2022 05:01 AM
> *To:* "Hyukjin Kwon";
> *Cc:* "dev";"Yikun Jiang";
> *Subject:* Re: Welcome Yikun Jiang as a Spark committer
>
> Congratulations, Yikun!
>
> On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> The Spark PMC recently added Yikun Jiang as a committer on the project.
>> Yikun is the major contributor of the infrastructure and GitHub Actions
>> in Apache Spark as well as Kubernates and PySpark.
>> He has put a lot of effort into stabilizing and optimizing the builds
>> so we all can work together in Apache Spark more
>> efficiently and effectively. He's also driving the SPIP for Docker
>> official image in Apache Spark as well for users and developers.
>> Please join me in welcoming Yikun!
>>
>>


Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Gengliang Wang
+1. I ran some simple tests and also verified that SPARK-40389 is fixed.

Gengliang

On Mon, Oct 3, 2022 at 8:56 AM Thomas Graves  wrote:

> +1. ran out internal tests and everything looks good.
>
> Tom Graves
>
> On Wed, Sep 28, 2022 at 12:20 AM Yuming Wang  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.3.1.
> >
> > The vote is open until 11:59pm Pacific time October 3th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.3.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org
> >
> > The tag to be voted on is v3.3.1-rc2 (commit
> 1d3b8f7cb15283a1e37ecada6d751e17f30647ce):
> > https://github.com/apache/spark/tree/v3.3.1-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-bin
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1421
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-docs
> >
> > The list of bug fixes going into 3.3.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12351710
> >
> > This release is using the release script of the tag v3.3.1-rc2.
> >
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.3.1?
> > ===
> > The current list of open tickets targeted at 3.3.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Gengliang Wang
+1

On Wed, Sep 21, 2022 at 7:26 PM Xiangrui Meng  wrote:

> +1
>
> On Wed, Sep 21, 2022 at 6:53 PM Kent Yao  wrote:
>
>> +1
>>
>> *Kent Yao *
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> *a spark enthusiast*
>> *kyuubi is a
>> unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of Apache Spark .*
>> *spark-authorizer A Spark
>> SQL extension which provides SQL Standard Authorization for **Apache
>> Spark .*
>> *spark-postgres  A library
>> for reading data from and transferring data to Postgres / Greenplum with
>> Spark SQL and DataFrames, 10~100x faster.*
>> *itatchi A** library t**hat
>> brings useful functions from various modern database management systems to 
>> **Apache
>> Spark .*
>>
>>
>>
>>  Replied Message 
>> From Hyukjin Kwon 
>> Date 09/22/2022 09:43
>> To dev 
>> Subject Re: [VOTE] SPIP: Support Docker Official Image for Spark
>> Starting with my +1.
>>
>> On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I would like to start a vote for SPIP: "Support Docker Official Image
>>> for Spark"
>>>
>>> The goal of the SPIP is to add Docker Official Image(DOI)
>>>  to ensure the Spark
>>> Docker images
>>> meet the quality standards for Docker images, to provide these Docker
>>> images for users
>>> who want to use Apache Spark via Docker image.
>>>
>>> Please also refer to:
>>>
>>> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support
>>> Docker Official Image for Spark
>>> 
>>> - SPIP doc: SPIP: Support Docker Official Image for Spark
>>> 
>>> - JIRA: SPARK-40513 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Gengliang Wang
+1, thanks for the work!

On Sun, Sep 18, 2022 at 6:20 PM Hyukjin Kwon  wrote:

> +1
>
> On Mon, 19 Sept 2022 at 09:15, Yikun Jiang  wrote:
>
>> Hi, all
>>
>> I would like to start the discussion for supporting Docker Official Image
>> for Spark.
>>
>> This SPIP is proposed to add Docker Official Image(DOI)
>>  to ensure the Spark
>> Docker images meet the quality standards for Docker images, to provide
>> these Docker images for users who want to use Apache Spark via Docker image.
>>
>> There are also several Apache projects that release the Docker Official
>> Images ,
>> such as: flink , storm
>> , solr ,
>> zookeeper , httpd
>>  (with 50M+ to 1B+ download for each).
>> From the huge download statistics, we can see the real demands of users,
>> and from the support of other apache projects, we should also be able to do
>> it.
>>
>> After support:
>>
>>-
>>
>>The Dockerfile will still be maintained by the Apache Spark community
>>and reviewed by Docker.
>>-
>>
>>The images will be maintained by the Docker community to ensure the
>>quality standards for Docker images of the Docker community.
>>
>>
>> It will also reduce the extra docker images maintenance effort (such as
>> frequently rebuilding, image security update) of the Apache Spark community.
>>
>> See more in SPIP DOC:
>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>>
>> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>>
>> Regards,
>> Yikun
>>
>


Re: Time for Spark 3.3.1 release?

2022-09-12 Thread Gengliang Wang
+1.
Thank you, Yuming!

On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh  wrote:

> +1
>
> Thanks Yuming!
>
> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang  wrote:
> >>
> >> Hi, All.
> >>
> >>
> >>
> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
> including 7 correctness patches arrived at branch-3.3.
> >>
> >>
> >>
> >> Shall we make a new release, Apache Spark 3.3.1, as the second release
> at branch-3.3? I'd like to volunteer as the release manager for Apache
> Spark 3.3.1.
> >>
> >>
> >>
> >> All changes:
> >>
> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
> >>
> >>
> >>
> >> Correctness issues:
> >>
> >> SPARK-40149: Propagate metadata columns through Project
> >>
> >> SPARK-40002: Don't push down limit through window using ntile
> >>
> >> SPARK-39976: ArrayIntersect should handle null in left expression
> correctly
> >>
> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a correctness
> issue in the case of overlapping partition and data columns
> >>
> >> SPARK-39061: Set nullable correctly for Inline output attributes
> >>
> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make the
> output of projection nodes unique
> >>
> >> SPARK-38614: Don't push down limit through window that's using
> percent_rank
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Gengliang Wang
Congratulations, Xinrong! Well deserved.


On Tue, Aug 9, 2022 at 7:09 AM Yi Wu  wrote:

> Congrats Xinrong!!
>
>
> On Tue, Aug 9, 2022 at 7:07 PM Maxim Gekk
>  wrote:
>
>> Congratulations, Xinrong!
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Tue, Aug 9, 2022 at 3:15 PM Weichen Xu
>>  wrote:
>>
>>> Congrats!
>>>
>>> On Tue, Aug 9, 2022 at 5:55 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Congrats Xinrong! Well deserved.

 2022년 8월 9일 (화) 오후 5:13, Hyukjin Kwon 님이 작성:

> Hi all,
>
> The Spark PMC recently added Xinrong Meng as a committer on the
> project. Xinrong is the major contributor of PySpark especially Pandas API
> on Spark. She has guided a lot of new contributors enthusiastically. 
> Please
> join me in welcoming Xinrong!
>
>


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Gengliang Wang
Hi Bruce,

FYI we had further discussions on
https://github.com/apache/spark/pull/35313#issuecomment-1185195455.
Thanks for pointing that out, but this document issue should not be a
blocker of the release.

+1 on the RC.

Gengliang

On Thu, Jul 14, 2022 at 10:22 PM sarutak  wrote:

> Hi Dongjoon and Bruce,
>
> SPARK-36724 is about SessionWindow, while SPARK-38017 and PR #35313 are
> about TimeWindow, and TimeWindow already supports TimestampNTZ in
> v3.2.1.
>
>
> https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L99
>
> So, I think that change still valid.
>
> Kousuke
>
> > Thank you so much, Bruce.
> >
> > After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land at
> > branch-3.2 mistakenly here.
> >
> > https://github.com/apache/spark/pull/35313
> >
> > I believe I can remove those four places after uploading the docs to
> > our website.
> >
> > Dongjoon.
> >
> > On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins 
> > wrote:
> >
> >> A small thing. The function API doc (here [1]) claims that the
> >> window function accepts a timeColumn of TimestampType or
> >> TimestampNTZType. The update to the API doc was made since v3.2.1.
> >>
> >> As far as I can tell, 3.2.2 doesn't support TimestampNTZType.
> >>
> >> On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun
> >>  wrote:
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version 3.2.2.
> >>>
> >>> The vote is open until July 15th 1AM (PST) and passes if a
> >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.2.2
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >>> https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.2.2-rc1 (commit
> >>> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
> >>> https://github.com/apache/spark/tree/v3.2.2-rc1
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >>> found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1409/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
> >>>
> >>> The list of bug fixes going into 3.2.2 can be found at the
> >>> following URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12351232
> >>>
> >>> This release is using the release script of the tag v3.2.2-rc1.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >>> taking
> >>> an existing Spark workload and running on this release candidate,
> >>> then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >>> install
> >>> the current RC and see if anything important breaks, in the
> >>> Java/Scala
> >>> you can add the staging repository to your projects resolvers and
> >>> test
> >>> with the RC (make sure to clean up the artifact cache before/after
> >>> so
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 3.2.2?
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 3.2.2 can be found
> >>> at:
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >>> "Target Version/s" = 3.2.2
> >>>
> >>> Committers should look at those and triage. Extremely important
> >>> bug
> >>> fixes, documentation, and API tweaks that impact compatibility
> >>> should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the
> >>> previous
> >>> release. That being said, if there is something which is a
> >>> regression
> >>> that has not been correctly targeted please ping me or a committer
> >>> to
> >>> help target the issue.
> >>>
> >>> Dongjoon
> >
> >
> > Links:
> > --
> > [1]
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/_site/api/scala/org/apache/spark/sql/functions$.html
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Gengliang Wang
+1.
Thank you, Dongjoon.

On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
>  wrote:
>
>> +1
>>
>> Thanks!
>>
>>
>> Xinrong Meng
>>
>> Software Engineer
>>
>> Databricks
>>
>>
>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Xiao
>>>
>>> Cheng Su  于2022年7月6日周三 19:16写道:
>>>
 +1 (non-binding)

 Thanks,
 Cheng Su

 On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>>
>>> +1  Thanks for the effort!
>>>
>>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 +1

 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :

> Yeah +1
>
> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> Hi, All.
>>
>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>> including 11 correctness patches arrived at branch-3.2.
>>
>> Shall we make a new release, Apache Spark 3.2.2, as the third
>> release
>> at 3.2 line? I'd like to volunteer as the release manager for
>> Apache
>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>
>> $ git log --oneline v3.2.1..HEAD | wc -l
>>  197
>>
>> # Correctness issues
>>
>> SPARK-38075 Hive script transform with order by and limit will
>> return fake rows
>> SPARK-38204 All state operators are at a risk of inconsistency
>> between state partitioning and operator partitioning
>> SPARK-38309 SHS has incorrect percentiles for shuffle read
>> bytes
>> and shuffle total blocks metrics
>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which
>> just
>> received inputs in the same microbatch
>> SPARK-38614 After Spark update, df.show() shows incorrect
>> F.percent_rank results
>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
>> offset
>> row whose input is not null
>> SPARK-38684 Stream-stream outer join has a possible
>> correctness
>> issue due to weakly read consistent on outer iterators
>> SPARK-39061 Incorrect results or NPE when using Inline
>> function
>> against an array of dynamically created structs
>> SPARK-39107 Silent change in regexp_replace's handling of
>> empty strings
>> SPARK-39259 Timestamps returned by now() and equivalent
>> functions
>> are not consistent in subqueries
>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>> intermediate result if string, struct, array, or map
>>
>> Best,
>> Dongjoon.
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
>>> John Zhuge
>>>
>>


Docker images for Spark 3.3.0 release are now available

2022-06-27 Thread Gengliang Wang
Hi all,

The official Docker images for Spark 3.3.0 release are now available!

   - To run Spark with Scala/Java API only:
   https://hub.docker.com/r/apache/spark
   - To run Python on Spark: https://hub.docker.com/r/apache/spark-py
   - To run R on Spark: https://hub.docker.com/r/apache/spark-r


Gengliang


Re: Re: [VOTE][SPIP] Spark Connect

2022-06-15 Thread Gengliang Wang
+1 (non-binding)

On Wed, Jun 15, 2022 at 9:32 AM Dongjoon Hyun 
wrote:

> +1
>
> On Wed, Jun 15, 2022 at 9:22 AM Xiao Li  wrote:
>
>> +1
>>
>> Xiao
>>
>> beliefer  于2022年6月14日周二 03:35写道:
>>
>>> +1
>>> Yeah, I tried to use Apache Livy, so as we can runing interactive query.
>>> But the Spark Driver in Livy looks heavy.
>>>
>>> The SPIP may resolve the issue.
>>>
>>>
>>>
>>> At 2022-06-14 18:11:21, "Wenchen Fan"  wrote:
>>>
>>> +1
>>>
>>> On Tue, Jun 14, 2022 at 9:38 AM Ruifeng Zheng 
>>> wrote:
>>>
 +1


 -- 原始邮件 --
 *发件人:* "huaxin gao" ;
 *发送时间:* 2022年6月14日(星期二) 上午8:47
 *收件人:* "L. C. Hsieh";
 *抄送:* "Spark dev list";
 *主题:* Re: [VOTE][SPIP] Spark Connect

 +1

 On Mon, Jun 13, 2022 at 5:42 PM L. C. Hsieh  wrote:

> +1
>
> On Mon, Jun 13, 2022 at 5:41 PM Chao Sun  wrote:
> >
> > +1 (non-binding)
> >
> > On Mon, Jun 13, 2022 at 5:11 PM Hyukjin Kwon 
> wrote:
> >>
> >> +1
> >>
> >> On Tue, 14 Jun 2022 at 08:50, Yuming Wang  wrote:
> >>>
> >>> +1.
> >>>
> >>> On Tue, Jun 14, 2022 at 2:20 AM Matei Zaharia <
> matei.zaha...@gmail.com> wrote:
> 
>  +1, very excited about this direction.
> 
>  Matei
> 
>  On Jun 13, 2022, at 11:07 AM, Herman van Hovell
>  wrote:
> 
>  Let me kick off the voting...
> 
>  +1
> 
>  On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell <
> her...@databricks.com> wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: "Spark Connect"
> >
> > The goal of the SPIP is to introduce a Dataframe based
> client/server API for Spark
> >
> > Please also refer to:
> >
> > - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark
> Connect - A client and server interface for Apache Spark.
> > - Design doc: Spark Connect - A client and server interface for
> Apache Spark.
> > - JIRA: SPARK-39375
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Kind Regards,
> > Herman
> 
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Stickers and Swag

2022-06-14 Thread Gengliang Wang
FYI now you can find the shopping information on 
https://spark.apache.org/community <https://spark.apache.org/community> as well 
:)


Gengliang



> On Jun 14, 2022, at 7:47 PM, Hyukjin Kwon  wrote:
> 
> Woohoo
> 
> On Tue, 14 Jun 2022 at 15:04, Xiao Li  <mailto:gatorsm...@gmail.com>> wrote:
> Hi, all, 
> 
> The ASF has an official store at RedBubble 
> <https://www.redbubble.com/people/comdev/shop> that Apache Community 
> Development (ComDev) runs. If you are interested in buying Spark Swag, 70 
> products featuring the Spark logo are available: 
> https://www.redbubble.com/shop/ap/113203780 
> <https://www.redbubble.com/shop/ap/113203780> 
> 
> Go Spark! 
> 
> Xiao



Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Gengliang Wang
+1 (non-binding)

On Mon, Jun 13, 2022 at 10:20 AM Herman van Hovell
 wrote:

> +1
>
> On Mon, Jun 13, 2022 at 12:53 PM Wenchen Fan  wrote:
>
>> +1, tests are all green and there are no more blocker issues AFAIK.
>>
>> On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk
>>  wrote:
>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.3.0.
>>>
>>> The vote is open until 11:59pm Pacific time June 14th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.3.0-rc6 (commit
>>> f74867bddfbcdd4d08076db36851e88b15e66556):
>>> https://github.com/apache/spark/tree/v3.3.0-rc6
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1407
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>>>
>>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> This release is using the release script of the tag v3.3.0-rc6.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.0?
>>> ===
>>> The current list of open tickets targeted at 3.3.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.3.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>


Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Gengliang Wang
Hi Kent and Wenchen,

Thanks for reporting. I just created
https://github.com/apache/spark/pull/36609 to fix the issue.

Gengliang

On Thu, May 19, 2022 at 5:40 PM Wenchen Fan  wrote:

> I think it should have been fixed  by
> https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed
> . Maybe the fix is not completed...
>
> On Thu, May 19, 2022 at 2:16 PM Kent Yao  wrote:
>
>> Thanks, Maxim.
>>
>> Leave my -1 for this release candidate.
>>
>> Unfortunately, I don't know which PR fixed this.
>> Does anyone happen to know?
>>
>> BR,
>> Kent Yao
>>
>> Maxim Gekk  于2022年5月19日周四 13:42写道:
>> >
>> > Hi Kent,
>> >
>> > > Shall we backport the fix from the master to 3.3 too?
>> >
>> > Yes, we shall.
>> >
>> > Maxim Gekk
>> >
>> > Software Engineer
>> >
>> > Databricks, Inc.
>> >
>> >
>> >
>> > On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I verified the simple case below with the binary release, and it looks
>> >> like a bug to me.
>> >>
>> >> bin/spark-sql -e "select date '2018-11-17' > 1"
>> >>
>> >> Error in query: Invalid call to toAttribute on unresolved object;
>> >> 'Project [unresolvedalias((2018-11-17 > 1), None)]
>> >> +- OneRowRelation
>> >>
>> >> Both 3.2 releases and the master branch work fine with correct errors
>> >> -  'due to data type mismatch'.
>> >>
>> >> Shall we backport the fix from the master to 3.3 too?
>> >>
>> >> Bests
>> >>
>> >> Kent Yao
>> >>
>> >>
>> >> Yuming Wang  于2022年5月18日周三 19:04写道:
>> >> >
>> >> > -1. There is a regression:
>> https://github.com/apache/spark/pull/36595
>> >> >
>> >> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov <
>> mgrigo...@apache.org> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> [X] +1 Release this package as Apache Spark 3.3.0
>> >> >>
>> >> >> Tested:
>> >> >> - make local distribution from sources (with
>> ./dev/make-distribution.sh --tgz --name with-volcano
>> -Pkubernetes,volcano,hadoop-3)
>> >> >> - create a Docker image (with JDK 11)
>> >> >> - run Pi example on
>> >> >> -- local
>> >> >> -- Kubernetes with default scheduler
>> >> >> -- Kubernetes with Volcano scheduler
>> >> >>
>> >> >> On both x86_64 and aarch64 !
>> >> >>
>> >> >> Regards,
>> >> >> Martin
>> >> >>
>> >> >>
>> >> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk <
>> maxim.g...@databricks.com.invalid> wrote:
>> >> >>>
>> >> >>> Please vote on releasing the following candidate as Apache Spark
>> version 3.3.0.
>> >> >>>
>> >> >>> The vote is open until 11:59pm Pacific time May 19th and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >> >>>
>> >> >>> [ ] +1 Release this package as Apache Spark 3.3.0
>> >> >>> [ ] -1 Do not release this package because ...
>> >> >>>
>> >> >>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> >> >>>
>> >> >>> The tag to be voted on is v3.3.0-rc2 (commit
>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>> >> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
>> >> >>>
>> >> >>> The release files, including signatures, digests, etc. can be
>> found at:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>> >> >>>
>> >> >>> Signatures used for Spark RCs can be found in this file:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >>>
>> >> >>> The staging repository for this release can be found at:
>> >> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1403
>> >> >>>
>> >> >>> The documentation corresponding t

Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Gengliang Wang
Congratulations to the whole spark community!

On Fri, May 13, 2022 at 10:14 AM Jungtaek Lim 
wrote:

> Congrats Spark community!
>
> On Fri, May 13, 2022 at 10:40 AM Qian Sun  wrote:
>
>> Congratulations !!!
>>
>> 2022年5月13日 上午3:44,Matei Zaharia  写道:
>>
>> Hi all,
>>
>> We recently found out that Apache Spark received
>>  the SIGMOD System Award this
>> year, given by SIGMOD (the ACM’s data management research organization) to
>> impactful real-world and research systems. This puts Spark in good company
>> with some very impressive previous recipients
>> . This award is
>> really an achievement by the whole community, so I wanted to say congrats
>> to everyone who contributes to Spark, whether through code, issue reports,
>> docs, or other means.
>>
>> Matei
>>
>>
>>


Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-06 Thread Gengliang Wang
Hi Maxim,

Thanks for the work!
There is a bug fix from Bruce merged on branch-3.3 right after the RC1 is
cut:
SPARK-39093: Dividing interval by integral can result in codegen
compilation error
<https://github.com/apache/spark/commit/fd998c8a6783c0c8aceed8dcde4017cd479e42c8>

So -1 from me. We should have RC2 to include the fix.

Thanks
Gengliang

On Fri, May 6, 2022 at 6:15 PM Maxim Gekk 
wrote:

> Hi Dongjoon,
>
>  > https://issues.apache.org/jira/projects/SPARK/versions/12350369
> > Since RC1 is started, could you move them out from the 3.3.0 milestone?
>
> I have removed the 3.3.0 label from Fix version(s). Thank you, Dongjoon.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, May 6, 2022 at 11:06 AM Dongjoon Hyun 
> wrote:
>
>> Hi, Sean.
>> It's interesting. I didn't see those failures from my side.
>>
>> Hi, Maxim.
>> In the following link, there are 17 in-progress and 6 to-do JIRA issues
>> which look irrelevant to this RC1 vote.
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> Since RC1 is started, could you move them out from the 3.3.0 milestone?
>> Otherwise, we cannot distinguish new real blocker issues from those
>> obsolete JIRA issues.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Thu, May 5, 2022 at 11:46 AM Adam Binford  wrote:
>>
>>> I looked back at the first one (SPARK-37618), it expects/assumes a 0022
>>> umask to correctly test the behavior. I'm not sure how to get that to not
>>> fail or be ignored with a more open umask.
>>>
>>> On Thu, May 5, 2022 at 1:56 PM Sean Owen  wrote:
>>>
>>>> I'm seeing test failures; is anyone seeing ones like this? This is Java
>>>> 8 / Scala 2.12 / Ubuntu 22.04:
>>>>
>>>> - SPARK-37618: Sub dirs are group writable when removing from shuffle
>>>> service enabled *** FAILED ***
>>>>   [OWNER_WRITE, GROUP_READ, GROUP_WRITE, GROUP_EXECUTE, OTHERS_READ,
>>>> OWNER_READ, OTHERS_EXECUTE, OWNER_EXECUTE] contained GROUP_WRITE
>>>> (DiskBlockManagerSuite.scala:155)
>>>>
>>>> - Check schemas for expression examples *** FAILED ***
>>>>   396 did not equal 398 Expected 396 blocks in result file but got 398.
>>>> Try regenerating the result files. (ExpressionsSchemaSuite.scala:161)
>>>>
>>>>  Function 'bloom_filter_agg', Expression class
>>>> 'org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate'
>>>> "" did not start with "
>>>>   Examples:
>>>>   " (ExpressionInfoSuite.scala:142)
>>>>
>>>> On Thu, May 5, 2022 at 6:01 AM Maxim Gekk
>>>>  wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>  version 3.3.0.
>>>>>
>>>>> The vote is open until 11:59pm Pacific time May 10th and passes if a
>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v3.3.0-rc1 (commit
>>>>> 482b7d54b522c4d1e25f3e84eabbc78126f22a3d):
>>>>> https://github.com/apache/spark/tree/v3.3.0-rc1
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found
>>>>> at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1402
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-docs/
>>>>>
>>>>> The list of bug fixes going into 3.3.0 can be found at the following
>>>>> URL:
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>>>
>>>>> This release is using the release script of the tag v3.3.0-rc1.
>>>>>
>>>>>
>>>>> FAQ
>>&

Re: Apache Spark 3.3 Release

2022-03-17 Thread Gengliang Wang
I'd like to add the following new SQL functions in the 3.3 release. These
functions are useful when overflow or encoding errors occur:

   - [SPARK-38548][SQL] New SQL function: try_sum
   <https://github.com/apache/spark/pull/35848>
   - [SPARK-38589][SQL] New SQL function: try_avg
   <https://github.com/apache/spark/pull/35896>
   - [SPARK-38590][SQL] New SQL function: try_to_binary
   <https://github.com/apache/spark/pull/35897>

Gengliang

On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are al

Re: [VOTE] Spark 3.1.3 RC4

2022-02-16 Thread Gengliang Wang
+1 (non-binding)

On Wed, Feb 16, 2022 at 1:28 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Feb 15, 2022 at 3:59 PM Yuming Wang  wrote:
>
>> +1 (non-binding).
>>
>> On Tue, Feb 15, 2022 at 10:22 AM Ruifeng Zheng 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> checked the release script issue Dongjoon mentioned:
>>>
>>> curl -s
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/spark-3.1.3-bin-hadoop2.7.tgz
>>> | tar tz | grep hadoop-common
>>> spark-3.1.3-bin-hadoop2.7/jars/hadoop-common-2.7.4
>>>
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Sean Owen" ;
>>> *发送时间:* 2022年2月15日(星期二) 上午10:01
>>> *收件人:* "Holden Karau";
>>> *抄送:* "dev";
>>> *主题:* Re: [VOTE] Spark 3.1.3 RC4
>>>
>>> Looks good to me, same results as last RC, +1
>>>
>>> On Mon, Feb 14, 2022 at 2:55 PM Holden Karau 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.3.

 The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes
 if a majority
 +1 PMC votes are cast, with a minimum of 3 + 1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.3
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 There are currently no open issues targeting 3.1.3 in Spark's JIRA
 https://issues.apache.org/jira/browse
 (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in
 (Open, Reopened, "In Progress"))
 at https://s.apache.org/n79dw



 The tag to be voted on is v3.1.3-rc4 (commit
 d1f8a503a26bcfb4e466d9accc5fa241a7933667):
 https://github.com/apache/spark/tree/v3.1.3-rc4

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at
 https://repository.apache.org/content/repositories/orgapachespark-1401

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-docs/

 The list of bug fixes going into 3.1.3 can be found at the following
 URL:
 https://s.apache.org/x0q9b

 This release is using the release script from 3.1.3
 The release docker container was rebuilt since the previous version
 didn't have the necessary components to build the R documentation.

 FAQ


 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.1.3?
 ===

 The current list of open tickets targeted at 3.1.3 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.1.3

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something that is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

 Note: I added an extra day to the vote since I know some folks are
 likely busy on the 14th with partner(s).


 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-29 Thread Gengliang Wang
Thanks to Huaxin for driving the release!

Fengyu, this is a known issue that will be fixed in the 3.3 release.
Currently, the "hadoop3.2" means 3.2 or higher.  See the thread
https://lists.apache.org/thread/yov8xsggo3g2qr2p1rrr2xtps25wkbvj for
more details.


On Sat, Jan 29, 2022 at 3:26 PM FengYu Cao  wrote:

> https://spark.apache.org/downloads.html
>
> *2. Choose a package type:* menu shows that Pre-built for Hadoop 3.3
>
> but download link is *spark-3.2.1-bin-hadoop3.2.tgz*
>
> need an update?
>
> L. C. Hsieh  于2022年1月29日周六 14:26写道:
>
>> Thanks Huaxin for the 3.2.1 release!
>>
>> On Fri, Jan 28, 2022 at 10:14 PM Dongjoon Hyun 
>> wrote:
>> >
>> > Thank you again, Huaxin!
>> >
>> > Dongjoon.
>> >
>> > On Fri, Jan 28, 2022 at 6:23 PM DB Tsai  wrote:
>> >>
>> >> Thank you, Huaxin for the 3.2.1 release!
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Jan 28, 2022, at 5:45 PM, Chao Sun  wrote:
>> >>
>> >> 
>> >> Thanks Huaxin for driving the release!
>> >>
>> >> On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng 
>> wrote:
>> >>>
>> >>> It's Great!
>> >>> Congrats and thanks, huaxin!
>> >>>
>> >>>
>> >>> -- 原始邮件 --
>> >>> 发件人: "huaxin gao" ;
>> >>> 发送时间: 2022年1月29日(星期六) 上午9:07
>> >>> 收件人: "dev";"user";
>> >>> 主题: [ANNOUNCE] Apache Spark 3.2.1 released
>> >>>
>> >>> We are happy to announce the availability of Spark 3.2.1!
>> >>>
>> >>> Spark 3.2.1 is a maintenance release containing stability fixes. This
>> >>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> >>> recommend all 3.2 users to upgrade to this stable release.
>> >>>
>> >>> To download Spark 3.2.1, head over to the download page:
>> >>> https://spark.apache.org/downloads.html
>> >>>
>> >>> To view the release notes:
>> >>> https://spark.apache.org/releases/spark-release-3-2-1.html
>> >>>
>> >>> We would like to acknowledge all community members for contributing
>> to this
>> >>> release. This release would not have been possible without you.
>> >>>
>> >>> Huaxin Gao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> *camper42 (曹丰宇)*
> Douban, Inc.
>
> Mobile: +86 15691996359
> E-mail:  camper.x...@gmail.com
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Gengliang Wang
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
wrote:

> +1
>
> Dongjoon.
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>
>>> +1 with same result as last time.
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
 passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
 ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
 this package because ... To learn more about Apache Spark, please see
 http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
 4f25b3f71238a00508a356591553f2dfa89f8290):
 https://github.com/apache/spark/tree/v3.2.1-rc2
 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
 repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1398/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
 The list of bug fixes going into 3.2.1 can be found at the following
 URL:https://s.apache.org/yu0cy

 This release is using the release script of the tag v3.2.1-rc2. FAQ
 = How can I help test this release?
 = If you are a Spark user, you can help us test
 this release by taking an existing Spark workload and running on this
 release candidate, then reporting any regressions. If you're working in
 PySpark you can set up a virtual env and install the current RC and see if
 anything important breaks, in the Java/Scala you can add the staging
 repository to your projects resolvers and test with the RC (make sure to
 clean up the artifact cache before/after so you don't end up building with
 a out of date RC going forward).
 === What should happen to JIRA
 tickets still targeting 3.2.1? ===
 The current list of open tickets targeted at 3.2.1 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.2.1 Committers should look at those and triage. Extremely
 important bug fixes, documentation, and API tweaks that impact
 compatibility should be worked on immediately. Everything else please
 retarget to an appropriate release. == But my bug isn't
 fixed? == In order to make timely releases, we will
 typically not hold the release unless the bug in question is a regression
 from the previous release. That being said, if there is something which is
 a regression that has not been correctly targeted please ping me or a
 committer to help target the issue.

>>>


Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-07 Thread Gengliang Wang
Thanks for the works, Shane!

On Wed, Dec 8, 2021 at 9:19 AM shane knapp ☠  wrote:

> created an issue to track stuff:
>
> https://issues.apache.org/jira/browse/SPARK-37571
>
> On Tue, Dec 7, 2021 at 8:25 AM shane knapp ☠  wrote:
>
>> Will you be nuking all the Jenkins-related code in the repo after the
>>> 23rd?
>>>
>>> probably not right away...  but soon after jenkins is shut down.  bits
>> of the docs and spark website will need to be updated as well.
>>
>> shane
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Gengliang Wang
+1 for new maintenance releases for all 3.x branches as well.

On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:

> SGTM!
>
> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>
>> I prefer to start rolling the release in January if there is no need to
>> publish it sooner :)
>>
>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>>
>>> Oh BTW, I realised that it's a holiday season soon this month including
>>> Christmas and new year.
>>> Shall we maybe start rolling the release around next January? I would
>>> leave it to @huaxin gao  :-).
>>>
>>> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
>>> wrote:
>>>
 +1 for new releases.

 Dongjoon.

 On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:

> +1 to make new maintenance releases for all 3.x branches.
>
> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>
>> Always fine by me if someone wants to roll a release.
>>
>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
>> new release of those wouldn't hurt either, if any of our release managers
>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>> around now anyway.
>>
>>
>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> It's been two months since Spark 3.2.0 release, and we have resolved
>>> many bug fixes and regressions. What do you guys think about rolling 
>>> Spark
>>> 3.2.1 release?
>>>
>>> cc @huaxin gao  FYI who I happened to
>>> overhear that is interested in rolling the maintenance release :-).
>>>
>>


Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-16 Thread Gengliang Wang
+1 (non-binding)

On Tue, Nov 16, 2021 at 9:03 PM Wenchen Fan  wrote:

> +1
>
> On Mon, Nov 15, 2021 at 2:54 AM John Zhuge  wrote:
>
>> +1 (non-binding)
>>
>> On Sun, Nov 14, 2021 at 10:33 AM Chao Sun  wrote:
>>
>>> +1 (non-binding). Thanks Anton for the work!
>>>
>>> On Sun, Nov 14, 2021 at 10:01 AM Ryan Blue  wrote:
>>>
 +1

 Thanks to Anton for all this great work!

 On Sat, Nov 13, 2021 at 8:24 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> +1 non-binding
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Sat, 13 Nov 2021 at 15:07, Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> +1 (never binding)
>>
>> On Sat, Nov 13, 2021 at 1:10 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Fri, Nov 12, 2021 at 6:58 PM huaxin gao 
>>> wrote:
>>>
 +1

 On Fri, Nov 12, 2021 at 6:44 PM Yufei Gu 
 wrote:

> +1
>
> > On Nov 12, 2021, at 6:25 PM, L. C. Hsieh 
> wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: Row-level operations in Data
> Source V2.
> >
> > The proposal is to add support for executing row-level operations
> > such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > execution should be the same across data sources and the best
> way to do
> > that is to implement it in Spark.
> >
> > Right now, Spark can only parse and to some extent analyze
> DELETE, UPDATE,
> > MERGE commands. Data sources that support row-level changes have
> to build
> > custom Spark extensions to execute such statements. The goal of
> this effort
> > is to come up with a flexible and easy-to-use API that will work
> across
> > data sources.
> >
> > Please also refer to:
> >
> >   - Previous discussion in dev mailing list: [DISCUSS] SPIP:
> > Row-level operations in Data Source V2
> >   <
> https://lists.apache.org/thread/kd8qohrk5h3qx8d6y4lhrm67vnn8p6bv>
> >
> >   - JIRA: SPARK-35801 <
> https://issues.apache.org/jira/browse/SPARK-35801>
> >   - PR for handling DELETE statements:
> > 
> >
> >   - Design doc
> > <
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> >
> -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

 --
 Ryan Blue
 Tabular

>>> --
>> John Zhuge
>>
>


Re: Update Spark 3.3 release window?

2021-10-28 Thread Gengliang Wang
+1, Mid-March 2022 sounds good.

Gengliang

On Thu, Oct 28, 2021 at 10:54 PM Tom Graves 
wrote:

> +1 for updating, mid march sounds good.  I'm also fine with EOL 2.x.
>
> Tom
>
> On Thursday, October 28, 2021, 09:37:00 AM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> +1 to EOL 2.x
> Mid march sounds like a good placeholder for 3.3.
>
> Regards,
> Mridul
>
> On Wed, Oct 27, 2021 at 10:38 PM Sean Owen  wrote:
>
> Seems fine to me - as good a placeholder as anything.
> Would that be about time to call 2.x end-of-life?
>
> On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> Spark 3.2. is out. Shall we update the release window
> https://spark.apache.org/versioning-policy.html?
> I am thinking of Mid March 2022 (5 months after the 3.2 release) for code
> freeze and onward.
>
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi Prasad,

Thanks for reporting the issue. The link was wrong. It should be fixed now.
Could you try again on https://spark.apache.org/downloads.html?

On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha <
prasad.parava...@gmail.com> wrote:

>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>
> FYI, unable to download from this location.
> Also, I don’t see Hadoop 3.3 version in the dist
>
>
> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> 
>
> Many thanks! 
>
>
>
> *From:* Gengliang Wang 
> *Sent:* Dienstag, 19. Oktober 2021 16:16
> *To:* dev ; user 
> *Subject:* [ANNOUNCE] Apache Spark 3.2.0
>
>
>
> Hi all,
>
>
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
> contribution from the open-source community, this release managed to
> resolve in excess of 1,700 Jira tickets.
>
>
>
> We'd like to thank our contributors and users for their contributions and
> early feedback to this release. This release would not have been possible
> without you.
>
>
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdownloads.html=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=MDsRlP0K91uf4ZLVLOx%2BnMaOlT0gavRjMyDh49vMnuE%3D=0>
>
>
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Freleases%2Fspark-release-3-2-0.html=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=p9vpQafuclgIT2TuGX2sDrL5A4d5%2BaS9aUHsbzXoE3o%3D=0>
>
>


[ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
contribution from the open-source community, this release managed to
resolve in excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and
early feedback to this release. This release would not have been possible
without you.

To download Spark 3.2.0, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-0.html


Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-14 Thread Gengliang Wang
Hi all,

FYI the size of the PySpark tarball exceeds the file size limit of PyPI. I
am still waiting for the issue
https://github.com/pypa/pypi-support/issues/1374 to be resolved.

Gengliang

On Tue, Oct 12, 2021 at 3:26 PM Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> Yes.  Genliang. Many thanks.
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Dienstag, 12. Oktober 2021 09:25
> *To:* Gengliang Wang 
> *Cc:* dev 
> *Subject:* Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)
>
>
>
> great work Gengliang. Thanks for your tremendous contribution!
>
>
>
>
>
>view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C9c74083248d04b3451e208d98d517286%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637696203339505402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=v7SDwdd3dpPwfImH6OZofILshZoicZ9kyL3r9rLE3yY%3D=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 12 Oct 2021 at 08:15, Gengliang Wang  wrote:
>
> The vote passes with 28 +1s (10 binding +1s).
> Thanks to all who helped with the release!
>
>
>
> (* = binding)
> +1:
>
> - Gengliang Wang
>
> - Michael Heuer
>
> - Mridul Muralidharan *
>
> - Sean Owen *
>
> - Ruifeng Zheng
>
> - Dongjoon Hyun *
>
> - Yuming Wang
>
> - Reynold Xin *
>
> - Cheng Su
>
> - Peter Toth
>
> - Mich Talebzadeh
>
> - Maxim Gekk
>
> - Chao Sun
>
> - Xinli Shang
>
> - Huaxin Gao
>
> - Kent Yao
>
> - Liang-Chi Hsieh *
>
> - Kousuke Saruta *
>
> - Ye Zhou
>
> - Cheng Pan
>
> - Angers Zhu
>
> - Wenchen Fan *
>
> - Holden Karau *
>
> - Yi Wu
>
> - Ricardo Almeida
>
> - DB Tsai *
>
> - Thomas Graves *
>
> - Terry Kim
>
>
>
> +0: None
>
> -1: None
>
>


[VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-12 Thread Gengliang Wang
The vote passes with 28 +1s (10 binding +1s).
Thanks to all who helped with the release!

(* = binding)
+1:
- Gengliang Wang
- Michael Heuer
- Mridul Muralidharan *
- Sean Owen *
- Ruifeng Zheng
- Dongjoon Hyun *
- Yuming Wang
- Reynold Xin *
- Cheng Su
- Peter Toth
- Mich Talebzadeh
- Maxim Gekk
- Chao Sun
- Xinli Shang
- Huaxin Gao
- Kent Yao
- Liang-Chi Hsieh *
- Kousuke Saruta *
- Ye Zhou
- Cheng Pan
- Angers Zhu
- Wenchen Fan *
- Holden Karau *
- Yi Wu
- Ricardo Almeida
- DB Tsai *
- Thomas Graves *
- Terry Kim

+0: None

-1: None


Please take a look at the draft of the Spark 3.2.0 release notes

2021-10-08 Thread Gengliang Wang
Hi all,

I am preparing to publish and announce Spark 3.2.0
This is the draft of the release note, and I plan to edit a bit more and
use it as the final release note.
Please take a look and let me know if I missed any major changes or
something.

https://docs.google.com/document/d/1Wvc7K2ep96HeGFOa4gsSUDhpCTj7U7p8EVRCj8dcjM0/edit?usp=sharing

Thanks
Gengliang


Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-06 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Thu, Oct 7, 2021 at 12:48 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time October 11 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc7 (commit
> 5d45a415f3a29898d92380380cfd82bfc7f579ea):
> https://github.com/apache/spark/tree/v3.2.0-rc7
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1394
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc7.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC7)

2021-10-06 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time October 11 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc7 (commit
5d45a415f3a29898d92380380cfd82bfc7f579ea):
https://github.com/apache/spark/tree/v3.2.0-rc7

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1394

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc7.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-10-01 Thread Gengliang Wang
Hi all,

Thanks for testing this RC and the votes.
Since Mridul created SPARK-36892
<https://issues.apache.org/jira/browse/SPARK-36892> as a blocker ticket,
this RC fails.
As per Mridul and Min, the Linkedin Spark team is testing Spark 3.2.0 RC
with the push-based shuffle feature enabled this week. Thus, I will start
RC7 after their tests are completed and the known blockers are resolved,
probably next week.

Gengliang

On Fri, Oct 1, 2021 at 2:26 AM Shardul Mahadik  wrote:

> I ran into https://issues.apache.org/jira/browse/SPARK-36905 when testing
> on some views in our organization. This used to work in 3.1.1. Should this
> be an RC blocker?
>
> On 2021/09/30 11:35:28, Jacek Laskowski  wrote:
> > Hi,
> >
> > I don't want to hijack the voting thread but given I faced
> > https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if
> it's
> > -1.
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> >
> > <https://twitter.com/jaceklaskowski>
> >
> >
> > On Wed, Sep 29, 2021 at 10:28 PM Mridul Muralidharan 
> > wrote:
> >
> > >
> > > Yi Wu helped identify an issue
> > > <https://issues.apache.org/jira/browse/SPARK-36892> which causes
> > > correctness (duplication) and hangs - waiting for validation to
> complete
> > > before submitting a patch.
> > >
> > > Regards,
> > > Mridul
> > >
> > > On Wed, Sep 29, 2021 at 11:34 AM Holden Karau 
> > > wrote:
> > >
> > >> PySpark smoke tests pass, I'm going to do a last pass through the
> JIRAs
> > >> before my vote though.
> > >>
> > >> On Wed, Sep 29, 2021 at 8:54 AM Sean Owen  wrote:
> > >>
> > >>> +1 looks good to me as before, now that a few recent issues are
> resolved.
> > >>>
> > >>>
> > >>> On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang 
> > >>> wrote:
> > >>>
> > >>>> Please vote on releasing the following candidate as
> > >>>> Apache Spark version 3.2.0.
> > >>>>
> > >>>> The vote is open until 11:59pm Pacific time September 30 and passes
> if
> > >>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >>>>
> > >>>> [ ] +1 Release this package as Apache Spark 3.2.0
> > >>>> [ ] -1 Do not release this package because ...
> > >>>>
> > >>>> To learn more about Apache Spark, please see
> http://spark.apache.org/
> > >>>>
> > >>>> The tag to be voted on is v3.2.0-rc6 (commit
> > >>>> dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
> > >>>> https://github.com/apache/spark/tree/v3.2.0-rc6
> > >>>>
> > >>>> The release files, including signatures, digests, etc. can be found
> at:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
> > >>>>
> > >>>> Signatures used for Spark RCs can be found in this file:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >>>>
> > >>>> The staging repository for this release can be found at:
> > >>>>
> https://repository.apache.org/content/repositories/orgapachespark-1393
> > >>>>
> > >>>> The documentation corresponding to this release can be found at:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
> > >>>>
> > >>>> The list of bug fixes going into 3.2.0 can be found at the following
> > >>>> URL:
> > >>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
> > >>>>
> > >>>> This release is using the release script of the tag v3.2.0-rc6.
> > >>>>
> > >>>>
> > >>>> FAQ
> > >>>>
> > >>>> =
> > >>>> How can I help test this release?
> > >>>> =
> > >>>> If you are a Spark user, you can help us test this release by taking
> > >>>> an existing Spark workload and running on this release candidate,
> then
> > >>>> reporting any regressions.
> > >>>>
> >

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-28 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Tue, Sep 28, 2021 at 11:45 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 30 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc6 (commit
> dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
> https://github.com/apache/spark/tree/v3.2.0-rc6
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1393
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc6.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


[VOTE] Release Spark 3.2.0 (RC6)

2021-09-28 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 30 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc6 (commit
dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
https://github.com/apache/spark/tree/v3.2.0-rc6

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1393

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc6.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-28 Thread Gengliang Wang
Hi all,

As this RC has multiple minor issues, I decide to mark this vote as failed
and start building RC6 now.

On Tue, Sep 28, 2021 at 2:20 PM Chao Sun  wrote:

> Looks like it's related to https://github.com/apache/spark/pull/34085. I
> filed https://issues.apache.org/jira/browse/SPARK-36873 to fix it.
>
> On Mon, Sep 27, 2021 at 6:00 PM Chao Sun  wrote:
>
>> Thanks. Trying it on my local machine now but it will probably take a
>> while. I think https://github.com/apache/spark/pull/34085 is more likely
>> to be relevant but don't yet have a clue how it could cause the issue.
>> Spark CI also passed for these.
>>
>> On Mon, Sep 27, 2021 at 5:29 PM Sean Owen  wrote:
>>
>>> I'm building and testing with
>>>
>>> mvn -Phadoop-3.2 -Phive -Phive-2.3 -Phive-thriftserver -Pkinesis-asl
>>> -Pkubernetes -Pmesos -Pnetlib-lgpl -Pscala-2.12 -Pspark-ganglia-lgpl
>>> -Psparkr -Pyarn ...
>>>
>>> I did a '-DskipTests clean install' and then 'test'; the problem arises
>>> only in 'test'.
>>>
>>> On Mon, Sep 27, 2021 at 6:58 PM Chao Sun  wrote:
>>>
>>>> Hmm it may be related to the commit. Sean: how do I reproduce this?
>>>>
>>>> On Mon, Sep 27, 2021 at 4:56 PM Sean Owen  wrote:
>>>>
>>>>> Another "is anyone else seeing this"? in compiling common/yarn-network:
>>>>>
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:32:
>>>>> package com.google.common.annotations does not exist
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:33:
>>>>> package com.google.common.base does not exist
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:34:
>>>>> package com.google.common.collect does not exist
>>>>> ...
>>>>>
>>>>> I didn't see this in RC4, so, I wonder if a recent change affected
>>>>> something, but there are barely any changes since RC4. Anything touching
>>>>> YARN or Guava maybe, like:
>>>>>
>>>>> https://github.com/apache/spark/commit/540e45c3cc7c64e37aa5c1673c03a0f2d7462878
>>>>> ?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 27, 2021 at 7:56 AM Gengliang Wang 
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as
>>>>>> Apache Spark version 3.2.0.
>>>>>>
>>>>>> The vote is open until 11:59pm Pacific time September 29 and passes
>>>>>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v3.2.0-rc5 (commit
>>>>>> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
>>>>>> https://github.com/apache/spark/tree/v3.2.0-rc5
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1392
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
>>>>>>
>>>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>>>> URL:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>>>
>>>>>> This release is using the release script of the tag v3.2.0-rc5.
>>>>>>
>>>>>>
>>>>>> FAQ
>>>

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Hi Kousuke,

I tend to agree with Sean. It only affects the macOS developers when
building Spark with the released Spark 3.2 code tarball without setting
JAVA_HOME.
I can mention this one as a known issue in the release note if this vote
passes.

Thanks,
Gengliang

On Mon, Sep 27, 2021 at 11:47 PM sarutak  wrote:

> I think it affects devs but there are some workarounds.
> So, if you all don't think it's necessary to meet 3.2.0, I'm OK not to
> do it.
>
> - Kousuke
>
> > Hm... it does just affect Mac OS (?) and only if you don't have
> > JAVA_HOME set (which people often do set) and only affects build/mvn,
> > vs built-in maven (which people often have installed). Only affects
> > those building. I'm on the fence about whether it blocks 3.2.0, as it
> > doesn't affect downstream users and is easily resolvable.
> >
> > On Mon, Sep 27, 2021 at 10:26 AM sarutak 
> > wrote:
> >
> >> Hi All,
> >>
> >> SPARK-35887 seems to have introduced another issue that building
> >> with
> >> build/mvn on macOS stucks, and SPARK-36856 will resolve this issue.
> >> Should we meet the fix to 3.2.0?
> >>
> >> - Kousuke
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version 3.2.0.
> >>>
> >>> The vote is open until 11:59pm Pacific time September 29 and
> >> passes if
> >>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.2.0
> >>>
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >> http://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.2.0-rc5 (commit
> >>> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
> >>>
> >>> https://github.com/apache/spark/tree/v3.2.0-rc5
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >> found
> >>> at:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1392
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
> >>>
> >>> The list of bug fixes going into 3.2.0 can be found at the
> >> following
> >>> URL:
> >>>
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
> >>>
> >>> This release is using the release script of the tag v3.2.0-rc5.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>>
> >>> How can I help test this release?
> >>>
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >> taking
> >>>
> >>> an existing Spark workload and running on this release candidate,
> >> then
> >>>
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >> install
> >>>
> >>> the current RC and see if anything important breaks, in the
> >> Java/Scala
> >>>
> >>> you can add the staging repository to your projects resolvers and
> >> test
> >>>
> >>> with the RC (make sure to clean up the artifact cache before/after
> >> so
> >>>
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>>
> >>> What should happen to JIRA tickets still targeting 3.2.0?
> >>>
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 3.2.0 can be found
> >> at:
> >>>
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >> "Target
> >>> Version/s" = 3.2.0
> >>>
> >>> Committers should look at those and triage. Extremely important
> >> bug
> >>>
> >>> fixes, documentation, and API tweaks that impact compatibility
> >> should
> >>>
> >>> be worked on immediately. Everything else please retarget to an
> >>>
> >>> appropriate release.
> >>>
> >>> ==
> >>>
> >>> But my bug isn't fixed?
> >>>
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>>
> >>> release unless the bug in question is a regression from the
> >> previous
> >>>
> >>> release. That being said, if there is something which is a
> >> regression
> >>>
> >>> that has not been correctly targeted please ping me or a committer
> >> to
> >>>
> >>> help target the issue.
> >>
> >>
> > -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Mon, Sep 27, 2021 at 8:55 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 29 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc5 (commit
> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
> https://github.com/apache/spark/tree/v3.2.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1392
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc5.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 29 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc5 (commit
49aea14c5afd93ae1b9d19b661cc273a557853f5):
https://github.com/apache/spark/tree/v3.2.0-rc5

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1392

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc5.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Thank you, Peter.
I will start RC5 today.

On Fri, Sep 24, 2021 at 12:06 AM Peter Toth  wrote:

> Hi All,
>
> Sorry, but I've just run into this issue:
> https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17419285
> I think SPARK-35672 is a breaking change.
>
> Peter
>
>
> On Thu, Sep 23, 2021 at 5:32 PM Yi Wu  wrote:
>
>> +1 (non-binding)
>>
>> Thanks for the work, Gengliang!
>>
>> Bests,
>> Yi
>>
>> On Thu, Sep 23, 2021 at 10:03 PM Gengliang Wang  wrote:
>>
>>> Starting with my +1(non-binding)
>>>
>>> Thanks,
>>> Gengliang
>>>
>>> On Thu, Sep 23, 2021 at 10:02 PM Gengliang Wang 
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as
>>>> Apache Spark version 3.2.0.
>>>>
>>>> The vote is open until 11:59pm Pacific time September 27 and passes if
>>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v3.2.0-rc4 (commit
>>>> b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
>>>> https://github.com/apache/spark/tree/v3.2.0-rc4
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1391
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/
>>>>
>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>
>>>> This release is using the release script of the tag v3.2.0-rc4.
>>>>
>>>>
>>>> FAQ
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with a out of date RC going forward).
>>>>
>>>> ===
>>>> What should happen to JIRA tickets still targeting 3.2.0?
>>>> ===
>>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.0
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==
>>>> But my bug isn't fixed?
>>>> ==
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>


Re: [VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Thu, Sep 23, 2021 at 10:02 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 27 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc4 (commit
> b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
> https://github.com/apache/spark/tree/v3.2.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1391
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc4.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 27 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc4 (commit
b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
https://github.com/apache/spark/tree/v3.2.0-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1391

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc4.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-23 Thread Gengliang Wang
Hi All,

Thanks for the votes and suggestions!
Because of the issues above and SPARK-36782
<https://issues.apache.org/jira/browse/SPARK-36782>, I decide to build RC4
and start new votes now.


On Wed, Sep 22, 2021 at 10:18 AM Venkatakrishnan Sowrirajan <
vsowr...@asu.edu> wrote:

> Yes that's correct, the failure is observed with both Hadoop-2.7 as well
> as Hadoop-2.10 (internal use)
>
> On Tue, Sep 21, 2021, 7:15 PM Mridul Muralidharan 
> wrote:
>
>> The failure I observed looks the same as what Venkat mentioned, lz4 tests
>> in FileSuite in core were failing with hadoop-2.7 profile.
>>
>> Regards,
>> Mridul
>>
>> On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:
>>
>>> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
>>> tried it locally on the master branch and the tests are all passing. Could
>>> you provide more details?
>>>
>>> The reason we want to disable the LZ4 test is because it requires the
>>> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
>>> have.
>>>
>>> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
>>> vsowr...@asu.edu> wrote:
>>>
>>>> Hi Chao,
>>>>
>>>> But there are tests in core as well failing. For
>>>> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
>>>> you think we should disable these tests for hadoop version < 3.x?
>>>>
>>>> Regards
>>>> Venkata krishnan
>>>>
>>>>
>>>> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>>>>
>>>>> I just created SPARK-36820 for the above LZ4 test issue. Will post a
>>>>> PR there soon.
>>>>>
>>>>> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>>>>>
>>>>>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently
>>>>>> uses Hadoop compression codec while Hadoop 2.7 still depends on native 
>>>>>> lib
>>>>>> for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>>>>>
>>>>>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan <
>>>>>> mri...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Signatures, digests, etc check out fine.
>>>>>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
>>>>>>> this worked fine.
>>>>>>>
>>>>>>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native
>>>>>>> lz4 library not available").
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mridul
>>>>>>>
>>>>>>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To Stephen: Thanks for pointing that out. I agree with that.
>>>>>>>> To Sean: I made a PR
>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/34059__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5St4gRBM$>
>>>>>>>>  to
>>>>>>>> remove the test dependency so that we can start RC4 ASAP.
>>>>>>>>
>>>>>>>> Gengliang
>>>>>>>>
>>>>>>>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>>>>>>>
>>>>>>>>> Hm yeah I tend to agree. See
>>>>>>>>> https://github.com/apache/spark/pull/33912
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/33912__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5nHr4Dvc$>
>>>>>>>>> This _is_ a test-only dependency which makes it less of an issue.
>>>>>>>>> I'm guessing it's not in Maven as it's a small one-off utility; we
>>>>>>>>> _could_ just inline the ~100 lines of code in test code instead?
>>>>>>>>>
>>>>>>>>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> I was going to -1 this because of the
>>>>

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Gengliang Wang
To Stephen: Thanks for pointing that out. I agree with that.
To Sean: I made a PR <https://github.com/apache/spark/pull/34059> to remove
the test dependency so that we can start RC4 ASAP.

Gengliang

On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:

> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
> This _is_ a test-only dependency which makes it less of an issue.
> I'm guessing it's not in Maven as it's a small one-off utility; we _could_
> just inline the ~100 lines of code in test code instead?
>
> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy 
> wrote:
>
>> Hi there,
>>
>> I was going to -1 this because of the
>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
>> Maven Central, and therefore is not available from our repository manager
>> (Nexus).
>>
>> Historically  most places I have worked have avoided other public maven
>> repositories because they are not well curated. i.e artifacts with the same
>> GAV have been known to change over time, which never happens with Maven
>> Central.
>>
>> I know that I can address this by changing my settings.xml file.
>>
>> Anyway, I can see this biting other people so I thought that I would
>> mention it.
>>
>> Steve C
>>
>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 24 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289473704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=mf1Z69isdBZnI7I5MS0ss3GmCmN%2FiyHqfrnKrG4U4qk%3D=0>
>>
>> The tag to be voted on is v3.2.0-rc3 (commit
>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>> https://github.com/apache/spark/tree/v3.2.0-rc3
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc3=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289473704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=0QYDm8FEE9Zikf8%2F6x2SvFfjlsqNyarpMd9%2B2xjwnhY%3D=0>
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc3-bin%2F=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289483662%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=OAS4bFev%2FZNByxF4%2Bs8%2FcZCv%2BxZwd2D3K6ayeKGYxhs%3D=0>
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289483662%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=hLzYUBYBg0NQLnFKnCX1iD2HRM1zyqaVuyVQF82UKaQ%3D=0>
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1390
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1390=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289493632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=vMnpXpJZuq49WKmnuvAmaiXzTi9dvSbFQxr4yL8cGDI%3D=0>
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc3-docs%2F=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Sun, Sep 19, 2021 at 11:18 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc3.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 24 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc3 (commit
96044e97353a079d3a7233ed3795ca82f3d9a101):
https://github.com/apache/spark/tree/v3.2.0-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1390

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc3.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-17 Thread Gengliang Wang
Hi Tom,

I will cut RC3 right after SPARK-36772
<https://issues.apache.org/jira/browse/SPARK-36772>is resolved.

Thanks,
Gengliang

On Fri, Sep 17, 2021 at 10:03 PM Tom Graves  wrote:

> Hey folks,
>
> just curious what the status was on doing an rc3?  I didn't see any
> blockers left since it looks like parquet change got merged.
>
> Thanks,
> Tom
>
> On Thursday, September 9, 2021, 12:27:58 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> I have filed a blocker, SPARK-36705
> <https://issues.apache.org/jira/browse/SPARK-36705> which will need to be
> addressed.
>
> Regards,
> Mridul
>
>
> On Sun, Sep 5, 2021 at 8:47 AM Gengliang Wang  wrote:
>
> Hi all,
>
> the voting fails.
> Liang-Chi reported a new block SPARK-36669
> <https://issues.apache.org/jira/browse/SPARK-36669>. We will have RC3
> when the existing issues are resolved.
>
>
> On Thu, Sep 2, 2021 at 5:01 AM Sean Owen  wrote:
>
> This RC looks OK to me too, understanding we may need to have RC3 for the
> outstanding issues though.
>
> The issue with the Scala 2.13 POM is still there; I wasn't able to figure
> it out (anyone?), though it may not affect 'normal' usage (and is
> work-around-able in other uses, it seems), so may be sufficient if Scala
> 2.13 support is experimental as of 3.2.0 anyway.
>
>
> On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-05 Thread Gengliang Wang
Hi all,

the voting fails.
Liang-Chi reported a new block SPARK-36669
<https://issues.apache.org/jira/browse/SPARK-36669>. We will have RC3 when
the existing issues are resolved.


On Thu, Sep 2, 2021 at 5:01 AM Sean Owen  wrote:

> This RC looks OK to me too, understanding we may need to have RC3 for the
> outstanding issues though.
>
> The issue with the Scala 2.13 POM is still there; I wasn't able to figure
> it out (anyone?), though it may not affect 'normal' usage (and is
> work-around-able in other uses, it seems), so may be sufficient if Scala
> 2.13 support is experimental as of 3.2.0 anyway.
>
>
> On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 3 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc2 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1389
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc2.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Hi all,

After reviewing and testing RC1, the community has fixed multiple bugs and
improved the documentation. Thanks for the efforts, everyone!
Even though there are known issues in RC2 now, we can still test it and
find more potential issues as early as possible.

Changes after RC1

   - Updates AuthEngine to pass the correct SecretKeySpec format
   
<https://github.com/apache/spark/commit/243bfafd5cb58c1d3ae6c2a1a9e2c14c3a13526c>

   - [
   
<https://github.com/apache/spark/commit/bdd3b490263405a45537b406e20d1877980ab372>
   SPARK-36552 <https://issues.apache.org/jira/browse/SPARK-36552>][SQL]
   Fix different behavior for writing char/varchar to hive and datasource table
   
<https://github.com/apache/spark/commit/bdd3b490263405a45537b406e20d1877980ab372>

   - [
   
<https://github.com/apache/spark/commit/36df86c0d058977f0f202abd0106881474f18f0e>
   SPARK-36564 <https://issues.apache.org/jira/browse/SPARK-36564>][CORE]
   Fix NullPointerException in LiveRDDDistribution.toApi
   
<https://github.com/apache/spark/commit/36df86c0d058977f0f202abd0106881474f18f0e>
   - Revert "[
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>
   SPARK-34415 <https://issues.apache.org/jira/browse/SPARK-34415>][ML]
   Randomization in hyperparameter optimization"
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>

   - [
   
<https://github.com/apache/spark/commit/fb38887e001d33adef519d0288bd0844dcfe2bd5>
   SPARK-36398 <https://issues.apache.org/jira/browse/SPARK-36398>][SQL]
   Redact sensitive information in Spark Thrift Server log
   
<https://github.com/apache/spark/commit/fb38887e001d33adef519d0288bd0844dcfe2bd5>
   - [
   
<https://github.com/apache/spark/commit/c21303f02c582e97fefc130415e739ddda8dd43e>
   SPARK-36594 <https://issues.apache.org/jira/browse/SPARK-36594>][SQL][3.2]
   ORC vectorized reader should properly check maximal number of fields
   
<https://github.com/apache/spark/commit/c21303f02c582e97fefc130415e739ddda8dd43e>
   - [
   
<https://github.com/apache/spark/commit/93f2b00501c7fad20fb6bc130b548cb87e9f91f1>
   SPARK-36509 <https://issues.apache.org/jira/browse/SPARK-36509>][CORE]
   Fix the issue that executors are never re-scheduled if the worker stops
   with standalone cluster
   
<https://github.com/apache/spark/commit/93f2b00501c7fad20fb6bc130b548cb87e9f91f1>
   - [SPARK-36367 <https://issues.apache.org/jira/browse/SPARK-36367>] Fix
   the behavior to follow pandas >= 1.3
   - Many documentation improvements


Known Issues after RC2 cut

   - PARQUET-2078 <https://issues.apache.org/jira/browse/PARQUET-2078>: Failed
   to read parquet file after writing with the same parquet version if
   `spark.sql.hive.convertMetastoreParquet` is false
   - SPARK-36629 <https://issues.apache.org/jira/browse/SPARK-36629>:
   Upgrade aircompressor to 1.21


Thanks,
Gengliang

On Wed, Sep 1, 2021 at 3:07 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository 

[VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 3 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc2 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1389

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc2.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Gengliang Wang
Hi Chao & DB,

Actually, I cut the RC2 yesterday before your post the Parquet issue:
https://github.com/apache/spark/tree/v3.2.0-rc2
It has been 11 days since RC1. I think we can have RC2 today so that the
community can test and find potential issues earlier.
As for the Parquet issue, we can treat it as a known blocker. If it takes
more than one week(which is not likely to happen), we will have to consider
reverting Parquet 1.12 and related features from branch-3.2.

Gengliang

On Wed, Sep 1, 2021 at 5:40 AM DB Tsai  wrote:

> Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet
> 1.12, so it might be easier to wait for the fix in parquet community
> instead of reverting all the related changes. The fix in parquet community
> is very trivial, and we hope that it will not take too long. Thanks.
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
>
> On Tue, Aug 31, 2021 at 1:09 PM Chao Sun  wrote:
>
>> Hi Xiao, I'm still checking with the Parquet community on this. Since the
>> fix is already +1'd, I'm hoping this won't take long. The delta in
>> parquet-1.12.x branch is also small with just 2 commits so far.
>>
>> Chao
>>
>> On Tue, Aug 31, 2021 at 12:03 PM Xiao Li  wrote:
>>
>>> Hi, Chao,
>>>
>>> How long will it take? Normally, in the RC stage, we always revert the
>>> upgrade made in the current release. We did the parquet upgrade multiple
>>> times in the previous releases for avoiding the major delay in our Spark
>>> release
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>>
>>> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun  wrote:
>>>
>>>> The Apache Parquet community found an issue [1] in 1.12.0 which could
>>>> cause incorrect file offset being written and subsequently reading of the
>>>> same file to fail. A fix has been proposed in the same JIRA and we may have
>>>> to wait until a new release is available so that we can upgrade Spark with
>>>> the hot fix.
>>>>
>>>> [1]: https://issues.apache.org/jira/browse/PARQUET-2078
>>>>
>>>> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:
>>>>
>>>>> Maybe, I'm just confused why it's needed at all. Other profiles that
>>>>> add a dependency seem OK, but something's different here.
>>>>>
>>>>> One thing we can/should change is to simply remove the
>>>>>  block in the profile. It should always be a direct
>>>>> dep in Scala 2.13 (which lets us take out the profiles in submodules, 
>>>>> which
>>>>> just repeat that)
>>>>> We can also update the version, by the by.
>>>>>
>>>>> I tried this and the resulting POM still doesn't look like what I
>>>>> expect though.
>>>>>
>>>>> (The binary release is OK, FWIW - it gets pulled in as a JAR as
>>>>> expected)
>>>>>
>>>>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
>>>>> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>>
>>>>>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ 
>>>>>> will
>>>>>> help you out here.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Steve C
>>>>>>
>>>>>> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>>>>>>
>>>>>> OK right, you would have seen a different error otherwise.
>>>>>>
>>>>>> Yes profiles are only a compile-time thing, but they should affect
>>>>>> the effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom
>>>>>> shows scala-parallel-collections as a dependency in the POM as expected
>>>>>> (not in a profile). However I see what you see in the .pom in the release
>>>>>> repo, and in my local repo after building - it's just sitting there as a
>>>>>> profile as if it weren't activated or something.
>>>>>>
>>>>>> I'm confused then, that shouldn't be what happens. I'd say maybe
>>>>>> there is a problem with the release script, but seems to affect a simple
>>>>>> local build. Anyone else more expert in this see the problem, while I try
>>>>>> to debug more?
>>>>>> The binary distro may actually be fine, I'll check; it may even not
>>>>>> matter much for users who generally

Re: spark 3.2 release date

2021-08-30 Thread Gengliang Wang
Hi,

There is not exact release date now. As per 
https://spark.apache.org/release-process.html 
<https://spark.apache.org/release-process.html>, we need a Release Candidate 
which passes the release vote.
Spark 3.2 RC1 failed recently. I will cut RC2 after 
https://issues.apache.org/jira/browse/SPARK-36619 
<https://issues.apache.org/jira/browse/SPARK-36619> is resolved.


Gengliang Wang




> On Aug 31, 2021, at 12:06 PM, infa elance  wrote:
> 
> What is the expected ballpark release date of spark 3.2 ? 
> 
> Thanks and Regards,
> Ajay.



Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-25 Thread Gengliang Wang
Hi all,

So, RC1 failed.
After RC1 cut, we have merged the following bug fixes to branch-3.2:

   - Updates AuthEngine to pass the correct SecretKeySpec format
   
<https://github.com/apache/spark/commit/243bfafd5cb58c1d3ae6c2a1a9e2c14c3a13526c>
   - Fix NullPointerException in LiveRDDDistribution.toAPI
   
<https://github.com/apache/spark/commit/36df86c0d058977f0f202abd0106881474f18f0e>
   - Revert "[
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>
   SPARK-34415 <https://issues.apache.org/jira/browse/SPARK-34415>][ML]
   Randomization in hyperparameter optimization"
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>
   - Redact sensitive information in Spark Thrift Server
   
<https://github.com/apache/spark/commit/fb38887e001d33adef519d0288bd0844dcfe2bd5>

I will cut RC2 after the following issues are resolved:

   - Add back transformAllExpressions to AnalysisHelper(SPARK-36581
   <https://issues.apache.org/jira/browse/SPARK-36581>)
   - Review and fix issues in API docs(SPARK-36457
   <https://issues.apache.org/jira/browse/SPARK-36457>)
   - Support setting "since" version in FunctionRegistry (SPARK-36585
   <https://issues.apache.org/jira/browse/SPARK-36585>)
   - pushDownPredicate=false failed to prevent push down filters to JDBC
   data source(SPARK-36574
   <https://issues.apache.org/jira/browse/SPARK-36574>)

Please let me know if you know of any other new bugs/blockers for the 3.2.0
release.

Thanks,
Gengliang

On Wed, Aug 25, 2021 at 2:50 AM Sean Owen  wrote:

> I think we'll need this revert:
> https://github.com/apache/spark/pull/33819
>
> Between that and a few other minor but important issues I think I'd say -1
> myself and ask for another RC.
>
> On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski  wrote:
>
>> Hi Yi Wu,
>>
>> Looks like the issue has got resolution: Won't Fix. How about your -1?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Mon, Aug 23, 2021 at 4:58 AM Yi Wu  wrote:
>>
>>> -1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558)
>>> in the push-based shuffle, which could lead to job hang.
>>>
>>> Bests,
>>> Yi
>>>
>>> On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang  wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>>  version 3.2.0.
>>>>
>>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v3.2.0-rc1 (commit
>>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>>>
>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>
>>>> This release is using the release script of the tag v3.2.0-rc1.
>>>>
>>>>
>>>> FAQ
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a vi

Re: Add option to Spark UI to proxy to the executors?

2021-08-22 Thread Gengliang Wang
Hi Holden,

FYI there are already some related features in Spark:

   - Spark Master UI to reverse proxy Application and Workers UI
   
   - Support Spark UI behind front-end reverse proxy using a path prefix
   Revert proxy URL 

Not sure if they are helpful to you.

On Sat, Aug 21, 2021 at 3:16 PM Mich Talebzadeh 
wrote:

> Yes I can see your point.
>
> Will that work in kubernetes deployment?
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 21 Aug 2021 at 00:02, Holden Karau  wrote:
>
>> Hi Folks,
>>
>> I'm wondering what people think about the idea of having the Spark UI
>> (optionally) act as a proxy to the executors? This could help with exec UI
>> access in some deployment environments.
>>
>> Cheers,
>>
>> Holden :)
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Gengliang Wang
Hi Mridul,

yes, Spark 3.2.0 should include the fix.
The PR is merged after the RC1 cut and there is no JIRA for the issue so
that it is missed.

On Sun, Aug 22, 2021 at 2:27 PM Mridul Muralidharan 
wrote:

> Hi,
>
>   Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos
> -Pkubernetes
>
> I am seeing test failures which are addressed by #33790
> <https://github.com/apache/spark/pull/33790> - this is in branch-3.2, but
> after the RC tag.
> After updating to the head of branch-3.2, I can get that test to pass.
>
> Given the failure, and as the fix is already in the branch, will -1 the RC.
>
> Regards,
> Mridul
>
>
> On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Gengliang Wang
Hi Jacek,

The current GitHub action CI for Spark contains Java 11 build. The build is
successful with the options "-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g":
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L506
The default Java stack size is small and we have to raise it for Spark
build with the option "-Xss64m".

On Sat, Aug 21, 2021 at 9:33 PM Jacek Laskowski  wrote:

> Hi,
>
> I've been building the tag and I'm facing the following StackOverflowError:
>
> Exception in thread "main" java.lang.StackOverflowError
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> at
> scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
> at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:280)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:133)
> at scala.reflect.internal.Trees.itransform(Trees.scala:1430)
> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
> at
> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> at scala.reflect.internal.Trees.itransform(Trees.scala:1409)
> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
> at
> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> ...
>
> The command I use:
>
> ./build/mvn \
> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
> -DskipTests \
> clean install
>
> $ java --version
> openjdk 11.0.11 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed
> mode)
>
> $ ./build/mvn -v
> Using `mvn` from path: /usr/local/bin/mvn
> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
> Maven home: /usr/local/Cellar/maven/3.8.1/libexec
> Java version: 11.0.11, vendor: AdoptOpenJDK, runtime:
> /Users/jacek/.sdkman/candidates/java/11.0.11.hs-adpt
> Default locale: en_PL, platform encoding: UTF-8
> OS name: "mac os x", version: "11.5", arch: "x86_64", family: "mac"
>
> $ echo $MAVEN_OPTS
> -Xmx8g -XX:ReservedCodeCacheSize=1g
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Fri, Aug 20, 2021 at 7:05 PM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>
>> The documentation corresponding to this release can be f

[VOTE] Release Spark 3.2.0 (RC1)

2021-08-20 Thread Gengliang Wang
Please vote on releasing the following candidate as Apache Spark version 3.2
.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1388

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc1.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: Spark 3.2.0 first RC next week

2021-08-11 Thread Gengliang Wang
Thanks for your information, Sean and Min!

On Tue, Aug 10, 2021 at 9:59 PM Min Shen  wrote:

> Hi Gengliang,
>
> SPARK-36378 (Switch to using RPCResponse to communicate common block push
> failures to the client) should be another one.
> This introduces a slight protocol change to push-based shuffle to improve
> code robustness and performance, and is almost ready to be committed.
> Because of the protocol change, it’s best to include it with 3.2.0
> release.
>
> Best,
> Min
>
> On Tue, Aug 10, 2021 at 01:13 Gengliang Wang  wrote:
>
>> Hi all,
>>
>> As of now, there are still some open/in-progress blockers for Spark 3.2.0
>> release:
>>
>>- Prohibit update mode in native support of session window (
>>SPARK-36463 <https://issues.apache.org/jira/browse/SPARK-36463>)
>>- Avoid inlining non-deterministic With-CTEs(SPARK-36447
>><https://issues.apache.org/jira/browse/SPARK-36447>)
>>- Data Source V2: Remove read specific distributions(SPARK-33807
>><https://issues.apache.org/jira/browse/SPARK-33807>)
>>- Support fetching shuffle blocks in batch with i/o encryption(
>>SPARK-34827 <https://issues.apache.org/jira/browse/SPARK-34827>)
>>- Add a new Maven profile "no-shaded-client" for older Hadoop 3.x
>>versions(SPARK-35959
>><https://issues.apache.org/jira/browse/SPARK-35959>)
>>- Review and fix issues in API docs(SPARK-34185
>><https://issues.apache.org/jira/browse/SPARK-34185>)
>>- Introduce the RocksDBStateStoreProvider in the programming guide(
>>SPARK-36041 <https://issues.apache.org/jira/browse/SPARK-36041>)
>>- Push-based shuffle documentation(SPARK-36374
>><https://issues.apache.org/jira/browse/SPARK-36374>)
>>
>> Thus, I propose to cut RC1 next week after all the blockers are resolved.
>> If there are any other blockers, please reply to this email.
>>
>> Thanks
>> Gengliang
>>
>


Spark 3.2.0 first RC next week

2021-08-10 Thread Gengliang Wang
Hi all,

As of now, there are still some open/in-progress blockers for Spark 3.2.0
release:

   - Prohibit update mode in native support of session window (SPARK-36463
   <https://issues.apache.org/jira/browse/SPARK-36463>)
   - Avoid inlining non-deterministic With-CTEs(SPARK-36447
   <https://issues.apache.org/jira/browse/SPARK-36447>)
   - Data Source V2: Remove read specific distributions(SPARK-33807
   <https://issues.apache.org/jira/browse/SPARK-33807>)
   - Support fetching shuffle blocks in batch with i/o encryption(
   SPARK-34827 <https://issues.apache.org/jira/browse/SPARK-34827>)
   - Add a new Maven profile "no-shaded-client" for older Hadoop 3.x
   versions(SPARK-35959 <https://issues.apache.org/jira/browse/SPARK-35959>)
   - Review and fix issues in API docs(SPARK-34185
   <https://issues.apache.org/jira/browse/SPARK-34185>)
   - Introduce the RocksDBStateStoreProvider in the programming guide(
   SPARK-36041 <https://issues.apache.org/jira/browse/SPARK-36041>)
   - Push-based shuffle documentation(SPARK-36374
   <https://issues.apache.org/jira/browse/SPARK-36374>)

Thus, I propose to cut RC1 next week after all the blockers are resolved.
If there are any other blockers, please reply to this email.

Thanks
Gengliang


Re: Apache Spark 3.2 Expectation

2021-07-01 Thread Gengliang Wang
Hi all,

I just cut branch-3.2 on Github and created version 3.3.0 on Jira.
When merging PRs on the master branch before 3.2.0 RC, please help
cherry-picking bug fixes and ongoing major features mentioned in this
thread to branch-3.2, thanks!

On Fri, Jul 2, 2021 at 2:31 AM Dongjoon Hyun 
wrote:

> Thank you, Gengliang!
>
> On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang  wrote:
>
>> Hi all,
>>
>> Just as a gentle reminder, I will do the branch cut tomorrow. Please
>> focus on finalizing the works to land in Spark 3.2.0.
>> After the branch cut, we can still merge the ongoing major features
>> mentioned in this thread. There should no be other new features in branch
>> 3.2.
>> Thanks!
>>
>> On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon  wrote:
>>
>>> *GA -> QA
>>>
>>> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon,  wrote:
>>>
>>>> I think we would make sure treating these items in the list as
>>>> exceptions from the code freeze, and discourage to push new APIs and
>>>> features though.
>>>>
>>>> GA period ideally we should focus on bug fixes and polishing.
>>>>
>>>> It would be great if we can speed up on these items in the list too.
>>>>
>>>>
>>>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang,  wrote:
>>>>
>>>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>>>> Now we make it clear that it's a soft cut and we can still merge
>>>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>>>> date as July 1st.
>>>>>
>>>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun 
>>>>> wrote:
>>>>>
>>>>>> > First, I think you are saying "branch-3.2";
>>>>>>
>>>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>>>
>>>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>> > This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>>
>>>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+
>>>>>> as Xiao wrote.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li  wrote:
>>>>>>
>>>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>>>> according to their decisions.
>>>>>>>
>>>>>>>
>>>>>>> First, I think you are saying "branch-3.2";
>>>>>>>
>>>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>>>
>>>>>>> This way, we can backport the other performance/operability
>>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released 
>>>>>>>> in
>>>>>>>> future Spark 3.2.x patch releases.
>>>>>>>
>>>>>>>
>>>>>>> This is not allowed based on the policy. Only bug fixes can be
>>>>>>> merged to the patch releases. Thus, if we know it will introduce major
>>>>>>> performance regression, we have to turn the feature off by default.
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Min Shen  于2021年6月16日周三 下午3:22写道:
>>>>>>>
>>>>>>>> Hi Gengliang,
>>>>>>>>
>>>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>>>> are close to having all the patches merged to master to enable 
>&

Re: Apache Spark 3.2 Expectation

2021-06-30 Thread Gengliang Wang
Hi all,

Just as a gentle reminder, I will do the branch cut tomorrow. Please focus
on finalizing the works to land in Spark 3.2.0.
After the branch cut, we can still merge the ongoing major features
mentioned in this thread. There should no be other new features in branch
3.2.
Thanks!

On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon  wrote:

> *GA -> QA
>
> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon,  wrote:
>
>> I think we would make sure treating these items in the list as exceptions
>> from the code freeze, and discourage to push new APIs and features though.
>>
>> GA period ideally we should focus on bug fixes and polishing.
>>
>> It would be great if we can speed up on these items in the list too.
>>
>>
>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang,  wrote:
>>
>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>> Now we make it clear that it's a soft cut and we can still merge
>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>> date as July 1st.
>>>
>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun 
>>> wrote:
>>>
>>>> > First, I think you are saying "branch-3.2";
>>>>
>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>
>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>> all the patches under SPARK-30602.
>>>> > This way, we can backport the other performance/operability
>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>> future Spark 3.2.x patch releases.
>>>>
>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
>>>> Xiao wrote.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li  wrote:
>>>>
>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>> according to their decisions.
>>>>>
>>>>>
>>>>> First, I think you are saying "branch-3.2";
>>>>>
>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>
>>>>> This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>
>>>>>
>>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>>> to the patch releases. Thus, if we know it will introduce major 
>>>>> performance
>>>>> regression, we have to turn the feature off by default.
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>> Min Shen  于2021年6月16日周三 下午3:22写道:
>>>>>
>>>>>> Hi Gengliang,
>>>>>>
>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>> shuffle.
>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>
>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>> enable push-based shuffle.
>>>>>> We do have other performance/operability enhancements tickets under
>>>>>> SPARK-33235 that are needed to fully contribute what we have internally 
>>>>>> for
>>>>>> push-based shuffle.
>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Gengliang Wang
+1 for targeting the renaming for Apache Spark 3.3 at the current phase.

On Fri, Jun 25, 2021 at 6:55 AM DB Tsai  wrote:

> +1 on renaming.
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Jun 24, 2021, at 11:41 AM, Chao Sun  wrote:
>
> Hi,
>
> As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile
> name hadoop-3.2 is no longer accurate, and it may confuse Spark users when
> they realize the actual version is not Hadoop 3.2.x. Therefore, I created
> https://issues.apache.org/jira/browse/SPARK-33880 to change the profile
> name to hadoop-3 and hadoop-2 respectively. What do you think? Is this
> something worth doing as part of Spark 3.2.0 release?
>
> Best,
> Chao
>
>
>


Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-20 Thread Gengliang Wang
+1 (non-binding)

> On Jun 21, 2021, at 1:33 PM, Hyukjin Kwon  wrote:
> 
> +1
> 
> 2021년 6월 21일 (월) 오후 2:19, Dongjoon Hyun  >님이 작성:
> +1
> 
> Thank you, Yi.
> 
> Bests,
> Dongjoon.
> 
> 
> On Sat, Jun 19, 2021 at 6:57 PM Yuming Wang  > wrote:
> +1
> 
> Tested a batch of production query with Thrift Server.
> 
> On Sat, Jun 19, 2021 at 3:04 PM Mridul Muralidharan  > wrote:
> 
> +1
> 
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos -Pkubernetes
> 
> Regards,
> Mridul
> 
> PS: Might be related to some quirk of my local env - the first test run 
> (after clean + package) usually fails for me (typically for hive tests) - 
> with a second run succeeding : this is not specific to this RC though.
> 
> On Fri, Jun 18, 2021 at 6:14 PM Liang-Chi Hsieh  > wrote:
> +1. Docs looks good. Binary looks good.
> 
> Ran simple test and some tpcds queries.
> 
> Thanks for working on this!
> 
> 
> wuyi wrote
> > Please vote on releasing the following candidate as Apache Spark version
> > 3.0.3.
> > 
> > The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC
> > votes are cast, with
> > a minimum of 3 +1 votes.
> > 
> > [ ] +1 Release this package as Apache Spark 3.0.3
> > [ ] -1 Do not release this package because ...
> > 
> > To learn more about Apache Spark, please see https://spark.apache.org/ 
> > 
> > 
> > The tag to be voted on is v3.0.3-rc1 (commit
> > 65ac1e75dc468f53fc778cd2ce1ba3f21067aab8):
> > https://github.com/apache/spark/tree/v3.0.3-rc1 
> > 
> > 
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.3-rc1-bin/ 
> > 
> > 
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS 
> > 
> > 
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1386/ 
> > 
> > 
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.3-rc1-docs/ 
> > 
> > 
> > The list of bug fixes going into 3.0.3 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12349723 
> > 
> > 
> > This release is using the release script of the tag v3.0.3-rc1.
> > 
> > FAQ
> > 
> > =
> > How can I help test this release?
> > =
> > 
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> > 
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> > 
> > ===
> > What should happen to JIRA tickets still targeting 3.0.3?
> > ===
> > 
> > The current list of open tickets targeted at 3.0.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK 
> >  and search for "Target
> > Version/s" = 3.0.3
> > 
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> > 
> > ==
> > But my bug isn't fixed?
> > ==
> > 
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ 
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 



Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Gengliang Wang
Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
Now we make it clear that it's a soft cut and we can still merge important
code changes to branch-3.2 before RC. Let's keep the branch cut date as
July 1st.

On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun 
wrote:

> > First, I think you are saying "branch-3.2";
>
> To Xiao. Yes, it's was a typo of "branch-3.2".
>
> > We do strongly prefer to cut the release for Spark 3.2.0 including all
> the patches under SPARK-30602.
> > This way, we can backport the other performance/operability
> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
> future Spark 3.2.x patch releases.
>
> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
> Xiao wrote.
>
>
>
> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li  wrote:
>
>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
>>> cut and the committers still are able to commit to `branch-3.3` according
>>> to their decisions.
>>
>>
>> First, I think you are saying "branch-3.2";
>>
>> Second, the "so cut" means no "code freeze", although we cut the branch.
>> To avoid releasing half-baked and unready features, the release
>> manager needs to be very careful when cutting the RC. Based on what is
>> proposed here, the RC date is the actual code freeze date.
>>
>> This way, we can backport the other performance/operability enhancements
>>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>>> 3.2.x patch releases.
>>
>>
>> This is not allowed based on the policy. Only bug fixes can be merged to
>> the patch releases. Thus, if we know it will introduce major performance
>> regression, we have to turn the feature off by default.
>>
>> Xiao
>>
>>
>>
>> Min Shen  于2021年6月16日周三 下午3:22写道:
>>
>>> Hi Gengliang,
>>>
>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
>>> close to having all the patches merged to master to enable push-based
>>> shuffle.
>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>> We should be able to post the PRs for the other 2 remaining tickets
>>> (SPARK-32923 and SPARK-35546) early next week.
>>>
>>> The tickets under SPARK-30602 are the minimum set of patches to enable
>>> push-based shuffle.
>>> We do have other performance/operability enhancements tickets under
>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>> push-based shuffle.
>>> However, these are optional for enabling push-based shuffle.
>>> We do strongly prefer to cut the release for Spark 3.2.0 including all
>>> the patches under SPARK-30602.
>>> This way, we can backport the other performance/operability enhancements
>>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>>> 3.2.x patch releases.
>>> I understand the preference of not postponing the branch cut date.
>>> We will check with Dongjoon regarding the soft cut date and the
>>> flexibility for including the remaining tickets under SPARK-30602 into
>>> branch-3.2.
>>>
>>> Best,
>>> Min
>>>
>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh 
>>> wrote:
>>>
>>>>
>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>> As it is soft cut date, there is no reason to postpone it.
>>>>
>>>> It sounds good then to keep original branch cut date.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>> Dongjoon Hyun-2 wrote
>>>> > Thank you for volunteering, Gengliang.
>>>> >
>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm
>>>> also
>>>> > watching some on-going improvements on that.
>>>> >
>>>> > https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
>>>> Query
>>>> > Execution QA)
>>>> >
>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>> soft
>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>> according
>>>> > to their decisions.
>>>> 

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Gengliang Wang
Thanks, Hyukjin.

The expected target branch cut date of Spark 3.2 is *July 1st* on
https://spark.apache.org/versioning-policy.html. However, I notice that
there are still multiple important projects in progress now:

[Core]

   - SPIP: Support push-based shuffle to improve shuffle efficiency
   <https://issues.apache.org/jira/browse/SPARK-30602>

[SQL]

   - Support ANSI SQL INTERVAL types
   <https://issues.apache.org/jira/browse/SPARK-27790>
   - Support Timestamp without time zone data type
   <https://issues.apache.org/jira/browse/SPARK-35662>
   - Aggregate (Min/Max/Count) push down for Parquet
   <https://issues.apache.org/jira/browse/SPARK-34952>

[Streaming]

   - EventTime based sessionization (session window)
   <https://issues.apache.org/jira/browse/SPARK-10816>
   - Add RocksDB StateStore as external module
   <https://issues.apache.org/jira/browse/SPARK-34198>


I wonder whether we should postpone the branch cut date.
cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
Li, Liang-Chi Hsieh, who work on the projects above.

On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon  wrote:

> +1, thanks.
>
> On Tue, 15 Jun 2021, 16:17 Gengliang Wang,  wrote:
>
>> Hi,
>>
>> As the expected release date is close,  I would like to volunteer as the
>> release manager for Apache Spark 3.2.0.
>>
>> Thanks,
>> Gengliang
>>
>> On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan  wrote:
>>
>>> An update: we found a mistake that we picked the Spark 3.2 release date
>>> based on the scheduled release date of 3.1. However, 3.1 was delayed and
>>> released on March 2. In order to have a full 6 months development for 3.2,
>>> the target release date for 3.2 should be September 2.
>>>
>>> I'm updating the release dates in
>>> https://github.com/apache/spark-website/pull/331
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun 
>>> wrote:
>>>
>>>> Thank you, Xiao, Wenchen and Hyukjin.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Just for an update, I will send a discussion email about my idea late
>>>>> this week or early next week.
>>>>>
>>>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan 님이 작성:
>>>>>
>>>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I 
>>>>>> don't
>>>>>> think it's realistic to do the branch cut in April.
>>>>>>
>>>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>>>> the branch 3 months earlier. We should make the release process faster 
>>>>>> and
>>>>>> cut the branch around June probably.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li  wrote:
>>>>>>
>>>>>>> Below are some nice-to-have features we can work on in Spark 3.2: 
>>>>>>> Lateral
>>>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>>>> queries, the returned metrics of DSV2, and error message 
>>>>>>> standardization.
>>>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>>>
>>>>>>> Go Spark!
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dongjoon Hyun  于2021年3月10日周三 下午12:25写道:
>>>>>>>
>>>>>>>> Hi, Xiao.
>>>>>>>>
>>>>>>>> This thread started 13 days ago. Since you asked the community
>>>>>>>> about major features or timelines at that time, could you share your
>>>>>>>> roadmap or expectations if you have something in your mind?
>>>>>>>>
>>>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep
>>>>>>>> it open. It might take 1-2 weeks to collect from the community all the
>>>>>>>> features we plan to build and ship in 3.2 since w

Re: Apache Spark 3.2 Expectation

2021-06-15 Thread Gengliang Wang
Hi,

As the expected release date is close,  I would like to volunteer as the
release manager for Apache Spark 3.2.0.

Thanks,
Gengliang

On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan  wrote:

> An update: we found a mistake that we picked the Spark 3.2 release date
> based on the scheduled release date of 3.1. However, 3.1 was delayed and
> released on March 2. In order to have a full 6 months development for 3.2,
> the target release date for 3.2 should be September 2.
>
> I'm updating the release dates in
> https://github.com/apache/spark-website/pull/331
>
> Thanks,
> Wenchen
>
> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun 
> wrote:
>
>> Thank you, Xiao, Wenchen and Hyukjin.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon  wrote:
>>
>>> Just for an update, I will send a discussion email about my idea late
>>> this week or early next week.
>>>
>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan 님이 작성:
>>>
>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't
>>>> think it's realistic to do the branch cut in April.
>>>>
>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>> the branch 3 months earlier. We should make the release process faster and
>>>> cut the branch around June probably.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li  wrote:
>>>>
>>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral
>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>> queries, the returned metrics of DSV2, and error message standardization.
>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>
>>>>> Go Spark!
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun  于2021年3月10日周三 下午12:25写道:
>>>>>
>>>>>> Hi, Xiao.
>>>>>>
>>>>>> This thread started 13 days ago. Since you asked the community about
>>>>>> major features or timelines at that time, could you share your roadmap or
>>>>>> expectations if you have something in your mind?
>>>>>>
>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep it
>>>>>> open. It might take 1-2 weeks to collect from the community all the
>>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1
>>>>>> voting.
>>>>>> > TBH, cutting the branch this April does not look good to me. That
>>>>>> means, we only have one month left for feature development of Spark 3.2. 
>>>>>> Do
>>>>>> we have enough features in the current master branch? If not, are we able
>>>>>> to finish major features we collected here? Do they have a timeline or
>>>>>> project plan?
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, John.
>>>>>>>
>>>>>>> This thread aims to share your expectations and goals (and maybe
>>>>>>> work progress) to Apache Spark 3.2 because we are making this together. 
>>>>>>> :)
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge  wrote:
>>>>>>>
>>>>>>>> Hi Dongjoon,
>>>>>>>>
>>>>>>>> Is it possible to get ViewCatalog in? The community already had
>>>>>>>> fairly detailed discussions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <
>>>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>>>
>&g

Re: Apache Spark 3.0.3 Release?

2021-06-09 Thread Gengliang Wang
+1, thanks Yi

Gengliang Wang




> On Jun 9, 2021, at 6:03 PM, 郑瑞峰  wrote:
> 
> +1, thanks Yi



Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-02 Thread Gengliang Wang
Thank you, Dongjoon!



> On Jun 2, 2021, at 2:11 PM, Maxim Gekk  wrote:
> 
> Congratulations everyone with the new release, and thanks to Dongjoon!
> 
> On Wed, Jun 2, 2021 at 9:08 AM Jungtaek Lim  > wrote:
> Nice! Thanks Dongjoon for your amazing efforts!
> 
> On Wed, Jun 2, 2021 at 2:59 PM Liang-Chi Hsieh  > wrote:
> Thank you, Dongjoon!
> 
> 
> 
> Takeshi Yamamuro wrote
> > Thank you, Dongjoon!
> > 
> > On Wed, Jun 2, 2021 at 2:29 PM Xiao Li 
> 
> > lixiao@
> 
> >  wrote:
> > 
> >> Thank you!
> >>
> >> Xiao
> >>
> >> On Tue, Jun 1, 2021 at 9:29 PM Hyukjin Kwon 
> 
> > gurwls223@
> 
> >  wrote:
> >>
> >>> awesome!
> >>>
> >>> 2021년 6월 2일 (수) 오전 9:59, Dongjoon Hyun 
> 
> > dongjoon.hyun@
> 
> > 님이 작성:
> >>>
>  We are happy to announce the availability of Spark 3.1.2!
> 
>  Spark 3.1.2 is a maintenance release containing stability fixes. This
>  release is based on the branch-3.1 maintenance branch of Spark. We
>  strongly
>  recommend all 3.1 users to upgrade to this stable release.
> 
>  To download Spark 3.1.2, head over to the download page:
>  https://spark.apache.org/downloads.html 
>  
> 
>  To view the release notes:
>  https://spark.apache.org/releases/spark-release-3-1-2.html 
>  
> 
>  We would like to acknowledge all community members for contributing to
>  this
>  release. This release would not have been possible without you.
> 
>  Dongjoon Hyun
> 
> >>>
> >>
> >> --
> >>
> >>
> > 
> > -- 
> > ---
> > Takeshi Yamamuro
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ 
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 



  1   2   >