Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-21 Thread Pavan Kotikalapudi
Hi Spark PMC members,

I think we have few upvotes for this effort here and more people are
showing interest (see  PR comments
.)

Is anyone interested in mentoring and reviewing this effort?

Also can the repository admin/owner re-open the PR?  ( I guess people only
with admin access to the repository can do that).

Thank you,

Pavan

On Tue, Feb 20, 2024 at 2:08 PM Krystal Mitchell 
wrote:

> +1
>
> On 2024/01/17 17:49:32 Pavan Kotikalapudi wrote:
> > Thanks for proposing and voting for the feature Mich.
> >
> > adding some references to the thread.
> >
> >- Jira ticket - SPARK-24815
> >
> 
> >- Design Doc
> ><
> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing>
> 
> >
> >- discussion thread
> >
> 
> >- PR with initial implementation -
> >https://github.com/apache/spark/pull/42352
> 
> >
> > Please vote with:
> >
> > [ ] +1: Accept the proposal and start with the development.
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you,
> >
> > Pavan
> >
> > On Wed, Jan 17, 2024 at 9:52 PM Mich Talebzadeh 
> > wrote:
> >
> > >
> > > +1 for me  (non binding)
> > >
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> >
>


Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of
truth rather than two (so option 1 sounds like a good idea); but that’s
just my opinion. I'd be happy to help with reviews though.

On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas 
wrote:

> I know config documentation is not the most exciting thing. If there is
> anything I can do to make this as easy as possible for a committer to
> shepherd, I’m all ears!
>
>
> On Feb 14, 2024, at 8:53 PM, Nicholas Chammas 
> wrote:
>
> I’m interested in automating our config documentation and need input from
> a committer who is interested in shepherding this work.
>
> We have around 60 tables of configs across our documentation. Here’s a
> typical example.
> 
>
> These tables span several thousand lines of manually maintained HTML,
> which poses a few problems:
>
>- The documentation for a given config is sometimes out of sync across
>the HTML table and its source `ConfigEntry`.
>- Internal configs that are not supposed to be documented publicly
>sometimes are.
>- Many config names and defaults are extremely long, posing formatting
>problems.
>
>
> Contributors waste time dealing with these issues in a losing battle to
> keep everything up-to-date and consistent.
>
> I’d like to solve all these problems by generating HTML tables
> automatically from the `ConfigEntry` instances where the configs are
> defined.
>
> I’ve proposed two alternative solutions:
>
>- #44755 : Enhance
>`ConfigEntry` so a config can be associated with one or more groups, and
>use that new metadata to generate the tables we need.
>- #44756 : Add a
>standalone YAML file where we define config groups, and use that to
>generate the tables we need.
>
>
> If you’re a committer and are interested in this problem, please chime in
> on whatever approach appeals to you. If you think this is a bad idea, I’m
> also eager to hear your feedback.
>
> Nick
>
>


Re: Generating config docs automatically

2024-02-21 Thread Nicholas Chammas
I know config documentation is not the most exciting thing. If there is 
anything I can do to make this as easy as possible for a committer to shepherd, 
I’m all ears!


> On Feb 14, 2024, at 8:53 PM, Nicholas Chammas  
> wrote:
> 
> I’m interested in automating our config documentation and need input from a 
> committer who is interested in shepherding this work.
> 
> We have around 60 tables of configs across our documentation. Here’s a 
> typical example. 
> 
> 
> These tables span several thousand lines of manually maintained HTML, which 
> poses a few problems:
> The documentation for a given config is sometimes out of sync across the HTML 
> table and its source `ConfigEntry`.
> Internal configs that are not supposed to be documented publicly sometimes 
> are.
> Many config names and defaults are extremely long, posing formatting problems.
> 
> Contributors waste time dealing with these issues in a losing battle to keep 
> everything up-to-date and consistent.
> 
> I’d like to solve all these problems by generating HTML tables automatically 
> from the `ConfigEntry` instances where the configs are defined.
> 
> I’ve proposed two alternative solutions:
> #44755 : Enhance `ConfigEntry` so 
> a config can be associated with one or more groups, and use that new metadata 
> to generate the tables we need.
> #44756 : Add a standalone YAML 
> file where we define config groups, and use that to generate the tables we 
> need.
> 
> If you’re a committer and are interested in this problem, please chime in on 
> whatever approach appeals to you. If you think this is a bad idea, I’m also 
> eager to hear your feedback.
> 
> Nick
> 



[VOTE][RESULT] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
The vote passes with 6 +1s (4 binding +1s).
Thanks to all who helped with the release!

(* = binding)
+1:
Jungtaek Lim
Wenchen Fan (*)
Cheng Pan
Xiao Li (*)
Hyukjin Kwon (*)
Maxim Gekk (*)

+0: None

-1: None


Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
Thanks everyone for participating the vote! The vote passed.
I'll send out the vote result and proceed to the next steps.

On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
wrote:

> +1
>
> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> - Build successfully from source code.
>>> - Pass integration tests with Spark ClickHouse Connector[1]
>>>
>>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
>>> wrote:
>>> >
>>> > Thanks Sean, let's continue the process for this RC.
>>> >
>>> > +1 (non-binding)
>>> >
>>> > - downloaded all files from URL
>>> > - checked signature
>>> > - extracted all archives
>>> > - ran all tests from source files in source archive file, via running
>>> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
>>> >
>>> > Also bump to dev@ to encourage participation - looks like the timing
>>> is not good for US folks but let's see more days.
>>> >
>>> >
>>> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
>>> > Yeah let's get that fix in, but it seems to be a minor test only issue
>>> so should not block release.
>>> >
>>> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>>> > Very sorry. When I was fixing `SPARK-45242 (
>>> https://github.com/apache/spark/pull/43594)`
>>> , I noticed that its
>>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>>> didn't realize that it had also been merged into branch-3.5, so I didn't
>>> advocate for SPARK-45357 to be backported to branch-3.5.
>>> >  As far as I know, the condition to trigger this test failure is: when
>>> using Maven to test the `connect` module, if  `sparkTestRelation` in
>>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>>> is indeed related to the order in which Maven executes the test cases in
>>> the `connect` module.
>>> >  I have submitted a backport PR to branch-3.5, and if necessary, we
>>> can merge it to fix this test issue.
>>> >  Jie Yang
>>> >   发件人: Jungtaek Lim 
>>> > 日期: 2024年2月16日 星期五 22:15
>>> > 收件人: Sean Owen , Rui Wang 
>>> > 抄送: dev 
>>> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>> >   I traced back relevant changes and got a sense of what happened.
>>> >   Yangjie figured out the issue via link. It's a tricky issue
>>> according to the comments from Yangjie - the test is dependent on ordering
>>> of execution for test suites. He said it does not fail in sbt, hence CI
>>> build couldn't catch it.
>>> > He fixed it via link, but we missed that the offending commit was also
>>> ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
>>> >   Surprisingly, I can't reproduce locally even with maven. In my
>>> attempt to reproduce, SparkConnectProtoSuite was executed at third,
>>> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
>>> then SparkConnectProtoSuite. Maybe very specific to the environment, not
>>> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
>>> build/mvn (Maven 3.8.8).
>>> >   I'm not 100% sure this is something we should fail the release as
>>> it's a test only and sounds very environment dependent, but I'll respect
>>> your call on vote.
>>> >   Btw, looks like Rui also made a relevant fix via link (not to fix
>>> the failing test but to fix other issues), but this also wasn't ported back
>>> to 3.5. @Rui Wang Do you think this is a regression issue and warrants a
>>> new RC?
>>> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
>>> wrote:
>>> > Is anyone seeing this Spark Connect test failure? then again, I have
>>> some weird issue with this env that always fails 1 or 2 tests that nobody
>>> else can replicate.
>>> >   - Test observe *** FAILED ***
>>> >   == FAIL: Plans do not match ===
>>> >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
>>> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
>>> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
>>> 44
>>> >+- LocalRelation , [id#0, name#0]
>>>  +- LocalRelation , [id#0,
>>> name#0] (PlanTest.scala:179)
>>> >   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately
>>> figured out doc generation issue after tagging RC1.
>>> >   Please vote on releasing the following candidate as Apache Spark
>>> version 3.5.1.
>>> >
>>> > The vote is open until February 18th 9AM (PST) and passes if a
>>> majority +1 PMC votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.5.1
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more