from:"Herman van Hovell"

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Herman van Hovell

There is no really easy way of getting the state of the aggregation buffer,
unless you are willing to modify the code generation and sprinkle in some
logging.

What I would start with is dumping the generated code by calling
explain('codegen') on the DataFrame. That helped me to find similar issues
in most cases.

HTH

On Sun, Feb 11, 2024 at 11:26 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Consider this example:
>
> >>> from pyspark.sql.functions import sum>>> 
> >>> spark.range(4).repartition(2).select(sum("id")).show()+---+|sum(id)|+---+|
> >>>   6|+---+
>
> I’m trying to understand how this works because I’m investigating a bug in
> this kind of aggregate.
>
> I see that doProduceWithoutKeys
> 
>  and doConsumeWithoutKeys
> 
>  are
> called, and I believe they are responsible for computing a declarative
> aggregate like `sum`. But I’m not sure how I would debug the generated
> code, or the inputs that drive what code gets generated.
>
> Say you were running the above example and it was producing an incorrect
> result, and you knew the problem was somehow related to the sum. How would
> you troubleshoot it to identify the root cause?
>
> Ideally, I would like some way to track how the aggregation buffer mutates
> as the computation is executed, so I can see something roughly like:
>
> [0, 1, 2, 3]
> [1, 5]
> [6]
>
> Is there some way to trace a declarative aggregate like this?
>
> Nick
>
>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Herman van Hovell

+1

On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
wrote:

> +1
>
>
>
> *发件人**: *Yikun Jiang 
> *日期**: *2023年9月26日 星期二 18:06
> *收件人**: *dev 
> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
> ruife...@apache.org>
> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance
> releases
>
>
>
> +1, I believe it is a wise choice to update the EOL policy of the document
> based on the real demands of community users.
>
>
> Regards,
>
> Yikun
>
>
>
>
>
> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng  wrote:
>
> +1
>
>
>
> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
> wrote:
>
> Hi all,
>
> I would like to start the vote for updating documentation hosted for EOL
> and maintenance releases to improve the usability here, and in order for
> end users to read the proper and correct documentation.
>
>
> For discussion thread, please refer to
> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
> .
>
>
>
>
> Here is one example:
> - https://github.com/apache/spark/pull/42989
> 
>
> - https://github.com/apache/spark-website/pull/480
> 
>
>
>
> Starting with my own +1.
>
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Herman van Hovell

Tested connect, and everything looks good.

+1

On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li  wrote:

> Please vote on releasing the following candidate(RC4) as Apache Spark
> version 3.5.0.
>
> The vote is open until 11:59pm Pacific time Sep 8th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.5.0-rc4 (commit
> c2939589a29dd0d6a2d3d31a8d833877a37ee02a):
>
> https://github.com/apache/spark/tree/v3.5.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1448
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
> This release is using the release script of the tag v3.5.0-rc4.
>
>
> FAQ
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
> Thanks,
>
> Yuanjian Li
>

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-16 Thread Herman van Hovell

Hi Yuanjian,

For the ongoing encoder work for the connect scala client I'd like to get
the following tickets in:

   - SPARK-44396  :
   Direct Arrow Deserialization
   - SPARK-9  :
   Upcasting for Arrow Deserialization
   - SPARK-44450  : Make
   direct Arrow encoding work with SQL/API.

Cheers,
Herman

On Sat, Jul 15, 2023 at 7:53 AM Enrico Minack 
wrote:

> Speaking of JdbcDialect, is there any interest in getting upserts for JDBC
> into 3.5.0?
>
> [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC:
> https://github.com/apache/spark/pull/41518
> [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using
> MERGE INTO with temp table: https://github.com/apache/spark/pull/41611
>
> Enrico
>
>
> Am 15.07.23 um 04:10 schrieb Jia Fan:
>
> Can we put [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to
> JdbcDialect into 3.5.0?
> https://github.com/apache/spark/pull/41855
> Since this is the last major version update of 3.x, I think we need to
> make sure JdbcDialect can support more databases.
>
>
> Gengliang Wang  于2023年7月15日周六 05:20写道：
>
>> Hi Yuanjian,
>>
>> Besides the abovementioned changes, it would be great to include the UI
>> page for Spakr Connect: SPARK-44394
>> .
>>
>> Best Regards,
>> Gengliang
>>
>> On Fri, Jul 14, 2023 at 11:44 AM Julek Sompolski
>>   wrote:
>>
>>> Thank you,
>>> My changes that you listed are tracked under this Epic:
>>> https://issues.apache.org/jira/browse/SPARK-43754
>>> I am also working on https://issues.apache.org/jira/browse/SPARK-44422,
>>> didn't mention it before because I have hopes that this one will make it
>>> before the cut.
>>>
>>> (Unrelated) My colleague is also working on
>>> https://issues.apache.org/jira/browse/SPARK-43923 and I am reviewing
>>> https://github.com/apache/spark/pull/41443, so I hope that that one
>>> will also make it before the cut.
>>>
>>> Best regards,
>>> Juliusz Sompolski
>>>
>>> On Fri, Jul 14, 2023 at 7:34 PM Yuanjian Li 
>>> wrote:
>>>
 Hi everyone,
 As discussed earlier in "Time for Spark v3.5.0 release", I will cut
 branch-3.5 on *Monday, July 17th at 1 pm PST* as scheduled.

 Please plan your PR merge accordingly with the given timeline.
 Currently, we have received the following exception merge requests:

- SPARK-44421: Reattach to existing execute in Spark Connect
(server mechanism)
- SPARK-44423:  Reattach to existing execute in Spark Connect
(scala client)
- SPARK-44424:  Reattach to existing execute in Spark Connect
(python client)

 If there are any other exception feature requests, please reply to this
 email. We will not merge any new features in 3.5 after the branch cut.

 Best,
 Yuanjian

>>>
>

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Herman van Hovell

Dongjoon, I am not sure if I am not sure if I follow the line of thought
here.

Multiple people have asked for clarification on what Spark 4.0 would mean
(Holden, Mridul, Jia & Xiao). You can - for the record - also add me to
this list. However you choose to single out Xiao because asks this question
and wants to do a preview release as well? So again, what does Spark 4
mean, and why does it need to take almost a year? Historically major Spark
releases tend to break APIs, but if it only entails changing to Scala 2.13
and dropping support for JDK 8, then we could also just release a month
after 3.5.

How about we do this? We get 3.5 released, and afterwards we do a couple of
meetings where we build this roadmap. Using that, we can - hopefully - have
a grounded discussion.

Cheers,
Herman

On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun  wrote:

> Thank you. I reviewed the threads, vote and result once more.
>
> I found that I missed the binding vote mark on Holden in the vote result
> email. The following should be "-0: Holden Karau *". Sorry for this
> mistake, Holden and all.
>
> > -0: Holden Karau
>
> To Hyukjin, I disagree with you at the following point because the thread
> started clearly with your and Sean's Apache Spark 4.0 requirement in order
> to move away from Scala 2.12. In addition, we also discussed another item
> (dropping Java 8) from other current dev thread. The vote scope and goal is
> clear and specific.
>
> > we're unclear on the picture of Spark 4.0.0.
>
> Instead of vote scope and result, what is really unclear is that what you
> propose here. If Xiao wants a preview, Xiao can propose the preview plan
> more. It's welcome. If you want to has many 4.0 dev ideas which are not
> exposed to the community yet. Please share them with the community. It's
> welcome, too. Apache Spark is open source community. If you don't share it,
> there is no way for us to know what you want.
>
> Dongjoon
>
> On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
> > The major concerns raised in the thread were that we should initiate the
> > discussion for the below first:
> > - Apache Spark 4.0.0 Preview (and Dates)
> > - Apache Spark 4.0.0 Items
> > - Apache Spark 4.0.0 Plan Adjustment
> >
> > before setting the timeline for Spark 4.0.0 because we're unclear on the
> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the
> > opposite order procedurally.
> > The vote passed as a procedural issue, but I would prefer to consider
> this
> > as a tentative date, and should probably need another vote to adjust the
> > date considering the plans, preview dates, and items we aim for 4.0.0.
> >
> >
> > On Sat, 17 Jun 2023 at 04:33, Dongjoon Hyun  wrote:
> >
> > > This was a part of the following on-going discussions.
> > >
> > > 2023-05-28  Apache Spark 3.5.0 Expectations (?)
> > > https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
> > >
> > > 2023-05-30 Apache Spark 4.0 Timeframe?
> > > https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
> > >
> > > 2023-06-05 ASF policy violation and Scala version issues
> > > https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb
> > >
> > > 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
> > > https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb
> > >
> > > I'm looking forward to seeing the upcoming detailed discussions
> including
> > > the following
> > > - Apache Spark 4.0.0 Preview (and Dates)
> > > - Apache Spark 4.0.0 Items
> > > - Apache Spark 4.0.0 Plan Adjustment
> > >
> > > Please initiate the discussion.
> > >
> > > Thanks,
> > > Dongjoon.
> > >
> > >
> > > On 2023/06/16 19:30:42 Dongjoon Hyun wrote:
> > > > The vote passes with 6 +1s (4 binding +1s), one -0, and one -1.
> > > > Thank you all for your participation and
> > > > especially your additional comments during this voting,
> > > > Mridul, Hyukjin, and Jungtaek.
> > > >
> > > > (* = binding)
> > > > +1:
> > > > - Dongjoon Hyun *
> > > > - Huaxin Gao *
> > > > - Liang-Chi Hsieh *
> > > > - Kazuyuki Tanimura
> > > > - Chao Sun *
> > > > - Jia Fan
> > > >
> > > > -0: Holden Karau
> > > >
> > > > -1: Xiao Li *
> > > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-03-30 Thread Herman van Hovell

+1

On Thu, Mar 30, 2023 at 11:05 PM Sean Owen  wrote:

> +1 same result from me as last time.
>
> On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng 
> wrote:
>
>> Please vote on releasing the following candidate(RC5) as Apache Spark
>> version 3.4.0.
>>
>> The vote is open until 11:59pm Pacific time *April 4th* and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.4.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v3.4.0-rc5* (commit
>> f39ad617d32a671e120464e4a75986241d72c487):
>> https://github.com/apache/spark/tree/v3.4.0-rc5
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc5-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1439
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc5-docs/
>>
>> The list of bug fixes going into 3.4.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>
>> This release is using the release script of the tag v3.4.0-rc5.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.4.0?
>> ===
>> The current list of open tickets targeted at 3.4.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.4.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Thanks,
>> Xinrong Meng
>>
>>

Re: Ammonite as REPL for Spark Connect

2023-03-23 Thread Herman van Hovell

The goal of adding this, is to make it easy for a user to connect a scala
REPL to a Spark Connect server. Just like Spark shell makes it easy to work
with a regular Spark environment.

It is not meant as a Spark shell replacement. They represent two different
modes of working with Spark, and they have very different API surfaces
(Connect being a subset of what regular Spark has to offer). I do think we
should consider using ammonite for Spark shell at some point, since this
has better UX and does not require us to fork a REPL. That discussion is
for another day though.

I guess you can use it as an example of building an integration. In itself
I wouldn't call it that because I think this a key part of getting started
with connect, and/or doing debugging.

On Thu, Mar 23, 2023 at 4:00 AM Mridul Muralidharan 
wrote:

>
> What is unclear to me is why we are introducing this integration, how
> users will leverage it.
>
> * Are we replacing spark-shell with it ?
> Given the existing gaps, this is not the case.
>
> * Is it an example to showcase how to build an integration ?
> That could be interesting, and we can add it to external/
>
> Anything else I am missing ?
>
> Regards,
> Mridul
>
>
>
> On Wed, Mar 22, 2023 at 6:58 PM Herman van Hovell 
> wrote:
>
>> Ammonite is maintained externally by Li Haoyi et al. We are including it
>> as a 'provided' dependency. The integration bits and pieces (1 file) are
>> included in Apache Spark.
>>
>> On Wed, Mar 22, 2023 at 7:53 PM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Will this be maintained externally or included into Apache Spark ?
>>>
>>> Regards ,
>>> Mridul
>>>
>>>
>>>
>>> On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell
>>>  wrote:
>>>
>>>> Hi All,
>>>>
>>>> For Spark Connect Scala Client we are working on making the REPL
>>>> experience a bit nicer <https://github.com/apache/spark/pull/40515>.
>>>> In a nutshell we want to give users a turn key scala REPL, that works even
>>>> if you don't have a Spark distribution on your machine (through
>>>> coursier <https://get-coursier.io/>). We are using Ammonite
>>>> <https://ammonite.io/> instead of the standard scala REPL for this,
>>>> the main reason for going with Ammonite is that it is easier to customize,
>>>> and IMO has a superior user experience.
>>>>
>>>> Does anyone object to doing this?
>>>>
>>>> Kind regards,
>>>> Herman
>>>>
>>>>

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell

Ammonite is maintained externally by Li Haoyi et al. We are including it as
a 'provided' dependency. The integration bits and pieces (1 file) are
included in Apache Spark.

On Wed, Mar 22, 2023 at 7:53 PM Mridul Muralidharan 
wrote:

>
> Will this be maintained externally or included into Apache Spark ?
>
> Regards ,
> Mridul
>
>
>
> On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell
>  wrote:
>
>> Hi All,
>>
>> For Spark Connect Scala Client we are working on making the REPL
>> experience a bit nicer <https://github.com/apache/spark/pull/40515>. In
>> a nutshell we want to give users a turn key scala REPL, that works even if
>> you don't have a Spark distribution on your machine (through coursier
>> <https://get-coursier.io/>). We are using Ammonite <https://ammonite.io/>
>> instead of the standard scala REPL for this, the main reason for going with
>> Ammonite is that it is easier to customize, and IMO has a superior user
>> experience.
>>
>> Does anyone object to doing this?
>>
>> Kind regards,
>> Herman
>>
>>

Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell

Hi All,

For Spark Connect Scala Client we are working on making the REPL experience
a bit nicer . In a nutshell we
want to give users a turn key scala REPL, that works even if you don't have
a Spark distribution on your machine (through coursier
). We are using Ammonite 
instead of the standard scala REPL for this, the main reason for going with
Ammonite is that it is easier to customize, and IMO has a superior user
experience.

Does anyone object to doing this?

Kind regards,
Herman

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Herman van Hovell

Hi Jia,

How are you building connect?

Kind regards,
Herman

On Wed, Mar 8, 2023 at 8:48 AM Jia Fan  wrote:

> Thanks for reply,
> I had done clean build with maven few times. But always report
>
> /Users/xxx/Code/spark/core/target/generated-sources/org/apache/spark/status/protobuf/StoreTypes.java:658:9
> java: symbol not found
>Symbol: class UnusedPrivateParameter
>Location: class org.apache.spark.status.protobuf.StoreTypes.JobData
>I think maybe it's protobuf version conflict?
> https://user-images.githubusercontent.com/32387433/223716946-85761a34-f86c-4ba1-9557-a59d0d5b9958.png
> ">
>
>
> Hyukjin Kwon  于2023年3月8日周三 19:09写道：
>
>> Just doing a clean build with Maven, and running a test case like
>> `SparkConnectServiceSuite` in IntelliJ should work.
>>
>> On Wed, 8 Mar 2023 at 15:02, Jia Fan  wrote:
>>
>>> Hi developers,
>>>I want to contribute some code for Spark Connect. Any doc for
>>> starters? I want to debug SimpleSparkConnectService but I can't start it
>>> with IDEA. I would appreciate any help.
>>>
>>> Thanks
>>>
>>> 
>>>
>>>
>>> Jia Fan
>>>
>>

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Herman van Hovell

Hi All,

Thanks for testing the 3.4.0 RC! I apologize for the maven testing failures
for the Spark Connect Scala Client. We will try to get those sorted as soon
as possible.

This is an artifact of having multiple build systems, and only running CI
for one (SBT). That, however, is a debate for another day :)...

Cheers,
Herman

On Wed, Feb 22, 2023 at 5:32 PM Bjørn Jørgensen 
wrote:

> ./build/mvn clean package
>
> I'm using ubuntu rolling, python 3.11 openjdk 17
>
> CompatibilitySuite:
> - compatibility MiMa tests *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar
> inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
> - compatibility API tests: Dataset *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar
> inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   ...
> SparkConnectClientSuite:
> - Placeholder test: Create SparkConnectClient
> - Test connection
> - Test connection string
> - Check URI: sc://host, isCorrect: true
> - Check URI: sc://localhost/, isCorrect: true
> - Check URI: sc://localhost:1234/, isCorrect: true
> - Check URI: sc://localhost/;, isCorrect: true
> - Check URI: sc://host:123, isCorrect: true
> - Check URI: sc://host:123/;user_id=a94, isCorrect: true
> - Check URI: scc://host:12, isCorrect: false
> - Check URI: http://host, isCorrect: false
> - Check URI: sc:/host:1234/path, isCorrect: false
> - Check URI: sc://host/path, isCorrect: false
> - Check URI: sc://host/;parm1;param2, isCorrect: false
> - Check URI: sc://host:123;user_id=a94, isCorrect: false
> - Check URI: sc:///user_id=123, isCorrect: false
> - Check URI: sc://host:-4, isCorrect: false
> - Check URI: sc://:123/, isCorrect: false
> - Non user-id parameters throw unsupported errors
> DatasetSuite:
> - limit
> - select
> - filter
> - write
> UserDefinedFunctionSuite:
> - udf and encoder serialization
> Run completed in 21 seconds, 944 milliseconds.
> Total number of tests run: 389
> Suites: completed 10, aborted 0
> Tests: succeeded 386, failed 3, canceled 0, ignored 0, pending 0
> *** 3 TESTS FAILED ***
> [INFO]
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.4.0:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [
> 47.096 s]
> [INFO] Spark Project Tags . SUCCESS [
> 14.759 s]
> [INFO] Spark Project Sketch ... SUCCESS [
> 21.628 s]
> [INFO] Spark Project Local DB . SUCCESS [
> 20.311 s]
> [INFO] Spark Project Networking ... SUCCESS [01:07
> min]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
> 15.921 s]
> [INFO] Spark Project Unsafe ... SUCCESS [
> 16.020 s]
> [INFO] Spark Project Launcher . SUCCESS [
> 10.873 s]
> [INFO] Spark Project Core . SUCCESS [37:10
> min]
> [INFO] Spark Project ML Local Library . SUCCESS [
> 40.841 s]
> [INFO] Spark Project GraphX ... SUCCESS [02:39
> min]
> [INFO] Spark Project Streaming  SUCCESS [05:53
> min]
> [INFO] Spark Project Catalyst . SUCCESS [11:22
> min]
>

Re: Depolying stage-level scheduling for Spark SQL

2022-09-29 Thread Herman van Hovell

I think issue 2 is caused by adaptive query execution. This will break
apart queries into multiple jobs, each subsequent job will generate a RDD
that is based on previous ones.

As for 1. I am not sure how much you want to expose to an end user here.
SQL is declarative, and it does not specify how a query should be executed.
I can imagine that you might use different resources for different types of
stages, e.g. a scan stage and more compute heavy stages. This, IMO, should
be based on analysis and costing the plan. For this RDD only stage level
scheduling should be sufficient.

On Thu, Sep 29, 2022 at 8:56 AM Chenghao Lyu  wrote:

> Hi,
>
> I plan to deploy the stage-level scheduling for Spark SQL to apply some
> fine-grained optimizations over the DAG of stages. However, I am blocked by
> the following issues:
>
>1. The current stage-level scheduling
>
> 
>  supports
>RDD APIs only. So is there a way to reuse the stage-level scheduling for
>Spark SQL? E.g., how to expose the RDD code (the transformations and
>actions) from a Spark SQL (with SQL syntax)?
>2. We do not quite understand why a Spark SQL could trigger multiple
>jobs, and have some RDDs regenerated, as posted in *here*
>
> 
>. Can anyone give us some insight on the reasons and whether we can
>avoid the RDD regeneration to save execution time?
>
> Thanks in advance.
>
> Cheers,
> Chenghao
>

Re: Why are hash functions seeded with 42?

2022-09-26 Thread Herman van Hovell

Sorry about that, it made me laugh 6 years ago, I didn't expect this to
come back and haunt me :)...

There are ways out of this, none of them are particularly appealing:
- Add a SQL conf to make the value configurable.
- Add a seed parameter to the function. I am not sure if we can make this
work well with star expansion (e.g. xxhash64(*) is allowed).
- Add a new function that allows you to set the seed: e.g.
xxhash64_with_seed(,
, ..., ).

On Mon, Sep 26, 2022 at 8:27 PM Sean Owen  wrote:

> Oh yeah I get why we love to pick 42 for random things. I'm guessing it
> was a bit of an oversight here as the 'seed' is directly initial state and
> 0 makes much more sense.
>
> On Mon, Sep 26, 2022, 7:24 PM Nicholas Gustafson 
> wrote:
>
>> I don’t know the reason, however would offer a hunch that perhaps it’s a
>> nod to Douglas Adams (author of The Hitchhiker’s Guide to the Galaxy).
>>
>>
>> https://news.mit.edu/2019/answer-life-universe-and-everything-sum-three-cubes-mathematics-0910
>>
>> On Sep 26, 2022, at 16:59, Sean Owen  wrote:
>>
>> 
>> OK, it came to my attention today that hash functions in spark, like
>> xxhash64, actually always seed with 42:
>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L655
>>
>> This is an issue if you want the hash of some value in Spark to match the
>> hash you compute with xxhash64 somewhere else, and, AFAICT most any other
>> impl will start with seed=0.
>>
>> I'm guessing there wasn't a *great* reason for this, just seemed like 42
>> was a nice default seed. And we can't change it now without maybe subtly
>> changing program behaviors. And, I am guessing it's messy to let the
>> function now take a seed argument, esp. in SQL.
>>
>> So I'm left with, I guess we should doc that? I can do it if so.
>> And just a cautionary tale I guess, for hash function users.
>>
>>

[VOTE][RESULT] SPIP: Spark Connect

2022-06-16 Thread Herman van Hovell

The vote passes with 17 +1s (10 binding +1s).
+1:
Herman van Hovell*
Matei Zaharia*
Yuming Wang
Hyukjin Kwon*
Chao Sun
L.C. Hsieh*
Huaxin Gao
Ruifeng Zheng
Wenchen Fan*
Believer
Xiao Li*
Reynold Xin*
Dongjoon Hyun*
Gangliang Wang
Yikun Jiang
Tom Graves *
Holden Karau *

0: None
(Tom has voiced some architectural concerns)

-1: None

(* = binding)

The next step is that we are going to create a high level design doc, which
will give clarity on the design and should (hopefully) take away any
remaining concerns.

Thank you all for chiming in and your votes!

Cheers,
Herman

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell

Let me kick off the voting...

+1

On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell 
wrote:

> Hi all,
>
> I’d like to start a vote for SPIP: "Spark Connect"
>
> The goal of the SPIP is to introduce a Dataframe based client/server API
> for Spark
>
> Please also refer to:
>
> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect
> - A client and server interface for Apache Spark.
> <https://lists.apache.org/thread/3fd2n34hlyg872nr55rylbv5cg8m1556>
> - Design doc: Spark Connect - A client and server interface for Apache
> Spark.
> <https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit?usp=sharing>
> - JIRA: SPARK-39375 <https://issues.apache.org/jira/browse/SPARK-39375>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Kind Regards,
> Herman
>

[VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell

Hi all,

I’d like to start a vote for SPIP: "Spark Connect"

The goal of the SPIP is to introduce a Dataframe based client/server API
for Spark

Please also refer to:

- Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect -
A client and server interface for Apache Spark.

- Design doc: Spark Connect - A client and server interface for Apache
Spark.

- JIRA: SPARK-39375 

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Kind Regards,
Herman

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Herman van Hovell

+1

On Mon, Jun 13, 2022 at 12:53 PM Wenchen Fan  wrote:

> +1, tests are all green and there are no more blocker issues AFAIK.
>
> On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk
>  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time June 14th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc6 (commit
>> f74867bddfbcdd4d08076db36851e88b15e66556):
>> https://github.com/apache/spark/tree/v3.3.0-rc6
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1407
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc6.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-13 Thread Herman van Hovell

+1

On Tue, Apr 13, 2021 at 2:40 AM sarutak  wrote:

> +1 (non-binding)
>
> > +1
> >
> > On Tue, 13 Apr 2021, 02:58 Sean Owen,  wrote:
> >
> >> +1 same result as last RC for me.
> >>
> >> On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh 
> >> wrote:
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version
> >>> 2.4.8.
> >>>
> >>> The vote is open until Apr 15th at 9AM PST and passes if a
> >>> majority +1 PMC
> >>> votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.4.8
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >>> http://spark.apache.org/
> >>>
> >>> There are currently no issues targeting 2.4.8 (try project = SPARK
> >>> AND
> >>> "Target Version/s" = "2.4.8" AND status in (Open, Reopened, "In
> >>> Progress"))
> >>>
> >>> The tag to be voted on is v2.4.8-rc2 (commit
> >>> a0ab27ca6b46b8e5a7ae8bb91e30546082fc551c):
> >>> https://github.com/apache/spark/tree/v2.4.8-rc2
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >>> found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.4.8-rc2-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1373/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.4.8-rc2-docs/
> >>>
> >>> The list of bug fixes going into 2.4.8 can be found at the
> >>> following URL:
> >>> https://s.apache.org/spark-v2.4.8-rc2
> >>>
> >>> This release is using the release script of the tag v2.4.8-rc2.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >>> taking
> >>> an existing Spark workload and running on this release candidate,
> >>> then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >>> install
> >>> the current RC and see if anything important breaks, in the
> >>> Java/Scala
> >>> you can add the staging repository to your projects resolvers and
> >>> test
> >>> with the RC (make sure to clean up the artifact cache before/after
> >>> so
> >>> you don't end up building with an out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 2.4.8?
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 2.4.8 can be found
> >>> at:
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >>> "Target
> >>> Version/s" = 2.4.8
> >>>
> >>> Committers should look at those and triage. Extremely important
> >>> bug
> >>> fixes, documentation, and API tweaks that impact compatibility
> >>> should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the
> >>> previous
> >>> release. That being said, if there is something which is a
> >>> regression
> >>> that has not been correctly targeted please ping me or a committer
> >>> to
> >>> help target the issue.
> >>>
> >>> --
> >>> Sent from:
> >>> http://apache-spark-developers-list.1001551.n3.nabble.com/
> >>>
> >>>
> >>
> > -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Herman van Hovell

+1

On Mon, Feb 22, 2021 at 12:59 PM Jungtaek Lim 
wrote:

> +1 (non-binding)
>
> Verified signatures. Only a few commits added after RC2 which don't seem
> to change the SS behavior, so I'd carry over my +1 from RC2.
>
> On Mon, Feb 22, 2021 at 3:57 PM Hyukjin Kwon  wrote:
>
>> Starting with my +1 (binding).
>>
>> 2021년 2월 22일 (월) 오후 3:56, Hyukjin Kwon 님이 작성:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.1.
>>>
>>> The vote is open until February 24th 11PM PST and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc3 (commit
>>> 1d550c4e90275ab418b9161925049239227f3dc9):
>>> https://github.com/apache/spark/tree/v3.1.1-rc3
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1367
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc3.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>> it was discussed and decided to skip 3.1.0.
>>> Please see
>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>>> more details.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC via "pip install
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/pyspark-3.1.1.tar.gz
>>> "
>>> and see if anything important breaks.
>>> In the Java/Scala, you can add the staging repository to your projects
>>> resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-16 Thread Herman van Hovell

+1

On Tue, Feb 16, 2021 at 11:08 AM Hyukjin Kwon  wrote:

> +1
>
> 2021년 2월 16일 (화) 오후 5:10, Prashant Sharma 님이 작성:
>
>> +1
>>
>> On Tue, Feb 16, 2021 at 1:22 PM Dongjoon Hyun 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.0.2.
>>>
>>> The vote is open until February 19th 9AM (PST) and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.0.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v3.0.2-rc1 (commit
>>> 648457905c4ea7d00e3d88048c63f360045f0714):
>>> https://github.com/apache/spark/tree/v3.0.2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.0.2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1366/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.0.2-rc1-docs/
>>>
>>> The list of bug fixes going into 3.0.2 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12348739
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.0.2?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.0.2 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.0.2
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>

Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-09 Thread Herman van Hovell

+1

On Mon, Nov 9, 2020 at 2:06 AM Takeshi Yamamuro 
wrote:

> +1
>
> On Thu, Nov 5, 2020 at 3:41 AM Xinyi Yu  wrote:
>
>> Hi all,
>>
>> We had the discussion of SPIP: Standardize Spark Exception Messages at
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html
>> <
>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html>
>>
>> . The SPIP document link is at
>>
>> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing
>> <
>> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing>
>>
>> . We want to have the vote on this, for 72 hours.
>>
>> Please vote before November 7th at noon:
>>
>> [ ] +1: Accept this SPIP proposal
>> [ ] -1: Do not agree to standardize Spark exception messages, because ...
>>
>>
>> Thanks for your time and feedback!
>>
>> --
>> Xinyi
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Herman van Hovell

Congratulations!

On Wed, Jul 15, 2020 at 9:00 AM angers.zhu  wrote:

> Congratulations !
>
> angers.zhu
> angers@gmail.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 07/15/2020 14:53，Wenchen Fan 
> wrote：
>
> Congrats and welcome!
>
> On Wed, Jul 15, 2020 at 2:18 PM Mridul Muralidharan 
> wrote:
>
>>
>> Congratulations !
>>
>> Regards,
>> Mridul
>>
>> On Tue, Jul 14, 2020 at 12:37 PM Matei Zaharia 
>> wrote:
>>
>>> Hi all,
>>>
>>> The Spark PMC recently voted to add several new committers. Please join
>>> me in welcoming them to their new roles! The new committers are:
>>>
>>> - Huaxin Gao
>>> - Jungtaek Lim
>>> - Dilip Biswal
>>>
>>> All three of them contributed to Spark 3.0 and we’re excited to have
>>> them join the project.
>>>
>>> Matei and the Spark PMC
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: How do you debug a code-generated aggregate?

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

Re: [Reminder] Spark 3.5 Branch Cut

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

Re: Ammonite as REPL for Spark Connect

Re: Ammonite as REPL for Spark Connect

Ammonite as REPL for Spark Connect

Re: [Question] Can't start Spark Connect

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

Re: Depolying stage-level scheduling for Spark SQL

Re: Why are hash functions seeded with 42?

[VOTE][RESULT] SPIP: Spark Connect

Re: [VOTE][SPIP] Spark Connect

[VOTE][SPIP] Spark Connect

Re: [VOTE] Release Spark 3.3.0 (RC6)

Re: [VOTE] Release Spark 2.4.8 (RC2)

Re: [VOTE] Release Spark 3.1.1 (RC3)

Re: [VOTE] Release Spark 3.0.2 (RC1)

Re: [VOTE] Standardize Spark Exception Messages SPIP

Re: Welcoming some new Apache Spark committers

22 matches

Site Navigation

Mail list logo

Footer information