Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Thomas Graves
> > - Advisory user input (e.g. a way to say after X is done I know I need Y 
> > where Y might be a bunch of GPU machines)

You are thinking of something more advanced than the Stage Level
Scheduling?  Or perhaps configured differently or prestarting things
you know you will need?

Tom

On Mon, Aug 7, 2023 at 3:27 PM Holden Karau  wrote:
>
> So I wondering if there is interesting in revisiting some of how Spark is 
> doing it's dynamica allocation for Spark 4+?
>
> Some things that I've been thinking about:
>
> - Advisory user input (e.g. a way to say after X is done I know I need Y 
> where Y might be a bunch of GPU machines)
> - Configurable tolerance (e.g. if we have at most Z% over target no-op)
> - Past runs of same job (e.g. stage X of job Y had a peak of K)
> - Faster executor launches (I'm a little fuzzy on what we can do here but, 
> one area for example is we setup and tear down an RPC connection to the 
> driver with a blocking call which does seem to have some locking inside of 
> the driver at first glance)
>
> Is this an area other folks are thinking about? Should I make an epic we can 
> track ideas in? Or are folks generally happy with today's dynamic allocation 
> (or just busy with other things)?
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Thomas graves
+1

Tom

On Mon, Jun 19, 2023 at 9:41 PM Dongjoon Hyun  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.4.1.
>
> The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.4.1-rc1 (commit
> 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
> https://github.com/apache/spark/tree/v3.4.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1443/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/
>
> The list of bug fixes going into 3.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12352874
>
> This release is using the release script of the tag v3.4.1-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.1?
> ===
>
> The current list of open tickets targeted at 3.4.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Thomas graves
+1. Ran internal test suite.

Tom

On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.1.
>
> The vote is open until 11:59pm Pacific time October 21th and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org
>
> The tag to be voted on is v3.3.1-rc4 (commit 
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> https://github.com/apache/spark/tree/v3.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1430
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
>
> The list of bug fixes going into 3.3.1 can be found at the following URL:
> https://s.apache.org/ttgz6
>
> This release is using the release script of the tag v3.3.1-rc4.
>
>
> FAQ
>
> ==
> What happened to v3.3.1-rc3?
> ==
> A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, 
> which the Iceberg community hopes Spark 3.3.1 could fix.
> So we skipped the vote on v3.3.1-rc3.
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.1?
> ===
> The current list of open tickets targeted at 3.3.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.3.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Thomas Graves
+1. ran out internal tests and everything looks good.

Tom Graves

On Wed, Sep 28, 2022 at 12:20 AM Yuming Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.1.
>
> The vote is open until 11:59pm Pacific time October 3th and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org
>
> The tag to be voted on is v3.3.1-rc2 (commit 
> 1d3b8f7cb15283a1e37ecada6d751e17f30647ce):
> https://github.com/apache/spark/tree/v3.3.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-bin
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1421
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-docs
>
> The list of bug fixes going into 3.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351710
>
> This release is using the release script of the tag v3.3.1-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.1?
> ===
> The current list of open tickets targeted at 3.3.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.3.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE][SPIP] Spark Connect

2022-06-16 Thread Thomas Graves
+1 for the concept.
Correct me if I'm wrong, but at a high level this is proposing adding
a new user API (which is language agnostic) and the proposal is to
start with something like the Logical Plan, with the addition of being
able to remotely call this API.

+0 on architecture/design as it's not clear from the doc how much
impact this truly has. But that is a problem with SPIPs which I have
voiced my concern about in the past.
I can see how this could be flushed out and keep the overall impact
minimal (vs blow up the world architecture change) since it's not just
a drop in replacement for all existing APIs.  For instance,
conceptually this is just a version of the Spark thriftserver which
uses grpc and passes the new API and internally we add a new API
runPlan(LogicalPlan).  You could potentially also not use the internal
version of the catalyst Logical Plan API but have some conversion
still to allow changes to catalyst internals, not sure if that is
needed but a possibility.
With any API addition it will have to be kept stable and require more
testing and likely more dev work, so weighing that vs usefulness is
the question for me.

Tom

On Mon, Jun 13, 2022 at 1:04 PM Herman van Hovell
 wrote:
>
> Hi all,
>
> I’d like to start a vote for SPIP: "Spark Connect"
>
> The goal of the SPIP is to introduce a Dataframe based client/server API for 
> Spark
>
> Please also refer to:
>
> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect - A 
> client and server interface for Apache Spark.
> - Design doc: Spark Connect - A client and server interface for Apache Spark.
> - JIRA: SPARK-39375
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Kind Regards,
> Herman

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-07 Thread Thomas Graves
+1

Tom Graves

On Sat, Jun 4, 2022 at 9:50 AM Maxim Gekk
 wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.0.
>
> The vote is open until 11:59pm Pacific time June 8th and passes if a majority 
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc5 (commit 
> 7cf29705272ab8e8c70e8885a3664ad8ae3cd5e9):
> https://github.com/apache/spark/tree/v3.3.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1406
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc5.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-27 Thread Thomas graves
+1. Ran through internal tests.

Tom Graves

On Tue, May 24, 2022 at 12:14 PM Maxim Gekk
 wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.0.
>
> The vote is open until 11:59pm Pacific time May 27th and passes if a majority 
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc3 (commit 
> a7259279d07b302a51456adb13dc1e41a6fd06ed):
> https://github.com/apache/spark/tree/v3.3.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1404
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc3-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc3.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Thomas graves
Is there going to be an rc2? I thought there were a couple of issue
mentioned in the thread.

On Tue, May 10, 2022 at 11:53 AM Maxim Gekk
 wrote:
>
> Hi All,
>
> Today is the last day for voting. Please, test the RC1 and vote.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
>
> On Sat, May 7, 2022 at 10:58 AM beliefer  wrote:
>>
>>
>>  @Maxim Gekk  Glad to hear that!
>>
>> But there is a bug https://github.com/apache/spark/pull/36457
>> I think we should merge it into 3.3.0
>>
>>
>> At 2022-05-05 19:00:27, "Maxim Gekk"  
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time May 10th and passes if a 
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc1 (commit 
>> 482b7d54b522c4d1e25f3e84eabbc78126f22a3d):
>> https://github.com/apache/spark/tree/v3.3.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1402
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 3.1.3 RC4

2022-02-16 Thread Thomas graves
+1

Tom

On Mon, Feb 14, 2022 at 2:55 PM Holden Karau  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.1.3.
>
> The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes if a 
> majority
> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no open issues targeting 3.1.3 in Spark's JIRA 
> https://issues.apache.org/jira/browse
> (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in (Open, 
> Reopened, "In Progress"))
> at https://s.apache.org/n79dw
>
>
>
> The tag to be voted on is v3.1.3-rc4 (commit
> d1f8a503a26bcfb4e466d9accc5fa241a7933667):
> https://github.com/apache/spark/tree/v3.1.3-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at
> https://repository.apache.org/content/repositories/orgapachespark-1401
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-docs/
>
> The list of bug fixes going into 3.1.3 can be found at the following URL:
> https://s.apache.org/x0q9b
>
> This release is using the release script from 3.1.3
> The release docker container was rebuilt since the previous version didn't 
> have the necessary components to build the R documentation.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.3?
> ===
>
> The current list of open tickets targeted at 3.1.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something that is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Note: I added an extra day to the vote since I know some folks are likely 
> busy on the 14th with partner(s).
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Thomas Graves
It was discussed doing all the maintenance lines back at beginning of
December (Dec 6) when we were talking about release 3.2.1.

Tom

On Wed, Feb 2, 2022 at 2:07 AM Mridul Muralidharan  wrote:
>
> Hi Holden,
>
>   Not that I am against releasing 3.1.3 (given the fixes that have already 
> gone in), but did we discuss releasing it ? I might have missed the thread ...
>
> Regards,
> Mridul
>
> On Tue, Feb 1, 2022 at 7:12 PM Holden Karau  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.1.3.
>>
>> The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and passes if 
>> a majority
>> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.3
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> There are currently no open issues targeting 3.1.3 in Spark's JIRA 
>> https://issues.apache.org/jira/browse
>> (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in (Open, 
>> Reopened, "In Progress"))
>> at https://s.apache.org/n79dw
>>
>>
>>
>> The tag to be voted on is v3.2.1-rc1 (commit
>> b8c0799a8cef22c56132d94033759c9f82b0cc86):
>> https://github.com/apache/spark/tree/v3.1.3-rc3
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at
>> :https://repository.apache.org/content/repositories/orgapachespark-1400/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-docs/
>>
>> The list of bug fixes going into 3.1.3 can be found at the following URL:
>> https://s.apache.org/x0q9b
>>
>> This release is using the release script in master as of 
>> ddc77fb906cb3ce1567d277c2d0850104c89ac25
>> The release docker container was rebuilt since the previous version didn't 
>> have the necessary components to build the R documentation.
>>
>> FAQ
>>
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.3?
>> ===
>>
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.3
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something that is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> ==
>> What happened to RC1 & RC2?
>> ==
>>
>> When I first went to build RC1 the build process failed due to the
>> lack of the R markdown package in my local rm container. By the time
>> I had time to debug and rebuild there was already another bug fix commit in
>> branch-3.1 so I decided to skip ahead to RC2 and pick it up directly.
>> When I went to go send the RC2 vote e-mail I noticed a correctness issue had
>> been fixed in branch-3.1 so I rolled RC3 to contain the correctness fix.
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-11 Thread Thomas Graves
+1 (binding).

One minor note since I haven't had time to look at the implementation
details is please make sure resource aware scheduling and the stage
level scheduling still work or any caveats are documented. Feel free
to ping me if questions in these areas.

Tom

On Wed, Jan 5, 2022 at 7:07 PM Yikun Jiang  wrote:
>
> Hi all,
>
> I’d like to start a vote for SPIP: "Support Customized Kubernetes Schedulers 
> Proposal"
>
> The SPIP is to support customized Kubernetes schedulers in Spark on 
> Kubernetes.
>
> Please also refer to:
>
> - Previous discussion in dev mailing list: [DISCUSSION] SPIP: Support 
> Volcano/Alternative Schedulers Proposal
> - Design doc: [SPIP] Spark-36057 Support Customized Kubernetes Schedulers 
> Proposal
> - JIRA: SPARK-36057
>
> Please vote on the SPIP:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Regards,
> Yikun

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Thomas Graves
+1, ran our internal tests and everything looks good.

Tom

On Mon, Jan 10, 2022 at 12:10 PM huaxin gao  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.1.
>
> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if a 
> majority
> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 3.2.1 (try project = SPARK AND
> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v3.2.1-rc1 (commit
> 2b0ee226f8dd17b278ad11139e62464433191653):
> https://github.com/apache/spark/tree/v3.2.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1395/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
>
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/7tzik
>
> This release is using the release script of the tag v3.2.1-rc1.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.1?
> ===
>
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-11 Thread Thomas graves
+1

Tom Graves

On Wed, Oct 6, 2021 at 11:49 AM Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time October 11 and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc7 (commit 
> 5d45a415f3a29898d92380380cfd82bfc7f579ea):
> https://github.com/apache/spark/tree/v3.2.0-rc7
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1394
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc7.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing references to Master

2021-07-09 Thread Thomas Graves
Hey everyone,

Looking at this again since we cut spark 3.2 branch thinking this
might be something to target for Spark 3.3.

Based on the feedback, I'd like to propose using "Leader" to replace
"Master".   If there are objections to this please let me know in the
next few days.

Thanks,
Tom

On Tue, Jan 19, 2021 at 10:13 AM Tom Graves
 wrote:
>
> thanks for the interest, I haven't had time to work on replacing Master, 
> hopefully for the next release but time dependent, if you follow the lira - 
> https://issues.apache.org/jira/browse/SPARK-32333 - I will post there when I 
> start or if someone else picks it up should see activity there.
>
> Tom
>
> On Saturday, January 16, 2021, 07:56:14 AM CST, João Paulo Leonidas Fernandes 
> Dias da Silva  wrote:
>
>
> So, it looks like slave was already replaced in the docs. Waiting for a 
> definition on the replacement(s) for master so I can create a PR for the docs 
> only.
>
> On Sat, Jan 16, 2021 at 8:30 AM jpaulorio  wrote:
>
> What about updating the documentation as well? Does it depend on the codebase
> changes or can we treat it as a separate issue? I volunteer to update both
> Master and Slave terms when there's an agreement on what should be used as
> replacement. Since  [SPARK-32004]
>    was already resolved,
> can I start replacing slave with worker?
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-26 Thread Thomas Graves
+1


Tom Graves

On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.1.2.
>
> The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC 
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.1.2-rc1 (commit 
> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
> https://github.com/apache/spark/tree/v3.1.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1384/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>
> The list of bug fixes going into 3.1.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>
> This release is using the release script of the tag v3.1.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.2?
> ===
>
> The current list of open tickets targeted at 3.1.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.1.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



GitHub action permissions

2020-02-28 Thread Thomas graves
Does anyone know how the GitHub action permissions are setup?

I see a lot of random failures and want to be able to rerun them, but
I don't seem to have a "rerun" button like some folks do.

Thanks,
Tom

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Committing while Jenkins down?

2019-10-10 Thread Thomas graves
This is directed towards committers/PMC members.

It looks like Jenkins will be down for a while, what is everyone's
thoughts on committing PRs while its down?  Do we want to wait for
Jenkins to come back up, manually run things ourselves and commit?

Tom

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [RESULT][VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-14 Thread Thomas graves
Resending fixing my typo.

Hi all,
>
> The vote passed with 6 +1's (4 binding) and no -1's.
>
>  +1s (* = binding) :
> Bobby Evans*
> Thomas Graves*
> Dongjoon Hyun*
> Felix Cheung*
> Bryan Cutler
> Ryan Blue
>
>
> Thanks,
> Tom Graves
>


[RESULT][VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-13 Thread Thomas graves
Hi all,

The vote passed with 6 +1's (4 binding) and no -1's.

 +1s (* = binding) :
Bobby Evans*
Thomas Graves*
Dongjoon Hyuni*
Felix Cheung*
Bryan Cutler
Ryan Blue


Thanks,
Tom Graves

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-13 Thread Thomas Graves
Thanks everyone so far for the voting and the feedback, bumping this
up as vote is scheduling to end today.

Tom

On Wed, Sep 11, 2019 at 1:10 PM Bryan Cutler  wrote:
>
> +1 (non-binding), looks good!
>
> On Wed, Sep 11, 2019 at 10:05 AM Ryan Blue  wrote:
>>
>> +1
>>
>> This is going to be really useful. Thanks for working on it!
>>
>> On Wed, Sep 11, 2019 at 9:38 AM Felix Cheung  
>> wrote:
>>>
>>> +1
>>>
>>> 
>>> From: Thomas graves 
>>> Sent: Wednesday, September 4, 2019 7:24:26 AM
>>> To: dev 
>>> Subject: [VOTE] [SPARK-27495] SPIP: Support Stage level resource 
>>> configuration and scheduling
>>>
>>> Hey everyone,
>>>
>>> I'd like to call for a vote on SPARK-27495 SPIP: Support Stage level
>>> resource configuration and scheduling
>>>
>>> This is for supporting stage level resource configuration and
>>> scheduling.  The basic idea is to allow the user to specify executor
>>> and task resource requirements for each stage to allow the user to
>>> control the resources required at a finer grain. One good example here
>>> is doing some ETL to preprocess your data in one stage and then feed
>>> that data into an ML algorithm (like tensorflow) that would run as a
>>> separate stage.  The ETL could need totally different resource
>>> requirements for the executors/tasks than the ML stage does.
>>>
>>> The text for the SPIP is in the jira description:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-27495
>>>
>>> I split the API and Design parts into a google doc that is linked to
>>> from the jira.
>>>
>>> This vote is open until next Fri (Sept 13th).
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don't think this is a good idea because ...
>>>
>>> I'll start with my +1
>>>
>>> Thanks,
>>> Tom
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Thoughts on Spark 3 release, or a preview release

2019-09-13 Thread Thomas Graves
+1, I think having preview release would be great.

Tom

On Fri, Sep 13, 2019 at 4:55 AM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> +1 as a contributor and as a user. Given the amount of testing required
> for all the new cool stuff like java 11 support, major
> refactorings/deprecations etc, a preview version would help a lot the
> community making adoption smoother long term. I would also add to the list
> of issues, Scala 2.13 support (
> https://issues.apache.org/jira/browse/SPARK-25075) assuming things will
> move forward faster the next few months.
>
> On Fri, Sep 13, 2019 at 11:08 AM Driesprong, Fokko 
> wrote:
>
>> Michael Heuer, that's an interesting issue.
>>
>> 1.8.2 to 1.9.0 is almost binary compatible (94%):
>> http://people.apache.org/~busbey/avro/1.9.0-RC4/1.8.2_to_1.9.0RC4_compat_report.html.
>> Most of the stuff is removing the Jackson and Netty API from Avro's public
>> API and deprecating the Joda library. I would strongly advise moving to
>> 1.9.1 since there are some regression issues, for Java most important:
>> https://jira.apache.org/jira/browse/AVRO-2400
>>
>> I'd love to dive into the issue that you describe and I'm curious if the
>> issue is still there with Avro 1.9.1. I'm a bit busy at the moment but
>> might have some time this weekend to dive into it.
>>
>> Cheers, Fokko Driesprong
>>
>>
>> Op vr 13 sep. 2019 om 02:32 schreef Reynold Xin :
>>
>>> +1! Long due for a preview release.
>>>
>>>
>>> On Thu, Sep 12, 2019 at 5:26 PM, Holden Karau 
>>> wrote:
>>>
 I like the idea from the PoV of giving folks something to start testing
 against and exploring so they can raise issues with us earlier in the
 process and we have more time to make calls around this.

 On Thu, Sep 12, 2019 at 4:15 PM John Zhuge  wrote:

 +1  Like the idea as a user and a DSv2 contributor.

 On Thu, Sep 12, 2019 at 4:10 PM Jungtaek Lim  wrote:

 +1 (as a contributor) from me to have preview release on Spark 3 as it
 would help to test the feature. When to cut preview release is
 questionable, as major works are ideally to be done before that - if we are
 intended to introduce new features before official release, that should
 work regardless of this, but if we are intended to have opportunity to test
 earlier, ideally it should.

 As a one of contributors in structured streaming area, I'd like to add
 some items for Spark 3.0, both "must be done" and "better to have". For
 "better to have", I pick some items for new features which committers
 reviewed couple of rounds and dropped off without soft-reject (No valid
 reason to stop). For Spark 2.4 users, only added feature for structured
 streaming is Kafka delegation token. (given we assume revising Kafka
 consumer pool as improvement) I hope we provide some gifts for structured
 streaming users in Spark 3.0 envelope.

 > must be done
 * SPARK-26154 Stream-stream joins - left outer join gives inconsistent
 output
 It's a correctness issue with multiple users reported, being reported
 at Nov. 2018. There's a way to reproduce it consistently, and we have a
 patch submitted at Jan. 2019 to fix it.

 > better to have
 * SPARK-23539 Add support for Kafka headers in Structured Streaming
 * SPARK-26848 Introduce new option to Kafka source - specify timestamp
 to start and end offset
 * SPARK-20568 Delete files after processing in structured streaming

 There're some more new features/improvements items in SS, but given
 we're talking about ramping-down, above list might be realistic one.



 On Thu, Sep 12, 2019 at 9:53 AM Jean Georges Perrin 
 wrote:

 As a user/non committer, +1

 I love the idea of an early 3.0.0 so we can test current dev against
 it, I know the final 3.x will probably need another round of testing when
 it gets out, but less for sure... I know I could checkout and compile, but
 having a “packaged” preversion is great if it does not take too much time
 to the team...

 jg


 On Sep 11, 2019, at 20:40, Hyukjin Kwon  wrote:

 +1 from me too but I would like to know what other people think too.

 2019년 9월 12일 (목) 오전 9:07, Dongjoon Hyun 님이 작성:

 Thank you, Sean.

 I'm also +1 for the following three.

 1. Start to ramp down (by the official branch-3.0 cut)
 2. Apache Spark 3.0.0-preview in 2019
 3. Apache Spark 3.0.0 in early 2020

 For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` helps
 it a lot.

 After this discussion, can we have some timeline for `Spark 3.0 Release
 Window` in our versioning-policy page?

 - https://spark.apache.org/versioning-policy.html

 Bests,
 Dongjoon.


 On Wed, Sep 11, 2019 at 11:54 AM Michael Heuer 
 wrote:

 I would love to see 

[VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-04 Thread Thomas graves
Hey everyone,

I'd like to call for a vote on SPARK-27495 SPIP: Support Stage level
resource configuration and scheduling

This is for supporting stage level resource configuration and
scheduling.  The basic idea is to allow the user to specify executor
and task resource requirements for each stage to allow the user to
control the resources required at a finer grain. One good example here
is doing some ETL to preprocess your data in one stage and then feed
that data into an ML algorithm (like tensorflow) that would run as a
separate stage.  The ETL could need totally different resource
requirements for the executors/tasks than the ML stage does.

The text for the SPIP is in the jira description:

https://issues.apache.org/jira/browse/SPARK-27495

I split the API and Design parts into a google doc that is linked to
from the jira.

This vote is open until next Fri (Sept 13th).

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...

I'll start with my +1

Thanks,
Tom

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

2019-05-29 Thread Thomas graves
Hi all,

The vote passed with 9 +1's (4 binding) and 1 +0 and no -1's.

 +1s (* = binding) :
Bobby Evans*
Thomas Graves*
DB Tsai*
Felix Cheung*
Bryan Cutler
Kazuaki Ishizaki
Tyson Condie
Dongjoon Hyun
Jason Lowe

+0s:
Xiangrui Meng

Thanks,
Tom Graves

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-14 Thread Thomas graves
Thanks for replying, I'll extend the vote til May 26th to allow your
and other people feedback who haven't had time to look at it.

Tom

On Mon, May 13, 2019 at 4:43 PM Holden Karau  wrote:
>
> I’d like to ask this vote period to be extended, I’m interested but I don’t 
> have the cycles to review it in detail and make an informed vote until the 
> 25th.
>
> On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng  wrote:
>>
>> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel 
>> strongly about it. I would still suggest doing the following:
>>
>> 1. Link the POC mentioned in Q4. So people can verify the POC result.
>> 2. List public APIs we plan to expose in Appendix A. I did a quick check. 
>> Beside ColumnarBatch and ColumnarVector, we also need to make the following 
>> public. People who are familiar with SQL internals should help assess the 
>> risk.
>> * ColumnarArray
>> * ColumnarMap
>> * unsafe.types.CaledarInterval
>> * ColumnarRow
>> * UTF8String
>> * ArrayData
>> * ...
>> 3. I still feel using Pandas UDF as the mid-term success doesn't match the 
>> purpose of this SPIP. It does make some code cleaner. But I guess for ETL 
>> use cases, it won't bring much value.
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-13 Thread Thomas graves
It would be nice to get feedback from people who responded on the
other vote thread - Reynold, Matei, Xiangrui, does the new version
look good?

Thanks,
Tom

On Mon, May 13, 2019 at 8:22 AM Jason Lowe  wrote:
>
> +1 (non-binding)
>
> Jason
>
> On Tue, May 7, 2019 at 1:37 PM Thomas graves  wrote:
>>
>> Hi everyone,
>>
>> I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs
>> for extended Columnar Processing Support.  The proposal is to extend
>> the support to allow for more columnar processing.  We had previous
>> vote and discussion threads and have updated the SPIP based on the
>> comments to clarify a few things and reduce the scope.
>>
>> You can find the updated proposal in the jira at:
>> https://issues.apache.org/jira/browse/SPARK-27396.
>>
>> Please vote as early as you can, I will leave the vote open until next
>> Monday (May 13th), 2pm CST to give people plenty of time.
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don't think this is a good idea because ...
>>
>> Thanks!
>> Tom Graves
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-07 Thread Thomas graves
Hi everyone,

I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs
for extended Columnar Processing Support.  The proposal is to extend
the support to allow for more columnar processing.  We had previous
vote and discussion threads and have updated the SPIP based on the
comments to clarify a few things and reduce the scope.

You can find the updated proposal in the jira at:
https://issues.apache.org/jira/browse/SPARK-27396.

Please vote as early as you can, I will leave the vote open until next
Monday (May 13th), 2pm CST to give people plenty of time.

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thanks!
Tom Graves

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Thomas Graves
+1

Tom

On Tue, Aug 28, 2018 at 8:50 AM Imran Rashid 
wrote:

> There has been discussion on jira & the PR, all generally positive, so I'd
> like to call a vote for this spip.
>
> I'll start with own +1.
>
> On Fri, Aug 3, 2018 at 11:59 AM Imran Rashid  wrote:
>
>> I'd like to propose adding a plugin api for Executors, primarily for
>> instrumentation and debugging (
>> https://issues.apache.org/jira/browse/SPARK-24918).  The changes are
>> small, but as its adding a new api, it might be spip-worthy.  I mentioned
>> it as well in a recent email I sent about memory monitoring
>>
>> The spip proposal is here (and attached to the jira as well):
>> https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit?usp=sharing
>>
>> There are already some comments on the jira and pr, and I hope to get
>> more thoughts and opinions on it.
>>
>> thanks,
>> Imran
>>
>