Re: How to set platform-level defaults for array-like configs?

2022-08-18 Thread Shrikant Prasad
Hi Mridul,

If you are using Spark on Kubernetes, you can make use of admission
controller to validate or mutate the confs set in the spark defaults
configmap. But this approach will work only for cluster deploy mode and not
for client.

Regards,
Shrikant

On Fri, 12 Aug 2022 at 12:26 AM, Tom Graves 
wrote:

> A few years ago when I was doing more deployment management I kicked
> around the idea of having different types of configs or different ways to
> specify the configs.  Though one of the problems at the time was actually
> with users specifying a properties file and not picking up the
> spark-defaults.conf.So I was thinking about creating like a
> spark-admin.conf or something to that nature.
>
>  I think there is benefit in it, it just comes down to how to implement it
> best.  The other thing I don't think I saw addressed was the the ability
> prevent user from overriding configs.  If you just do the defaults I
> presume users could still override it.  That gets a bit trickier especially
> if they can override the entire spark-defaults.conf file.
>
>
> Tom
> On Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> Hi,
>
>   Wenchen, would be great if you could chime in with your thoughts - given
> the feedback you originally had on the PR.
> It would be great to hear feedback from others on this, particularly folks
> managing spark deployments - how this is mitigated/avoided in your
> case, any other pain points with configs in this context.
>
>
> Regards,
> Mridul
>
> On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen  wrote:
>
> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>- Define a clear convention, e.g. a suffix of ".default" that enables
>a default to be set and merged
>- Document this convention in configuration.md so that we can avoid
>separately documenting each default-config, and instead just add a note in
>the docs for the normal config.
>- Adjust the withPrepended method
>
> 
>added in #24804  to
>leverage this convention instead of each usage instance re-defining the
>additional config name
>- Do a comprehensive review of applicable configs and enable them all
>to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 , would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmaha...@gmail.com> wrote:
>
> Hi Spark devs,
>
> Spark contains a bunch of array-like configs (comma separated lists). Some
> examples include `spark.sql.extensions`,
> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
> a dozen or so more). As owners of the Spark platform in our organization,
> we would like to set platform-level defaults, e.g. custom SQL extension and
> listeners, and we use some of the above mentioned properties to do so. At
> the same time, we have power users writing their own listeners, setting the
> same Spark confs and thus unintentionally overriding our platform defaults.
> This leads to a loss of functionality within our platform.
>
> Previously, Spark has introduced "default" confs for a few of these
> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
> These properties are meant to only be set by cluster admins thus allowing
> separation between platform default and user configs. However, as discussed
> in https://github.com/apache/spark/pull/34856, these configs are still
> client-side and can still be overridden, while also not being a scalable
> solution as we cannot introduce 1 new "default" config for every array-like
> config.
>
> I wanted to know if others have experienced this issue and what systems
> have been implemented to tackle this. Are there any existing solutions for
> this; either client-side or server-side? (e.g. at job submission server).
> Even though we cannot easily enforce this at the client-side, the
> simplicity of a solution may make it more appealing.
>
> Thanks,
> Shardul
>
> --
Regards,
Shrikant Prasad


Re: How to set platform-level defaults for array-like configs?

2022-08-11 Thread Tom Graves
 A few years ago when I was doing more deployment management I kicked around 
the idea of having different types of configs or different ways to specify the 
configs.  Though one of the problems at the time was actually with users 
specifying a properties file and not picking up the spark-defaults.conf.    So 
I was thinking about creating like a spark-admin.conf or something to that 
nature.
 I think there is benefit in it, it just comes down to how to implement it 
best.  The other thing I don't think I saw addressed was the the ability 
prevent user from overriding configs.  If you just do the defaults I presume 
users could still override it.  That gets a bit trickier especially if they can 
override the entire spark-defaults.conf file. 

TomOn Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan 
 wrote:  
 
 
Hi,
  Wenchen, would be great if you could chime in with your thoughts - given the 
feedback you originally had on the PR.It would be great to hear feedback from 
others on this, particularly folks managing spark deployments - how this is 
mitigated/avoided in your case, any other pain points with configs in this 
context.

Regards,Mridul
On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen  wrote:

I find there's substantial value in being able to set defaults, and I think we 
can see that the community finds value in it as well, given the handful of 
"default"-like configs that exist today as mentioned in Shardul's email. The 
mismatch of conventions used today (suffix with ".defaultList", change "extra" 
to "default", ...) is confusing and inconsistent, plus requires one-off 
additions for each config.
My proposal here would be:   
   - Define a clear convention, e.g. a suffix of ".default" that enables a 
default to be set and merged
   - Document this convention in configuration.md so that we can avoid 
separately documenting each default-config, and instead just add a note in the 
docs for the normal config.
   - Adjust the withPrepended method added in #24804 to leverage this 
convention instead of each usage instance re-defining the additional config name
   - Do a comprehensive review of applicable configs and enable them all to use 
the newly updated withPrepended method
Wenchen, you expressed some concerns with adding more default configs in 
#34856, would this proposal address those concerns?
Thanks,Erik
On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik  
wrote:

Hi Spark devs,

Spark contains a bunch of array-like configs (comma separated lists). Some 
examples include `spark.sql.extensions`, `spark.sql.queryExecutionListeners`, 
`spark.jars.repositories`, `spark.extraListeners`, 
`spark.driver.extraClassPath` and so on (there are a dozen or so more). As 
owners of the Spark platform in our organization, we would like to set 
platform-level defaults, e.g. custom SQL extension and listeners, and we use 
some of the above mentioned properties to do so. At the same time, we have 
power users writing their own listeners, setting the same Spark confs and thus 
unintentionally overriding our platform defaults. This leads to a loss of 
functionality within our platform.

Previously, Spark has introduced "default" confs for a few of these array-like 
configs, e.g. `spark.plugins.defaultList` for `spark.plugins`, 
`spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`. These 
properties are meant to only be set by cluster admins thus allowing separation 
between platform default and user configs. However, as discussed in 
https://github.com/apache/spark/pull/34856, these configs are still client-side 
and can still be overridden, while also not being a scalable solution as we 
cannot introduce 1 new "default" config for every array-like config.

I wanted to know if others have experienced this issue and what systems have 
been implemented to tackle this. Are there any existing solutions for this; 
either client-side or server-side? (e.g. at job submission server). Even though 
we cannot easily enforce this at the client-side, the simplicity of a solution 
may make it more appealing. 

Thanks,
Shardul


  

Re: How to set platform-level defaults for array-like configs?

2022-08-11 Thread Mridul Muralidharan
Hi,

  Wenchen, would be great if you could chime in with your thoughts - given
the feedback you originally had on the PR.
It would be great to hear feedback from others on this, particularly folks
managing spark deployments - how this is mitigated/avoided in your
case, any other pain points with configs in this context.


Regards,
Mridul

On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen  wrote:

> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>- Define a clear convention, e.g. a suffix of ".default" that enables
>a default to be set and merged
>- Document this convention in configuration.md so that we can avoid
>separately documenting each default-config, and instead just add a note in
>the docs for the normal config.
>- Adjust the withPrepended method
>
> 
>added in #24804  to
>leverage this convention instead of each usage instance re-defining the
>additional config name
>- Do a comprehensive review of applicable configs and enable them all
>to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 , would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmaha...@gmail.com> wrote:
>
>> Hi Spark devs,
>>
>> Spark contains a bunch of array-like configs (comma separated lists).
>> Some examples include `spark.sql.extensions`,
>> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
>> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
>> a dozen or so more). As owners of the Spark platform in our organization,
>> we would like to set platform-level defaults, e.g. custom SQL extension and
>> listeners, and we use some of the above mentioned properties to do so. At
>> the same time, we have power users writing their own listeners, setting the
>> same Spark confs and thus unintentionally overriding our platform defaults.
>> This leads to a loss of functionality within our platform.
>>
>> Previously, Spark has introduced "default" confs for a few of these
>> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
>> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
>> These properties are meant to only be set by cluster admins thus allowing
>> separation between platform default and user configs. However, as discussed
>> in https://github.com/apache/spark/pull/34856, these configs are still
>> client-side and can still be overridden, while also not being a scalable
>> solution as we cannot introduce 1 new "default" config for every array-like
>> config.
>>
>> I wanted to know if others have experienced this issue and what systems
>> have been implemented to tackle this. Are there any existing solutions for
>> this; either client-side or server-side? (e.g. at job submission server).
>> Even though we cannot easily enforce this at the client-side, the
>> simplicity of a solution may make it more appealing.
>>
>> Thanks,
>> Shardul
>>
>


Re: How to set platform-level defaults for array-like configs?

2022-07-27 Thread Erik Krogen
I find there's substantial value in being able to set defaults, and I think
we can see that the community finds value in it as well, given the handful
of "default"-like configs that exist today as mentioned in Shardul's email.
The mismatch of conventions used today (suffix with ".defaultList", change
"extra" to "default", ...) is confusing and inconsistent, plus requires
one-off additions for each config.

My proposal here would be:

   - Define a clear convention, e.g. a suffix of ".default" that enables a
   default to be set and merged
   - Document this convention in configuration.md so that we can avoid
   separately documenting each default-config, and instead just add a note in
   the docs for the normal config.
   - Adjust the withPrepended method
   

   added in #24804  to leverage
   this convention instead of each usage instance re-defining the additional
   config name
   - Do a comprehensive review of applicable configs and enable them all to
   use the newly updated withPrepended method

Wenchen, you expressed some concerns with adding more default configs in
#34856 , would this proposal
address those concerns?

Thanks,
Erik

On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik 
wrote:

> Hi Spark devs,
>
> Spark contains a bunch of array-like configs (comma separated lists). Some
> examples include `spark.sql.extensions`,
> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
> a dozen or so more). As owners of the Spark platform in our organization,
> we would like to set platform-level defaults, e.g. custom SQL extension and
> listeners, and we use some of the above mentioned properties to do so. At
> the same time, we have power users writing their own listeners, setting the
> same Spark confs and thus unintentionally overriding our platform defaults.
> This leads to a loss of functionality within our platform.
>
> Previously, Spark has introduced "default" confs for a few of these
> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
> These properties are meant to only be set by cluster admins thus allowing
> separation between platform default and user configs. However, as discussed
> in https://github.com/apache/spark/pull/34856, these configs are still
> client-side and can still be overridden, while also not being a scalable
> solution as we cannot introduce 1 new "default" config for every array-like
> config.
>
> I wanted to know if others have experienced this issue and what systems
> have been implemented to tackle this. Are there any existing solutions for
> this; either client-side or server-side? (e.g. at job submission server).
> Even though we cannot easily enforce this at the client-side, the
> simplicity of a solution may make it more appealing.
>
> Thanks,
> Shardul
>