Re: spark config params conventions
Based on typesafe config maintainer's response, with latest version of typeconfig, the double quote is no longer needed for key like spark.speculation, so you don't need code to strip the quotes Chester Alpine data labs Sent from my iPhone On Mar 12, 2014, at 2:50 PM, Aaron Davidson wrote: > One solution for typesafe config is to use > "spark.speculation" = true > > Typesafe will recognize the key as a string rather than a path, so the name > will actually be "\"spark.speculation\"", so you need to handle this > contingency when passing the config operations to spark (stripping the quotes > from the key). > > Solving this in Spark itself is a little tricky because there are ~5 such > conflicts (spark.serializer, spark.speculation, spark.locality.wait, > spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty > frequently. We could provide aliases for all of these in Spark, but actually > deprecating the old ones would affect many users, so we could only do that if > enough users would benefit from fully hierarchical config options. > > > > On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra wrote: >> That's the whole reason why some of the intended configuration changes were >> backed out just before the 0.9.0 release. It's a well-known issue, even if >> a completely satisfactory solution isn't as well-known and is probably >> something which should do another iteration on. >> >> >> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers wrote: >>> i am reading the spark configuration params from another configuration >>> object (typesafe config) before setting them as system properties. >>> >>> i noticed typesafe config has trouble with settings like: >>> spark.speculation=true >>> spark.speculation.interval=0.5 >>> >>> the issue seems to be that if spark.speculation is a "container" that has >>> more values inside then it cannot be also a value itself, i think. so this >>> would work fine: >>> spark.speculation.enabled=true >>> spark.speculation.interval=0.5 >>> >>> just a heads up. i would probably suggest we avoid this situation. >> >
Re: spark config params conventions
One solution for typesafe config is to use "spark.speculation" = true Typesafe will recognize the key as a string rather than a path, so the name will actually be "\"spark.speculation\"", so you need to handle this contingency when passing the config operations to spark (stripping the quotes from the key). Solving this in Spark itself is a little tricky because there are ~5 such conflicts (spark.serializer, spark.speculation, spark.locality.wait, spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty frequently. We could provide aliases for all of these in Spark, but actually deprecating the old ones would affect many users, so we could only do that if enough users would benefit from fully hierarchical config options. On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra wrote: > That's the whole reason why some of the intended configuration changes > were backed out just before the 0.9.0 release. It's a well-known issue, > even if a completely satisfactory solution isn't as well-known and is > probably something which should do another iteration on. > > > On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers wrote: > >> i am reading the spark configuration params from another configuration >> object (typesafe config) before setting them as system properties. >> >> i noticed typesafe config has trouble with settings like: >> spark.speculation=true >> spark.speculation.interval=0.5 >> >> the issue seems to be that if spark.speculation is a "container" that has >> more values inside then it cannot be also a value itself, i think. so this >> would work fine: >> spark.speculation.enabled=true >> spark.speculation.interval=0.5 >> >> just a heads up. i would probably suggest we avoid this situation. >> > >
Re: spark config params conventions
+1. I agree to keep the old ones only for backward compatibility purpose. On Wed, Mar 12, 2014 at 12:38 PM, Evan Chan wrote: > +1. > > Not just for Typesafe Config, but if we want to consider hierarchical > configs like JSON rather than flat key mappings, it is necessary. It > is also clearer. > > On Wed, Mar 12, 2014 at 9:58 AM, Aaron Davidson > wrote: > > Should we try to deprecate these types of configs for 1.0.0? We can start > > by accepting both and giving a warning if you use the old one, and then > > actually remove them in the next minor release. I think > > "spark.speculation.enabled=true" is better than "spark.speculation=true", > > and if we decide to use typesafe configs again ourselves, this change is > > necessary. > > > > We actually don't have to ever complete the deprecation - we can always > > accept both spark.speculation and spark.speculation.enabled, and people > > just have to use the latter if they want to use typesafe config. > > > > > > On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra >wrote: > > > >> That's the whole reason why some of the intended configuration changes > >> were backed out just before the 0.9.0 release. It's a well-known issue, > >> even if a completely satisfactory solution isn't as well-known and is > >> probably something which should do another iteration on. > >> > >> > >> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers > wrote: > >> > >>> i am reading the spark configuration params from another configuration > >>> object (typesafe config) before setting them as system properties. > >>> > >>> i noticed typesafe config has trouble with settings like: > >>> spark.speculation=true > >>> spark.speculation.interval=0.5 > >>> > >>> the issue seems to be that if spark.speculation is a "container" that > has > >>> more values inside then it cannot be also a value itself, i think. so > this > >>> would work fine: > >>> spark.speculation.enabled=true > >>> spark.speculation.interval=0.5 > >>> > >>> just a heads up. i would probably suggest we avoid this situation. > >>> > >> > >> > > > > -- > -- > Evan Chan > Staff Engineer > e...@ooyala.com | >
Re: spark config params conventions
+1. Not just for Typesafe Config, but if we want to consider hierarchical configs like JSON rather than flat key mappings, it is necessary. It is also clearer. On Wed, Mar 12, 2014 at 9:58 AM, Aaron Davidson wrote: > Should we try to deprecate these types of configs for 1.0.0? We can start > by accepting both and giving a warning if you use the old one, and then > actually remove them in the next minor release. I think > "spark.speculation.enabled=true" is better than "spark.speculation=true", > and if we decide to use typesafe configs again ourselves, this change is > necessary. > > We actually don't have to ever complete the deprecation - we can always > accept both spark.speculation and spark.speculation.enabled, and people > just have to use the latter if they want to use typesafe config. > > > On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra wrote: > >> That's the whole reason why some of the intended configuration changes >> were backed out just before the 0.9.0 release. It's a well-known issue, >> even if a completely satisfactory solution isn't as well-known and is >> probably something which should do another iteration on. >> >> >> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers wrote: >> >>> i am reading the spark configuration params from another configuration >>> object (typesafe config) before setting them as system properties. >>> >>> i noticed typesafe config has trouble with settings like: >>> spark.speculation=true >>> spark.speculation.interval=0.5 >>> >>> the issue seems to be that if spark.speculation is a "container" that has >>> more values inside then it cannot be also a value itself, i think. so this >>> would work fine: >>> spark.speculation.enabled=true >>> spark.speculation.interval=0.5 >>> >>> just a heads up. i would probably suggest we avoid this situation. >>> >> >> -- -- Evan Chan Staff Engineer e...@ooyala.com |
Re: spark config params conventions
Should we try to deprecate these types of configs for 1.0.0? We can start by accepting both and giving a warning if you use the old one, and then actually remove them in the next minor release. I think "spark.speculation.enabled=true" is better than "spark.speculation=true", and if we decide to use typesafe configs again ourselves, this change is necessary. We actually don't have to ever complete the deprecation - we can always accept both spark.speculation and spark.speculation.enabled, and people just have to use the latter if they want to use typesafe config. On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra wrote: > That's the whole reason why some of the intended configuration changes > were backed out just before the 0.9.0 release. It's a well-known issue, > even if a completely satisfactory solution isn't as well-known and is > probably something which should do another iteration on. > > > On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers wrote: > >> i am reading the spark configuration params from another configuration >> object (typesafe config) before setting them as system properties. >> >> i noticed typesafe config has trouble with settings like: >> spark.speculation=true >> spark.speculation.interval=0.5 >> >> the issue seems to be that if spark.speculation is a "container" that has >> more values inside then it cannot be also a value itself, i think. so this >> would work fine: >> spark.speculation.enabled=true >> spark.speculation.interval=0.5 >> >> just a heads up. i would probably suggest we avoid this situation. >> > >