Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
What about splitting the baby and having set.seed(1:2), set.seed(6.1), etc. issue a warning rather than throw an error? It informs the user that their expectations have deviated from reality, encourages proper programming practices, and carries substantially lower risk of breaking things than an exception. On Fri, Sep 17, 2021 at 1:13 PM Avi Gross via R-devel wrote: > > R wobbles a bit as there is no normal datatype that is a singleton variable. > Saying x <- 5 just creates a vector of current length 1. It is perfectly > legal to then write x [2] <- 6 and so on. The vector lengthens. You can > truncate it back to 1, if you wish: length(x) <- 1 > > So the question here is what happens if you supply more info than is needed? > If it is an integer vector of length greater than one, should it ignore > everything but the first entry? I note it happily accepts not-quite integers > like TRUE and FALSE. it also accepts floating point numbers like 1.23 or > 1.2e5. > > The goal seems to be to set a unique starting point, rounded or transformed > if needed. The visible part of the function does not even look at the seed > before calling the internal representation. So although superficially > choosing the first integer in a vector makes some sense, it can be a problem > if a program assumes the entire vector is consumed and perhaps hashed in some > way to make a seed. If the program later changes parts of the vector other > than the first entry, it may assume re-setting the seed gets something else > and yet it may be exactly the same. > > So, yes, I suspect it is an ERROR to take anything that cannot be coerced by > something like as.integer() into a vector of length 1. > > I have noted other places in R where I may get a warning when giving a longer > vector that only the fist element will be used. Are they all problems that > need to be addressed? > > Here is a short one: > > > x <- c(1:3) > > if (x > 2) y <- TRUE > Warning message: > In if (x > 2) y <- TRUE : > the condition has length > 1 and only the first element will be used > > y > Error: object 'y' not found > > The above is not vectorized and makes the choice of x==1 and thus does not > set y. > > Now a vectorized variant works as expected, making a vector of length 3 for y: > > > x > [1] 1 2 3 > > > y <- ifelse(x > 2, TRUE, FALSE) > > y > [1] FALSE FALSE TRUE > > I have no doubt fixing lots of this stuff, if indeed it is a fix, can break > lots of existing code. Sure, it is not harmful to ask a programmer to always > say x[1] to guarantee they are getting what they want, or to add a function > like first(x) that does the same. > > R has some compromises or features I sometimes wonder about. If it had a > concept of a numeric scalar, then some things that now happen might start > being an error. > > What happens when you multiply a vector by a scalar as in 5*x is that every > component of x is multiplied by 5. but x*x does componentwise multiplication. > So say x is c(1:3) what should this do using a twosome times a threesome? > > x[1:2]*x > [1] 1 4 3 > Warning message: > In x[1:2] * x : > longer object length is not a multiple of shorter object length > > Is it recycling to get a 1 in pseudo-position 3? > > Yep, this shows recycling: > > > x[1:2]*x > [1] 1 4 3 8 5 12 7 16 9 > Warning message: > In x[1:2] * x : > longer object length is not a multiple of shorter object length > > You do get a warning but not telling you what it did. > > In essence, the earlier case of 5*x arguably recycled the 5 as many times as > needed but with no warning. > > My point is that many languages, especially older ones, were designed a > certain way and have been updated but we may be stuck with what we have. A > brand new language might come up with a new way that includes vectorizing the > heck out of things but allowing and even demanding that you explicitly > convert things to a scalar in a context that needs it or to explicitly asking > for recycling when you want it or ... > > > > > -Original Message- > From: R-devel On Behalf Of Henrik Bengtsson > Sent: Friday, September 17, 2021 8:39 AM > To: GILLIBERT, Andre > Cc: R-devel > Subject: Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 > (now silent) > > > I’m curious, other than proper programming practice, why? > > Life's too short for troubleshooting silent mistakes - mine or others. > > While at it, searching the interwebs for use of set.seed(), gives > mistakes/misunderstandings like using set.seed(), e.g. > > > set.seed(6.1); sum(.Rando
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
R wobbles a bit as there is no normal datatype that is a singleton variable. Saying x <- 5 just creates a vector of current length 1. It is perfectly legal to then write x [2] <- 6 and so on. The vector lengthens. You can truncate it back to 1, if you wish: length(x) <- 1 So the question here is what happens if you supply more info than is needed? If it is an integer vector of length greater than one, should it ignore everything but the first entry? I note it happily accepts not-quite integers like TRUE and FALSE. it also accepts floating point numbers like 1.23 or 1.2e5. The goal seems to be to set a unique starting point, rounded or transformed if needed. The visible part of the function does not even look at the seed before calling the internal representation. So although superficially choosing the first integer in a vector makes some sense, it can be a problem if a program assumes the entire vector is consumed and perhaps hashed in some way to make a seed. If the program later changes parts of the vector other than the first entry, it may assume re-setting the seed gets something else and yet it may be exactly the same. So, yes, I suspect it is an ERROR to take anything that cannot be coerced by something like as.integer() into a vector of length 1. I have noted other places in R where I may get a warning when giving a longer vector that only the fist element will be used. Are they all problems that need to be addressed? Here is a short one: > x <- c(1:3) > if (x > 2) y <- TRUE Warning message: In if (x > 2) y <- TRUE : the condition has length > 1 and only the first element will be used > y Error: object 'y' not found The above is not vectorized and makes the choice of x==1 and thus does not set y. Now a vectorized variant works as expected, making a vector of length 3 for y: > x [1] 1 2 3 > y <- ifelse(x > 2, TRUE, FALSE) > y [1] FALSE FALSE TRUE I have no doubt fixing lots of this stuff, if indeed it is a fix, can break lots of existing code. Sure, it is not harmful to ask a programmer to always say x[1] to guarantee they are getting what they want, or to add a function like first(x) that does the same. R has some compromises or features I sometimes wonder about. If it had a concept of a numeric scalar, then some things that now happen might start being an error. What happens when you multiply a vector by a scalar as in 5*x is that every component of x is multiplied by 5. but x*x does componentwise multiplication. So say x is c(1:3) what should this do using a twosome times a threesome? x[1:2]*x [1] 1 4 3 Warning message: In x[1:2] * x : longer object length is not a multiple of shorter object length Is it recycling to get a 1 in pseudo-position 3? Yep, this shows recycling: > x[1:2]*x [1] 1 4 3 8 5 12 7 16 9 Warning message: In x[1:2] * x : longer object length is not a multiple of shorter object length You do get a warning but not telling you what it did. In essence, the earlier case of 5*x arguably recycled the 5 as many times as needed but with no warning. My point is that many languages, especially older ones, were designed a certain way and have been updated but we may be stuck with what we have. A brand new language might come up with a new way that includes vectorizing the heck out of things but allowing and even demanding that you explicitly convert things to a scalar in a context that needs it or to explicitly asking for recycling when you want it or ... -Original Message- From: R-devel On Behalf Of Henrik Bengtsson Sent: Friday, September 17, 2021 8:39 AM To: GILLIBERT, Andre Cc: R-devel Subject: Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent) > I’m curious, other than proper programming practice, why? Life's too short for troubleshooting silent mistakes - mine or others. While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(), e.g. > set.seed(6.1); sum(.Random.seed) [1] 73930104 > set.seed(6.2); sum(.Random.seed) [1] 73930104 which clearly is not what the user expected. There are also a few cases of set.seed(), e.g. > set.seed("42"); sum(.Random.seed) [1] -2119381568 > set.seed(42); sum(.Random.seed) [1] -2119381568 which works just because as.numeric("42") is used. /Henrik On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre wrote: > > Hello, > > A vector with a length >= 2 to set.seed would probably be a bug. An error > message will help the user to fix his R code. The bug may be accidental or > due to bad understanding of the set.seed function. For instance, a user may > think that the whole state of the PRNG can be passed to set.seed. > > The "if" instruction, emits a warning when the condition has length >= 2, > because it is often a bug. I w
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
> I'd say a more serious problem would be using set.seed(.Random.seed) ... Exactly, I'm pretty sure I also tried that at some point. This leads to another thing I wanted to get to, which is to add support for exactly that case. So, instead of having poke around with: globalenv()$.Random.seed <- new_seed where 'new_seed' is a valid ".Random.seed" seed, it would be convenient to be able to do just set.seed(new_seed), which comes handy in parallel processing. /Henrik On Fri, Sep 17, 2021 at 3:10 PM Duncan Murdoch wrote: > > I'd say a more serious problem would be using set.seed(.Random.seed), > because the first entry codes for RNGkind, it hardly varies at all. So > this sequence could really mislead someone: > > > set.seed(.Random.seed) > > sum(.Random.seed) > [1] 24428993419 > > # Use it to get a new .Random.seed value: > > runif(1) > [1] 0.3842704 > > > sum(.Random.seed) > [1] -13435151647 > > # So let's make things really random, by using the new seed as a seed: > > set.seed(.Random.seed) > > sum(.Random.seed) > [1] 24428993419 > > # Back to the original! > > Duncan Murdoch > > > On 17/09/2021 8:38 a.m., Henrik Bengtsson wrote: > >> I’m curious, other than proper programming practice, why? > > > > Life's too short for troubleshooting silent mistakes - mine or others. > > > > While at it, searching the interwebs for use of set.seed(), gives > > mistakes/misunderstandings like using set.seed(), e.g. > > > >> set.seed(6.1); sum(.Random.seed) > > [1] 73930104 > >> set.seed(6.2); sum(.Random.seed) > > [1] 73930104 > > > > which clearly is not what the user expected. There are also a few > > cases of set.seed(), e.g. > > > >> set.seed("42"); sum(.Random.seed) > > [1] -2119381568 > >> set.seed(42); sum(.Random.seed) > > [1] -2119381568 > > > > which works just because as.numeric("42") is used. > > > > /Henrik > > > > On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre > > wrote: > >> > >> Hello, > >> > >> A vector with a length >= 2 to set.seed would probably be a bug. An error > >> message will help the user to fix his R code. The bug may be accidental or > >> due to bad understanding of the set.seed function. For instance, a user > >> may think that the whole state of the PRNG can be passed to set.seed. > >> > >> The "if" instruction, emits a warning when the condition has length >= 2, > >> because it is often a bug. I would expect a warning or error with > >> set.seed(). > >> > >> Validating inputs and emitting errors early is a good practice. > >> > >> Just my 2 cents. > >> > >> Sincerely. > >> Andre GILLIBERT > >> > >> -Message d'origine- > >> De : R-devel [mailto:r-devel-boun...@r-project.org] De la part de Avraham > >> Adler > >> Envoyé : vendredi 17 septembre 2021 12:07 > >> À : Henrik Bengtsson > >> Cc : R-devel > >> Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != > >> 1 (now silent) > >> > >> Hi, Henrik. > >> > >> I’m curious, other than proper programming practice, why? > >> > >> Avi > >> > >> On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < > >> henrik.bengts...@gmail.com> wrote: > >> > >>> Hi, > >>> > >>> according to help("set.seed"), argument 'seed' to set.seed() should be: > >>> > >>>a single value, interpreted as an integer, or NULL (see ‘Details’). > >>> > >>> From code inspection (src/main/RNG.c) and testing, it turns out that > >>> if you pass a 'seed' with length greater than one, it silently uses > >>> seed[1], e.g. > >>> > >>>> set.seed(1); sum(.Random.seed) > >>> [1] 4070365163 > >>>> set.seed(1:3); sum(.Random.seed) > >>> [1] 4070365163 > >>>> set.seed(1:100); sum(.Random.seed) > >>> [1] 4070365163 > >>> > >>> I'd like to suggest that set.seed() produces an error if length(seed) > >>>> 1. As a reference, for length(seed) == 0, we get: > >>> > >>>> set.seed(integer(0)) > >>> Error in set.seed(integer(0)) : supplied seed is not a valid integer > >>> > >>> /Henrik > >>> > >>> __ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> -- > >> Sent from Gmail Mobile > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
I'd say a more serious problem would be using set.seed(.Random.seed), because the first entry codes for RNGkind, it hardly varies at all. So this sequence could really mislead someone: > set.seed(.Random.seed) > sum(.Random.seed) [1] 24428993419 # Use it to get a new .Random.seed value: > runif(1) [1] 0.3842704 > sum(.Random.seed) [1] -13435151647 # So let's make things really random, by using the new seed as a seed: > set.seed(.Random.seed) > sum(.Random.seed) [1] 24428993419 # Back to the original! Duncan Murdoch On 17/09/2021 8:38 a.m., Henrik Bengtsson wrote: I’m curious, other than proper programming practice, why? Life's too short for troubleshooting silent mistakes - mine or others. While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(), e.g. set.seed(6.1); sum(.Random.seed) [1] 73930104 set.seed(6.2); sum(.Random.seed) [1] 73930104 which clearly is not what the user expected. There are also a few cases of set.seed(), e.g. set.seed("42"); sum(.Random.seed) [1] -2119381568 set.seed(42); sum(.Random.seed) [1] -2119381568 which works just because as.numeric("42") is used. /Henrik On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre wrote: Hello, A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed. The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed(). Validating inputs and emitting errors early is a good practice. Just my 2 cents. Sincerely. Andre GILLIBERT -Message d'origine- De : R-devel [mailto:r-devel-boun...@r-project.org] De la part de Avraham Adler Envoyé : vendredi 17 septembre 2021 12:07 À : Henrik Bengtsson Cc : R-devel Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent) Hi, Henrik. I’m curious, other than proper programming practice, why? Avi On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < henrik.bengts...@gmail.com> wrote: Hi, according to help("set.seed"), argument 'seed' to set.seed() should be: a single value, interpreted as an integer, or NULL (see ‘Details’). From code inspection (src/main/RNG.c) and testing, it turns out that if you pass a 'seed' with length greater than one, it silently uses seed[1], e.g. set.seed(1); sum(.Random.seed) [1] 4070365163 set.seed(1:3); sum(.Random.seed) [1] 4070365163 set.seed(1:100); sum(.Random.seed) [1] 4070365163 I'd like to suggest that set.seed() produces an error if length(seed) 1. As a reference, for length(seed) == 0, we get: set.seed(integer(0)) Error in set.seed(integer(0)) : supplied seed is not a valid integer /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
> I’m curious, other than proper programming practice, why? Life's too short for troubleshooting silent mistakes - mine or others. While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(), e.g. > set.seed(6.1); sum(.Random.seed) [1] 73930104 > set.seed(6.2); sum(.Random.seed) [1] 73930104 which clearly is not what the user expected. There are also a few cases of set.seed(), e.g. > set.seed("42"); sum(.Random.seed) [1] -2119381568 > set.seed(42); sum(.Random.seed) [1] -2119381568 which works just because as.numeric("42") is used. /Henrik On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre wrote: > > Hello, > > A vector with a length >= 2 to set.seed would probably be a bug. An error > message will help the user to fix his R code. The bug may be accidental or > due to bad understanding of the set.seed function. For instance, a user may > think that the whole state of the PRNG can be passed to set.seed. > > The "if" instruction, emits a warning when the condition has length >= 2, > because it is often a bug. I would expect a warning or error with set.seed(). > > Validating inputs and emitting errors early is a good practice. > > Just my 2 cents. > > Sincerely. > Andre GILLIBERT > > -Message d'origine- > De : R-devel [mailto:r-devel-boun...@r-project.org] De la part de Avraham > Adler > Envoyé : vendredi 17 septembre 2021 12:07 > À : Henrik Bengtsson > Cc : R-devel > Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 > (now silent) > > Hi, Henrik. > > I’m curious, other than proper programming practice, why? > > Avi > > On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < > henrik.bengts...@gmail.com> wrote: > > > Hi, > > > > according to help("set.seed"), argument 'seed' to set.seed() should be: > > > > a single value, interpreted as an integer, or NULL (see ‘Details’). > > > > From code inspection (src/main/RNG.c) and testing, it turns out that > > if you pass a 'seed' with length greater than one, it silently uses > > seed[1], e.g. > > > > > set.seed(1); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:3); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:100); sum(.Random.seed) > > [1] 4070365163 > > > > I'd like to suggest that set.seed() produces an error if length(seed) > > > 1. As a reference, for length(seed) == 0, we get: > > > > > set.seed(integer(0)) > > Error in set.seed(integer(0)) : supplied seed is not a valid integer > > > > /Henrik > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Sent from Gmail Mobile > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
Hello, A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed. The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed(). Validating inputs and emitting errors early is a good practice. Just my 2 cents. Sincerely. Andre GILLIBERT -Message d'origine- De : R-devel [mailto:r-devel-boun...@r-project.org] De la part de Avraham Adler Envoyé : vendredi 17 septembre 2021 12:07 À : Henrik Bengtsson Cc : R-devel Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent) Hi, Henrik. I’m curious, other than proper programming practice, why? Avi On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < henrik.bengts...@gmail.com> wrote: > Hi, > > according to help("set.seed"), argument 'seed' to set.seed() should be: > > a single value, interpreted as an integer, or NULL (see ‘Details’). > > From code inspection (src/main/RNG.c) and testing, it turns out that > if you pass a 'seed' with length greater than one, it silently uses > seed[1], e.g. > > > set.seed(1); sum(.Random.seed) > [1] 4070365163 > > set.seed(1:3); sum(.Random.seed) > [1] 4070365163 > > set.seed(1:100); sum(.Random.seed) > [1] 4070365163 > > I'd like to suggest that set.seed() produces an error if length(seed) > > 1. As a reference, for length(seed) == 0, we get: > > > set.seed(integer(0)) > Error in set.seed(integer(0)) : supplied seed is not a valid integer > > /Henrik > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
Hi, Henrik. I’m curious, other than proper programming practice, why? Avi On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < henrik.bengts...@gmail.com> wrote: > Hi, > > according to help("set.seed"), argument 'seed' to set.seed() should be: > > a single value, interpreted as an integer, or NULL (see ‘Details’). > > From code inspection (src/main/RNG.c) and testing, it turns out that > if you pass a 'seed' with length greater than one, it silently uses > seed[1], e.g. > > > set.seed(1); sum(.Random.seed) > [1] 4070365163 > > set.seed(1:3); sum(.Random.seed) > [1] 4070365163 > > set.seed(1:100); sum(.Random.seed) > [1] 4070365163 > > I'd like to suggest that set.seed() produces an error if length(seed) > > 1. As a reference, for length(seed) == 0, we get: > > > set.seed(integer(0)) > Error in set.seed(integer(0)) : supplied seed is not a valid integer > > /Henrik > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
Hi, according to help("set.seed"), argument 'seed' to set.seed() should be: a single value, interpreted as an integer, or NULL (see ‘Details’). >From code inspection (src/main/RNG.c) and testing, it turns out that if you pass a 'seed' with length greater than one, it silently uses seed[1], e.g. > set.seed(1); sum(.Random.seed) [1] 4070365163 > set.seed(1:3); sum(.Random.seed) [1] 4070365163 > set.seed(1:100); sum(.Random.seed) [1] 4070365163 I'd like to suggest that set.seed() produces an error if length(seed) > 1. As a reference, for length(seed) == 0, we get: > set.seed(integer(0)) Error in set.seed(integer(0)) : supplied seed is not a valid integer /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel