Re: [R] Improvement: function cut

2021-09-18 Thread David Winsemius



On 9/18/21 5:28 AM, Leonard Mada via R-help wrote:

Hello Andrew,


I add this info as a completion (so other users can get a better
understanding):

If we want to perform a survival analysis, than the interval should be
closed to the right, but we should include also the first time point (as
per Intention-to-Treat):

[0, 4](4, 8](8, 12](12, 16]

[0, 4](4, 8](8, 12](12, 16](16, 20]


So the series is extendible to the right without any errors!

But the 1st interval (which is the same in both series) is different
from the other intervals: [0, 4].


I feel that this should have been the default behaviour for cut().


To Leonard;

If you do not like the behavior of `cut`, then you should "roll your 
own". It's very unlikely that R Core will modify a base cunction like 
cut. You might want to look at Hmisc::cut2. Frank Harrell didn't like 
that default behavior and thought he could make a better cut, so he just 
put it in his package. I did like his version better and often used it 
when I was actively programming. I suspect there is also a tidyverse 
cut-like function, but I'm not terribly familiar with that fork of R. 
(It's really not the same language IMHO.)


But it's a waste of time and energy to try propose modifications of core 
R functions unless *you* can show that it is stable across 20,000 
packages and will not offend long-time users. The likelihood  of that 
happening for your proposal is vanishing small in my estimation. You 
shouldn't ask R Core to do that for you. They are busy fixing real bugs.



If you want to persist despite my negativity, then you should make a 
complete proposal by submitting a proper diff file that incorporates 
your tested efforts to the Rdevel mailing list.



--

David



Note:

I was induced to think about a different situation in my previous
message, as you constructed open intervals on the right, and also
extended to the right. But survival analysis should be as described in
this mail and should probably be the default.


Sincerely,


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:

I disagree, I don't really think it's too long or ugly, but if you
think it is, you could abbreviate it as 'i'.


x <- 0:20
breaks1 <- seq.int (0, 16, 4)
breaks2 <- seq.int (0, 20, 4)
data.frame(
     cut(x, breaks1, right = FALSE, i = TRUE),
     cut(x, breaks2, right = FALSE, i = TRUE),
     check.names = FALSE
)


I hope this helps.

On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada mailto:leo.m...@syonic.eu>> wrote:

 Hello Andrew,


 But "cut" generates factors. In most cases with real data one
 expects to have also the ends of the interval: the argument
 "include.lowest" is both ugly and too long.

 [The test-code on the ftable thread contains this error! I have
 run through this error a couple of times.]


 The only real situation that I can imagine to be problematic:

 - if the interval goes to +Inf (or -Inf): I do not know if there
 would be any effects when including +Inf (or -Inf).


 Leonard


 On 9/18/2021 1:14 AM, Andrew Simmons wrote:

 While it is not explicitly mentioned anywhere in the
 documentation for .bincode, I suspect 'include.lowest = FALSE' is
 the default to keep the definitions of the bins consistent. For
 example:


 x <- 0:20
 breaks1 <- seq.int (0, 16, 4)
 breaks2 <- seq.int (0, 20, 4)
 cbind(
     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
 )


 by having 'include.lowest = TRUE' with different ends, you can
 get inconsistent behaviour. While this probably wouldn't be an
 issue with 'real' data, this would seem like something you'd want
 to avoid by default. The definitions of the bins are


 [0, 4)
 [4, 8)
 [8, 12)
 [12, 16]


 and


 [0, 4)
 [4, 8)
 [8, 12)
 [12, 16)
 [16, 20]


 so you can see where the inconsistent behaviour comes from. You
 might be able to get R-core to add argument 'warn', but probably
 not to change the default of 'include.lowest'. I hope this helps


 On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada mailto:leo.m...@syonic.eu>> wrote:

 Thank you Andrew.


 Is there any reason not to make: include.lowest = TRUE the
 default?


 Regarding the NA:

 The user still has to suspect that some values were not
 included and run that test.


 Leonard


 On 9/18/2021 12:53 AM, Andrew Simmons wrote:

 Regarding your first point, argument 'include.lowest'
 already handles this specific case, see ?.bincode

 Your second point, maybe it could be helpful, but since both
 'cut.default' and '.bincode' return NA if a value isn't
 within a bin, you could make something like this on your own.
 Might be worth pitching to R-bugs on the 

Re: [R] Improvement: function cut

2021-09-18 Thread Leonard Mada via R-help
Hello Andrew,


I add this info as a completion (so other users can get a better 
understanding):

If we want to perform a survival analysis, than the interval should be 
closed to the right, but we should include also the first time point (as 
per Intention-to-Treat):

[0, 4](4, 8](8, 12](12, 16]

[0, 4](4, 8](8, 12](12, 16](16, 20]


So the series is extendible to the right without any errors!

But the 1st interval (which is the same in both series) is different 
from the other intervals: [0, 4].


I feel that this should have been the default behaviour for cut().

Note:

I was induced to think about a different situation in my previous 
message, as you constructed open intervals on the right, and also 
extended to the right. But survival analysis should be as described in 
this mail and should probably be the default.


Sincerely,


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> I disagree, I don't really think it's too long or ugly, but if you 
> think it is, you could abbreviate it as 'i'.
>
>
> x <- 0:20
> breaks1 <- seq.int (0, 16, 4)
> breaks2 <- seq.int (0, 20, 4)
> data.frame(
>     cut(x, breaks1, right = FALSE, i = TRUE),
>     cut(x, breaks2, right = FALSE, i = TRUE),
>     check.names = FALSE
> )
>
>
> I hope this helps.
>
> On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  > wrote:
>
> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one
> expects to have also the ends of the interval: the argument
> "include.lowest" is both ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have
> run through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there
> would be any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>> While it is not explicitly mentioned anywhere in the
>> documentation for .bincode, I suspect 'include.lowest = FALSE' is
>> the default to keep the definitions of the bins consistent. For
>> example:
>>
>>
>> x <- 0:20
>> breaks1 <- seq.int (0, 16, 4)
>> breaks2 <- seq.int (0, 20, 4)
>> cbind(
>>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
>> )
>>
>>
>> by having 'include.lowest = TRUE' with different ends, you can
>> get inconsistent behaviour. While this probably wouldn't be an
>> issue with 'real' data, this would seem like something you'd want
>> to avoid by default. The definitions of the bins are
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16]
>>
>>
>> and
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16)
>> [16, 20]
>>
>>
>> so you can see where the inconsistent behaviour comes from. You
>> might be able to get R-core to add argument 'warn', but probably
>> not to change the default of 'include.lowest'. I hope this helps
>>
>>
>> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > > wrote:
>>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the
>> default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not
>> included and run that test.
>>
>>
>> Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>> Regarding your first point, argument 'include.lowest'
>>> already handles this specific case, see ?.bincode
>>>
>>> Your second point, maybe it could be helpful, but since both
>>> 'cut.default' and '.bincode' return NA if a value isn't
>>> within a bin, you could make something like this on your own.
>>> Might be worth pitching to R-bugs on the wishlist.
>>>
>>>
>>>
>>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>> mailto:r-help@r-project.org>> wrote:
>>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for function
>>> cut (and .bincode):
>>>
>>>
>>> 1.) Argument: Include extremes
>>> extremes = TRUE
>>> if(right == FALSE) {
>>>     # include also right for last interval;
>>> } else {
>>>     # include also left for first interval;
>>> }
>>>
>>>
>>> 2.) Argument: warn = TRUE
>>>
>>> Warn if any values are not included in the intervals.
>>>
>>>
>>> Motivation:
>>> - reduce risk of errors when using function cut();
>>>
>>>
>>> Sincerely,
>>>
>>>
>>> Leonard
>>>
>>> __
>>> R-help@r-project.org 

Re: [R] Improvement: function cut

2021-09-17 Thread Bert Gunter
Perhaps you and Andrew should take this discussion off list...

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 17, 2021 at 3:45 PM Leonard Mada via R-help
 wrote:
>
> Why would you want to merge different factors?
>
> It makes no sense on real data. Even if some names are the same, the
> factors are not the same!
>
>
> The only real-data application that springs to mind is censoring (right
> or left, depending on the choice): but here we have both open and closed
> intervals, e.g. to the right (in the same data-set).
>
>
> Leonard
>
>
> On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> > I disagree, I don't really think it's too long or ugly, but if you
> > think it is, you could abbreviate it as 'i'.
> >
> >
> > x <- 0:20
> > breaks1 <- seq.int (0, 16, 4)
> > breaks2 <- seq.int (0, 20, 4)
> > data.frame(
> > cut(x, breaks1, right = FALSE, i = TRUE),
> > cut(x, breaks2, right = FALSE, i = TRUE),
> > check.names = FALSE
> > )
> >
> >
> > I hope this helps.
> >
> > On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  > > wrote:
> >
> > Hello Andrew,
> >
> >
> > But "cut" generates factors. In most cases with real data one
> > expects to have also the ends of the interval: the argument
> > "include.lowest" is both ugly and too long.
> >
> > [The test-code on the ftable thread contains this error! I have
> > run through this error a couple of times.]
> >
> >
> > The only real situation that I can imagine to be problematic:
> >
> > - if the interval goes to +Inf (or -Inf): I do not know if there
> > would be any effects when including +Inf (or -Inf).
> >
> >
> > Leonard
> >
> >
> > On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> >> While it is not explicitly mentioned anywhere in the
> >> documentation for .bincode, I suspect 'include.lowest = FALSE' is
> >> the default to keep the definitions of the bins consistent. For
> >> example:
> >>
> >>
> >> x <- 0:20
> >> breaks1 <- seq.int (0, 16, 4)
> >> breaks2 <- seq.int (0, 20, 4)
> >> cbind(
> >> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
> >> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> >> )
> >>
> >>
> >> by having 'include.lowest = TRUE' with different ends, you can
> >> get inconsistent behaviour. While this probably wouldn't be an
> >> issue with 'real' data, this would seem like something you'd want
> >> to avoid by default. The definitions of the bins are
> >>
> >>
> >> [0, 4)
> >> [4, 8)
> >> [8, 12)
> >> [12, 16]
> >>
> >>
> >> and
> >>
> >>
> >> [0, 4)
> >> [4, 8)
> >> [8, 12)
> >> [12, 16)
> >> [16, 20]
> >>
> >>
> >> so you can see where the inconsistent behaviour comes from. You
> >> might be able to get R-core to add argument 'warn', but probably
> >> not to change the default of 'include.lowest'. I hope this helps
> >>
> >>
> >> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada  >> > wrote:
> >>
> >> Thank you Andrew.
> >>
> >>
> >> Is there any reason not to make: include.lowest = TRUE the
> >> default?
> >>
> >>
> >> Regarding the NA:
> >>
> >> The user still has to suspect that some values were not
> >> included and run that test.
> >>
> >>
> >> Leonard
> >>
> >>
> >> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
> >>> Regarding your first point, argument 'include.lowest'
> >>> already handles this specific case, see ?.bincode
> >>>
> >>> Your second point, maybe it could be helpful, but since both
> >>> 'cut.default' and '.bincode' return NA if a value isn't
> >>> within a bin, you could make something like this on your own.
> >>> Might be worth pitching to R-bugs on the wishlist.
> >>>
> >>>
> >>>
> >>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
> >>> mailto:r-help@r-project.org>> wrote:
> >>>
> >>> Hello List members,
> >>>
> >>>
> >>> the following improvements would be useful for function
> >>> cut (and .bincode):
> >>>
> >>>
> >>> 1.) Argument: Include extremes
> >>> extremes = TRUE
> >>> if(right == FALSE) {
> >>> # include also right for last interval;
> >>> } else {
> >>> # include also left for first interval;
> >>> }
> >>>
> >>>
> >>> 2.) Argument: warn = TRUE
> >>>
> >>> Warn if any values are not included in the intervals.
> >>>
> >>>
> >>> Motivation:
> >>> - reduce risk of errors when using function cut();
> >>>
> >>>
> >>> Sincerely,
> >>>
> >>>
> >>>  

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help

The warn should be in cut() => .bincode().

It should be generated whenever a real value (excludes NA or NAN or +/- 
Inf) is not included in any of the bins.



If the user writes a script and doesn't want any warnings: he can select 
warn = FALSE. But otherwise it would be very helpful to catch 
immediately the error (and not after a number of steps or miss the error 
altogether).



Leonard


On 9/18/2021 1:28 AM, Jeff Newmiller wrote:

Re your objection that "the user has to suspect that some values were not 
included" applies equally to your proposed warn option. There are a lot of ways to 
introduce NAs... in real projects all analysts should be suspecting this problem.

On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help 
 wrote:

Thank you Andrew.


Is there any reason not to make: include.lowest = TRUE the default?


Regarding the NA:

The user still has to suspect that some values were not included and run
that test.


Leonard


On 9/18/2021 12:53 AM, Andrew Simmons wrote:

Regarding your first point, argument 'include.lowest' already handles
this specific case, see ?.bincode

Your second point, maybe it could be helpful, but since both
'cut.default' and '.bincode' return NA if a value isn't within a bin,
you could make something like this on your own.
Might be worth pitching to R-bugs on the wishlist.



On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
mailto:r-help@r-project.org>> wrote:

 Hello List members,


 the following improvements would be useful for function cut (and
 .bincode):


 1.) Argument: Include extremes
 extremes = TRUE
 if(right == FALSE) {
     # include also right for last interval;
 } else {
     # include also left for first interval;
 }


 2.) Argument: warn = TRUE

 Warn if any values are not included in the intervals.


 Motivation:
 - reduce risk of errors when using function cut();


 Sincerely,


 Leonard

 __
 R-help@r-project.org  mailing list --
 To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Why would you want to merge different factors?

It makes no sense on real data. Even if some names are the same, the 
factors are not the same!


The only real-data application that springs to mind is censoring (right 
or left, depending on the choice): but here we have both open and closed 
intervals, e.g. to the right (in the same data-set).


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> I disagree, I don't really think it's too long or ugly, but if you 
> think it is, you could abbreviate it as 'i'.
>
>
> x <- 0:20
> breaks1 <- seq.int (0, 16, 4)
> breaks2 <- seq.int (0, 20, 4)
> data.frame(
>     cut(x, breaks1, right = FALSE, i = TRUE),
>     cut(x, breaks2, right = FALSE, i = TRUE),
>     check.names = FALSE
> )
>
>
> I hope this helps.
>
> On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  > wrote:
>
> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one
> expects to have also the ends of the interval: the argument
> "include.lowest" is both ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have
> run through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there
> would be any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>> While it is not explicitly mentioned anywhere in the
>> documentation for .bincode, I suspect 'include.lowest = FALSE' is
>> the default to keep the definitions of the bins consistent. For
>> example:
>>
>>
>> x <- 0:20
>> breaks1 <- seq.int (0, 16, 4)
>> breaks2 <- seq.int (0, 20, 4)
>> cbind(
>>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
>> )
>>
>>
>> by having 'include.lowest = TRUE' with different ends, you can
>> get inconsistent behaviour. While this probably wouldn't be an
>> issue with 'real' data, this would seem like something you'd want
>> to avoid by default. The definitions of the bins are
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16]
>>
>>
>> and
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16)
>> [16, 20]
>>
>>
>> so you can see where the inconsistent behaviour comes from. You
>> might be able to get R-core to add argument 'warn', but probably
>> not to change the default of 'include.lowest'. I hope this helps
>>
>>
>> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > > wrote:
>>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the
>> default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not
>> included and run that test.
>>
>>
>> Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>> Regarding your first point, argument 'include.lowest'
>>> already handles this specific case, see ?.bincode
>>>
>>> Your second point, maybe it could be helpful, but since both
>>> 'cut.default' and '.bincode' return NA if a value isn't
>>> within a bin, you could make something like this on your own.
>>> Might be worth pitching to R-bugs on the wishlist.
>>>
>>>
>>>
>>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>> mailto:r-help@r-project.org>> wrote:
>>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for function
>>> cut (and .bincode):
>>>
>>>
>>> 1.) Argument: Include extremes
>>> extremes = TRUE
>>> if(right == FALSE) {
>>>     # include also right for last interval;
>>> } else {
>>>     # include also left for first interval;
>>> }
>>>
>>>
>>> 2.) Argument: warn = TRUE
>>>
>>> Warn if any values are not included in the intervals.
>>>
>>>
>>> Motivation:
>>> - reduce risk of errors when using function cut();
>>>
>>>
>>> Sincerely,
>>>
>>>
>>> Leonard
>>>
>>> __
>>> R-help@r-project.org 
>>> mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> 
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> 
>>> and provide commented, minimal, self-contained,
>>> reproducible code.
>>>


Re: [R] Improvement: function cut

2021-09-17 Thread Jeff Newmiller
Re your objection that "the user has to suspect that some values were not 
included" applies equally to your proposed warn option. There are a lot of ways 
to introduce NAs... in real projects all analysts should be suspecting this 
problem.

On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help 
 wrote:
>Thank you Andrew.
>
>
>Is there any reason not to make: include.lowest = TRUE the default?
>
>
>Regarding the NA:
>
>The user still has to suspect that some values were not included and run 
>that test.
>
>
>Leonard
>
>
>On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>> Regarding your first point, argument 'include.lowest' already handles 
>> this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both 
>> 'cut.default' and '.bincode' return NA if a value isn't within a bin, 
>> you could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
>> mailto:r-help@r-project.org>> wrote:
>>
>> Hello List members,
>>
>>
>> the following improvements would be useful for function cut (and
>> .bincode):
>>
>>
>> 1.) Argument: Include extremes
>> extremes = TRUE
>> if(right == FALSE) {
>>     # include also right for last interval;
>> } else {
>>     # include also left for first interval;
>> }
>>
>>
>> 2.) Argument: warn = TRUE
>>
>> Warn if any values are not included in the intervals.
>>
>>
>> Motivation:
>> - reduce risk of errors when using function cut();
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> __
>> R-help@r-project.org  mailing list --
>> To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> 
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Hello Andrew,


But "cut" generates factors. In most cases with real data one expects to 
have also the ends of the interval: the argument "include.lowest" is 
both ugly and too long.

[The test-code on the ftable thread contains this error! I have run 
through this error a couple of times.]


The only real situation that I can imagine to be problematic:

- if the interval goes to +Inf (or -Inf): I do not know if there would 
be any effects when including +Inf (or -Inf).


Leonard


On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> While it is not explicitly mentioned anywhere in the documentation for 
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep 
> the definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int (0, 16, 4)
> breaks2 <- seq.int (0, 20, 4)
> cbind(
>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get 
> inconsistent behaviour. While this probably wouldn't be an issue with 
> 'real' data, this would seem like something you'd want to avoid by 
> default. The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might 
> be able to get R-core to add argument 'warn', but probably not to 
> change the default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada  > wrote:
>
> Thank you Andrew.
>
>
> Is there any reason not to make: include.lowest = TRUE the default?
>
>
> Regarding the NA:
>
> The user still has to suspect that some values were not included
> and run that test.
>
>
> Leonard
>
>
> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>> Regarding your first point, argument 'include.lowest' already
>> handles this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both
>> 'cut.default' and '.bincode' return NA if a value isn't within a
>> bin, you could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>> mailto:r-help@r-project.org>> wrote:
>>
>> Hello List members,
>>
>>
>> the following improvements would be useful for function cut
>> (and .bincode):
>>
>>
>> 1.) Argument: Include extremes
>> extremes = TRUE
>> if(right == FALSE) {
>>     # include also right for last interval;
>> } else {
>>     # include also left for first interval;
>> }
>>
>>
>> 2.) Argument: warn = TRUE
>>
>> Warn if any values are not included in the intervals.
>>
>>
>> Motivation:
>> - reduce risk of errors when using function cut();
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> __
>> R-help@r-project.org  mailing
>> list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> 
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
I disagree, I don't really think it's too long or ugly, but if you think it
is, you could abbreviate it as 'i'.


x <- 0:20
breaks1 <- seq.int(0, 16, 4)
breaks2 <- seq.int(0, 20, 4)
data.frame(
cut(x, breaks1, right = FALSE, i = TRUE),
cut(x, breaks2, right = FALSE, i = TRUE),
check.names = FALSE
)


I hope this helps.

On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  wrote:

> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one expects to
> have also the ends of the interval: the argument "include.lowest" is both
> ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have run
> through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there would be
> any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>
> While it is not explicitly mentioned anywhere in the documentation for
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep the
> definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int(0, 16, 4)
> breaks2 <- seq.int(0, 20, 4)
> cbind(
> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get
> inconsistent behaviour. While this probably wouldn't be an issue with
> 'real' data, this would seem like something you'd want to avoid by default.
> The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might be
> able to get R-core to add argument 'warn', but probably not to change the
> default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada  wrote:
>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not included and run
>> that test.
>>
>>
>> Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>
>> Regarding your first point, argument 'include.lowest' already handles
>> this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both
>> 'cut.default' and '.bincode' return NA if a value isn't within a bin, you
>> could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
>> wrote:
>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for function cut (and
>>> .bincode):
>>>
>>>
>>> 1.) Argument: Include extremes
>>> extremes = TRUE
>>> if(right == FALSE) {
>>> # include also right for last interval;
>>> } else {
>>> # include also left for first interval;
>>> }
>>>
>>>
>>> 2.) Argument: warn = TRUE
>>>
>>> Warn if any values are not included in the intervals.
>>>
>>>
>>> Motivation:
>>> - reduce risk of errors when using function cut();
>>>
>>>
>>> Sincerely,
>>>
>>>
>>> Leonard
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
While it is not explicitly mentioned anywhere in the documentation for
.bincode, I suspect 'include.lowest = FALSE' is the default to keep the
definitions of the bins consistent. For example:


x <- 0:20
breaks1 <- seq.int(0, 16, 4)
breaks2 <- seq.int(0, 20, 4)
cbind(
.bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
.bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
)


by having 'include.lowest = TRUE' with different ends, you can get
inconsistent behaviour. While this probably wouldn't be an issue with
'real' data, this would seem like something you'd want to avoid by default.
The definitions of the bins are


[0, 4)
[4, 8)
[8, 12)
[12, 16]


and


[0, 4)
[4, 8)
[8, 12)
[12, 16)
[16, 20]


so you can see where the inconsistent behaviour comes from. You might be
able to get R-core to add argument 'warn', but probably not to change the
default of 'include.lowest'. I hope this helps


On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada  wrote:

> Thank you Andrew.
>
>
> Is there any reason not to make: include.lowest = TRUE the default?
>
>
> Regarding the NA:
>
> The user still has to suspect that some values were not included and run
> that test.
>
>
> Leonard
>
>
> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>
> Regarding your first point, argument 'include.lowest' already handles this
> specific case, see ?.bincode
>
> Your second point, maybe it could be helpful, but since both 'cut.default'
> and '.bincode' return NA if a value isn't within a bin, you could make
> something like this on your own.
> Might be worth pitching to R-bugs on the wishlist.
>
>
>
> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
> wrote:
>
>> Hello List members,
>>
>>
>> the following improvements would be useful for function cut (and
>> .bincode):
>>
>>
>> 1.) Argument: Include extremes
>> extremes = TRUE
>> if(right == FALSE) {
>> # include also right for last interval;
>> } else {
>> # include also left for first interval;
>> }
>>
>>
>> 2.) Argument: warn = TRUE
>>
>> Warn if any values are not included in the intervals.
>>
>>
>> Motivation:
>> - reduce risk of errors when using function cut();
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Thank you Andrew.


Is there any reason not to make: include.lowest = TRUE the default?


Regarding the NA:

The user still has to suspect that some values were not included and run 
that test.


Leonard


On 9/18/2021 12:53 AM, Andrew Simmons wrote:
> Regarding your first point, argument 'include.lowest' already handles 
> this specific case, see ?.bincode
>
> Your second point, maybe it could be helpful, but since both 
> 'cut.default' and '.bincode' return NA if a value isn't within a bin, 
> you could make something like this on your own.
> Might be worth pitching to R-bugs on the wishlist.
>
>
>
> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> Hello List members,
>
>
> the following improvements would be useful for function cut (and
> .bincode):
>
>
> 1.) Argument: Include extremes
> extremes = TRUE
> if(right == FALSE) {
>     # include also right for last interval;
> } else {
>     # include also left for first interval;
> }
>
>
> 2.) Argument: warn = TRUE
>
> Warn if any values are not included in the intervals.
>
>
> Motivation:
> - reduce risk of errors when using function cut();
>
>
> Sincerely,
>
>
> Leonard
>
> __
> R-help@r-project.org  mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
Regarding your first point, argument 'include.lowest' already handles this
specific case, see ?.bincode

Your second point, maybe it could be helpful, but since both 'cut.default'
and '.bincode' return NA if a value isn't within a bin, you could make
something like this on your own.
Might be worth pitching to R-bugs on the wishlist.



On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
wrote:

> Hello List members,
>
>
> the following improvements would be useful for function cut (and .bincode):
>
>
> 1.) Argument: Include extremes
> extremes = TRUE
> if(right == FALSE) {
> # include also right for last interval;
> } else {
> # include also left for first interval;
> }
>
>
> 2.) Argument: warn = TRUE
>
> Warn if any values are not included in the intervals.
>
>
> Motivation:
> - reduce risk of errors when using function cut();
>
>
> Sincerely,
>
>
> Leonard
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.