Re: [Rd] suggestion for extending ?as.factor

2009-05-12 Thread Petr Savicky
On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote: The version I have committed a few hours ago is indeed a much re-simplified version, using as.character(.) explicitly The current development version (2009-05-11 r48528) contains in ?factor a description of levels parametr

Re: [Rd] suggestion for extending ?as.factor

2009-05-12 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Tue, 12 May 2009 13:17:15 +0200 writes: PS On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote: The version I have committed a few hours ago is indeed a much re-simplified version, using as.character(.) explicitly PS

Re: [Rd] suggestion for extending ?as.factor

2009-05-11 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Sun, 10 May 2009 13:52:53 +0200 writes: PS On Sat, May 09, 2009 at 10:55:17PM +0200, Martin Maechler wrote: PS [...] If'd revert to such a solution, we'd have to get back to Peter's point about the issue that he'd think

Re: [Rd] suggestion for extending ?as.factor

2009-05-11 Thread Petr Savicky
On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote: [...] The version I have committed a few hours ago is indeed a much re-simplified version, using as.character(.) explicitly and consequently no longer providing the extra optional arguments that we have had for a couple of days.

Re: [Rd] suggestion for extending ?as.factor

2009-05-10 Thread Petr Savicky
On Sat, May 09, 2009 at 10:55:17PM +0200, Martin Maechler wrote: [...] If'd revert to such a solution, we'd have to get back to Peter's point about the issue that he'd think table(.) should be more tolerant than as.character() about almost equality. For compatibility reasons, we could also

Re: [Rd] suggestion for extending ?as.factor

2009-05-09 Thread Michael Dewey
At 14:18 08/05/2009, Martin Maechler wrote: PS == Petr Savicky savi...@cs.cas.cz on Fri, 8 May 2009 11:01:55 +0200 writes: Somewhere below Martin asks for alternatives from list readers. I do not have alternatives, but I do have two comments, one immediately below this, the other

Re: [Rd] suggestion for extending ?as.factor

2009-05-09 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Fri, 8 May 2009 18:10:56 +0200 writes: PS On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote: Let me suggest to consider the following modification, where match() is done on the strings, not on the original values. levels

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Petr Savicky
On Wed, May 06, 2009 at 10:41:58AM +0200, Martin Maechler wrote: PD I think that the real issue is that we actually do want almost-equal PD numbers to be folded together. yes, this now (revision 48469) will happen by default, using signif(x, 15) where '15' is the default for the

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Fri, 8 May 2009 11:01:55 +0200 writes: PS On Wed, May 06, 2009 at 10:41:58AM +0200, Martin Maechler wrote: PD I think that the real issue is that we actually do want almost-equal PD numbers to be folded together. yes, this now

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Petr Savicky
On Fri, May 08, 2009 at 03:18:01PM +0200, Martin Maechler wrote: As long as we don't want to allow factor(numeric) to fail --rarely -- I think (and that actually has been a recurring daunting thought for quite a few days) that we probably need an extra step of checking for duplicate levels,

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Petr Savicky
On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote: Let me suggest to consider the following modification, where match() is done on the strings, not on the original values. levels - unique(as.character(sort(unique(x x - as.character(x) f - match(x, levels) An alternative

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Fri, 8 May 2009 18:10:56 +0200 writes: PS On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote: Let me suggest to consider the following modification, where match() is done on the strings, not on the original values. levels

Re: [Rd] suggestion for extending ?as.factor

2009-05-08 Thread Petr Savicky
On Fri, May 08, 2009 at 06:48:40PM +0200, Martin Maechler wrote: PS == Petr Savicky savi...@cs.cas.cz on Fri, 8 May 2009 18:10:56 +0200 writes: [...] PS ... I have PS strong objections against the existing implementation of as.character(), {(because it is not *accurate*

Re: [Rd] suggestion for extending ?as.factor

2009-05-07 Thread Petr Savicky
On Wed, May 06, 2009 at 10:41:58AM +0200, Martin Maechler wrote: PD I think that the real issue is that we actually do want almost-equal PD numbers to be folded together. yes, this now (revision 48469) will happen by default, using signif(x, 15) where '15' is the default for the

Re: [Rd] suggestion for extending ?as.factor

2009-05-06 Thread Martin Maechler
MM == Martin Maechler maech...@stat.math.ethz.ch on Tue, 5 May 2009 10:35:42 +0200 writes: PD == Peter Dalgaard p.dalga...@biostat.ku.dk on Mon, 04 May 2009 19:28:06 +0200 writes: PD Petr Savicky wrote: On Mon, May 04, 2009 at 05:39:52PM +0200, Martin Maechler wrote:

Re: [Rd] suggestion for extending ?as.factor

2009-05-05 Thread Petr Savicky
On Mon, May 04, 2009 at 07:28:06PM +0200, Peter Dalgaard wrote: Petr Savicky wrote: For this, we get convert(0.3) [1] 0.3 convert(1/3) [1] 0. # 16 digits suffice convert(0.12345) [1] 0.12345 convert(0.12345678901234567) [1]

Re: [Rd] suggestion for extending ?as.factor

2009-05-05 Thread Martin Maechler
PD == Peter Dalgaard p.dalga...@biostat.ku.dk on Mon, 04 May 2009 19:28:06 +0200 writes: PD Petr Savicky wrote: On Mon, May 04, 2009 at 05:39:52PM +0200, Martin Maechler wrote: [snip] Let me quickly expand the tasks we have wanted to address, when I started changing

Re: [Rd] suggestion for extending ?as.factor

2009-05-05 Thread Peter Dalgaard
Petr Savicky wrote: Notice that the discrepancy comes from sums that really are identical values (in decimal arithmetic), but where the binary FP inaccuracy makes them slightly different. [for a nice picture, continue the example with tt - table(signif(zz,7))

Re: [Rd] suggestion for extending ?as.factor

2009-05-05 Thread Petr Savicky
On Tue, May 05, 2009 at 11:27:36AM +0200, Peter Dalgaard wrote: I know. The point was rather that if you are not careful with rounding, you get the some of the bars wrong (you get 2 or 3 small bars very close to each other instead of one longer one). Computed p values from permutation tests

Re: [Rd] suggestion for extending ?as.factor

2009-05-04 Thread Martin Maechler
PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200 writes: PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200 writes: PS In R-2.10.0, the development version, function as.factor() uses 17 digit PS precision for conversion of numeric

Re: [Rd] suggestion for extending ?as.factor

2009-05-04 Thread Martin Maechler
PD == Peter Dalgaard p.dalga...@biostat.ku.dk on Mon, 04 May 2009 15:34:09 +0200 writes: PD Martin Maechler wrote: PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200 writes: PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200

Re: [Rd] suggestion for extending ?as.factor

2009-05-04 Thread Petr Savicky
On Mon, May 04, 2009 at 05:39:52PM +0200, Martin Maechler wrote: [snip] Let me quickly expand the tasks we have wanted to address, when I started changing factor() for R-devel. 1) R-core had unanimously decided that R 2.10.0 should not allow duplicated levels in factors anymore. When

Re: [Rd] suggestion for extending ?as.factor

2009-05-04 Thread Peter Dalgaard
Petr Savicky wrote: On Mon, May 04, 2009 at 05:39:52PM +0200, Martin Maechler wrote: [snip] Let me quickly expand the tasks we have wanted to address, when I started changing factor() for R-devel. 1) R-core had unanimously decided that R 2.10.0 should not allow duplicated levels in

Re: [Rd] suggestion for extending ?as.factor

2009-05-04 Thread Peter Dalgaard
Martin Maechler wrote: PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200 writes: PS == Petr Savicky savi...@cs.cas.cz on Sun, 3 May 2009 22:32:04 +0200 writes: PS In R-2.10.0, the development version, function as.factor() uses 17 digit PS precision