Re: [Rd] S4 and connection slot [Sec=Unclassified]

2009-06-30 Thread Wacek Kusnierczyk

Martin Morgan wrote:

[...]




## Attempt two -- initialize
setClass(Element,
 representation=representation(conn=file))

setMethod(initialize, Element, function(.Object, ..., conn=file()) {
callNextMethod(.Object, ..., conn=conn)
})

new(Element)
## oops, connection created but not closed; gc() closes (eventually)
## but with an ugly warning
##  gc()
##used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells   717240  38.41166886  62.4  1073225  57.4
## Vcells 3795 284.9   63274729 482.8 60051033 458.2
##  gc()
##used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells   715906  38.31166886  62.4  1073225  57.4
## Vcells 37335626 284.9   63274729 482.8 60051033 458.2
## Warning messages:
## 1: closing unused connection 3 ()

setClass(ElementX, contains=Element)
## oops, two connections opened (!)


yes, that's because of the nonsense double call to the initializer while 
creating a subclass.  the conceptual bug in the s4 system leads to this 
ridiculous behaviour in your essentially correct and useful pattern.


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Show location of workspace image on quit?

2009-06-05 Thread Wacek Kusnierczyk
Barry Rowlingson wrote:
 Would something like

   q()
  Save workspace image (/home/me/workspace/.RData)? [y/n/c]:

  be useful to anyone else? Just thought I'd ask before I dive into
 internals or wrap the q function for myself.
   


yes, it would be very useful to me.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Show location of workspace image on quit?

2009-06-05 Thread Wacek Kusnierczyk
Mathieu Ribatet wrote:
 I guess that having something like this
   
 q()
 Save workspace image (/home/me/workspace/.RData)? [y/n/c/e]:
 

 where e means Editing the path should be clear enought, isn't it?
   

good idea;  maybe 'o' for 'other path';  or 'a' for 'alternative path'; 
or 'd' for 'different path'; or 'm' for 'modify path';  or 'p' for
'path';  or... ?


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug tracker

2009-06-04 Thread Wacek Kusnierczyk
the post 13613 has been classified as featuresfaq and annotated with
As documented in the Warning section!.  however, the bug has actually
been fixed. 

may i kindly suggest that the annotation be changed to a more
appropriate note?

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference counting bug related to break and next in loops

2009-06-03 Thread Wacek Kusnierczyk
William Dunlap wrote:
 One of our R users here just showed me the following problem while
 investigating the return value of a while loop.  I added some
 information
 on a similar bug in for loops.  I think he was using 2.9.0
 but I see the same problem on today's development version of 2.10.0
 (svn 48703).

 Should the semantics of while and for loops be changed slightly to avoid
 the memory
 buildup that fixing this to reflect the current docs would entail?  S+'s
 loops return nothing useful - that change was made long ago to avoid
 memory buildup resulting from semantics akin the R's present semantics.

 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com 

 Forwarded (and edited) message
 below---
 --

  I think I have found another reference counting bug.

 If you type in the following in R you get what I think is the wrong
 result.

   
 i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
 
 i + 1; y}; q
  [1] 42 42 42 42 42 42 42 42  9 10

 I had expected  [1] 42 42 42 42 42 42 42  8  9 10 which is what you get
 if you add 0 to y in the last statement in the while loop:
   

a simplified example may help to get a clear picture:

i = 1; y = 1:3;
(while(TRUE) {
   y[i] = 0
   if (i == 2) break
   i = i + 1
   y })
# 0 0 3

i = 1; y = 1:3;
(while(TRUE) {
   y[i] = 0
   if (i == 2) break
   i = i + 1
   y + 0 })
# 0 2 3

the test on i is done after the assignment to y[i].  when the loop
breaks, y is 0 0 3, and one might expect this to be the final result. 
it looks like the result is the value of y from the previous iteration,
and it does not seem particularly intuitive to me.  (using common sense,
i mean;  an informed expert on the copy-when-scared semantics may have a
different opinion, but why should a casual user ever suspect such magic.) 

anyway, i'd rather expect NULL to be returned.  for the oracle,
?'while', says:

'for', 'while' and 'repeat' return the value of the last expression
evaluated (or 'NULL' if none was), invisibly. [...]  'if' returns the
value of the expression evaluated, or 'NULL' if none was. [...]  'break'
and 'next' have value 'NULL', although it would be strange to look for a
return value.

when i is 2, i == 2 is TRUE.  hence, if (i == 2) break evaluates to
break.  break evaluates to NULL, breaks the loop, and the return value
should be NULL.  while it is, following the docs, strange to have q =
while(...) ... in the code, the result above is not compliant with the
docs at all -- seems like a plain bug.  there is no reason for while to
return the value of y, be it 0 0 3 or 0 2 3.

one might naively suspect that it is the syntactically last expression
in the body of while that provides the return value, but the docs
explicitly say the last expression evaluated.  and indeed,

(while (TRUE) { break; 'foo' })
# NULL

however,

i = FALSE
(while (TRUE) { if (i) break; i = !i; i })
# TRUE

which again reveals the bug. 

one could suspect that the last expression evaluated is actually the
whole body of the while loop;  so in the above, the value of { if (i)
break; i = !i; i } should be returned, even if the loop breaks in the
middle.  hence, the result should be TRUE (or maybe FALSE?).  however,

(while (TRUE) { break; while(TRUE) { 'foo' } })
# NULL

has no problem with returning NULL -- obviously, so to speak.

it seems to me that the bug is not in reference counting, but in that
the while loop incorrectly returns the value of the *previous* iteration
while executing a break, instead of the break's NULL.

likewise,

(for (i in 1:2) {
   if (i == 2) break
   i })
# 1

instead of the specification-promised NULL.


  i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
 
 i + 1; y + 0}; q
  [1] 42 42 42 42 42 42 42  8  9 10  
   



 Also, 

   
 i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
 
 i-i+1 ; if (i=8i3)next ; cat(Completing iteration, i, \n); y};
 q
 Completing iteration 2
 Completing iteration 3
  [1] 42 42 42 42 42 42 42 42  9 10

 but if the last statement in the while loop is y+0 instead of y I get
 the
 expected result:

   
 i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
 
 i-i+1 ; if (i=8i3)next ; cat(Completing iteration, i, \n);
 y+0L}; q
 Completing iteration 2
 Completing iteration 3
  [1] 42 42  3  4  5  6  7  8  9 10
   

 A background to the problem is that in R a while-loop returns the value
 of the last iteration. 

not according to the docs;  the last expression evaluated. 
specifically, not the value of the last non-break-broken iteration.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Print bug for matrix(list(NA_complex_, ...))

2009-06-03 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 In R 2.8.0 on Windows (tested both under ESS and under R Console in case
 there was an I/O issue)

 There is a bug in printing val - matrix(list(NA_complex_,NA_complex_),1).

   
 dput(val)
 
 structure(list(NA_complex_, NA_complex_), .Dim = 1:2)

   
 print(val)
 

 [,1]

 [1,]


 [,2]

 [1,]


 Note that a large number of spaces are printed instead of NA.  

on ubuntu 8.04 with r 2.10.0 r48703 there is almost no problem (still
some unnecessary spaces):

 [,1]  [,2]
[1,]NANA



 Compare the
 unproblematic real case:

 print(structure(list(NA_real_, NA_real_), .Dim = 1:2))
  [,1] [,2]
 [1,] NA   NA

 Also, when printed in the read-eval-print loop, printing takes a very very
 long time:

   
 proc.time(); matrix(list(NA_complex_,NA_complex_),1); proc.time()
 
user  system elapsed
   74.350.09  329.45

 [,1]

 [1,]


 [,2]

 [1,]

user  system elapsed
   92.630.15  347.86

 18 seconds runtime!
   

   user  system elapsed
  0.648   0.056 155.843
 [,1] [,2]
[1,]  NA   NA
   user  system elapsed
  0.648   0.056 155.843

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference counting bug related to break and next in loops

2009-06-03 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 a simplified example may help to get a clear picture:

 i = 1; y = 1:3;
 (while(TRUE) {
y[i] = 0
if (i == 2) break
i = i + 1
y })
 # 0 0 3

 i = 1; y = 1:3;
 (while(TRUE) {
y[i] = 0
if (i == 2) break
i = i + 1
y + 0 })
 # 0 2 3

 the test on i is done after the assignment to y[i].  when the loop
 breaks, y is 0 0 3, and one might expect this to be the final result. 
 it looks like the result is the value of y from the previous iteration,
 and it does not seem particularly intuitive to me.  (using common sense,
 i mean;  an informed expert on the copy-when-scared semantics may have a
 different opinion, but why should a casual user ever suspect such magic.) 

 anyway, i'd rather expect NULL to be returned.  for the oracle,
 ?'while', says:

 'for', 'while' and 'repeat' return the value of the last expression
 evaluated (or 'NULL' if none was), invisibly. [...]  'if' returns the
 value of the expression evaluated, or 'NULL' if none was. [...]  'break'
 and 'next' have value 'NULL', although it would be strange to look for a
 return value.

 when i is 2, i == 2 is TRUE.  hence, if (i == 2) break evaluates to
 break.  break evaluates to NULL, breaks the loop, and the return value
 should be NULL.  while it is, following the docs, strange to have q =
 while(...) ... in the code, the result above is not compliant with the
 docs at all -- seems like a plain bug.  there is no reason for while to
 return the value of y, be it 0 0 3 or 0 2 3.
   

somewhat surprising to learn,

i = 1
y = 1:3
(while (TRUE) {
   y[i] = 0
   if (i == 2) { 2*y; break }
   i = i + 1
   y })
# 0 0 3

where clearly the last expression evaluated (before the break, that is)
is 2*y -- or?


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference counting bug related to break and next in loops

2009-06-03 Thread Wacek Kusnierczyk
William Dunlap wrote:
 help('while') says:
   Usage:
  for(var in seq) expr
  while(cond) expr
  repeat expr
  break
  next
   Value:
  'for', 'while' and 'repeat' return the value of the last
  expression evaluated (or 'NULL' if none was), invisibly. 'for'
  sets 'var' to the last used element of 'seq', or to 'NULL' if it
  was of length zero.

  'break' and 'next' have value 'NULL', although it would be strange
  to look for a return value.

 Does the 'the last expression evaluated' mean (a) the value from
 evaluating 'expr' the last time it was completely evaluated or
 does it mean (b) the value of the last element of a {} expr that was
 evaluated?  

it's interesting (if not obvious) that

i = 1;  y = 1:3
(while (TRUE) {
   y[i] = 0
   if (i==2) break
   i = i +1
   y + 0 })
# 0 2 3

does not reflect in the final value the modification made to y in the
second, incomplete iteration, and that

i = 1;  y = 1:3
(while (TRUE) {
   y[i] = 0
   if (i==2) break
   i = i +1
   y })
# 0 0 3

does reflect this modification, yet

i = 1;  y = 1:3
(while (TRUE) {
   y[i] = 0
   if (i==2) { y = 1:3; break }
   i = i +1
   y })
# 0 0 3

makes a copy of y on y = 1:3 and returns the previous value.  again,
this surely has a straightforward explanation in the copy-when-scared
mechanics, yet, intuitively, the returned value seems completely out of
place.



 R currently follow interpretation (a), modulo reference
 counting bugs.   My suggestion is to move to interpretation (b),
 so that the fact that break and next return NULL would mean that
 a broken-out-of loop would have value NULL.  (Personally, I'm happy
 with S+'s return value for all loops being NULL in all cases, but
 that might break existing R code.)
   

i'm truly impressed by s+'s superiority over r.

 Of course, if the reference counting bug can be fixed without degrading
 performance in ordinary situations (does anyone look at the return
 value of a loop, particularly one that is broken out of?), then I'm
 happy
 retaining the current semantics.

   

... with the current lousy documentation improved to match the actual
semantics.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference counting bug: overwriting for loop 'seq' variable

2009-06-02 Thread Wacek Kusnierczyk
William Dunlap wrote:
 It looks like the 'seq' variable to 'for' can be altered from
 within the loop, leading to incorrect answers.  E.g., in
 the following I'd expect 'sum' to be 1+2=3, but R 2.10.0
 (svn 48686) gives 44.5.

 x = c(1,2);  sum = 0; for (i in x) { x[i+1] = i + 42.5; sum = sum +
 i }; sum
[1] 44.5
 or, with a debugging cat()s,
 x = c(1,2);  sum = 0; for (i in x) { cat(before, i=, i, \n);
 x[i+1] = i + 42.5; cat(after, i=, i,\n); sum = sum + i }; sum
before, i= 1
after, i= 1
before, i= 43.5
after, i= 43.5
[1] 44.5
  
 If I force the for's 'seq' to be a copy of x by adding 0 to it, then I
 do get the expected answer.

 x = c(1,2);  sum = 0; for (i in x+0) { x[i+1] = i + 42.5; sum = sum
 + i }; sum
b[1] 3

 It looks like an error in reference counting. 
   

indeed;  seems like you've hit the issue of when r triggers data
duplication and when it doesn't, discussed some time ago in the context
of names() etc.  consider:

x = 1:2
for (i in x)
   x[i+1] = i-1
x
# 1 0 1

y = c(1, 2)
for (i in y)
   y[i+1] = i-1
y
# -1 0


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setdiff bizarre

2009-06-02 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:

  '1:3' %in% data.frame(a=2:4,b=1:3)  # TRUE
   

utterly weird.  so what would x have to be so that

x %in% data.frame('a')
# TRUE

hint: 

'1' %in% data.frame(1)
# TRUE

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setdiff bizarre

2009-06-02 Thread Wacek Kusnierczyk
William Dunlap wrote:
 %in% is a thin wrapper on a call to match().  match() is
 not a generic function (and is not documented to be one),
 so it treats data.frames as lists, as their underlying
 representation is a list of columns.  match is documented
 to convert lists to character and to then run the character
 version of match on that character data.  match does not
 bail out if the types of the x and table arguments don't match
 (that would be undesirable in the integer/numeric mismatch case).
   

yes, i understand that this is documented behaviour, and that it's not a
bug.  nevertheless, the example is odd, and hints that there's a design
flaw.  i also do not understand why the following should be useful and
desirable:

as.character(list('a'))
# a

as.character(data.frame('a'))
# 1

and hence

'a' %in% list('a')
# TRUE

while

'a' %in% data.frame('a')
# FALSE
'1' %in% data.frame('a')
# TRUE

there is a mechanistic explanation for how this works, but is there one
for why this works this way?


 Hence
'1' %in% data.frame(1) # - TRUE
 is acting consistently with
match(as.character(pi), c(1, pi, exp(1))) # - 2
 and
1L %in% c(1.0, 2.0, 3.0) # - TRUE

 The related functions, duplicated() and unique(), do have
 row-wise data.frame methods.  E.g.,
 duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
[1] FALSE FALSE FALSE FALSE  TRUE
 Perhaps match() ought to have one also.  S+'s match is generic
 and has a data.frame method (which is row-oriented) so there we get:
  match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
 data.frame(x=1:10,y=letters[1:10]))
[1] 1 3 5
 is.element(data.frame(x=1:10,y=letters[1:10]),
 data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))
 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

 I think that %in% and is.element() ought to remain calls to match()
 and that if you want them to work row-wise on data.frames then
 match should get a data.frame method.
   

sounds good to me.  how is

'a' %in% data.frame('a')

in S+?

thanks for the response.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setdiff bizarre

2009-06-02 Thread Wacek Kusnierczyk
Barry Rowlingson wrote:

[...]

 I suspect it's using 'deparse()' to get the character representation.
 This function is mentioned in ?as.character, but as.character.default
 disappears into the infernal .Internal and I don't have time to chase
 source code - it's sunny outside!
   

on the side, as.character triggers do_ascharacter, which in turn calls
DispatchOrEval, a function with the following beautiful comment:

To call this an ugly hack would be to insult all existing ugly hacks at
large in the world.

a fortune?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-06-01 Thread Wacek Kusnierczyk
Martin Maechler wrote:
 PS == Petr Savicky savi...@cs.cas.cz
 on Sun, 31 May 2009 10:29:41 +0200 writes:
 

 []

 PS I appreciate the current version, which contains static
 PS const char* dropTrailing0(char *s, char cdec) ...
 PS mkChar(dropTrailing0((char *)EncodeReal(x, w, d, e,
 PS OutDec), ...

 PS Here, is better visible that the cast (char *) is used
 PS than if it was hidden inside dropTrailing0(). Also, it
 PS makes dropTrailing0() more consistent.

 PS I would like to recall the already discussed
 PS modification if (replace != p) while((*(replace++) =
 PS *(p++))) ; which saves a few instructions in the more
 PS frequent case that there are no trailing zeros.

 Yes,  thank you.  This already was in my working version,
 and I had managed to lose it again.
 Will put i back

  still hoping this topic would be closed now ...
   

i would rather hope for the EncodeReal flaw to be repaired...


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-30 Thread Wacek Kusnierczyk
Martin Maechler wrote:
 Hi Waclav (and other interested parties),

 I have committed my working version of src/main/coerce.c
 so you can prepare your patch against that.
   

some further investigation and reflections on the code in StringFromReal
(henceforth SFR), src/main/coerce.c:315 (as in the patched version, now
in r-devel).

petr's elim_trailing (renamed to dropTrailing, henceforth referred to as
DT) takes as input a const char*, and returns a const char*.

const-ness of the return is not a problem;  it is fed into mkChar, which
(via mkCharLenCE) makes a local memcpy of the string, and there's no
violation of the contract here.

const-ness of the input is a consequence of the return type of
EncodeReal (henceforth EC).  however, it is hardly ever, in principle,
a good idea to destructively modify const input (as DT does) if it comes
from a function that explicitly provides it as const (as ER does).

the first question is, why does ER return the string as const?  it
appears that the returned pointer provides the address of a buffer used
internally in ER, which is allocated *statically*.  that is, each call
to ER operates on the same memory location, and each call to ER returns
the address of that same location.  i suspect this is intended to be a
smart optimization, to avoid heap- or stack-allocating a new buffer in
each call to ER, and deallocating it after use.  however, this appraoch
is problematic, in that any two calls to ER return the address of the
same piece of memory, and this may easily lead to data corruption. 

under the assumption that the content of this piece of memory is copied
before any destructive use, and that after the string is copied the
address is not further distributed, the hack is relatively harmless. 
this is what mkChar (via mkCharLenCE) does;  in SFR it copies the
content of s with memcpy, and wraps it into a SEXP that becomes the
return value from SFR.

the original author of this hack seems to have had some concern about
exporting (from ER) the address of a static buffer, hence the returned
buffer is const.  in principle, this should prevent corruption of the
buffer's content in situations such as

// hypothetical
char *p1 = ER(...);
// p1 is some string returned from ER
char p2 = ER(...);
// p2 is some other string returned from ER

// some modifications performed on the string referred to by p1
p1[0] = 'x';
// p2[0] is 'x' -- possible data corruption

still worse in a scenario with concurrent calls to ER.
  
however, since the output from ER is const, this is no longer possible
-- at least, not without a deconstifying cast the petr style.  the
problem with petr's solution is not only that it modifies shared memory
purposefully qualified as const (by virtue of ER's return type), but
also that it effectively distributes the address for further use. 

unfortunately, like most of the r source code, ER is not appropriately
commented at the declaration and the definition, and without looking at
the code, one can hardly have any clue that ER always return the same
address of a static location.  while the original developer might be
careful enough not to misuse ER, in a large multideveloper project it's
hard expect that from others.  petr's function is precisely an example
of such misuse, and as it adds (again, without an appropriate comment) a
step of indirection; any use of petr's function other than what you have
in SFR (and can you guarantee no one will ever use DT for other
purposes?) is even more likely to end up in data corruption.

one simple way to improve the code is as follows;  instead of (simplified)

const char* dropTrailing(const char* s, ...) {
   const char *p = s;
   char *replace;
   ...
   replace = (char*) p;
   ...
   return s; }

...mkChar(dropTrailing(EncodeReal(...), ...) ...

you can have something like

const char* dropTrailing(char* s, ...) {
   char *p = s, *replace;
   ...
   replace = p;
   ...
   return s; }

...mkChar(dropTrailing((char*)EncodeReal(...), ...) ...
  
where it is clear, from DT's signature, that it may (as it purposefully
does, in fact) modify the content of s.  that is, you drop the
promise-not-to-modify contract in DT, and move the need for
deconstifying ER's return out of DT, making it more explicit.

however, this is still an ad hoc hack;  it still breaks the original
developer's assumption (if i'm correct) that the return from ER
(pointing to its internal buffer) should not be destructively modified
outside of ER.

another issue is that even making the return from ER const does not
protect against data corruption.  for example,

const char *p1 = ER(...)
// p1 is some string returned from ER
const char *p2 = ER(...)
// p2 is some other string returned from ER
// but p1 == p2

if p1 is used after the second call to ER, it's likely to lead to data
corruption problems.  frankly, i'd consider the design of ER 

Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-30 Thread Wacek Kusnierczyk
Martin Maechler wrote:

[...]

 vQ the first question is, why does ER return the string as const?  it
 vQ appears that the returned pointer provides the address of a buffer 
 used
 vQ internally in ER, which is allocated *statically*.  that is, each call
 vQ to ER operates on the same memory location, and each call to ER 
 returns
 vQ the address of that same location.  i suspect this is intended to be a
 vQ smart optimization, to avoid heap- or stack-allocating a new buffer in
 vQ each call to ER, and deallocating it after use.  however, this 
 appraoch
 vQ is problematic, in that any two calls to ER return the address of the
 vQ same piece of memory, and this may easily lead to data corruption. 

 Well, that would be ok if R could be used threaded / parallel / ...
   

this can cause severe problems even without concurrency, as one of my
examples hinted.


 and we all know that there are many other pieces of code {not
 just R's own, but also in Fortran/C algorithms ..} that are
 not thread-safe.
   

absolutely.  again, ER is unsafe even in a sequential execution environment.

 Yes, of course, R looks like a horrible piece of software 

telepathy?


 to
 some,  because of that

 vQ under the assumption that the content of this piece of memory is 
 copied
 vQ before any destructive use, and that after the string is copied the
 vQ address is not further distributed, the hack is relatively harmless. 
 vQ this is what mkChar (via mkCharLenCE) does;  in SFR it copies the
 vQ content of s with memcpy, and wraps it into a SEXP that becomes the
 vQ return value from SFR.

 exactly.
   

but it should be made clear, by means of a comment, that ER is supposed
to be used in this way.  there is no hint at the interface level.


 vQ the original author of this hack seems to have had some concern about
 vQ exporting (from ER) the address of a static buffer, hence the returned
 vQ buffer is const.  in principle, this should prevent corruption of the
 vQ buffer's content in situations such as

 vQ // hypothetical
 vQ char *p1 = ER(...);
 vQ // p1 is some string returned from ER
 vQ char p2 = ER(...);
 vQ // p2 is some other string returned from ER

 vQ // some modifications performed on the string referred to by p1
 vQ p1[0] = 'x';
 vQ // p2[0] is 'x' -- possible data corruption

 vQ still worse in a scenario with concurrent calls to ER.
   
 (which will not happen in the near future)
   

unless you know a powerful and willing magician.


 vQ however, since the output from ER is const, this is no longer possible
 vQ -- at least, not without a deconstifying cast the petr style.  the
 vQ problem with petr's solution is not only that it modifies shared 
 memory
 vQ purposefully qualified as const (by virtue of ER's return type), but
 vQ also that it effectively distributes the address for further use. 

 vQ unfortunately, like most of the r source code, ER is not appropriately
 vQ commented at the declaration and the definition, and without looking 
 at
 vQ the code, one can hardly have any clue that ER always return the same
 vQ address of a static location.  while the original developer might be
 vQ careful enough not to misuse ER, in a large multideveloper project 
 it's
 vQ hard expect that from others.  petr's function is precisely an example
 vQ of such misuse, and as it adds (again, without an appropriate 
 comment) a
 vQ step of indirection; any use of petr's function other than what you 
 have
 vQ in SFR (and can you guarantee no one will ever use DT for other
 vQ purposes?) is even more likely to end up in data corruption.

 you have a point here, and as a consequence, I'm proposing to
 put the following version of DT  into the source :
 

 /* Note that we modify a  'const char*'  which is unsafe in general,
  * but ok in the context of filtering an Encode*() value into mkChar(): */
 static const char* dropTrailing0(char *s, char cdec)
 {
 char *p = s;
 for (p = s; *p; p++) {
   if(*p == cdec) {
   char *replace = p++;
   while ('0' = *p*p = '9')
   if(*(p++) != '0')
   replace = p;
   while((*(replace++) = *(p++)))
   ;
   break;
   }
 }
 return s;
 }

   

the first line appears inessential;  to an informed programmer, taking a
string as char* (as opposed to const char*) means that it *may* be
modified within the call, irrespectively of whether it actually is, and
on what occasions, and one should not assume the string is not
destructively modified.

i think it is much more appropriate to comment (a) ER, with a warning to
the effect that it always returns the same address, hence the output
should be used immediately and never written to, (b) the use of ER in
SFR where 

[Rd] bug in strsplit?

2009-05-29 Thread Wacek Kusnierczyk
src/main/character.c:435-438 (do_strsplit) contains the following code:

for (i = 0; i  tlen; i++)
if (getCharCE(STRING_ELT(tok, 0)) == CE_UTF8) use_UTF8 = TRUE;
for (i = 0; i  len; i++)
if (getCharCE(STRING_ELT(x, 0)) == CE_UTF8) use_UTF8 = TRUE;

since both loops iterate over loop-invariant expressions and statements,
either the loops are redundant, or the fixed index '0' was meant to
actually be the variable i.  i guess it's the latter, hence 'bug?' in
the subject.

it also appears that if *any* element of tok (or x) positively passes
the test, use_UTF8 is set to TRUE;  in such a case, further checks make
no sense.  the following rewrite cuts the inessential computation:

for (i = 0; i  tlen; i++)
if (getCharCE(STRING_ELT(tok, i)) == CE_UTF8) {
use_UTF8 = TRUE;
break; }
for (i = 0; i  len; i++)
if (getCharCE(STRING_ELT(x, i)) == CE_UTF8) {
use_UTF8 = TRUE;
break; }

since the pattern is repetitive, the following generic approach would
help (and the macro could possibly be reused in other places):

#define CHECK_CE(CHARACTER, LENGTH, USEUTF8) \
for (i = 0; i  (LENGTH); i++) \
if (getCharCE(STRING_ELT((CHARACTER), i)) == CE_UTF8) { \
(USEUTF8) = TRUE; \
break; }
CHECK_CE(tok, tlen, use_UTF8)
CHECK_CE(x, len, use_UTF8)

if you like it, i can provide a patch.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Petr Savicky wrote:
 On Fri, May 29, 2009 at 03:53:02PM +0200, Martin Maechler wrote:
   
 my version of *using* the function was

 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), 
 OutDec));
 7 }

 where you need to consider that mkChar() expects a 'const char*' 
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition. 
 

 Yes, the goal was to accept the output of EncodeReal() with exactly the
 same type, which EncodeReal() produces. A question is, whether the
 output type of EncodeReal() could be changed to (char *). Then, changing
 the output string could be done without casting const to non-const.

   
exactly.  my suggestion was to modify your function so that no modify a
constant string-cheating is done, by either (a) keeping the const but
returning a *new* string (hence no const-to-nonconst cast would be
needed), or (b) modify your function to accept a non-const string *and*
modify the code that connects to your function via the input and output
strings. 

note, if a solution in which your function serves as a destructive
filter is just fine (martin seems to have accepted it already), then
EncodeReal probably can produce just a string, with no const qualifier,
and analogously for mkChar.  on the other hand, if EncodeReal is
purposefully designed to return a const string (i.e., there is an
important reason for doing so), and analogously for mkChar, then your
function violates the assumptions and can potentially be harmful to the
rest of the code.


 This solution may be in conflict with the structure of the rest of R code,
 so i cannot evaluate, whether this is possible.

   

well, either the rest of the code does *not* need const, and it can be
safely removed, or it *does* rely on const, and your solution ciolates
the expectation.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Martin Maechler wrote:

[...]
 vQ you return s, which should be the same pointer value (given the actual
 vQ code that does not modify the local variable s) with the same 
 pointed-to
 vQ string value (given the signature of the function).

 vQ was perhaps

 vQ char *elim_trailing(char* const s, char cdec)

 vQ intended?

 yes that would seem slightly more logical to my eyes, 
 and in principle I also agree with the other remarks you make above,
   

what does ' in principle ' mean, as opposed to 'in principle'?  (is it
emphasis, or sneer quotes?)

 ...

 vQ anyway, having the pointer s itself declared as const does
 vQ make sense, as the code seems to assume that exactly the input pointer
 vQ value should be returned.  or maybe the argument to elim_trailing 
 should
 vQ not be declared as const, since elim_trailing violates the 
 declaration. 

 vQ one way out is to drop the violated const in both the actual argument
 vQ and in elim_trailing, which would then be simplified by removing all
 vQ const qualifiers and (char*) casts.  

 I've tried that, but   ``it does not work'' later:
 {after having renamed  'elim_trailing'  to  'dropTrailing0' }
 my version of *using* the function was

 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec));
 7 }

 where you need to consider that mkChar() expects a 'const char*' 
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition. 
 If I use your proposed signature

 char* dropTrailing0(char *s, char cdec);

 line 6 above gives warnings in all of several incantations I've tried
 including this one :

 else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, 
 d, e, OutDec), OutDec));

 which (the warnings) leave me somewhat clue-less or rather
 unmotivated to dig further, though I must say that I'm not the
 expert on the subject char*  / const char* ..
   

of course, if the input *is* const and the output is expected to be
const, you should get an error/warning in the first case, and at least a
warning in the other (depending on the level of verbosity/pedanticity
you choose).

but my point was not to light-headedly change the signature/return of
elim_trailing and its implementation and use it in the original
context;  it was to either modify the context as well (if const is
inessential), or drop modifying the const string if the const is in fact
essential.


 vQ   another way out is to make
 vQ elim_trailing actually allocate and return a new string, keeping the
 vQ input truly constant, at a performance cost.  yet another way is 
 to
 vQ ignore the issue, of course.

 vQ the original (martin/petr) version may quietly pass -Wall, but the
 vQ compiler would complain (rightfully) with -Wcast-qual.

 hmm, yes, but actually I haven't found a solution along your
 proposition that even passes   -pedantic -Wall -Wcast-align
 (the combination I've personally been using for a long time).
   

one way is to return from elim_trailing a new, const copy of the const
string.  using memcpy should be efficient enough.  care should be taken
to deallocate s when no longer needed.  (my guess is that using the
approach suggested here, s can be deallocated as soon as it is copied,
which means pretty much that it does not really have to be const.)

 Maybe we can try to solve this more esthetically
 in private e-mail exchange?
   

sure, we can discuss aesthetics offline.  as long as we do not discuss
aesthetics (do we?), it seems appropriate to me to keep the discussion
online.

i will experiment with a patch to solve this issue, and let you know
when i have something reasonable.

best,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Martin Maechler wrote:
 Hi Waclav (and other interested parties),

 I have committed my working version of src/main/coerce.c
 so you can prepare your patch against that.
   

Hi Martin,

One quick reaction (which does not resolve my original complaint):  you
can have p non-const, and cast s to char* on the first occasion its
value is assigned to p, thus being able to copy from p to replace
without repetitive casts.  make check-ed patch atatched.

vQ
Index: src/main/coerce.c
===
--- src/main/coerce.c	(revision 48689)
+++ src/main/coerce.c	(working copy)
@@ -297,13 +297,13 @@
 
 const char* dropTrailing0(const char *s, char cdec)
 {
-const char *p;
-for (p = s; *p; p++) {
+char *p;
+for (p = (char *)s; *p; p++) {
 	if(*p == cdec) {
-	char *replace = (char *) p++;
+	char *replace = p++;
 	while ('0' = *p*p = '9')
 		if(*(p++) != '0')
-		replace = (char *) p;
+		replace = p;
 	while((*(replace++) = *(p++)))
 		;
 	break;
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 This is another example of the general preference of the designers of R for
 convenience over consistency.

 In my opinion, this is a design flaw even for non-programmers, because I
 find that inconsistencies make the system harder to learn.  Yes, the naive
 user may stumble over the difference between m[[1,1]] and m[1,1] a few times
 before getting it, but once he or she understands the principle, it is
 general.
   

+1

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] split strings

2009-05-28 Thread Wacek Kusnierczyk
(diverted to r-devel, a source code patch attached)

Wacek Kusnierczyk wrote:
 Allan Engelhardt wrote:
   
 Immaterial, yes, but it is always good to test :) and your solution
 *is* faster and it is even faster if you can assume byte strings:
 

 :)

 indeed;  though if the speed is immaterial (and in this case it
 supposedly was), it's probably not worth risking fixed=TRUE removing
 '.tif' from the middle of the name, however unlikely this might be (cf
 murphy's laws).

 but if you can assume that each string ends with a '.tif' (or any other
 \..{3} substring), then substr is marginally faster than sub, even as a
 three-pass approach, while avoiding the risk of removing '.tif' from the
 middle:

 strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
 paste(sample(letters, 10), collapse='')))
 library(rbenchmark)
 benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
substr={basenames=basename(strings); substr(basenames, 1,
 nchar(basenames)-4)},
sub=sub('.tif', '', basename(strings), fixed=TRUE, useBytes=TRUE))
 # test elapsed
 # 1 substr   3.176
 # 2sub   3.296
   

btw., i wonder why negative indices default to 1 in substr:

substr('foobar', -5, 5)
# fooba
# substr('foobar', 1, 5)
substr('foobar', 2, -2)
# 
# substr('foobar', 2, 1)

this does not seem to be documented in ?substr.  there are ways to make
negative indices meaningful, e.g., by taking them as indexing from
behind (as in, e.g., perl):

# hypothetical
substr('foobar', -5, 5)
# ooba
# substr('foobar', 6-5+1, 5)
substr('foobar', 2, -2)
# ooba
# substr('foobar', 2, 6-2+1)

there is a trivial fix to src/main/character.c that gives substr the
extended functionality -- see the attached patch.  the patch has been
created and tested as follows:

svn co https://svn.r-project.org/R/trunk r-devel
cd r-devel
# modifications made to src/main/character.c
svn diff  character.c.diff
svn revert -R .
patch -p0  character.c.diff
   
./configure
make
make check-all
# no problems reported

with the patched substr, the original problem can now be solved more
concisely, using a two-pass approach, with performance still better than
the sub/fixed/bytes one, as follows:

strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
paste(sample(letters, 10), collapse='')))
library(rbenchmark)
benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
substr=substr(basename(strings), 1, -5),
'substr-nchar'={
basenames=basename(strings)
substr(basenames, 1, nchar(basenames)-4) },
sub=sub('.tif', '', basename(strings), fixed=TRUE, useBytes=TRUE))
# test elapsed
# 1   substr   2.981
# 2 substr-nchar   3.206
# 3  sub   3.273

if this sounds interesting, i can update the docs accordingly.

vQ
Index: src/main/character.c
===
--- src/main/character.c	(revision 48667)
+++ src/main/character.c	(working copy)
@@ -244,7 +244,12 @@
 	ss = CHAR(el);
 	slen = strlen(ss); /* FIXME -- should handle embedded nuls */
 	buf = R_AllocStringBuffer(slen+1, cbuff);
-	if (start  1) start = 1;
+	if (start == 0) 
+		start = 1;
+	else if (start  0) 
+		start = slen + start + 1;
+	if (stop  0) 
+		stop = slen + stop + 1;
 	if (start  stop || start  slen) {
 		buf[0] = '\0';
 	} else {
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] split strings

2009-05-28 Thread Wacek Kusnierczyk
William Dunlap wrote:

 Would your patched code affect the following
 use of regexpr's output as input to substr, to
 pull out the matched text from the string?
 x-c(ooo,good food,bad)
 r-regexpr(o+, x)
 substring(x,r,attr(r,match.length)+r-1)
[1] ooo oo 
   

no; same output

 substr(x,r,attr(r,match.length)+r-1)
[1] ooo oo 
   

no; same output

 r
[1]  1  2 -1
attr(,match.length)
[1]  3  2 -1
 attr(r,match.length)+r-1
[1]  3  3 -3
attr(,match.length)
[1]  3  2 -1
   

for the positive indices there is no change, as you might expect.

if i understand your concern, the issue is that regexpr returns -1 (with
the corresponding attribute -1) where there is no match.  in this case,
you expect  as the substring. 

if there is no match, we have:

start = r = -1 (the start you index provide)
stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)

for a string of length n, my patch computes the final indices as follows:

start' = n + start - 1
stop' = n + stop - 1

whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4. 
that is, stop'  start', hence an empty string is returned, by virtue of
the original code.  (see the sources for details.)

does this answer your question?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] split strings

2009-05-28 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
 William Dunlap wrote:
   
 Would your patched code affect the following
 use of regexpr's output as input to substr, to
 pull out the matched text from the string?
 x-c(ooo,good food,bad)
 r-regexpr(o+, x)
 substring(x,r,attr(r,match.length)+r-1)
[1] ooo oo 
   
 

 no; same output

   
 substr(x,r,attr(r,match.length)+r-1)
[1] ooo oo 
   
 

 no; same output

   
 r
[1]  1  2 -1
attr(,match.length)
[1]  3  2 -1
 attr(r,match.length)+r-1
[1]  3  3 -3
attr(,match.length)
[1]  3  2 -1
   
 

 for the positive indices there is no change, as you might expect.

 if i understand your concern, the issue is that regexpr returns -1 (with
 the corresponding attribute -1) where there is no match.  in this case,
 you expect  as the substring. 

 if there is no match, we have:

 start = r = -1 (the start you index provide)
 stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)

 for a string of length n, my patch computes the final indices as follows:

 start' = n + start - 1
 stop' = n + stop - 1

 whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4. 
   

except for that stop - start = -3 - -1 = -2, but that's still negative,
i.e., stop'  start'.
silly me, sorry.

vQ

 that is, stop'  start', hence an empty string is returned, by virtue of
 the original code.  (see the sources for details.)

 does this answer your question?



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] minor correction to the r internals manual

2009-05-27 Thread Wacek Kusnierczyk
sec. 1.1 says:

both types of node structure have as their first three fields a 32-bit
sxpinfo header and then three pointers [...]

that's *four* fields, as seen in src/include/Rinternals.h:208+:

#define SEXPREC_HEADER \
struct sxpinfo_struct sxpinfo; \
struct SEXPREC *attrib; \
struct SEXPREC *gengc_next_node, *gengc_prev_node

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-27 Thread Wacek Kusnierczyk
Martin Maechler wrote:

 I have very slightly  modified the changes (to get rid of -Wall
 warnings) and also exported the function as Rf_dropTrailing0(),
 and tested the result with 'make check-all' .
 As the change seems reasonable and consequent, and as
 it seems not to produce any problems in our tests, 
 I'm hereby proposing to commit it (my version of it),
 [to R-devel only] within a few days,
 unless someone speaks up.

   

i may be misunderstanding the code, but:


 Martin Maechler, ETH Zurich

PS --- R-devel/src/main/coerce.c  2009-04-17 17:53:35.0 +0200
 PS +++ R-devel-elim-trailing/src/main/coerce.c   2009-05-23 
 08:39:03.914774176 +0200
 PS @@ -294,12 +294,33 @@
 PS else return mkChar(EncodeInteger(x, w));
 PS }
  
 PS +const char *elim_trailing(const char *s, char cdec)
   

the first argument is const char*, which usually means a contract
promising not to change the content of the pointed-to object

 PS +{
 PS +const char *p;
 PS +char *replace;
 PS +for (p = s; *p; p++) {
 PS +if (*p == cdec) {
 PS +replace = (char *) p++;
   

const char* p is cast to non-const char* replace

 PS +while ('0' = *p  *p = '9') {
 PS +if (*(p++) != '0') {
 PS +replace = (char *) p;
   

likewise

 PS +}
 PS +}
 PS +while (*(replace++) = *(p++)) {
   

the char* replace is assigned to -- effectively, the content of the
promised-to-be-constant string s is modified, and the modification may
involve any character in the string.  (it's a no-compile-error contract
violation;  not an uncommon pattern, but not good practice either.)

 PS +;
 PS +}
 PS +break;
 PS +}
 PS +}
 PS +return s;
   

you return s, which should be the same pointer value (given the actual
code that does not modify the local variable s) with the same pointed-to
string value (given the signature of the function).

was perhaps

char *elim_trailing(char* const s, char cdec)

intended?  anyway, having the pointer s itself declared as const does
make sense, as the code seems to assume that exactly the input pointer
value should be returned.  or maybe the argument to elim_trailing should
not be declared as const, since elim_trailing violates the declaration. 

one way out is to drop the violated const in both the actual argument
and in elim_trailing, which would then be simplified by removing all
const qualifiers and (char*) casts.  another way out is to make
elim_trailing actually allocate and return a new string, keeping the
input truly constant, at a performance cost.  yet another way is to
ignore the issue, of course.

the original (martin/petr) version may quietly pass -Wall, but the
compiler would complain (rightfully) with -Wcast-qual.


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Qs: The list of arguments, wrapping functions...

2009-05-19 Thread Wacek Kusnierczyk
Kynn Jones wrote:
 Hi.  I'm pretty new to R, but I've been programming in other languages for
 some time.  I have a couple of questions regarding programming with function
 objects.
 1. Is there a way for a function to refer generically to all its actual
 arguments as a list?  I'm thinking of something like the @_ array in Perl or
 the arguments variable in JavaScript.  (By actual I mean the ones that
 were actually passed, as opposed to its formal arguments, as returned by
 formals()).
   

a quick shot from a naive r user:

f = function(a=1, b, ...)
as.list(match.call()[-1])

f(2)
f(b=2)
f(1,2,3)


 2. I have a package in which most of the functions have the form:

 the.function - function(some, list, of, params) {
 return( some.other.function(the.list.of.params.to.this.function));
 }

 Is there a way that I can use a loop to define all these functions?
   

what do you mean, precisely?

 In general, I'm looking for all the information I can find on the subject of
 dynamic function definition (i.e. using code to automate the definition of
 functions at runtime).  I'm most interested in introspection facilities and
 dynamic code generation.  E.g. is it possible to write a module that
 redefines itself when sourced?  Or can a function redefine itself when
 first run?  Or how can a function find out about how it was called?
   

another quick shot from a naive r user:

f = function()
   assign(
   as.character(match.call()[[1]]),
   function() evil(),
   envir=parent.frame())
  
f
f()
f

you can then use stuff like formals, body, match.call, parent.frame,
etc. to have your function reimplement itself based on how and where it
is called.

 FWIW, Some of the things I'd like to do are in the spirit of a decorator in
 Python, which is a function that take a function f an argument and return
 another function g that is somehow based on f.  For example, this makes it
 very easy to write functions as wrappers to other simpler functions.
   

recall that decorators, when applied using the @syntax, do not just
return a new function, but rather redefine the one to which they are
applied.  so in r it would not be enough to write a function that takes
a function and returns another one;  it'd have to establish the input
function's name and the environment it resides in, and then replace that
entry in that environment with the new function.

yet another quick shot from the same naive r user:

# the decorator operator
'%...@%' = function(decorator, definition) {
   definition = substitute(definition)
   name = definition[[2]][[2]]
   definition = definition[[2]][[3]]
   assign(
   as.character(name),
   decorator(eval(definition, envir=parent.frame())),
   envir=parent.frame()) }

# a decorator
twice = function(f)
   function(...)
   do.call(f, as.list(f(...)))

# a function
inv = function(a, b)
   c(b, a)

inv(1,2)
# 2 1
twice(inv)(1,2)
# 1 2

# a decorated function
twice %...@% {
   square = function(x) x^2 }

square(2)
# 16

# another decorator
verbose = function(f)
   function(...) {
  cat('computing...\n')
  f(...) }

# another decorated function
verbose %...@% {
   square = function(x) x^2 }

square(2)
# computing...
# 4

there is certainly a lot of space for improvements, and there are
possibly bugs in the code above, but i hope it helps a little.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Qs: The list of arguments, wrapping functions...

2009-05-19 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
 Kynn Jones wrote:
   

 In general, I'm looking for all the information I can find on the subject of
 dynamic function definition (i.e. using code to automate the definition of
 functions at runtime).  I'm most interested in introspection facilities and
 dynamic code generation.  E.g. is it possible to write a module that
 redefines itself when sourced?  Or can a function redefine itself when
 first run?  Or how can a function find out about how it was called?
   
 

 another quick shot from a naive r user:

 f = function()
assign(
as.character(match.call()[[1]]),
function() evil(),
envir=parent.frame())
   
or maybe

f = function()
   body(f) - expression(evil())


   
 f
 f()
 f
   

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Qs: The list of arguments, wrapping functions...

2009-05-19 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
 Wacek Kusnierczyk wrote:
   
 Kynn Jones wrote:
   

 
 In general, I'm looking for all the information I can find on the subject of
 dynamic function definition (i.e. using code to automate the definition of
 functions at runtime).  I'm most interested in introspection facilities and
 dynamic code generation.  E.g. is it possible to write a module that
 redefines itself when sourced?  Or can a function redefine itself when
 first run?  Or how can a function find out about how it was called?
   
 
   
 another quick shot from a naive r user:

 f = function()
assign(
as.character(match.call()[[1]]),
function() evil(),
envir=parent.frame())
   
 
 or maybe

 f = function()
body(f) - expression(evil())

   

though, 'of course', these two versions are not effectively equivalent; try

g = f
f()
c(g, f)

with both definitions.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] View() crashy on Ubuntu 9.04

2009-05-13 Thread Wacek Kusnierczyk
Ben Bolker wrote:
   It's my vague impression that View() is workable on Windows and maybe
 on MacOS, but on Ubuntu Linux 9.04 (intrepid) it seems completely
 unstable.  I can reliably crash R by trying to look  at a very small,
 simple data frame ...
   

on my 8.04, r is reliable at crashing with, e.g.,

View(1)

with a subsequent attempt to move through the spreadsheet with an arrow
key.  this always causes a segfault.

I was going to try to run with debug turned on, but my installed
 version (2.9.0) doesn't have debugging symbols, and I'm having trouble
 building the latest SVN version (./configure gives checking for
 recommended packages... ls: cannot access
 ./src/library/Recommended/boot_*.tar.gz: No such file or directory)
   

tools/rsync-recommended


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unsplit list of data.frames with one column

2009-05-09 Thread Wacek Kusnierczyk
Peter Dalgaard wrote:
 Will Gray wrote:

 Perhaps this is the intended behavior, but I discovered that unsplit
 throws an error when it tries to set rownames of a variable that has
 no dimension.  This occurs when unsplit is passed a list of
 data.frames that have only a single column.

 An example:

 df - data.frame(letters[seq(25)])
 fac - rep(seq(5), 5)
 unsplit(split(df, fac), fac)

 For reference, I'm using R version 2.9.0 (2009-04-17), subversion
 revision 48333, on Ubuntu 8.10.


 That's a bug. The line

 x - value[[1L]][rep(NA, len), ]

 should be

 x - value[[1L]][rep(NA, len), , drop=FALSE]


looks like someone got caught by the drop=TRUE design...?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed changes to RSiteSearch

2009-05-08 Thread Wacek Kusnierczyk
Romain Francois wrote:

txt - grep( '^tr.*td align=right.*a', readLines( url ), value =
 TRUE )
  rx - '^.*?a href=(.*?)(.*?)/a.*td(.*?)/td.*$'
out - data.frame(
url = gsub( rx, \\1, txt ),
group = gsub( rx, \\2, txt ),
description = gsub( rx, \\3, txt ),

looking at this bit of your code, i wonder why gsub is not vectorized
for the pattern and replacement arguments, although it is for the x
argument.  the three lines above could be collapsed to just one with a
vectorized gsub:

gsubm = function(pattern, replacement, x, ...)
   mapply(USE.NAMES=FALSE, SIMPLIFY=FALSE,
   gsub, pattern=pattern, replacement=replacement, x=x, ...)

for example, given the sample data

txt = 'foofoo/foobarbar/bar'
rx = '(.*?)(.*?)/(.*?)'

the sequence

open = gsub(rx, '\\1', txt, perl=TRUE)
content = gsub(rx, '\\2', txt, perl=TRUE)
close = gsub(rx, '\\3', txt, perl=TRUE)

print(list(open, content, close))
   
could be replaced with

data = structure(names=c('open', 'content', 'close'),
gsubm(rx, paste('\\', 1:3, sep=''), txt, perl=TRUE))

print(data)

surely, a call to mapply does not improve performance, but a
source-level fix should not be too difficult;  unfortunately, i can't
find myself willing to struggle with r sources right now.


note also that .*? does not work as a non-greedy .* with the default
regex engine, e.g.,

txt = foo='FOO' bar='BAR'
gsub((.*?)='(.*?)', '\\1', txt)
# foo='FOO' bar
gsub((.*?)='(.*?)', '\\2', txt)
# BAR

because the first .*? matches everyithng up to and exclusive of the
second, *not* the first, '='.  for a non-greedy match, you'd need pcre
(and using pcre generally improves performance anyway):

txt = foo='FOO' bar='BAR'
gsub((.*?)='(.*?)', '\\1', txt, perl=TRUE)
# foo bar
gsub((.*?)='(.*?)', '\\2', txt, perl=TRUE)
# FOO BAR

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed changes to RSiteSearch

2009-05-08 Thread Wacek Kusnierczyk
Romain Francois wrote:
 strapply in package gsubfn brings elegance here:

  txt - 'foobar/foo'
  rx - (.*?)(.*?)/(.*?)
  strapply( txt, rx, c , perl = T )
 [[1]]
 [1] foo bar foo


sure, but this does not, in any way, make it less strange that gsub is
not vectorized. 


 Too bad you have to pay this on performance:

  txt - rep( 'foobar/foo', 1000 )
  rx - (.*?)(.*?)/(.*?)
  system.time( out - strapply( txt, rx, c , perl = T ) )
   user  system elapsed
  2.923   0.005   3.063
  system.time( out2 - sapply( paste('\\', 1:3, sep=''), function(x){
 + gsub(rx, x, txt, perl=TRUE)
 + } ) )
   user  system elapsed
  0.011   0.000   0.011

strapply

and you know why.


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed changes to RSiteSearch

2009-05-08 Thread Wacek Kusnierczyk
hadley wickham wrote:
 On Fri, May 8, 2009 at 10:11 AM, Romain Francois
 romain.franc...@dbmail.com wrote:
   
 strapply in package gsubfn brings elegance here:

 
 txt - 'foobar/foo'
 rx - (.*?)(.*?)/(.*?)
 strapply( txt, rx, c , perl = T )
   
 [[1]]
 [1] foo bar foo

 Too bad you have to pay this on performance:

 
 txt - rep( 'foobar/foo', 1000 )
 rx - (.*?)(.*?)/(.*?)
 system.time( out - strapply( txt, rx, c , perl = T ) )
   
  user  system elapsed
  2.923   0.005   3.063
 
 system.time( out2 - sapply( paste('\\', 1:3, sep=''), function(x){
   
 + gsub(rx, x, txt, perl=TRUE)
 + } ) )
  user  system elapsed
  0.011   0.000   0.011

 Not sure what the right play i
 

 For me:

   
 system.time( out - strapply( txt, rx, c , perl = T ) )
 
user  system elapsed
   0.004   0.000   0.004

   
 system.time( out2 - sapply( paste('\\', 1:3, sep=''), function(x){
 
 + gsub(rx, x, txt, perl=TRUE)
 + } ) )
user  system elapsed
   0   0   0
   

for me:

txt - 'foobar/foo'
rx - '(.*?)(.*?)/(.*?)'

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
order='elapsed',
   sapply=sapply(paste('\\', 1:3, sep=''), function(x) gsub(rx, x,
txt, perl=TRUE)),
   mapply=mapply(gsub, rx, paste('\\', 1:3, sep=''), txt, perl=TRUE),
   strapply=strapply(txt, rx, c, perl=TRUE))
# 2   mapply   0.151
# 1   sapply   0.166
# 3 strapply   1.917

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Some extensions to class inheritance and method selection

2009-04-29 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 These look like important improvements.  As a relative newcomer to the R
 community, I'm not sure I understand what the procedures are for such
 changes.

 In particular, does the fact that the changes were committed to R-devel mean
 that the changes have already been reviewed and approved by R Core?  Are R
 Core's discussions / deliberations archived somewhere? What is the role of
 the larger R community in reviewing and approving changes like this?

 How is documentation handled? Who is responsible for developing and
 maintaining a definitive reference manual (not just man pages) which
 includes all the cumulative changes and describes them comprehensively and
 in black-box way (not referring to history and implementation details)?
   

as another newcommer, i admit the procedures mentioned above are quite
opaque to me, too.  from my perspective, it seems like quite many, if
not most, improvements (changes, at least) to r code are committed in an
ad hoc fashion, by a single developer, without any publicly visible
discussion.  this is likely to lead, and in certain circumstances does
lead, to bizarre, eclectic patches visible in the sources. 

it would be indeed interesting and desirable to make the process more
open, at least for review, by users.  or is r not *that* open?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

2009-04-27 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 On Fri, Apr 24, 2009 at 6:45 AM,  maech...@stat.math.ethz.ch wrote:
   
 Yes, the documentation will also have to be amended, but apart
 from that, would people see a big problem with the 8192 limit
 which now is suddenly of greater importance
 {{as I said all along;  hence my question to Wacek (and the
  R-develers)  if anybody found that limit too low}}
 

 I haven't been following all this but in working with strings for
 the gsubfn package my own usage of the package was primarily
 for small strings but then I discovered that others wanted to use
 it for much larger strings of 25,000 characters, say, and it was
 necessary to raise the limits (and there are also performance
 implications which could be addressed too). I don't know what
 the situation is particularly here but cases where
 very large strings can be used include linguistic analysis and
 computer generated R code.
   

in principle, instead of the quite arbitrary and not justified constant
size limit 8192 [1], one could use dynamic arrays.  this would allow
strings of arbitrary length without adding much performance penalty for
strings shorter than 8193 bytes.

[1] src/include/Defn.h:60

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

2009-04-24 Thread Wacek Kusnierczyk
maech...@stat.math.ethz.ch wrote:

 vQ sprintf has a documented limit on strings included in the output 
 using the
 vQ format '%s'.  It appears that there is a limit on the length of 
 strings included
 vQ with, e.g., the format '%d' beyond which surprising things happen 
 (output
 vQ modified for conciseness):
  

 vQ ... and this limit is *not* documented.

 MM well, it is basically (+ a few bytes ?)
 MM the same  8192  limit that *is* documented.

 indeed, I was right with that..
   

hmm, i'd guess this limit is valid for all strings included in the
output with any format?  not just %s (and, as it appears, undocumentedly
%d)?

 vQ while snprintf would help avoid buffer overflow, it may not be a
 vQ solution to the issue of confused output.

 MM I think it would / will.  We would be able to give warnings and
 MM errors, by checking the  snprintf()  return codes.

 My current working code gives an error for all the above
 examples, e.g.,

   sprintf('%d', 1)
  Error in sprintf(%d, 1) : 
required resulting string length  is  maximal 8191

 it passes  'make check-devel' and I am inclined to commit that
 code to R-devel (e.g. tomorrow). 

 Yes, the documentation will also have to be amended, but apart
 from that, would people see a big problem with the 8192 limit
 which now is suddenly of greater importance
 {{as I said all along;  hence my question to Wacek (and the
   R-develers)  if anybody found that limit too low}}
   

i didn't find the limit itself problematic.  (so far?)

btw. (i do know what that means ;)), after your recent fix:

sprintf('%q%s', 1)
# Error in sprintf(%q%s, 1) :
#  use format %f, %e, %g or %a for numeric objects

sprintf('%s', 1)
# [1] 1

you may want to add '%s' (and '%x', and ...) to the error message.  or
perhaps make it say sth like 'invalid format: ...'.  the problem is not
that %q is not applicable to numeric, but that it is not a valid format
at all.

there's also an issue with the additional arguments supplied after the
format:  any superfluous arguments are ignored (this is not documented,
as far as i can see), but they *are* evaluated nevertheless, e.g.:

sprintf('%d', 0, {print(1)})
# 1
# [1] 0

it might be a good idea to document this behaviour.

best,
vQ

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

2009-04-23 Thread Wacek Kusnierczyk
maech...@stat.math.ethz.ch wrote:

 vQ sprintf has a documented limit on strings included in the output 
 using the
 vQ format '%s'.  It appears that there is a limit on the length of 
 strings included
 vQ with, e.g., the format '%d' beyond which surprising things happen 
 (output
 vQ modified for conciseness):
   

... and this limit is *not* documented.


 vQ gregexpr('1', sprintf('%9000d', 1))
 vQ # [1] 9000 9801

 vQ gregexpr('1', sprintf('%9000d', 1))
 vQ # [1]  9000  9801 10602

 vQ gregexpr('1', sprintf('%9000d', 1))
 vQ # [1]  9000  9801 10602 11403

 vQ gregexpr('1', sprintf('%9000d', 1))
 vQ # [1]  9000  9801 10602 11403 12204

 vQ ...

 vQ Note that not only more than one '1' is included in the output, but 
 also that
 vQ the same functional expression (no side effects used beyond the 
 interface) gives
 vQ different results on each execution.  Analogous behaviour can be 
 observed with
 vQ '%nd' where n  8200.

 vQ The actual output above is consistent across separate sessions.

 vQ With sufficiently large field width values, R segfaults:

 vQ sprintf('%*d', 10^5, 1)
 vQ # *** caught segfault ***
 vQ # address 0xbfcfc000, cause 'memory not mapped'
 vQ # Segmentation fault


 Thank you, Wacek.
 That's all ``interesting''  ... unfortunately, 

 my version of  'man 3 sprintf' contains

   
 BUGS
Because sprintf() and vsprintf() assume an arbitrarily
long string, callers must be careful not to overflow the
actual space; this is often impossible to assure. Note
that the length of the strings produced is
locale-dependent and difficult to predict.  Use
snprintf() and vsnprintf() instead (or asprintf() and vasprintf).
   

   

yes, but this is c documentation, not r documentation.  it's applicable
to a degree, since ?sprintf does say that sprintf is a wrapper for the
C function 'sprintf'.  however, in c you use a buffer and you usually
have control over it's capacity, while in r this is a hidden
implementational detail, which should not be visible to the user, or
should cause an attempt to overflow the buffer to fail more gracefully
than with a segfault.

in r, sprintf('%9000d', 1) will produce a confused output with a count
of 1's variable (!) across runs (while sprintf('%*d', 9000, 1) seems to
do fine):

gregexpr('1', sprintf('%*d', 9000, 1))
# [1] 9000

gregexpr('1', sprintf('%9000d', 1))
# [1] 9000 9801 ..., variable across executions

on one execution in a series i actually got this:

Warning message:
In gregexpr(1, sprintf(%9000d, 1)) :
  input string 1 is invalid in this locale

while the very next execution, still in the same session, gave

# [1]  9000  9801 10602

with sprintf('%*d', 1, 1) i got segfaults on some executions but
correct output on others, while sprintf('%1d', 1) is confused again.



 (note the impossible part above)   
   

yes, but it does also say must be careful, and it seems that someone
has not been careful enough.

 and we haven't used  snprintf() yet, probably because it
 requires the  C99 C standard, and AFAIK, we have only relatively
 recently started to more or less rely on C99 in the R sources.
   

while snprintf would help avoid buffer overflow, it may not be a
solution to the issue of confused output.


 More precisely, I see that some windows-only code relies on
 snprintf() being available  whereas in at least on non-Windows
 section, I read   /* we cannot assume snprintf here */

 Now such platform dependency issues and corresponding configure
 settings I do typically leave to other R-corers with a much
 wider overview about platforms and their compilers and C libraries.
   

it looks like src/main/sprintf.c is just buggy, and it's plausible that
the bug could be repaired in a platform-independent manner.




 BTW,  
 1) sprintf(%n %g, 1,1)   also seg.faults
   

as do

sprintf('%n%g', 1, 1)
sprintf('%n%')

etc., while

sprintf('%q%g', 1, 1)
sprintf('%q%')
  
work just fine.  strange, because per ?sprintf 'n' is not recognized as
a format specifier, so the output from the first two above should be as
from the last two above, respectively.  (and likewise in the %S case,
discussed and bug-reported earlier.)


 2) Did you have a true use case where  the  8192  limit was an
undesirable limit?
   

how does it matter?  if you set a limit, be sure to consistently enforce
it and warn the user on attempts to exceed it.  or write clearly in the
docs that such attempts will cause the output to be silently truncated. 
examples such as

sprintf('%9000d', 1)

do not contribute to the reliability of r, and neither to the user's
confidence in it.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sprintf limits output string length with no warning/error message

2009-04-21 Thread Wacek Kusnierczyk
sprintf has a limit on the length of a string produced with a '%s'
specification:

   nchar(sprintf('%1s', ''))
   # 8191

   nchar(sprintf('%*s', 1, ''))
   # 8191

This is sort of documented in ?sprintf:

 There is a limit of 8192 bytes on elements of 'fmt' and also on
 strings included by a '%s' conversion specification.

but it should be a good idea for sprintf to at least warn when the
output is shorter than specified.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Definition of = vs. -

2009-04-02 Thread Wacek Kusnierczyk
Peter Dalgaard wrote:
 Wacek Kusnierczyk wrote:
 Stavros Macrakis wrote:
 `-`
 
 Error: object - not found
   

 that's weird!

 Why???


partly because it was april fools. 

but more seriously, it's because one could assume that in any syntactic
expression with an operator involved, the operator maps to a semantic
object.  it has been claimed on this list (as far as i recall;  don't
ask me for reference, but if pressed, i'll find it) that any expression
of the form

lhs op rhs

is a syntactic variant for

`op`(lhs, rhs)

(which would, following that argumentation, make r a lisp-like language)
but this apparently does not apply to '-'.  i would (naively, perhaps)
expect that `-` is a function, which, internally, may well just invert
the order of arguments and imemdiately call `-`.  the fact that
expressions involving '-' are converted, at the parse time, into ones
using '-' is far from obvious to me (it is now, but not a priori):

quote(1-a)
# a - 1
# why not: 1 - a
# why not: `-`(1, a)

and btw. the following is also weird:

quote(a=1)
# 1

not because '=' works as named argument specifier (so that the result
would be something like `=`(a, 1)), but because quote has no parameter
named 'a', and i would expect an error to be raised:

# hypothetical
quote(a=1)
# error: unused argument(s): (a = 1)

as in, say

vector(mode='list', i=1)
# error: unused argument(s): (i = 1)

it appears that, in fact, quite many r functions will gladly match a
*named* argument with a *differently named* parameter.  it is weird to
the degree that it is *wrong* wrt. the 'r language definition', sec.
4.3.2 'argument matching', which says:

The first thing that occurs in a function evaluation is the matching of
formal to the actual or
supplied arguments. This is done by a three-pass process:
 1. Exact matching on tags. For each named supplied argument the list of
formal arguments is
 searched for an item whose name matches exactly. It is an error to
have the same formal
 argument match several actuals or vice versa.
 2. Partial matching on tags. Each remaining named supplied argument is
compared to the
 remaining formal arguments using partial matching. If the name of
the supplied argument
 matches exactly with the first part of a formal argument then the
two arguments are con-
 sidered to be matched. It is an error to have multiple partial
matches. Notice that if f
 - function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal,
even though the 2nd
 actual argument only matches fooey. f(f = 1, fooey = 2) is legal
though since the second
 argument matches exactly and is removed from consideration for
partial matching. If the
 formal arguments contain ‘...’ then partial matching is only
applied to arguments that
 precede it.
 3. Positional matching. Any unmatched formal arguments are bound to
unnamed supplied
 arguments, in order. If there is a ‘...’ argument, it will take up
the remaining arguments,
 tagged or not.
   If any arguments remain unmatched an error is declared.


if you now consider the example of quote(a=1), with quote having *one*
formal argument (parameter) named 'expr' (see ?quote), we see that:

1. there is no exact match between the formal 'expr' and the actual 'a'

2. there is no partial match between the formal 'expr' and the actual 'a'

3a. there is an unmatched formal argument ('expr'), but no unnamed
actual argument.  hence, 'expr' remains unmatched. 
3b. there is no argument '...' (i think the r language definition is
lousy and should say 'formal argument' here, as you can have it as an
actual, too, as in quote('...'=1)).  hence, the actual argument named
'a' will not be 'taken up'.

there remain unmatched arguments (i guess the r language definition is
lousy and should say 'unmatched actual arguments', as you can obviously
have unmatched formals, as in eval(1)), hence an error should be
'declared' (i guess 'raised' is more appropriate). 

this does not happen in quote(a=1) (and many, many other cases), and
this makes me infer that there is a *bug* in the implementation of
argument matching, since it clearly does not conform to the definiton. 
hence, i cc: to r-devel, and will also report a bug in the usual way.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Definition of = vs. -

2009-04-02 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 and btw. the following is also weird:

 quote(a=1)
 # 1

 not because '=' works as named argument specifier (so that the result
 would be something like `=`(a, 1)), 

i meant to write: not because '=' does not work as an assignment
operator (or otherwise the result would be ...)

 but because quote has no parameter
 named 'a', and i would expect an error to be raised:

 # hypothetical
 quote(a=1)
 # error: unused argument(s): (a = 1)

 as in, say

 vector(mode='list', i=1)
 # error: unused argument(s): (i = 1)

 it appears that, in fact, quite many r functions will gladly match a
 *named* argument with a *differently named* parameter.  it is weird to
 the degree that it is *wrong* wrt. the 'r language definition', sec.
 4.3.2 'argument matching', which says:

 The first thing that occurs in a function evaluation is the matching of
 formal to the actual or
 supplied arguments. This is done by a three-pass process:
  1. Exact matching on tags. For each named supplied argument the list of
 formal arguments is
  searched for an item whose name matches exactly. It is an error to
 have the same formal
  argument match several actuals or vice versa.
  2. Partial matching on tags. Each remaining named supplied argument is
 compared to the
  remaining formal arguments using partial matching. If the name of
 the supplied argument
  matches exactly with the first part of a formal argument then the
 two arguments are con-
  sidered to be matched. It is an error to have multiple partial
 matches. Notice that if f
  - function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal,
 even though the 2nd
  actual argument only matches fooey. f(f = 1, fooey = 2) is legal
 though since the second
  argument matches exactly and is removed from consideration for
 partial matching. If the
  formal arguments contain ‘...’ then partial matching is only
 applied to arguments that
  precede it.
  3. Positional matching. Any unmatched formal arguments are bound to
 unnamed supplied
  arguments, in order. If there is a ‘...’ argument, it will take up
 the remaining arguments,
  tagged or not.
If any arguments remain unmatched an error is declared.
 

 if you now consider the example of quote(a=1), with quote having *one*
 formal argument (parameter) named 'expr' (see ?quote), we see that:

 1. there is no exact match between the formal 'expr' and the actual 'a'

 2. there is no partial match between the formal 'expr' and the actual 'a'

 3a. there is an unmatched formal argument ('expr'), but no unnamed
 actual argument.  hence, 'expr' remains unmatched. 
 3b. there is no argument '...' (i think the r language definition is
 lousy and should say 'formal argument' here, as you can have it as an
 actual, too, as in quote('...'=1)).  hence, the actual argument named
 'a' will not be 'taken up'.

 there remain unmatched arguments (i guess the r language definition is
 lousy and should say 'unmatched actual arguments', as you can obviously
 have unmatched formals, as in eval(1)), hence an error should be
 'declared' (i guess 'raised' is more appropriate). 

 this does not happen in quote(a=1) (and many, many other cases), and
 this makes me infer that there is a *bug* in the implementation of
 argument matching, since it clearly does not conform to the definiton. 
 hence, i cc: to r-devel, and will also report a bug in the usual way.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Assignment to string

2009-04-02 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 On Wed, Apr 1, 2009 at 5:11 PM, Wacek Kusnierczyk 
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Stavros Macrakis wrote:
 ...
 i think this concords with the documentation in the sense that in an
 assignment a string can work as a name.  note that

`foo bar` = 1
is.name(`foo`)
# FALSE

 the issue is different here in that in is.name(foo) foo evaluates to
 a string (it works as a string literal), while in is.name(`foo`) `foo`
 evaluates to the value of the variable named 'foo' (with the quotes
 *not* belonging to the name).

 

 Wacek, surely you are joking here.  The object written `foo` (a name)
 *evaluates to* its value.  

yes, which is the value of a variable named 'foo' (quotes not included
in the name), or with other words, the value of the variable foo.

 The object written foo (a string) evaluates to
 itself.  This has nothing to do with the case at hand, since the left-hand
 side of an assignment statement is not evaluated in the normal way.
   

yes.  i did support your point that the documentation is confusing wrt.

foo = 1

because foo is not a name (and in particular, not a quoted name).



   
 ...with only a quick look at the sources (src/main/envir.c:1511), i guess
 the first element to an assignment operator (i mean the left-assignment
 operators) is converted to a name
 


 Yes, clearly when the LHS of an assignment is a string it is being coerced
 to a name.  I was simply pointing out that that is not consistent with the
 documentation, which requires a name on the LHS.
   

... but there is probably something going on in do_set (in
src/main/eval.c) before do_assign is called.

 - maclisp was designed by computer scientists in a research project,
   
 - r is being implemented by statisticians for practical purposes.

 

 Well, I think it is overstating things to say that Maclisp was designed at
 all.  Maclisp grew out of PDP-6 Lisp, with new features being added
 regularly. Maclisp itself wasn't a research project -- 

didn't say that;  it was, as far as i know (and that's little) developed
as part, or in support of, the MIT research project MAC.


 there are vanishingly
 few papers about it in the academic literature, unlike contemporary research
 languages like Planner, EL/1, CLU, etc. In fact, there are many parallels
 with R -- it was in some sense a service project supporting AI and symbolic
 algebra research, with ad hoc features (a.k.a. hacks) 

that's a parallel to r, i guess?

 being added regularly
 to support some new idea in AI or algebra.  To circle back to the current
 discussion, Maclisp didn't even have strings as a data type until the
 mid-70's -- before that, atoms ('symbols' in more modern terminology) were
 the only way to represent strings. (And that lived on in Maxima for many
 decades...)  See http://www.softwarepreservation.org/projects/LISP/ for
 documentation on the history of many different Lisps.
   

interesting, thanks.

 We learned many lessons with Maclisp.  Well, actually two different sets of
 lessons were learned by two different communities.  The Scheme community
 learned the importance of minimalist, clean, principled design.  

and scheme is claimed to be the inspiration for r...

 The Common
 Lisp community learned the importance of large, well-designed libraries.
 Both learned the importance of standardization and clear specification.
 There is much to learn.
   
yes...

best,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] actual argument matching does not conform to the definition (PR#13634)

2009-04-02 Thread Wacek Kusnierczyk
Thomas Lumley wrote:

 The explanation is that quote() is a primitive function and that the
 argument matching rules do not apply to primitives.  That section of
 the R Language definition should say that primitives are excluded;  it
 is documented in ?.Primitive.

thanks.  indeed, the documentation --  the language *definition* --
should make this clear.  so this is a bug in the definition, which does
not match the implementation, which in turn is as intended (right?)

?.Primitive says:

 The advantage of '.Primitive' over '.Internal' functions is the
 potential efficiency of argument passing.  However, this is done
 by ignoring argument names and using positional matching of
 arguments (unless arranged differently for specific primitives
 such as 'rep'), so this is discouraged for functions of more than
 one argument.


what is discouraged?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated.data.frame {was [R] which rows are duplicates?}

2009-04-01 Thread Wacek Kusnierczyk
Martin Maechler wrote:

 WK i attach the patch post for reference.  note that you need to fix all 
 of
 WK the functions in duplicated.R that share the buggy code.  (yes, this 
 was
 WK another thread;  i submitted a bug report, and then sent a follow-up
 WK post with a patch).

 Thank you; yes, in the mean time I have also seen your bug
 report and patch.  
 Interestingly (or not), I have myself patched identically to
 what you propose, withOUT even having known about your bug report + patch.
   

this means, the solution has greater chances to be correct.

 

 { hmmm, it seems your thinking can be very close to mine, so why
   can't you like R properly  ;-b }
   

actually, i think i *do* like r properly.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-04-01 Thread Wacek Kusnierczyk
Martin Maechler wrote:

 Your patch is basically only affecting the default  
 method = pearson. For (most) other cases, 'y = NULL' would
 still remain  *the* way to save computations, unless we'd start
 to use an R-level equivalent [which I think does not exist] of
 your C  trick   (DATAPTR(x) == DATAPTR(y)).

   

yes, my patch was constrained to the c code, but i don't think it would
be particularly difficult to fix the relevant r-level code as well.  i
did think about it, but didn't want to invest more time in this until
(or unless) someone would respond.  (thanks for the response.)

 Also, for S- and R- backcompatibility reasons, we'd need to
 continue allowing  y = NULL (as your patch would, too), 

only in its current for -- indeed, the (unimplemented) intention was to
detach from the old misdesign, and fix everything so that y=x by default
anywhere.

 so
 currently I think this whole idea -- as slick as it is, I
 learned something!  --  
 does not make sense applying here.
   

i think it does, because the current state is somewhat funny, including
both the difference in performance between var(x) and var(x,x) (with x
being a matrix), and the respective comment in ?var.

  the attached patch suggests modifications to src/main/cov.c and
  src/library/stats/man/cor.Rd.

 BTW: since you didn't (and shouldn't , because of method != pearson !) 
  change the R code, 

i would suggest it be done, though.

 the docs  \usage{.} part should not have been
  changed either ! 
   

indeed, the change in the docs didn't match what i *have* actually fixed
in the code.

  and as I mentioned: using 'y = NULL' in the function call must
   

*MUST* ?

  continue to work, hence should also be documented as
  possibility
  ==  the docs would not really become more clear, I think 
   

no, of course, without the change in r code having the docs say y=x by
default would be a nonsense.  but again, this was a start, not a
complete modification (and i admit i failed to acknowledge this).

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Gamma funtion(s) bug

2009-04-01 Thread Wacek Kusnierczyk
Martin Maechler wrote:

 Using 'bug' (without any qualifying ? or possible ..) 
 in the subject line is still a bit unfriendly...
   


is suggesting that a poster includes 'excel bug' in the subject line [1]
friendly??

vQ



[1] https://stat.ethz.ch/pipermail/r-help/2009-March/190119.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Assignment to string

2009-04-01 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 The documentation for assignment says:

  In all the assignment operator expressions, 'x' can be a name or
  an expression defining a part of an object to be replaced (e.g.,
  'z[[1]]').  A syntactic name does not need to be quoted, though it
  can be (preferably by backticks).

 But the implementation allows assignment to a character string (i.e. not a
 name), which it coerces to a name:

  foo - 23; foo
  # returns 23
   is.name(foo)
  [1] FALSE

 Is this a documentation error or an implementation error?
   

i think this concords with the documentation in the sense that in an
assignment a string can work as a name.  note that

`foo bar` = 1
is.name(`foo`)
# FALSE

the issue is different here in that in is.name(foo) foo evaluates to
a string (it works as a string literal), while in is.name(`foo`) `foo`
evaluates to the value of the variable named 'foo' (with the quotes
*not* belonging to the name).

with only a quick look at the sources (src/main/envir.c:1511), i guess
the first element to an assignment operator (i mean the left-assignment
operators) is converted to a name, so that in

foo - 1

foo evaluates to a string and not a name (hence is.name(foo) is
false), but internally it is sort of 'coerced' to a name, as in

as.name(foo)
# `foo`
is.name(as.name(foo))
# TRUE

 The coercion is not happening at parse time:

 class(quote(foo-3)[[2]])
 [1] character
   

i think the internal assignment op really receives a string in a case
like foo - 1, it knows it has to treat it as a name without the
parser classifying the string as a name.  (pure guesswork, again.)

the documentation might avoid calling a plain string a 'quoted name',
though, it is confusing.  a quoted name is something like quote(name) or
quote(`name`):

is(quote(name))
# name language

is(quote(`name`))
# name language

but *not* something like name:
   
is(name)
# character vector data.frameRowLabels

and *not* like quote(name):
   
is(quote(name))
# character vector data.frameRowLabels


 In fact, bizarrely, not only does it coerce to a name, it actually
 *modifies* the parse tree:

  gg - quote(hij - 4)
  gg
 hij - 4
  eval(gg)
  gg
 hij - 4
   

wow!  that's called 'functional programming' ;)
you're right:

gg = quote({a = 1})
is(gg[[2]][[2]])
# character ...
eval(gg)
is(gg[[2]][[2]])
# name ...
  

 *** The cases below only come up with expression trees generated
 programmatically as far as I know, so are much more marginal cases. ***

 The - operator even allows the left-hand-side to be of length  1, though
 it just ignores the other elements, with the same side effect as before:
   

that's clear from the sources;  see src/main/envir.c:1521.  it should be
documented (maybe it is, i haven't investigated this issue).

  gg - quote(x-44)
  gg[[2]] - c(x,y)
  gg
 c(x, y) - 44
   
 eval(gg)

but also this:

rm(list=ls())
do.call('=', list(letters, 1))
# just fine
a
# 1
b
# error


weird these work.  i think it deserves a warning, at the very least, as in

c('x', 'y') = 4
# error: assignment to non-language object
c(x, y) = 4
# error: could not find function c-

(provided that x and y are already there)

btw., that's what you can do with rvalues (using the otherwise
semantically void operator `:=`).

these could seem equivalent, but they're (obviously) not:

'x' = 1
c('x') = 1

x = 1
c(x) = 1

  x
 [1] 44
  y
 Error: object y not found
  gg
 x - 44

 None of this is documented in ? -, and it is rather a surprise that
 evaluating an expression tree can modify it.  I admit we had a feature
 (performance hack) like this in MacLisp years ago, where expanded syntax
 macros replaced the source code of the macro, but it was a documented,
 general, and optional part of the macro mechanism.
   

but

- maclisp was designed by computer scientists in a research project,
- r is being implemented by statisticians for practical purposes.

almost every part differs here (and almost no pun intended).

 Another little glitch:

 gg - quote(x-44); gg[[2]] - character(0); eval(gg)
 Error in eval(expr, envir, enclos) :
   'getEncChar' must be called on a CHARSXP

 This looks like an internal error that users shouldn't see.
   

by no means the only example that the interface is no blood-brain barrier.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] incoherent conversions from/to raw

2009-03-31 Thread Wacek Kusnierczyk
Martin Maechler wrote:

(...)

 WK which shows that raw won't coerce to the four first types in the
 WK 'hierarchy' (excluding NULL), but it will to character, list, and
 WK expression.

 WK suggestion:   improve the documentation, or adapt the implementation 
 to
 WK a more coherent design.

 Thank you, Wacek.

 I've decided to adapt the implementation
 such that all the above  c(raw , type)  calls' implicit
 coercions will work.
   

great!


 WK (2)
 WK incidentally, there's a bug somewhere there related to the condition
 WK system and printing:

 WK tryCatch(stop(), error=function(e) print(e))
 WK # works just fine

 WK tryCatch(stop(), error=function(e) sprintf('%s', e))
 WK # *** caught segfault ***
 WK # address (nil), cause 'memory not mapped'

 WK # Traceback:
 WK # 1: sprintf(%s, e)
 WK # 2: value[[3]](cond)
 WK # 3: tryCatchOne(expr, names, parentenv, handlers[[1]])
 WK # 4: tryCatchList(expr, classes, parentenv, handlers)
 WK # 5: tryCatch(stop(), error = function(e) sprintf(%s, e))

 WK # Possible actions:
 WK # 1: abort (with core dump, if enabled)
 WK # 2: normal R exit
 WK # 3: exit R without saving workspace
 WK # 4: exit R saving workspace
 WK # Selection:
  
 WK interestingly, it is possible to stay in the session by typing ^C.  
 the
 WK session seems to work, but if the tryCatch above is tried once again, 
 a
 WK segfault causes r to crash immediately:

 WK # ^C
 WK tryCatch(stop(), error=function(e) sprintf('%s', e))
 WK # [whoe...@wherever] $

 WK however, this doesn't happen if some other code is evaluated first:

 WK # ^C
 WK x = 1:10^8
 WK tryCatch(stop(), error=function(e) sprintf('%s', e))
 WK # Error in sprintf(%s, e) : 'getEncChar' must be called on a CHARSXP
   
 WK this can't be a feature.  (tried in both 2.8.0 and r-devel;  version
 WK info at the bottom.)

 WK suggestion:  trace down and fix the bug.

 [not me, at least not now.]
   

sure;  i might try to find the bug in spare time, but can't promise.


 WK (3)
 WK the error argument to tryCatch is used in two examples in ?tryCatch, 
 but
 WK it is not explained anywhere in the help page.  one can guess that the
 WK argument name corresponds to the class of conditions the handler will
 WK handle, but it would be helpful to have this stated explicitly.  the
 WK help page simply says:

 WK 
 WK If a condition is signaled while evaluating 'expr' then
 WK established handlers are checked, starting with the most recently
 WK established ones, for one matching the class of the condition.
 WK When several handlers are supplied in a single 'tryCatch' then the
 WK first one is considered more recent than the second. 
 WK 

 WK which is uninformative in this respect -- what does 'one matching the
 WK class' mean?

 WK suggestion:  improve the documentation.

 Patches to  tryCatch.Rd  are gladly accepted
 and quite possibly applied to the sources without much changes.
   

ok, if you're willing to accept my suggestions i can try to suggest a
patch to the rd.


 Thanks in advance!
   

you're welcome.

best,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated.data.frame {was [R] which rows are duplicates?}

2009-03-31 Thread Wacek Kusnierczyk
Martin Maechler wrote:

 WK what the documentation *fails* to tell you is that the parameter
 WK 'incomparables' is defunct

 No, not defunct, but the contrary of it,
 not yet implemented !
   

that's my bad english, again.  sorry.

 WK # data as above, or any data frame
 WK duplicated(data, incomparables=NA)
 WK # Error in if (!is.logical(incomparables) || incomparables)
 WK .NotYetUsed(incomparables != FALSE) :
 WK #   missing value where TRUE/FALSE needed

 WK the error message here is *confusing*.  
 yes!
   

!

 WK the error is raised because the
 WK author of the code made a mistake and apparently haven't carefully
 ((plural or singular ??))
   

i guess hasn't was intended.  i'd need to ask the author.

 WK examined and tested his product;  the code goes:
 ((aah, ... singular ...))
   

my guesswork, anyway.

 WK duplicated.data.frame
 WK # function (x, incomparables = FALSE, fromLast = FALSE, ...)
 WK # {
 WK #if (!is.logical(incomparables) || incomparables)
 WK #.NotYetUsed(incomparables != FALSE)
 WK #duplicated(do.call(paste, c(x, sep = \r)), fromLast = 
 fromLast)
 WK # }
 WK # environment: namespace:base

 WK clearly, the intention here is to raise an error with a (still hardly
 WK clear) message as in:

 WK .NotYetUsed(incomparables != FALSE)
 WK # Error: argument 'incomparables != FALSE' is not used (yet)

 WK but instead, if(NA) is evaluated (because '!is.logical(NA) || NA'
 WK evaluates, *obviously*, to NA) and hence the uninformative error 
 message.

 WK take home point:  rtfm, *but* don't believe it.

 and then be helpful to the R community and send a bug report
 *with* a patch if {as in this case} you are able to...

 Well, that' no longer needed here,
 I'll fix that easily myself.
   

but i *have* sent a patch already!

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.data.frame peculiarities

2009-03-31 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 The documentation of as.data.frame is not explicit about how it generates
 column names for the simple vector case, but it seems to use the character
 form of the quoted argument, e.g.

 names(as.data.frame(1:3))
 [1] 1:3

 But there is a strange case:

 names(as.data.frame(c(a)))
 [1] if (stringsAsFactors) factor(x) else x

   

gosh!  you don't even need the c():

names(as.data.frame(''))
# same as above

i thought you don''t even need the '', but then you're served with the
following highly informative message:

names(as.data.frame())
# Error in as.data.frame() :
#   element 1 is empty;
#the part of the args list of 'is.null' being evaluated was:
#(x)
   
which actually comes from as.data.frame().


 I feel fairly comfortable calling this a bug, though there is no explicit
 specification.
   

maybe there is none so that it can always be claimed that you deal with
an intentional, but not (yet) documented feature, rather than a bug.

let's investigate this feature.  in

names(as.data.frame('a'))

as.data.frame is generic, 'a' is character, thus
as.data.frame.character(x, ...)  is called with x = 'a'.  here's  the
code for as.data.frame.character:

function (x, ..., stringsAsFactors = default.stringsAsFactors())
as.data.frame.vector(if (stringsAsFactors) factor(x) else x, ...)

and the as.data.frame.vector it calls:

function (x, row.names = NULL, optional = FALSE, ...)
{
nrows - length(x)
nm - paste(deparse(substitute(x), width.cutoff = 500L),
collapse =  )
if (is.null(row.names)) {
if (nrows == 0L)
row.names - character(0L)
else if (length(row.names - names(x)) == nrows 
!any(duplicated(row.names))) {
}
else row.names - .set_row_names(nrows)
}
names(x) - NULL
value - list(x)
if (!optional)
names(value) - nm
attr(value, row.names) - row.names
class(value) - data.frame
value
}

watch carefully:  nm = paste(deparse(substitute(x)), width.cutoff=500L),
that is:

nm = paste(if (stringsAsFactors) factor(x) else x, width.cutoff=500L)


x = factor('a'), row.names==NULL, names(x)==NULL, and nrows = 1, and
thus row.names = .set_row_names(1) = c(NA, -1)  (interesting; see
.set_row_names).

and then we have:

x = factor('a') # the input
names(x) = NULL
value = list(x) # value == list(factor('a'))
names(value) = if (stringsAsFactors) factor(x) else x # the value
of nm
attr(value, 'row.names') = c(NA, -1) # the value of row.names
class(value) = 'data.frame'
value

here you go:  as some say, the answer is always in the code.  that's how
ugly hacks with deparse/substitute lead r core developers to produce
ugly bugs.  very useful, indeed.
   

 There is another strange case which I don't understand.

 The specification of 'optional' is:

optional: logical. If 'TRUE', setting row names and converting column
   names (to syntactic names: see 'make.names') is optional.

 I am not sure what this means and why it is useful.  In practice, it seems
 to produce a structure of class data.frame which exhibits some very odd
 behavior:

   
 d - as.data.frame(c(a),optional=TRUE)
 class(d)
 
 [1] data.frame
   
 d
 
   structure(a, class = AsIs)where does this
 column name come from?
 1  a'
   

gosh...  rtfc, again; code as above, but this time optional=TRUE so
names(value) = nm does not apply:

x = factor('a') # the input
names(x) = NULL
value = list(x) # value == list(factor('a'))
attr(value, 'row.names') = c(NA, -1) # the value of row.names
class(value) = 'data.frame'
value

here you go.


 names(d)
 
 NULL not from names()
   

yes, because it was explicitly set to NULL, second line above.

 dput(d)
 
 structure(list(structure(1L, .Label = a, class = factor)), row.names =
 c(NA,
 -1L), class = data.frame) and it doesn't show up in dput
   

yes, because there are no names there!  it's format.data.frame, called
from print.data.frame, called from print(value), that makes up this
column name;  rtfc.

seems like there's a need for post-implementation design.


for the desserts, here's another curious, somewhat related example:

data = data.frame(1)
row.names(data) = TRUE
data
#  X1
# TRUE  1
   
as.data.frame(1, row.names=TRUE)
# Error in attr(value, row.names) - row.names :
#   row names must be 'character' or 'integer', not 'logical'

probably not a bug, because ?as.data.frame says:


row.names: 'NULL' or a character vector giving the row names for the
  data frame.  Missing values are not allowed.


so it's rather a design flaw.  much harder to fix in r.


best,
vQ

__
R-devel@r-project.org mailing list

Re: [Rd] duplicated.data.frame {was [R] which rows are duplicates?}

2009-03-31 Thread Wacek Kusnierczyk
Martin Maechler wrote:

  
  and then be helpful to the R community and send a bug report
  *with* a patch if {as in this case} you are able to...
  
  Well, that' no longer needed here,
  I'll fix that easily myself.
  

 WK but i *have* sent a patch already!

 Ok, I believe you.  But I think you did not mention that during
 this thread, ... and/or I must have overlooked your patch.

 In any case the problem is now solved
 [well, a better solution of course would add the not-yet
  functionality..]; 
 thank you for the contribution.
   

i attach the patch post for reference.  note that you need to fix all of
the functions in duplicated.R that share the buggy code.  (yes, this was
another thread;  i submitted a bug report, and then sent a follow-up
post with a patch).

vQ


---BeginMessage---
the bug seems to have a trivial solution;  as far as i can see, it suffices to 
replace

if (!is.logical(incomparables) || incomparables)

with

if(!identical(incomparables, FALSE))

in all its occurrences in src/library/base/R/duplicated.R

attached is a patch created, successfully tested and installed on Ubuntu 8.04 
Linux 32 bit as follows:

svn co https://svn.r-project.org/R/trunk trunk
cd trunk
# edit src/library/base/R/duplicated.R
svn diff  duplicated.R.diff

svn revert -R src
patch -p0  duplicated.R.diff
tools/rsync-recommended
./configure
make
make check

and now

duplicated(data.frame(), incomparables=NA)
# error: argument 'incomparables != FALSE' is not used (yet)

regards,
vQ



waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Full_Name: Wacek Kusnierczyk
 Version: 2.8.0 and 2.10.0 r48242
 OS: Ubuntu 8.04 Linux 32 bit
 Submission from: (NULL) (129.241.110.161)


 In the following code:

duplicated(data.frame(), incomparables=NA)
# Error in if (!is.logical(incomparables) || incomparables)
 .NotYetUsed(incomparables != FALSE) : 
# missing value where TRUE/FALSE needed

 the raised error is clearly not the one intended to be raised.

 ?duplicated says:

 
 incomparables: a vector of values that cannot be compared. 'FALSE' is a
   special value, meaning that all values can be compared, and
   may be the only value accepted for methods other than the
   default.  It will be coerced internally to the same type as
   'x'.

 (...)

  Values in 'incomparables' will never be marked as duplicated. This
  is intended to be used for a fairly small set of values and will
  not be efficient for a very large set.
 

 However, in duplicated.data.frame (which is called when duplicated is applied 
 to
 a data frame, as above) the parameter 'incomparables' is defunct.  The
 documentation fails to explain this, and it might be a good idea to improve 
 it.

 In the code for duplicated.data.frame there is an attempt to intercept any use
 of the parameter 'incomparables' with a value other than FALSE and to raise an
 appropriate error, but this attempt fails with, e.g., incomparables=NA.

 Incidentally, the attempt to intercept incomparables != FALSE fails completely
 (i.e., the call to duplicated succeeds) with certain inputs:

duplicated(data.frame(logical=c(TRUE, TRUE)), incomparables=c(FALSE, TRUE))
# [1] FALSE TRUE

 while

duplicated(c(TRUE, TRUE), incomparables=c(FALSE, TRUE))
# [1] FALSE FALSE


 Regards,
 vQ

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

---

Index: src/library/base/R/duplicated.R
===
--- src/library/base/R/duplicated.R	(revision 48242)
+++ src/library/base/R/duplicated.R	(working copy)
@@ -25,7 +25,7 @@
 
 duplicated.data.frame - function(x, incomparables = FALSE, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables) || incomparables)
+if (!identical(incomparables, FALSE))
 	.NotYetUsed(incomparables != FALSE)
 duplicated(do.call(paste, c(x, sep=\r)), fromLast = fromLast)
 }
@@ -33,7 +33,7 @@
 duplicated.matrix - duplicated.array -
 function(x, incomparables = FALSE , MARGIN = 1L, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables

Re: [Rd] duplicated fails to rise correct errors (PR#13632)

2009-03-30 Thread Wacek Kusnierczyk
the bug seems to have a trivial solution;  as far as i can see, it suffices to 
replace

if (!is.logical(incomparables) || incomparables)

with

if(!identical(incomparables, FALSE))

in all its occurrences in src/library/base/R/duplicated.R

attached is a patch created, successfully tested and installed on Ubuntu 8.04 
Linux 32 bit as follows:

svn co https://svn.r-project.org/R/trunk trunk
cd trunk
# edit src/library/base/R/duplicated.R
svn diff  duplicated.R.diff

svn revert -R src
patch -p0  duplicated.R.diff
tools/rsync-recommended
./configure
make
make check

and now

duplicated(data.frame(), incomparables=NA)
# error: argument 'incomparables != FALSE' is not used (yet)

regards,
vQ



waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Full_Name: Wacek Kusnierczyk
 Version: 2.8.0 and 2.10.0 r48242
 OS: Ubuntu 8.04 Linux 32 bit
 Submission from: (NULL) (129.241.110.161)


 In the following code:

duplicated(data.frame(), incomparables=NA)
# Error in if (!is.logical(incomparables) || incomparables)
 .NotYetUsed(incomparables != FALSE) : 
# missing value where TRUE/FALSE needed

 the raised error is clearly not the one intended to be raised.

 ?duplicated says:

 
 incomparables: a vector of values that cannot be compared. 'FALSE' is a
   special value, meaning that all values can be compared, and
   may be the only value accepted for methods other than the
   default.  It will be coerced internally to the same type as
   'x'.

 (...)

  Values in 'incomparables' will never be marked as duplicated. This
  is intended to be used for a fairly small set of values and will
  not be efficient for a very large set.
 

 However, in duplicated.data.frame (which is called when duplicated is applied 
 to
 a data frame, as above) the parameter 'incomparables' is defunct.  The
 documentation fails to explain this, and it might be a good idea to improve 
 it.

 In the code for duplicated.data.frame there is an attempt to intercept any use
 of the parameter 'incomparables' with a value other than FALSE and to raise an
 appropriate error, but this attempt fails with, e.g., incomparables=NA.

 Incidentally, the attempt to intercept incomparables != FALSE fails completely
 (i.e., the call to duplicated succeeds) with certain inputs:

duplicated(data.frame(logical=c(TRUE, TRUE)), incomparables=c(FALSE, TRUE))
# [1] FALSE TRUE

 while

duplicated(c(TRUE, TRUE), incomparables=c(FALSE, TRUE))
# [1] FALSE FALSE


 Regards,
 vQ

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

---

Index: src/library/base/R/duplicated.R
===
--- src/library/base/R/duplicated.R	(revision 48242)
+++ src/library/base/R/duplicated.R	(working copy)
@@ -25,7 +25,7 @@
 
 duplicated.data.frame - function(x, incomparables = FALSE, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables) || incomparables)
+if (!identical(incomparables, FALSE))
 	.NotYetUsed(incomparables != FALSE)
 duplicated(do.call(paste, c(x, sep=\r)), fromLast = fromLast)
 }
@@ -33,7 +33,7 @@
 duplicated.matrix - duplicated.array -
 function(x, incomparables = FALSE , MARGIN = 1L, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables) || incomparables)
+if (!identical(incomparables, FALSE))
 	.NotYetUsed(incomparables != FALSE)
 ndim - length(dim(x))
 if (length(MARGIN)  ndim || any(MARGIN  ndim))
@@ -67,7 +67,7 @@
 
 unique.data.frame - function(x, incomparables = FALSE, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables) || incomparables)
+if (!identical(incomparables, FALSE))
 	.NotYetUsed(incomparables != FALSE)
 x[!duplicated(x, fromLast = fromLast),  , drop = FALSE]
 }
@@ -75,7 +75,7 @@
 unique.matrix - unique.array -
 function(x, incomparables = FALSE , MARGIN = 1, fromLast = FALSE, ...)
 {
-if(!is.logical(incomparables) || incomparables)
+if (!identical(incomparables, FALSE))
 	.NotYetUsed(incomparables != FALSE)
 ndim - length(dim(x))
 if (length(MARGIN)  1L || any(MARGIN  ndim

Re: [Rd] [R] [.data.frame and lapply

2009-03-28 Thread Wacek Kusnierczyk
Romain Francois wrote:
 Wacek Kusnierczyk wrote:
 redirected to r-devel, because there are implementational details of
 [.data.frame discussed here.  spoiler: at the bottom there is a fairly
 interesting performance result.

 Romain Francois wrote:
  
 Hi,

 This is a bug I think. [.data.frame treats its arguments differently
 depending on the number of arguments.
 

 you might want to hesitate a bit before you say that something in r is a
 bug, if only because it drives certain people mad.  r is a carefully
 tested software, and [.data.frame is such a basic function that if what
 you talk about were a bug, it wouldn't have persisted until now.
   
 I did hesitate, and would be prepared to look the other way of someone
 shows me proper evidence that this makes sense.

  d - data.frame( x = 1:10, y = 1:10, z = 1:10 )
  d[ j=1 ]
x  y  z
 1   1  1  1
 2   2  2  2
 3   3  3  3
 4   4  4  4
 5   5  5  5
 6   6  6  6
 7   7  7  7
 8   8  8  8
 9   9  9  9
 10 10 10 10

 If a single index is supplied, it is interpreted as indexing the list
 of columns. Clearly this does not happen here, and this is because
 NextMethod gets confused.

obviously.  it seems that there is a bug here, and that it results from
the lack of clear design specification.


 I have not looked your implementation in details, but it misses array
 indexing, as in:

yes;  i didn't take it into consideration, but (still without detailed
analysis) i guess it should not be difficult to extend the code to
handle this.




  d - data.frame( x = 1:10, y = 1:10, z = 1:10 )
  m - cbind( 5:7, 1:3 )
  m
 [,1] [,2]
 [1,]51
 [2,]62
 [3,]73
  d[m]
 [1] 5 6 7
  subdf( d, m )
 Error in subdf(d, m) : undefined columns selected

this should be easy to handle by checking if i is a matrix and then
indexing by its first column as i and the second as j.


 Matrix indexing using '[' is not recommended, and barely
 supported.  For extraction, 'x' is first coerced to a matrix. For
 replacement a logical matrix (only) can be used to select the
 elements to be replaced in the same way as for a matrix.

yes, here's how it's done (original comment):

if(is.matrix(i))
return(as.matrix(x)[i])  # desperate measures

and i can easily add this to my code, at virtually no additional expense.

it's probably not a good idea to convert x to a matrix, x would often be
much more data than the index matrix m, so it's presumably much more
efficient, on average, to fiddle with i instead.

there are some potentially confusing issues here:

m = cbind(8:10, 1:3)
   
d[m]
# 3-element vector, as you could expect

d[t(m)]
# 6-element vector

t(m) has dimensionality inappropriate for matrix indexing (it has 3
columns), so it gets flattened into a vector;  however, it does not work
like in the case of a single vector index where columns would be selected:

d[as.vector(t(m))]
# error: undefined columns selected

i think it would be more appropriate to raise an error in a case like
d[t(m)].

furthermore, if a matrix is used in a two-index form, the matrix is
flattened again and is used to select rows (not elements, as in
d[t(m)]).  note also that the help page says that for extraction, 'x'
is first coerced to a matrix.  it fails to explain that if *two*
indices are used of which at least one is a matrix, no coercion is
done.  that is, the matrix is again flattened into a vector, but here
[.data.frame forgets that it was a matrix (unlike in d[t(m)]):

is(d[m])
# a character vector, matrix indexing

is(d[t(m)])
# a character vector, vector indexing of elements, not columns

is(d[m,])
# a data frame, row indexing
   
and finally, the fact that d[m] in fact converts x (i.e., d) to a matrix
before the indexing means that the types of values in a some columns in
d may get coerced to another type:

d[,2] = as.character(d[,2])
is(d[,1])
# integer vector
is(d[,2])
# character vector

is(d[1:2, 1])
# integer vector
is(d[cbind(1:2, 1)])
# character vector


for all it's worth, i think matrix indexing of data frames should be
dropped:

d[m]
# error: ...

 and if one needs it, it's as simple as

as.matrix(d)[m]

where the conversion of d to a matrix is explicit.

on the side, [.data.frame is able to index matrices:

'[.data.frame'(as.matrix(d), m)
# same as as.matrix(d)[m]

which is, so to speak, nonsense, since '[.data.frame' is designed
specifically to handle data frames;  i'd expect an error to be raised
here (or a warning, at the very least).

to summarize, the fact that subdf does not handle matrix indices is not
an issue.  anyway, thanks for the comment!

best,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] [.data.frame and lapply

2009-03-27 Thread Wacek Kusnierczyk
redirected to r-devel, because there are implementational details of
[.data.frame discussed here.  spoiler: at the bottom there is a fairly
interesting performance result.

Romain Francois wrote:

 Hi,

 This is a bug I think. [.data.frame treats its arguments differently
 depending on the number of arguments.

you might want to hesitate a bit before you say that something in r is a
bug, if only because it drives certain people mad.  r is a carefully
tested software, and [.data.frame is such a basic function that if what
you talk about were a bug, it wouldn't have persisted until now.

treating the arguments differently depending on their number is actually
(if clearly...) documented:  if there is one index (the 'i'), it selects
columns.  if there are two, 'i' selects rows.

however, not all seems fine, there might be a design flaw:

# dummy data frame
d = structure(names=paste('col', 1:3, sep='.'),
data.frame(row.names=paste('row', 1:3, sep='.'),
   matrix(1:9, 3, 3)))

d[1:2]
# correctly selects two first columns
# 1:2 passed to [.data.frame as i, no j given

d[,1:2]
# correctly selects two first columns
# 1:2 passed to [.data.frame as j, i given the missing argument
value (note the comma)

d[,i=1:2]
# correctly selects two first rows
# 1:2 passed to [.data.frame as i, j given the missing argument
value (note the comma)

d[j=1:2,]
# correctly selects two first columns
# 1:2 passed to [.data.frame as j, i given the missing argument
value (note the comma)

d[i=1:2]
# correctly (arguably) selects the first two columns
# 1:2 passed to [.data.frame as i, no j given
  
d[j=1:2]
# wrong: returns the whole data frame
# does not recognize the index as i because it is explicitly named 'j'
# does not recognize the index as j because there is only one index

i say this *might* be a design flaw because it's hard to judge what the
design really is.  the r language definition (!) [1, sec. 3.4.3 p. 18] says:

   The most important example of a class method for [ is that used for
data frames. It is not
be described in detail here (see the help page for [.data.frame, but in
broad terms, if two
indices are supplied (even if one is empty) it creates matrix-like
indexing for a structure that is
basically a list of vectors of the same length. If a single index is
supplied, it is interpreted as
indexing the list of columns—in that case the drop argument is ignored,
with a warning.

it does not say what happens when only one *named* index argument is
given.  from the above, it would indeed seem that there is a *bug*
here:  in the last example above only one index is given, and yet
columns are not selected, even though the *language definition* says
they should.  (so it's not a documented feature, it's a
contra-definitional misfeature -- a bug?)

somewhat on the side, the 'matrix-like indexing' above is fairly
misleading;  just try the same patterns of indexing -- one index, two
indices, named indices -- on a data frame and a matrix of the same shape:

m = matrix(1:9, 3, 3)
md = data.frame(m)

md[1]
# the first column
m[1]
# the first element (i.e., m[1,1])

md[,i=3]
# third row
m[,i=3]
# third column


the quote above refers to the ?'[.data.frame' for details. 
unfortunately, it the help page a lump of explanations for various
'['-like operators, and it is *not* a definition of any sort.  it does
not provide much more detail on '[.data.frame' -- it is hardly as a
design specification.  in particular, it does not explain the issue of
named arguments to '[.data.frame' at all.


`[.data.frame` only is called with two arguments in the second case,  
 so
 the following condition is true:

 if(Narg  3L) {  # list-like indexing or matrix indexing

 And then, the function assumes the argument it has been passed is i,  
 and
 eventually calls NextMethod([) which I think calls
 `[.listof`(x,i,...), since i is missing in `[.data.frame` it is not
 passed to `[.listof`, so you have something equivalent to as.list(d) 
 [].

 I think we can replace the condition with this one:

 if(Narg  3L  !has.j) {  # list-like indexing or matrix indexing

 or this:

 if(Narg  3L) {  # list-like indexing or matrix indexing
if(has.j) i - j



indeed, for a moment i thought a trivial fix somewhere there would
suffice.  unfortunately, the code for [.data.frame [2, lines 500-641] is
so clean and readable that i had to give up reading it, forget fixing. 
instead, i wrote an new version of '[.data.frame' from scratch.  it
fixes (or at least seems to fix, as far as my quick assessment goes) the
problem.  the function subdf (see the attached dataframe.r) is the new
version of '[.data.frame':

# dummy data frame
d = structure(names=paste('col', 1:3, sep='.'),
data.frame(row.names=paste('row', 1:3, sep='.'),
   matrix(1:9, 3, 3)))

d[j=1:2]
# incorrect: the whole data frame

subdf(d, 

Re: [Rd] typo in sprintf format string segfaults R

2009-03-26 Thread Wacek Kusnierczyk
Sklyar, Oleg (London) wrote:
 typo as simple as %S instead of %s segfaults R devel:
   

not exactly:

sprintf('%S', 'aa')
# error: unrecognised format at end of string

without a segfault.  but with another format specifier behind, it will
cause a segfault.

interestingly, here's again the same problem i have reported recently: 
that you are given a number of options for how to leave the session, but
you can type ^c and stay in a semi-working session.  (and the next
execution of the above will  then cause a segfault with immediate exit.)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-24 Thread Wacek Kusnierczyk
William Dunlap wrote:
 Doesn't Fortran still require that the arguments to
 a function not alias each other (in whole or in part)?
   

what do you mean?  the following works pretty fine:

echo '
program foo
implicit none

integer, target :: a = 1
integer, pointer :: p1, p2, p3
integer :: gee

p1 = a
p2 = a
p3 = a
write(*,*) p1, p2, p3
call bar (p1, p2, p3)
write(*,*) p1, p2, p3
a = gee(p1, p2, p3)
write(*,*) p1, p2, p3
  
end program foo

subroutine bar (p1, p2, p3)
integer :: p1, p2, p3
p3 = p1 + p2
end subroutine bar

function gee(p1, p2, p3)
integer :: p1, p2, p3, gee
p3 = p1 + p2
gee = p3
return
end function gee

'  foo.f95

gfortran foo.f95 -o foo
./foo
# 1 1 1
# 2 2 2
# 4 4 4

clearly, p1, p2, and p3 are aliases of each other, and there is an
assignment made in both the subroutine and the function.  have i
misunderstood what you said?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
somewhat related to a previous discussion [1] on how 'names-' would
sometimes modify its argument in place, and sometimes produce a modified
copy without changing the original, here's another example of how it
becomes visible to the user when r makes or doesn't make a copy of an
object:

x = NULL
dput(x)
# NULL
class(x) = 'integer'
# error: invalid (NULL) left side of assignment

x = c()
dput(x)
# NULL
class(x) = 'integer'
dput(x)
# integer(0)

in both cases, x ends up with the value NULL (the no-value object).  in
both cases, dput explains that x is NULL.  in both cases, an attempt is
made to make x be an empty integer vector.  the first fails, because it
tries to modify NULL itself, the latter apparently does not and succeeds.

however, the following has a different pattern:

x = NULL
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

x = c()
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

and also:

x = c()
class(x) = 'integer'
# fine
class(x) = 'foo'
# error: attempt to set an attribute on NULL

how come?  the behaviour can obviously be explained by looking at the
source code (hardly surprisingly, because it is as it is because the
source is as it is), and referring to the NAMED property (i.e., the
sxpinfo.named field of a SEXPREC struct).  but can the *design* be
justified?  can the apparent incoherences visible above the interface be
defended? 

why should the first example above be unable to produce an empty integer
vector? 

why is it possible to set a class attribute, but not a names attribute,
on c()? 

why is it possible to set the class attribute in c() to 'integer', but
not to 'foo'? 

why are there different error messages for apparently the same problem?


vQ


[1] search the rd archives for 'surprising behaviour of names-'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:
 WK == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no

 
 WK somewhat related to a previous discussion [1] on how 'names-' would
 WK sometimes modify its argument in place, and sometimes produce a 
 modified
 WK copy without changing the original, here's another example of how it
 WK becomes visible to the user when r makes or doesn't make a copy of an
 WK object:

 WK x = NULL
 WK dput(x)
 WK # NULL
 WK class(x) = 'integer'
 WK # error: invalid (NULL) left side of assignment

 does not happen for me in R-2.8.1,  R-patched or newer

 So you must be using your own patched version of  R ?
   

oops, i meant to use 2.8.1 or devel for testing.  you're right, in this
example there is no error reported in  2.8.0, but see below.


 WK x = c()
 WK dput(x)
 WK # NULL
 WK class(x) = 'integer'
 WK dput(x)
 WK # integer(0)

 WK in both cases, x ends up with the value NULL (the no-value object).  
 in
 WK both cases, dput explains that x is NULL.  in both cases, an attempt 
 is
 WK made to make x be an empty integer vector.  the first fails, because 
 it
 WK tries to modify NULL itself, the latter apparently does not and 
 succeeds.

 WK however, the following has a different pattern:

 WK x = NULL
 WK dput(x)
 WK # NULL
 WK names(x) = character(0)
 WK # error: attempt to set an attribute on NULL
   

i get the error in devel.


 WK x = c()
 WK dput(x)
 WK # NULL
 WK names(x) = character(0)
 WK # error: attempt to set an attribute on NULL
   

i get the error in devel.

 WK and also:

 WK x = c()
 WK class(x) = 'integer'
 WK # fine
 WK class(x) = 'foo'
 WK # error: attempt to set an attribute on NULL
   

i get the error in devel.

it doesn't seem coherent to me:  why can i set the class, but not names
attribute on both NULL and c()?  why can i set the class attribute to
'integer', but not to 'foo', as i could on a non-empty vector:

x = 1
class(x) = 'foo'
# just fine

i'd naively expect to be able to create an empty vector classed 'foo',
displayed perhaps as

# speculation
x = NULL
class(x) = 'foo'
x
# foo(0)

or maybe as

x
# NULL
# attr(, class)
# [1] foo

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:

  more verbously,  all NULL objects in R are identical, or as the
  help page says, there's only ``*The* NULL Object'' in R,
  i.e., NULL cannot get any attributes.
  

 WK yes, but that's not the issue.  the issue is that names(x)- seems to
 WK try to attach an attribute to NULL, while it could, in principle, do 
 the
 WK same as class(x)-, i.e., coerce x to some type (and hence attach the
 WK name attribute not to NULL, but to the coerced-to object).

 yes, it could;  but really, the  fact that  'class-' works is
 the exception.  The other variants (with the error message) are
 the rule.
   

ok.

 Also, note (here and further below),
 that Using   class(.) -  className
 is an S3 idiom   and S3 classes  ``don't really exist'', 
 the class attribute being a useful hack,
 and many of us would rather like to work and improve working
 with S4 classes ( generics  methods) than to fiddle with  'class-'.

 In S4, you'd  use  setClass(.), new(.) and  setAs(.),
 typically, for defining and changing classes of objects.

 But maybe I have now lead you into a direction I will later
 regret, 
 
 when you start telling us about the perceived inconsistencies of
 S4 classes, methods, etc.
 BTW: If you go there, please do use  R 2.9.0 (or newer)
   

using latest r-devel for the most part.

i think you will probably not regret your words;  from what i've seen
already, s4 classes are the last thing i'd ever try to learn in r.  but
yes, there would certainly be lots of issues to complain about.  i'll
rather wait for s5.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 Tested in R 2.8.1 Windows

   
 ff - formals(function(x)1)
 ff1 - as.list(function(x)1)[1]
 
 # ff1 acts the same as ff in the examples below, but is a list rather
 than a pairlist

   
 dput( ff , control=c(warnIncomplete))
 
 list(x = )

 This string is not parsable, but dput does not give a warning as specified.

   

same in 2.10.0 r48200, ubuntu 8.04 linux 32 bit


 dput( ff , control=c(all,warnIncomplete))
 
 list(x = quote())
   

likewise.

 This string is parseable, but quote() is not evaluable, and again dput
 does not give a warning as specified.

 In fact, I don't know how to write out ff$x.  It appears to be the
 zero-length name:

 is.name(ff$x) = TRUE
 as.character(ff$x) = 

 but there is no obvious way to create such an object:

 as.name() = execution error
 quote(``) = parse error

 The above examples should either produce a parseable and evaluable
 output (preferable), or give a warning.
   

interestingly,

quote(NULL)
# NULL

as.name(NULL)
# Error in as.name(NULL) :
#  invalid type/length (symbol/0) in vector allocation

æsj.

vQ

 -s

 PS As a matter of comparative linguistics, many versions of Lisp allow
 zero-length symbols/names.  But R coerces strings to symbols/names in
 a way that Lisp does not, so that might be an invitation to obscure
 bugs in R where it is rarely problematic in Lisp.

 PPS dput(pairlist(23),control=all) also gives the same output as
 dput(list(23),control=all), but as I understand it, pairlists will
 become non-user-visible at some point.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread Wacek Kusnierczyk

(this post suggests a patch to the sources, so i allow myself to divert
it to r-devel)

Bert Gunter wrote:
 x a numeric vector, matrix or data frame. 
 y NULL (default) or a vector, matrix or data frame with compatible
 dimensions to x. The default is equivalent to y = x (but more efficient). 

   
bert points to an interesting fragment of ?var:  it suggests that
computing var(x) is more efficient than computing var(x,x), for any x
valid as input to var.  indeed:

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.091
# 2 var(x, x)   2.051

that's of course, so to speak, unreasonable:  for what var(x) does is
actually computing the covariance of x and x, which should be the same
as var(x,x). 

the hack is that if y is given, there's an overhead of memory allocation
for *both* x and y when y is given, as seen in src/main/cov.c:720+.
incidentally, it seems that the problem can be solved with a trivial fix
(see the attached patch), so that

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.121
# 2 var(x, x)   1.107

with the quick checks

all.equal(var(x), var(x, x))
# TRUE
   
all(var(x) == var(x, x))
# TRUE

and for cor it seems to make cor(x,x) slightly faster than cor(x), while
originally it was twice slower:

# original
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.196
# 2 cor(x, x)   2.253
   
# patched
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.207
# 2 cor(x, x)   1.204

(there is a visible penalty due to an additional pointer test, but it's
10ms on 1000 replications with 1 data points, which i think is
negligible.)

 This is as clear as I would know how to state. 

i believe bert is right.

however, with the above fix, this can now be rewritten as:


x: a numeric vector, matrix or data frame. 
y: a vector, matrix or data frame with dimensions compatible to those of x. 
By default, y = x. 


which, to my simple mind, is even more clear than what bert would know
how to state, and less likely to cause the sort of confusion that
originated this thread.

the attached patch suggests modifications to src/main/cov.c and
src/library/stats/man/cor.Rd.
it has been prepared and checked as follows:

svn co https://svn.r-project.org/R/trunk trunk
cd trunk
# edited the sources
svn diff  cov.diff
svn revert -R src
patch -p0  cov.diff

tools/rsync-recommended
./configure
make
make check
bin/R
# subsequent testing within R

if you happen to consider this patch for a commit, please be sure to
examine and test it carefully first.

vQ
Index: src/library/stats/man/cor.Rd
===
--- src/library/stats/man/cor.Rd	(revision 48200)
+++ src/library/stats/man/cor.Rd	(working copy)
@@ -6,9 +6,9 @@
 \name{cor}
 \title{Correlation, Variance and Covariance (Matrices)}
 \usage{
-var(x, y = NULL, na.rm = FALSE, use)
+var(x, y = x, na.rm = FALSE, use)
 
-cov(x, y = NULL, use = everything,
+cov(x, y = x, use = everything,
 method = c(pearson, kendall, spearman))
 
 cor(x, y = NULL, use = everything,
@@ -32,9 +32,7 @@
 }
 \arguments{
   \item{x}{a numeric vector, matrix or data frame.}
-  \item{y}{\code{NULL} (default) or a vector, matrix or data frame with
-compatible dimensions to \code{x}.   The default is equivalent to
-\code{y = x} (but more efficient).}
+  \item{y}{a vector, matrix or data frame with dimensions compatible to those of \code{x}. By default, y = x.}
   \item{na.rm}{logical. Should missing values be removed?}
   \item{use}{an optional character string giving a
 method for computing covariances in the presence
Index: src/main/cov.c
===
--- src/main/cov.c	(revision 48200)
+++ src/main/cov.c	(working copy)
@@ -689,7 +689,7 @@
 if (ansmat) PROTECT(ans = allocMatrix(REALSXP, ncx, ncy));
 else PROTECT(ans = allocVector(REALSXP, ncx * ncy));
 sd_0 = FALSE;
-if (isNull(y)) {
+if (isNull(y) || (DATAPTR(x) == DATAPTR(y))) {
 	if (everything) { /* NA's are propagated */
 	PROTECT(xm = allocVector(REALSXP, ncx));
 	PROTECT(ind = allocVector(LGLSXP, ncx));
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub')

2009-03-21 Thread Wacek Kusnierczyk
there seems to be something wrong with r's regexing.  consider the
following example:

gregexpr('a*|b', 'ab')
# positions: 1 2
# lengths: 1 1

gsub('a*|b', '.', 'ab')
# ..

where the pattern matches any number of 'a's or one b, and replaces the
match with a dot, globally.  the answer is correct (assuming a dfa
engine).  however,

gregexpr('a*|b', 'ab', perl=TRUE)
# positions: 1 2
# lengths: 1 0

gsub('a*|b', '.', 'ab', perl=TRUE)
# .b.

where the pattern is identical, but the result is wrong.  perl uses an
nfa (if it used a dfa, the result would still be wrong), and in the
above example it should find *four* matches, collectively including
*all* letters in the input, thus producing *four* dots (and *only* dots)
in the output:

perl -le '
   $input = qq|ab|;
   print qq|match: $_| foreach $input =~ /a*|b/g;
   $input =~ s/a*|b/./g;
   print qq|output: $input|;'
# match: a
# match: 
# match: b
# match: 
# output: 

since with perl=TRUE both gregexpr and gsub seem to use pcre, i've
checked the example with pcretest, and also with a trivial c program
(available on demand) using the pcre api;  there were four matches,
exactly as in the perl bit above.

the results above are surprising, and suggest a bug in r's use of pcre
rather than in pcre itself.  possibly, the issue is that when an empty
sting is matched (with a*, for example), the next attempt is not trying
to match a non-empty string at the same position, but rather an empty
string again at the next position.  for example,

gsub('a|b|c', '.', 'abc', perl=TRUE)
# ..., correct

gsub('a*|b|c', '.', 'abc', perl=TRUE)
# .b.c., wrong

gsub('a|b*|c', '.', 'abc', perl=TRUE)
# ..c., wrong (but now only 'c' remains)

gsub('a|b*|c', '.', 'aba', perl=TRUE)
# ..., incidentally correct


without detailed analysis of the code, i guess the bug is located
somewhere in src/main/pcre.c, and is distributed among the do_p*
functions, so that multiple fixes may be needed.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sprintf causes a segfault (PR#13613)

2009-03-20 Thread Wacek Kusnierczyk
strangely enough, the way r handles the same sequence of expressions on
different occasions varies:

# fresh session 1
e = simpleError('foo')
sprintf('%s', e)
# segfault: address 0x202, cause memory not mapped
# ^c
sprintf('%s', e)
# error in sprintf(%s, e) : 'getEncChar' must be called on a CHARSXP
  
# fresh session 2
e = simpleError('foo')
sprintf('%s', e)
# segfault: address (nil), cause memory not mapped
# ^c
sprintf('%s', e)
# segfault, exit

note the difference in the address and how this relates to the outcome
of the second execution of sprintf('%s', e)

vQ


waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 the following code illustrates a problem with sprintf which consistently 
 causes
 a segfault when applied to certain type of arguments.  it also shows
 inconsistent consequences of the segfault:

(e = tryCatch(stop(), error=identity))
# e is an error object

sprintf('%d', e)
# error in sprintf(%d, e) : unsupported type

sprintf('%f', e)
# error in sprintf(%f, e) : (list) object cannot be coerced to type
 'double'

sprintf('%s', e)
# segfault reported, with a choice of options for how to exit the session

 it is possible not to leave the session, by simply typing ^c (ctrl-c).  (which
 should probably be prohibited.)  if one stays in the session, then trying to
 evaluate sprintf('%s', e) will cause a segfault with immediate crash (r is
 silently closed), but not necessarily if some other code is executed first.  
 in
 the latter case, there may be no segfault, but an error message might be 
 printed
 instead:

e = tryCatch(stop(), error=identity)
sprintf('%s', e)
# segfault, choice of options
# ^c, stay in the session
e = tryCatch(stop(), error=identity)
sprintf('%s', e)
# segfault, immediate exit
  
e = tryCatch(stop(), error=identity)
sprintf('%s', e)
# segfault, choice of options
# ^c, stay in the session
e = tryCatch(stop(), error=identity)
x = 1 # possibly, whatever code would do
sprintf('%s', e)
# [1] Error in doTryCatch(return(expr), name, parentenv, handler): \n
# [2] Error in doTryCatch(return(expr), name, parentenv, handler): \n
sprintf('%s', e)
# segfault, immediate exit

 in the second code snippet above, on some executions the error message was
 printed. on others a segfault caused immediate exit.  (the pattern seems to
 differ between 2.8.0 and 2.10.0-devel.)

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] incoherent conversions from/to raw

2009-03-19 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
 interestingly,

 c(1, as.raw(1))
 # error: type 'raw' is unimplemented in 'RealAnswer'

   

three more comments.


(1)
the above is interesting in the light of what ?c says:


The output type is determined from the highest type of the
 components in the hierarchy NULL  raw  logical  integer  real
  complex  character  list  expression.


which seems to suggest that raw components should be coerced to whatever
the highest type among all arguments to c, which clearly doesn't happen:

test = function(type)
c(as.raw(1), get(sprintf('as.%s',type))(1))

for (type in c('null', 'logical', 'integer', 'real', 'complex',
'character', 'list', 'expression'))
   tryCatch(test(type), error = function(e) cat(sprintf(raw won't
coerce to %s type\n, type)))

which shows that raw won't coerce to the four first types in the
'hierarchy' (excluding NULL), but it will to character, list, and
expression.

suggestion:   improve the documentation, or adapt the implementation to
a more coherent design.



(2)
incidentally, there's a bug somewhere there related to the condition
system and printing:

tryCatch(stop(), error=function(e) print(e))
# works just fine

tryCatch(stop(), error=function(e) sprintf('%s', e))
# *** caught segfault ***
# address (nil), cause 'memory not mapped'

# Traceback:
# 1: sprintf(%s, e)
# 2: value[[3]](cond)
# 3: tryCatchOne(expr, names, parentenv, handlers[[1]])
# 4: tryCatchList(expr, classes, parentenv, handlers)
# 5: tryCatch(stop(), error = function(e) sprintf(%s, e))

# Possible actions:
# 1: abort (with core dump, if enabled)
# 2: normal R exit
# 3: exit R without saving workspace
# 4: exit R saving workspace
# Selection:
 
interestingly, it is possible to stay in the session by typing ^C.  the
session seems to work, but if the tryCatch above is tried once again, a
segfault causes r to crash immediately:

# ^C
tryCatch(stop(), error=function(e) sprintf('%s', e))
# [whoe...@wherever] $

however, this doesn't happen if some other code is evaluated first:

# ^C
x = 1:10^8
tryCatch(stop(), error=function(e) sprintf('%s', e))
# Error in sprintf(%s, e) : 'getEncChar' must be called on a CHARSXP
  
this can't be a feature.  (tried in both 2.8.0 and r-devel;  version
info at the bottom.)

suggestion:  trace down and fix the bug.



(3)
the error argument to tryCatch is used in two examples in ?tryCatch, but
it is not explained anywhere in the help page.  one can guess that the
argument name corresponds to the class of conditions the handler will
handle, but it would be helpful to have this stated explicitly.  the
help page simply says:


   If a condition is signaled while evaluating 'expr' then
 established handlers are checked, starting with the most recently
 established ones, for one matching the class of the condition.
 When several handlers are supplied in a single 'tryCatch' then the
 first one is considered more recent than the second. 


which is uninformative in this respect -- what does 'one matching the
class' mean?

suggestion:  improve the documentation.

vQ


 version
   _  
platform   i686-pc-linux-gnu  
arch   i686   
os linux-gnu  
system i686, linux-gnu
status
major  2  
minor  8.0
year   2008   
month  10 
day20 
svn rev46754  
language   R  
version.string R version 2.8.0 (2008-10-20)



 version
  
_  
platform  
i686-pc-linux-gnu  
arch  
i686   
os
linux-gnu  
system i686,
linux-gnu
status Under development
(unstable)   
major 
2  
minor 
9.0
year  
2009   
month 
03 
day   
19 
svn rev   
48152  
language  
R  
version.string R version 2.9.0 Under development (unstable) (2009-03-19
r48152)

__
R-devel@r

Re: [Rd] Match .3 in a sequence

2009-03-17 Thread Wacek Kusnierczyk
Petr Savicky wrote:
 On Mon, Mar 16, 2009 at 07:39:23PM -0400, Stavros Macrakis wrote:
 ...
   
 Let's look at the extraordinarily poor behavior I was mentioning. Consider:

 nums - (.3 + 2e-16 * c(-2,-1,1,2)); nums
 [1] 0.3 0.3 0.3 0.3

 Though they all print as .3 with the default precision (which is
 normal and expected), they are all different from .3:

 nums - .3 =  -3.885781e-16 -2.220446e-16  2.220446e-16  3.885781e-16

 When we convert nums to a factor, we get:

 fact - as.factor(nums); fact
 [1] 0.300 0.3   0.3   0.300
 Levels: 0.300 0.3 0.3 0.300

 Not clear what the difference between 0.300 and 0.3 is
 supposed to be, nor why some 0.300 are  .3 and others are
 
 ...

 When creating a factor from numeric vector, the list of levels and the
 assignment of original elements to the levels is done using
 double precision. Since the four elements in the vector are distinct,
 we get four distinct levels. After this is done, the levels attribute is
 formed using as.character(). This can map different numbers to the same
 string, so in the example above, this leads to a factor, which contains
 repeated levels.

 This part of the problem may be avoided using

   fact - as.factor(as.character(nums)); fact
   [1] 0.300 0.3   0.3   0.300
   Levels: 0.3 0.300

 The reason for having 0.300 and 0.3 is that as.character()
 works the same as printing with digits=15. The R printing mechanism
 works in two steps. In the first step it tries to determine the shortest 
 format needed to achieve the required relative precision of the output.
 This step uses an algorithm, which need not provide an accurate result.
 The next step is that the number is printed using C function sprintf
 with the chosen format. This step is accurate, so we cannot get wrong
 digits. We only can get wrong number of digits.

 In order to avoid using 15 digits in as.character(), we can use 
 round(,digits),
 with digits argument appropriate for the current situation.

fact - as.factor(round(nums,digits=1)); fact
   [1] 0.3 0.3 0.3 0.3
   Levels: 0.3

   

with the examples above, it looks like a design flaw that factor levels
and their *labels* are messed up into one clump.  if, in the above,
levels were the numbers, and their labels were produced with
as.character, as you show, but kept separately (or generated on the fly,
when displaying the factor), the problem would have been solved.  you
would then have something like:
  
nums - (.3 + 2e-16 * c(-2,-1,1,2)); nums   
# [1] 0.3 0.3 0.3 0.3
   
sum(nums[rep(1:4, each=4)] == nums[rep(1:4, 4)])
# 4

fact - as.factor(nums); fact
# [1] 0.300 0.3 0.3 0.300
# Levels: 0.300 0.3 0.3 0.300
  
sum(fact[rep(1:4, each=4)] == fact[rep(1:4, 4)])
# 4 (currently, it's 8)
   
there's one more curiosity about factors, in particular, ordered factors:

ord - as.ordered(nums); ord
# [1] 0.300 0.3   0.3  
0.300
# Levels: 0.300  0.3  0.3  0.300

ord[1]  ord[4]
# TRUE
ord[1] == ord[4]
# TRUE

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Match .3 in a sequence

2009-03-17 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

  
 there's one more curiosity about factors, in particular, ordered factors:

 ord - as.ordered(nums); ord
 # [1] 0.300 0.3   0.3  
 0.300
 # Levels: 0.300  0.3  0.3  0.300

 ord[1]  ord[4]
 # TRUE
 ord[1] == ord[4]
 # TRUE
   

as a corollary, the warning printed when comparing elements of a factor
is misleading:

f = factor(1:2)
f[1]  f[2]
# [1] NA
# Warning message:
# In Ops.factor(f[1], f[2]) :  not meaningful for factors

g = as.ordered(f)
is.factor(g)
# TRUE
g[1]  g[2]
# TRUE


 *is* meaningful for factors, though not for unordered ones.  the
warning is generated in Ops.factor, src/library/base/all.R:7162, and
with my limited knowledge of the r internals i can't judge how easy it
is to fix the problem.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-16 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:

 '*tmp*' = 0
 `*tmp*`
 # 0

 x = 1
 names(x) = 'foo'
 `*tmp*`
 # error: object *tmp* not found

 `*ugly*`
 

 I agree, and I am a bit flabbergasted.  I had not expected that
 something like this would happen and I am indeed not aware of anything
 in the documentation that warns about this; but others may prove me
 wrong on this.
   

hopefully.

   
 given that `*tmp*`is a perfectly legal (though some would say
 'non-standard') name, it would be good if somewhere here a warning
 were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
 not just any non-standard name, but one that is 'obviously' used
 under the hood to perform black magic.
 

 Now I wonder whether there are any other objects (with non-standard)
 names) that can be nuked by operations performed under the hood.  
   

any such risk should be clearly documented, if not with a warning issued
each time the user risks h{is,er} workspace corrupted by the under-the-hood.


 I guess the best thing is to stay away from non-standard names, if only
 to save the typing of back-ticks. :)
   

agree.  but then, there may be -- and probably are -- other such 'best
to stay away' things in r, all of which should be documented so that a
user know what may happen on the surface, *without* having to peek under
the hood.


 Thanks for letting me know, I have learned something new today.
   

wow.  most of my fiercely truculent ranting is meant to point out things
that may not be intentional, or if they are, they seem to me design
flaws rather than features -- so that either i learn that i am ignorant
or wrong, or someone else does, pro bono.  hopefully.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-16 Thread Wacek Kusnierczyk
somewhat one the side,

l = list(1)
   
l[[2]]
# error, index out of bounds

l[2][[1]]
# NULL

that is, we can't extract from l any element at an index exceeding the
list's length (if we could, it would have been NULL or some sort of
_NA_list), but we can extract a sublist at an index out of bounds, and
from that sublist extract the element (which is NULL, 'the _NA_list').

that's not necessarily wrong, but the item at index i (l[[i]]) is not
equivalent to the item in the sublist at index i.

vQ



Thomas Lumley wrote:
 On Sun, 15 Mar 2009, Stavros Macrakis wrote:

 The semantics of [ and [[ don't seem to be fully specified in the
 Reference manual.  In particular, I can't find where the following
 cases are covered:

 cc - c(1); ll - list(1)

 cc[3]
 [1] NA
 OK, RefMan says: If i is positive and exceeds length(x) then the
 corresponding selection is NA.

 dput(ll[3])
 list(NULL)
 ? i is positive and exceeds length(x); why isn't this list(NA)?

 I think some of these are because there are only NAs for character,
 logical, and the numeric types. There isn't an NA of list type.

 This one shouldn't be list(NA) - which NA would it use?  It should be
 some sort of list(_NA_list_) type, and list(NULL) is playing that role.


 ll[[3]]
 Error in list(1)[[3]] : subscript out of bounds
 ? Why does this return NA for an atomic vector, but give an error for
 a generic vector?

 Again, because there isn't an NA of generic vector type.

 cc[[3]] - 34; dput(cc)
 c(1, NA, 34)
 OK

 ll[[3]] - 34; dput(ll)
 list(1, NULL, 34)
 Why is second element NULL, not NA?
 And why is it OK to set an undefined ll[[3]], but not to get it?

 Same reason for NULL vs NA.  The fact that setting works may just be
 an inconsistency -- as you can see from previous discussions, R often
 does not effectively forbid code that shouldn't work -- or it may be
 bug-compatibility with some version of S or S-PLUS.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-16 Thread Wacek Kusnierczyk
Thomas Lumley wrote:

 Wacek,

 In this case I think the *tmp* dates from the days before backticks,
 when it was not a legal name (it still isn't) and it was much, much
 harder to use illegal names, so the collision issue really didn't exist.


thanks for the explanation.

 You're right about the documentation.



thanks for the acknowledgement.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Match .3 in a sequence

2009-03-16 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 3/16/2009 9:36 AM, Daniel Murphy wrote:
 Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3).
 I get
 0.3 %in% seq(from=.2,to=.3)
 [1] FALSE
 Yet
 0.3 %in% c(.2,.3)
 [1] TRUE
 For arbitrary sequences, this invisible .3 has been problematic.
 What is
 the best way to work around this?

 Don't assume that computations on floating point values are exact.
 Generally computations on small integers *are* exact, so you could
 change that to

 3 %in% seq(from=2, to=3)

 and get the expected result.  You can divide by 10 just before you use
 the number, or if you're starting with one decimal place, multiply by
 10 *and round to an integer* before doing the test.  Alternatively,
 use some approximate test rather than an exact one, e.g. all.equal()
 (but you'll need a bit of work to make use of all.equal() in an
 expression like 0.3 %in% c(.2,.3)).


there's also the problem that seq(from=0.2, to=0.3) does *not* include
0.3 (in whatever internal form), simply because the default step is 1. 
however,

0.3 %in% seq(from=.2,to=.3, by=0.1)
# FALSE

so it won't help anyway.  (but in general be careful about using seq and
the like.)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Match .3 in a sequence

2009-03-16 Thread Wacek Kusnierczyk
Petr Savicky wrote:
 On Mon, Mar 16, 2009 at 06:36:53AM -0700, Daniel Murphy wrote:
   
 Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get
 
 0.3 %in% seq(from=.2,to=.3)
   
 [1] FALSE
 

 As others already pointed out, you should use seq(from=0.2,to=0.3,by=0.1)
 to get 0.3 in the sequence. In order to get correct %in%, it is also
 possible to use round(), for example
0.3 %in% round(seq(from=0.2,to=0.3,by=0.1),digits=1)
   [1] TRUE

   

half-jokingly, there's another solution, which avoids rounding:

0.3 %in% (seq(0.4, 0.5, 0.1)-0.2)
# TRUE

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-15 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:

 Obviously, assuming that R really executes 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 under the hood, in the C code, then *tmp* does not end up in the symbol
 table and does not persist beyond the execution of 
   names(x) - c(a,b)

   

to prove that i take you seriously, i have peeked into the code, and
found that indeed there is a temporary binding for *tmp* made behind the
scenes -- sort of. unfortunately, it is not done carefully enough to
avoid possible interference with the user's code:

'*tmp*' = 0
`*tmp*`
# 0

x = 1
names(x) = 'foo'
`*tmp*`
# error: object *tmp* not found

`*ugly*`

given that `*tmp*`is a perfectly legal (though some would say
'non-standard') name, it would be good if somewhere here a warning were
issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just
any non-standard name, but one that is 'obviously' used under the hood
to perform black magic.

it also appears that the explanation given in, e.g., the r language
definition (draft, of course) sec. 3.4.4:


Assignment to subsets of a structure is a special case of a general
mechanism for complex
assignment:
x[3:5] - 13:15
The result of this commands is as if the following had been executed
‘*tmp*‘ - x
x - [-(‘*tmp*‘, 3:5, value=13:15)


is incomplete (because the final result is not '*tmp*' having the value
of x, as it might seem, but rather '*tmp*' having been unbound).

so the suggestion for the documenters is to add to the end of the
section (or wherever else it is appropriate) a warning to the effect
that in the end '*tmp*' will be removed, even if the user has explicitly
defined it earlier in the same scope.

or maybe have the implementation not rely on a user-forgeable name? for
example, the '.Last.value' name is automatically bound to the most
recently returned value, but it resides in package:base and does not
collide with bindings using it made by the user:

.Last.value = 0

1
.Last.value
# 0, not 1

1
base::.Last.value
# 1, not 0


why could not '*tmp*' be bound and unbound outside of the user's
namespace? (i guess it's easier to update the docs -- or just ignore the
issue.)


on the margin, traceback('-') will pick only one of the uses of '-'
suggested by the code above:

x - 1:10

trace('-')
x[3:5] - 13:15
# trace: x[3:5] - 13:15
# trace: x - `[-`(`*tmp*`, 3:5, value = 13:15)

which is somewhat confusing, because then '*tmp*' appears in the trace
somewhat ex machina. (again, the explanation is in the source code, but
the traceback could have been more informative.)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-15 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:

 Well, that's one issue.  But another is that there should be a
 specification addressed to users, who should not have to understand
 internals.
   

this should really be taken seriously.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-14 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Sat, 14 Mar 2009 07:22:34 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 [...]
   
 Well, I don't see any new object created in my workspace after
 x - 4
 names(x) - foo
 Do you?
   
   
 of course not.  that's why i'd say the two above are *not*
 equivalent. 

 i haven't noticed the 'in the c code';  do you mean the r interpreter
 actually generates, in the c code, such r expressions for itself to
 evaluate?
 

 As I said before, I have little knowledge about how the parser works and
 what goes on under the hood; and I have also little time and
 inclination to learn about it.  

 But if you are interested in these details, then by all means invest
 the time to investigate.

   

berwin, you're playing radio erewan now.  i talk about what the user
sees at the interface, and you talk about c code.  then you admit you
don't know the code, and suggest i examine it if i'm interested.  i
incidentally am, but the whole point was that the user should not be
forced to look under the hood to know the interface to a function. 
prefix 'names-' seems to have a certain behaviour that is not properly
documented.

 Alternatively, you would hope that Simon eventually finishes the book
 that he is writing on programming in R; as I understand it, that book
 would explain part of these issues in details.  Hopefully, along with
 the book he makes the tools that he has for introspection available.
   

simon:  i'd be happy to contribute in any way you might find useful.

   
 i guess you have looked under the hood;  point me to the relevant
 code. 
 
 No I did not, because I am not interested in knowing such intimate
 details of R, but it seems you were interested.
   
   
 yes, but then your claim about what happens under the hood, in the c
 code, is a pure stipulation.  
 

 I made no claim about what is going on under the hood because I have no
 knowledge about these matters.  But, yes, I was speculating of what
 might go on.
   

owe me a beer.

   
 and you got the example from the r language definition sec. 10.2,
 which says the forms are equivalent, with no 'under the hood, in the
 c code' comment.
 

 Trying to figure out what a writer/painter actually means/says beyond
 the explicitly stated/painted, something that is summed up in Australia
 (and other places) under the term critical thinking, was not high in
 the curriculum of your school, was it? :-)
   

sure, but probably not the way you seem to think about.  have you
incidentally read ferdydurke by gombrowicz? 


   
 you're just showing that your statements cannot be taken seriously.
 

 Usually, my statement can be taken seriously, unless followed by some
 indication that I said them tongue-in-cheek.  Of course, statements
 that I allegedly made but were in fact put into my mouth cannot, and
 should not, be taken seriously.
   

i'm talking about your speculations about what the parser does (wrt.
infix and prefix forms having exactly the same parse tree), rather vague
statements such as 'names-'(x,'foo') should create (more or less) a
parse tree equivalent to that expression, and other statements (surely,
qualified with 'assuming', 'strongly suggests', and the like), coupled
with your admitting that you in fact donæt know what happens there, is
not particularly reassuring.
   
 yes, *if* you are able to predict the refcount of the object
 passed to 'names-' *then* you can predict what 'names-' will do,
 [...] 
 
 I think Simon pointed already out that you seem to have a wrong
 picture of what is going on.  [...]
   
 so what you quote effectively talks about a specific refcount
 mechanism.  it's not refcount that would be used by the garbage
 collector, but it's a refcount, or maybe refflag.
 

 Fair enough, if you call this a refcount then there is no problem.
 Whenever I came across the term refcount in my readings, it was
 referring to different mechanisms, typically mechanisms that kept exact
 track on how often an object was referred too.  So I would not call the
 value of the named field a refcount.  And we can agree to call it from
 now on a refcount as long as we realise what mechanism is really used.
   

the major point of the discussion was that 'names-' will sometimes
modify and othertimes copy its argument.  you chose to justify this by
looking under the hood, and i suppose you were pretty clear what i meant
by refcount, because it should have been clear from the context.

  
   
 yes, that's my opinion:  the effects of implementation tricks should
 not be observable by the user, because they can lead to hard to
 explain and debug behaviour in the user's program.  you surely don't
 suggest that all users consult the source code before writing
 programs in r.
 

 Indeed, I am not suggesting this.  Only users who use/rely on
 features that are not sufficiently documented would have to study the
 source code to find out what the exact

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:

 foo = function(arg) arg$foo = foo

 e = new.env()
 foo(e)
 e$foo
   
 are you sure this is pass by value?
 

 But that is what environments are for, aren't they?  

might be.

 And it is
 documented behaviour.  

sure!

 Read section 2.1.10 (Environments) in the R
 Language Definition, 

haven't objected to that.  i object to your 'r uses pass by value',
which is only partially correct.

 in particular the last paragraph:

   Unlike most other R objects, environments are not copied when 
   passed to functions or used in assignments.  Thus, if you assign the
   same environment to several symbols and change one, the others will
   change too.  In particular, assigning attributes to an environment can
   lead to surprises.

 [..]
   
 and actually, in the example we discuss, 'names-' does *not* return
 an updated *tmp*, so there's even less to entertain.  
 

 How do you know?  Are you sure?  Have you by now studied what goes on
 under the hood?
   

yes, a bit.  but in this example, it's enough to look into *tmp* to see
that it hasn't got the names added, and since x does have names, names-
must have returned a copy of *tmp* rather than *tmp* changed:
   
x = 1
tmp = x
x = 'names-'(tmp, 'foo')
names(tmp)
# NULL

you suggested that One reads the manual, (...) one reflects and
investigates, ... -- had you done it, you wouldn't have asked the question.



   
 for fun and more guesswork, the example could have been:

 x = x
 x = 'names-'(x, value=c('a', 'b'))
 

 But it is manifestly not written that way in the manual; and for good
 reasons since 'names-' might have side effects which invokes in the
 last line undefined behaviour.  Just as in the equivalent C snippet
 that I mentioned.
   

i just can't get it why the manual does not manifestly explain what
'names-' does, and leaves you doing the guesswork you suggest.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:

 sure!
 

 Glad to see that we agree on this.
   

owe you a beer.

   
 Read section 2.1.10 (Environments) in the R
 Language Definition, 
   
 haven't objected to that.  i object to your 'r uses pass by value',
 which is only partially correct.
 

 Well, I used qualifiers and did not stated it categorically. 
   

indeed, you said R supposedly uses call-by-value (though we know how to
circumvent that, don't we?).

in that vain, R supposedly can be used to do valid statistical
computations (though we know how to circumvent it) ;)


  
   
 and actually, in the example we discuss, 'names-' does *not*
 return an updated *tmp*, so there's even less to entertain.  
 
 
 How do you know?  Are you sure?  Have you by now studied what goes
 on under the hood?
   
 yes, a bit.  but in this example, it's enough to look into *tmp* to
 see that it hasn't got the names added, and since x does have names,
 names- must have returned a copy of *tmp* rather than *tmp* changed:

 x = 1
 tmp = x
 x = 'names-'(tmp, 'foo')
 names(tmp)
 # NULL
 

 Indeed, if you type these two commands on the command line, then it is
 not surprising that a copy of tmp is returned since you create a
 temporary object that ends up in the symbol table and persist after the
 commands are finished.
   

what does command line have to do with it?

 Obviously, assuming that R really executes 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 under the hood, in the C code, then *tmp* does not end up in the symbol
 table 

no?

 and does not persist beyond the execution of 
   names(x) - c(a,b)
   

no?

i guess you have looked under the hood;  point me to the relevant code.

 This looks to me as one of the situations where a value of 1 is used
 for the named field of some of the objects involves so that a copy can
 be avoided.  That's why I asked whether you looked under the hood.
   

anyway, what happens under the hood is much less interesting from the
user's perspective that what can be seen over the hood.  what i can see,
is that 'names-' will incoherently perform in-place modification or
copy-on-assignment. 

yes, *if* you are able to predict the refcount of the object passed to
'names-' *then* you can predict what 'names-' will do, but in general
you may not have the chance.  and in general, this should not matter
because it should be unobservable, but it isn't.

back to your i += i++ example, the outcome may differ from a compiler to
a compiler, but, i guess, compilers will implement the order coherently,
so that whatever version they choose, the outcome will be predictable,
and not dependent on some earlier code.  (prove me wrong.  or maybe i'll
do it myself.)

   
 you suggested that One reads the manual, (...) one reflects and
 investigates, ...
 

 Indeed, and I am not giving up hope that one day you will master this
 art.
   

well, this time i meant you.


   
 -- had you done it, you wouldn't have asked the  question.
 

 Sorry, I forgot that you have a tendency to interpret statements
 extremely verbatim 

yes, i have two hooks installed:  one says \begin{verbatim}, the other
says \end{verbatim}.


 and with little reference to the context in which
 they are made.  

not that you're trying to be extremely accurate or polite here...

 I will try to be more explicit in future.
   

it will certainly do good to you.



 i just can't get it why the manual does not manifestly explain what
 'names-' does, and leaves you doing the guesswork you suggest.
 

 As I said before, patched to documentation are also welcome.
   

i'll give it a try.


 Best wishes,
   

hope you mean it.

likewise,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk
William Dunlap wrote:
 Would it make anyone any happier if the manual said
 that the replacement functions should not be called
 in the form
xNew - `func-` (xOld, value)
 and should only be used as
func(xToBeChanged) - value
   

surely better than guesswork.

 ? 

 The explanation
   names(x) - c(a,b)
   is equivalent to
   '*tmp*' - x
   x - names-('*tmp*', value=c(a,b))
 could also be extended a bit, adding a line like
   rm(`*tmp*`)
 Those 3 lines should be considered an atomic operation:
 the value that `*tmp*` or `x` may have or what is
 in the symbol table at various points in that sequence 
 is not defined.  (Letting details be explicitly undefined
 is important: it gives developers room to improve the
 efficiency of the interpreter and tells users where not to go.) 
   

there is a difference between letting things be undefined and explicitly
stating that things are unspecified.  the c99 standard [1], for example,
is explicit about the non-determinism of expressions that involve side
effects, as it is about that some expressions may actually not be
evaluated if the optimizer decides so. 

berwin has already suggested that one reads from what docs do *not*
say;  it's a very bad idea.  it's best that the documentation *does* say
that, for example, a particular function should be used only in the
infix form because the semantics of the prefix form are not guaranteed
and may change in future versions.

if the current state is that 'names-' will modify the object it is
given as an argument in some situations, but not in others, and this is
visible to the user, the best thing to do is to give an explicit warning
-- perhaps with an annotation that things may change, if they may.

best,
vQ


[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk
Tony Plate wrote:
 Wacek Kusnierczyk wrote:
 [snip]
 i just can't get it why the manual does not manifestly explain what
 'names-' does, and leaves you doing the guesswork you suggest.

   
 I'm having trouble understanding the point of this discussion. 
 Someone is calling a replacement function in a way that it's not meant
 to be used, and is them complaining about it not doing what he thinks
 it should, or about the documentation not describing what happens when
 one does that?

where is it written that the function is not meant to be used this way? 
you get an example in the man page, showing precisely how it could be
used that way.  it also explains the value of 'names-':


 For 'names-', the updated object.  (Note that the value of
 'names(x) - value' is that of the assignment, 'value', not the
 return value from the left-hand side.)


it does speak of 'names-' used in prefix form, and does not do it in
any negative (discouraging) way.


 Is there anything incorrect or missing in the help page for normal
 usage of the replacement function for 'names'? (i.e., when used in an
 expression like 'names(x) - ...')

what is missing here in the first place is a specification of what
'normal' means.  as far as i can see from the man page, 'normal' does
not exclude prefix use.  and if so, what is missing in the help page is
a clear statement what an application of 'names-' will do, in the sense
of what a user may observe.


 R does give one the ability to use its facilities in non-standard
 ways.  However, I don't see much value in the help page for 'gun'
 attempting to describe the ways in which the bones in your foot will
 be shattered should you choose to point the gun at your foot and pull
 the trigger.  Reminds me of the story of the guy in New York, who
 after injuring his back in refrigerator-carrying race, sued the
 manufacturer of the refrigerator for not having a warning label
 against that sort of use.

very funny.  little relevant.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk
Tony Plate wrote:
 Wacek Kusnierczyk wrote:
 Tony Plate wrote:

 Is there anything incorrect or missing in the help page for normal
 usage of the replacement function for 'names'? (i.e., when used in an
 expression like 'names(x) - ...')
 

 what is missing here in the first place is a specification of what
 'normal' means.  as far as i can see from the man page, 'normal' does
 not exclude prefix use.  and if so, what is missing in the help page is
 a clear statement what an application of 'names-' will do, in the sense
 of what a user may observe.
   
 Fair enough.  I looked at the help page for names after sending my
 email, and was surprised to see the following in the DETAILS section:

   It is possible to update just part of the names attribute via the
 general rules: see the examples. This works because the expression
 there is evaluated as |z - names-(z, [-(names(z), 3, c2))|. 

 To me, this paragraph is far more confusing than enlightening,
 especially as also gives the impression that it's OK to use a
 replacement function in a functional form.  In my own personal opinion
 it would be a enhancement to remove that example from the
 documentation, and just say you can do things like 'names(x)[2:3] -
 c(a,b)'.

i must say that this part of the man page does explain things to me. 
much less the code [1] berwin suggested as a piece to read and
investigate (slightly modified):

tmp = x
x = 'names-'(tmp, 'foo')

berwin's conclusion seemed to be that this code
hints/suggests/fortune-tells the user that 'names-' might be doing side
effects. 

this code illustrates what names(x) = 'foo' (the infix form) does --
that it destructively modifies x.  now, if the code were to illustrate
that the prefix form does perform side effects too, then the following
would be enough:

'names-'(x, 'foo')

if the code were to illustrate that the prefix form, unlike the infix
form, does not perform side effects, then the following would suffice
for a discussion:

x = 'names-'(x, 'foo')

if the code wee to illustrate that the prefix form may or may not do
side effects depending on the situation, then it surely fails to show
that, unless the user performs some sophisticated inference which i am
not capable of, or, more likely, unless the user already knows that this
was to be shown.

without a discussion, the example is simply an unworked rubbish.  and
it's obviously wrong; it says that (slightly and irrelevantly simplified)

names(x) = 'foo'

is equivalent to

tmp = x
x = 'names-'(tmp, 'foo')

which is nonsense, because in the latter case you either have an
additional binding that you don't have in the former case, or, worse,
you rebind, possibly with a different value, a name that has had a
binding already.  it's a gritty-nitty detail, but so is most of
statistics based on nitty-gritty details which non-statisticians are
happy to either ignore or be ignorant about.


[1] http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Comments


 I often use name replacement functions in a functional way, and
 because one can't use 'names-' etc in this way, 

note, this 'because' does not follow in any way from the man page, or
the section of 'r language definition' referred to above.


 I define my own functions like the following:

 set.names - function(n,x) {names(x) - n; x}

it appears that

set.names = function(n, x) 'names-'(x, n)

would do the job (guess why).


 (and similarly for set.rownames(), set colnames(), etc.)

 I would highly recommend you do this rather than try to use a call
 like names-(x, ...).

i'm almost tempted to extend your recommendation to 'define your own
function for about every function already in r' ;)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Wed, 11 Mar 2009 20:31:18 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Simon Urbanek wrote:
 
 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

   
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter
 hinted at that). Internally x=1 an x=c(1) are slightly different
 in that the former has NAMED(x) = 2 whereas the latter has
 NAMED(x) = 0 which is what causes the difference in behavior as
 Peter explained. The reason is that c(1) creates a copy of the 1
 (which is a constant [=unmutable] thus requiring a copy) and the
 new copy has no other references and thus can be modified and
 hence NAMED(x) = 0.

 
 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above
 -- since NAMED(c(1)) = 0 and once it's assigned to x it becomes
 NAMED(x) = 1 -- this is just a detail on how things work with
 assignment, the explanation above is still correct since
 duplication happens conditional on NAMED == 2.
   
 i guess this is what every user needs to know to understand the
 behaviour one can observe on the surface? 
 

 Nope, only users who prefer to write '+'(1,2) instead of 1+2, or
 'names-'(x, 'foo') instead of names(x)='foo'.

   

well, as far as i remember, it has been said on this list that in r the
infix syntax is equivalent to the prefix syntax, so no one wanting to
use the form above should be afraid of different semantics;  these two
forms should be perfectly equivalent.  after all,

x = 1
names(x) = 'foo'
names(x)

should return NULL, because when the second assignment is made, we need
to make a copy of the value of x, so it is the copy that should have
changed names, not the value of x (which would still be the original 1).

on the other hand, the fact that

names(x) = 'foo'

is (or so it seems) a shorthand for

x = 'names-'(x, 'foo')

is precisely why i'd think that the prefix 'names-' should never do
destructive modifications, because that's what x = 'names-'(x, 'foo'),
and thus also names(x) = 'foo', is for.

i guess the above is sort of blasphemy.

 Attempting to change the name attribute of x via 'names-'(x, 'foo')
 looks to me as if one relies on a side effect of the function
 'names-'; which, in my book would be a bad thing.  

indeed;  so, for coherence, 'names-' should always do the modification
on a copy.  it would then have semantics different from the infix form
of 'names-', but at least consistently so.



 I.e. relying on side
 effects of a function, or writing functions with side effects which are
 then called for their side-effects;  this, of course, excludes
 functions like plot() :)  I never had the need to call 'names-'()
 directly and cannot foresee circumstances in which I would do so.
   

 Plenty of users, including me, are happy using the latter forms and,
 hence, never have to bother with understanding these implementation
 details or have to bother about them.  

 Your mileage obviously varies, but that is when you have to learn about
 these internal details.  If you call functions because of their
 side-effects, you better learn what the side-effects are exactly.
   

well, i can imagine a user using the prefix 'names-' precisely under
the assumption that it will perform functionally;  i.e., 'names-'(x,
'foo') will always produce a copy of x with the new names, and never
change the x.  that there will be a destructive modification made to x
on some, but not all, occasions, is hardly a good thing in this context
-- and it's not a situation where a user wants to use the function
because of its side effects, quite to the contrary.  this was actually
the situation i had when i first discovered the surprizing behaviour of
'names-';  i thought 'names-' did *not* have side effects.

cheers, and thanks for the discussion.
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Wed, 11 Mar 2009 20:29:14 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Simon Urbanek wrote:
 
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter
 hinted at that). Internally x=1 an x=c(1) are slightly different in
 that the former has NAMED(x) = 2 whereas the latter has NAMED(x) =
 0 which is what causes the difference in behavior as Peter
 explained. The reason is that c(1) creates a copy of the 1 (which
 is a constant [=unmutable] thus requiring a copy) and the new copy
 has no other references and thus can be modified and hence NAMED(x)
 = 0.
   
 simon, thanks for the explanation, it's now as clear as i might
 expect.

 now i'm concerned with what you say:  that to understand something
 visible to the user one needs to learn far more about R internals
 than one apparently knows.  your response suggests that to use r
 without confusion one needs to know the internals, 
 

 Simon can probably speak for himself, but according to my reading he
 has not suggested anything similar to what you suggest he suggested. :)
   

so i did not say *he* suggested this.  'your response suggests' does
not, on my reading, imply any intention from simon's side.  but it's you
who is an expert in (a dialect of) english, so i won't argue.


   
 and this would be a really bad thing to say.. 
 

 No problems, since he did not say anything vaguely similar to what you
 suggest he said.
   

let's not depart from the point.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:

 Whoever said that must have been at that moment not as precise as he or
 she could have been.  Also, R does not behave according to what people
 say on this list (which is good, because some times people they wrong
 things on this list) but according to how it is documented to do; at
 least that is what people on this list (and others) say. :)
   

well, ?'names-' says:


Value:
 For 'names-', the updated object. 


which is only partially correct, in that the value will sometimes be an
updated *copy* of the object.

 And the R Language manual (ignoring for the moment that it is a draft
 and all that), 

since we must...

 clearly states that 

   names(x) - c(a,b)

 is equivalent to
   
   '*tmp*' - x
  x - names-('*tmp*', value=c(a,b))
   

... and?  does this say anything about what 'names-'(...) actually
returns?  updated *tmp*, or a copy of it?


 [...]
   
 well, i can imagine a user using the prefix 'names-' precisely under
 the assumption that it will perform functionally;  
 

 You mean
   y - 'names-'(x, foo)
 instead of
   y - x
   names(y) - foo
 ?
   

what i mean is, rather precisely, that 'names-'(x, 'foo') will produce
a *new* object with a copy of the value of x and names as specified, and
will *not*, under any circumstances, modify x.

the first line above does not quite address this, e.g.:

x = c(1)
y = 'names-'(x, 'foo')
names(x)
# foo, 'should' be NULL


 Fair enough.  But I would still prefer the latter version this it is
 (for me) easier to read and to decipher the intention of the code.
   

you're welcome to use it.  but this is personal preference, and i'm
trying to discuss the semantics of r here.  what you show is a way to
clutter the code, and you need to explicitly name the new object, while,
in functional programming, it is typical to operate on anonymous objects
passed from one function to another, e.g.

f('names-'(x, 'foo'))

which would have to become

y = x
names(y) = 'foo'
f(y)

or

f({y = x; names(y) = 'foo'; y})

with 'y' being a nuissance name.


 i.e., 'names-'(x, 'foo') will always produce a copy of x with the
 new names, and never change the x.  
 

 I am not sure whether R ever behaved in that way, but as Peter pointed
 out, this would be quite undesirable from a memory management and
 performance point of view.  

why?  you can still use the infix names- with destructive semantics to
avoid copying. 


 Image that every time you modify a (name)
 component of a large object a new copy of that object is created.
   

see above.  besides, r has been several times claimed here (but see your
remark above) to be a functional language, and in this context it is
surprising that the smart (i mean it) copy-on-assignment mechanism,
which is an implementational optimization, not only becomes visible, but
also makes functions (hmm, procedures?) such as 'names-' non-functional
-- in some, but not all, cases.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 is precisely why i'd think that the prefix 'names-' should never do
 destructive modifications, because that's what x = 'names-'(x, 'foo'),
 and thus also names(x) = 'foo', is for.

   

to make the point differently, i'd expect the following two to be
equivalent:

x = c(1); 'names-'(x, 'foo'); names(x)
# foo

x = c(1); do.call('names-', list(x, 'foo')); names(x)
# NULL

but they're obviously not.  and of course, just that i'd expect it is
not a strong argument.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Thu, 12 Mar 2009 10:53:19 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 well, ?'names-' says:

 
 Value:
  For 'names-', the updated object. 
 

 which is only partially correct, in that the value will sometimes be
 an updated *copy* of the object.
 

 But since R supposedly 

*supposedly*

 uses call-by-value (though we know how to
 circumvent that, don't we?) 

we know how a lot of built-ins hack around this, don't we, and we also
know that call-by-value is not really the argument passing mechanism in r.

 wouldn't you always expect that a copy of
 the object is returned?
   

indeed!  that's what i have said previously, no?  there is still space
for the smart (i mean it) copy-on-assignment behaviour, but it should
not be visible to the user, in particular, not in that 'names-'
destructively modifies the object it is given when the refcount is 1. 
in my humble opinion, there is either a design flaw or a bug here.


  
   
 And the R Language manual (ignoring for the moment that it is a
 draft and all that), 
   
 since we must...

 
 clearly states that 

 names(x) - c(a,b)

 is equivalent to
 
 '*tmp*' - x
  x - names-('*tmp*', value=c(a,b))
   
   
 ... and?  
 

 This seems to suggest 

seems to suggest?  is not the purpose of documentation to clearly,
ideally beyond any doubt, specify what is to be specified?

 that in this case the infix and prefix syntax
 is not equivalent as it does not say that 
   

are you suggesting fortune telling from what the docs do *not* say?

   names(x) - c(a,b)
 is equivalent to
   x - names-(x, value=c(a,b))
 and I was commenting on the claim that the infix syntax is equivalent
 to the prefix syntax.

   
 does this say anything about what 'names-'(...) actually
 returns?  updated *tmp*, or a copy of it?
 

 Since R uses pass-by-value, 

since?  it doesn't!

 you would expect the latter, wouldn't
 you?  

yes, that's what i'd expect in a functional language.

 If you entertain the idea that 'names-' updates *tmp* and
 returns the updated *tmp*, then you believe that 'names-' behaves in a
 non-standard way and should take appropriate care.
   

i got lost in your argumentation.  i have given examples of where
'names-' destructively modifies and returns the updated object, not a
copy.  what is your point here?

 And the fact that a variable *tmp* is used hints to the fact that
 'names-' might have side-effect.  

are you suggesting fortune telling from the fact that a variable *tmp*
is used?


 If 'names-' has side effects,
 then it might not be well defined with what value x ends up with if
 one executes:
   x - 'names-'(x, value=c(a,b))  
   

not really, unless you mean the returned object in the referential sense
(memory location) versus value conceptually.  here x will obviously have
the value of the original x plus the names, *but* indeed you cannot tell
from this snippet whether after the assignment x will be the same,
though updated, object or will rather be an updated copy:

x = c(1)
x = 'names-'(x, 'foo')
# x is the same object

x = c(1)
y = x
x = 'names-'(x, 'foo')
# x is another object

so, as you say, it is not well defined with what object will x end up as
its value, though the value of the object visible to the user is well
defined.  rewrite the above and play:

x = c(1)
y = 'names-'(x, 'foo')
names(x)

what are the names of x?  is y identical (sensu refernce) with x, is y
different (sensu reference) but indiscernible (sensu value) from x, or
is y different (sensu value) from x in that y has names and x doesn't?



 This is similar to the discussion what value i should have in the
 following C snippet:
   i = 0;
   i += i++;
   

nonsense, it's a *completely* different issue.  here you touch the issue
of the order of evaluation, and not of whether an object is copied or
modified;  above, the inverse is true.

in fact, your example is useless because the result here is clearly
specified by the semantics (as far as i know -- prove me wrong).  you
lookup i (0) and i (0) (the order does not matter here), add these
values (0), assign to i (0), and increase i (1). 

i have a better example for you:

int i = 0;
i += ++i - ++i

which will give different final values for i in c (2 with gcc 4.2, 1
with gcc 3.4), c# and java (-1), perl (2) and php (1).  again, this has
nothing to do with the above.



  
 [..]
   
 I am not sure whether R ever behaved in that way, but as Peter
 pointed out, this would be quite undesirable from a memory
 management and performance point of view.  
   
 why?  you can still use the infix names- with destructive semantics
 to avoid copying. 
 

 I guess that would require a rewrite (or extension) of the parser.  To
 me, Section 10.1.2 of the Language Definition manual suggests that once
 an expression is parsed, you cannot distinguish any more whether

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
 Berwin A Turlach wrote:
   

 This is similar to the discussion what value i should have in the
 following C snippet:
  i = 0;
  i += i++;
   
 


 in fact, your example is useless because the result here is clearly
 specified by the semantics (as far as i know -- prove me wrong).  you
 lookup i (0) and i (0) (the order does not matter here), add these
 values (0), assign to i (0), and increase i (1). 
   

i'm happy to prove myself wrong.  the c programming language, 2nd ed. by
ritchie and kernigan, has the following discussion:


One unhappy situation is typified by the statement

a[i] = i++;

The question is whether the subscript is the old value of i or the new.
Compilers can interpret
this in different ways, and generate different answers depending on
their interpretation. The
standard intentionally leaves most such matters unspecified.


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] E``rrors in recursive default argument references

2009-03-12 Thread Wacek Kusnierczyk
l...@stat.uiowa.edu wrote:
 Thanks to Stavros for the report.  This should now be fixed in R-devel.

indeed, though i find some of the error messages strange:

(function(a=a) -a)()
# Error in (function(a = a) -a)() :
#  element 1 is empty;
#   the part of the args list of '-' being evaluated was:
#   (a)

(function(a=a) c(a))()
# Error in (function(a = a) c(a))() :
#   promise already under evaluation: recursive default argument
reference or earlier problems?

why are they different?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

there is an interesting corollary.  self-assignment seems to increase
the reference count:

x = 1;  'names-'(x, 'foo'); names(x)
# NULL

x = 1;  x = x;  'names-'(x, 'foo'); names(x)
# foo

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Thu, 12 Mar 2009 15:21:50 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 seems to suggest?  is not the purpose of documentation to clearly,
 ideally beyond any doubt, specify what is to be specified?
 

 The R Language Definition manual is still a draft. :)
   

this is indeed a good explanation for all sorts of nonsense.  worse if
stuff tends to persist despite critique.

   
 that in this case the infix and prefix syntax
 is not equivalent as it does not say that 
   
   
 are you suggesting fortune telling from what the docs do *not* say?
 

 My experience is that sometimes you have to realise what is not
 stated.  

in general, yes.  in r, this often ends up with 'have you seen the
documentation saying that??' in response.

 I remember a discussion with somebody who asked why he could
 not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
 him that the documentation states that you can run R CMD INSTALL on
 *.tar.gz or *.tgz files and, thus, there should be no expectation that
 it can be run on *.zip file.
   

yes, that's a good point.  this reminds me of a (possibly anectodal)
lady who sued the manufacturer of her microwave after she had dried in
it her cat after a bath.

 YMMV, but when I read a passage like this in R documentation, I start
 to wonder why it is stated that 
   names(x) - c(a,b)
 is equivalent to 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 and the simpler construct
   x - names-(x, value=c(a, b))
 is not used.  There must be a reason, 

got an explanation:  because it probably is as drafty as the
aforementioned document.

 nobody likes to type
 unnecessarily long code.  And, after thinking about this for a while,
 the penny might drop.
   

that's cool.  instead of stating what 'names-' does or does not, one
expresses it in a convoluted way an makes you guess from a *tmp*
variable. a nice exercise, i like it.

 [...] 
   
 does this say anything about what 'names-'(...) actually
 returns?  updated *tmp*, or a copy of it?
 
 
 Since R uses pass-by-value, 
   
 since?  it doesn't!
 

 For all practical purposes it is as long as standard evaluation is
 used.  One just have to be aware that some functions evaluate their
 arguments in a non-standard way.  
   

it's maybe a bit of hairsplitting, but what you have in r is not exactly
what is called 'pass by value'.  here's a relevant quote from [1], p. 309:


In the call-by-name (CBN) mechanism, a formal parameter names the
computation designated by an unevaluated argument expression.

In the call-by-value (CBV) mechanism, a formal parameter names the value
of an evaluated argument expression.

In the call-by-need or lazy evaluation (CBL), the formal parameter name
can be bound to a location that originally stores the computation of the
argument expression. The first time the parameter is referenced, the
computation is performed, but the resulting value is cached at the
location and is used on every subsequent reference. Thus, the argument
expression is evaluated at most once and is never evaluated at all if
the parameter is never referenced.


note the 'unevaluated' and 'evaluated'.  you're free to have your pick. 

but it is possible to send an argument to a function that makes an
assignment to the argument, and yet the assignment is made to the
original, not to a copy:

foo = function(arg) arg$foo = foo

e = new.env()
foo(e)
e$foo
  
are you sure this is pass by value?

it appears that r has a pass-by-need mechanism that dispatches to
pass-by-value or pass-by-reference depending on the type of the object. 
with this semantics, all sorts of mess are possible, and 'names-'
provides one example.

[1] design concepts in programming languages, turbak and gifford, mit
press 2008


 [...]
   
 If you entertain the idea that 'names-' updates *tmp* and
 returns the updated *tmp*, then you believe that 'names-' behaves
 in a non-standard way and should take appropriate care. 
   
 i got lost in your argumentation.  [..]
 

 I was commenting on does this say anything about what 'names-'(...)
 actually returns?  updated *tmp*, or a copy of it?

 As I said, if you entertain the idea that 'names-' returns an updated
 *tmp*, then you believe that 'names-' behaves in a non-standard way
 and appropriate care has to be taken.

   

i can check, by experimentation, whether 'names-' returns a copy or the
original; even if i can establish that it returns the original after
having modified it, it's not something to entertain.  maybe you
entertain the idea of your users performing the guesswork instead of
reading an unambiguous specification.  you have already said that you
don't care if your users get confused, it would fit the image.

and actually, in the example we discuss, 'names-' does *not* return an
updated *tmp*, so there's even less to entertain.  for fun and more
guesswork, the example could

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
Simon Urbanek wrote:

 On Mar 12, 2009, at 11:12 , Wacek Kusnierczyk wrote:

 Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

 there is an interesting corollary.  self-assignment seems to increase
 the reference count:

x = 1;  'names-'(x, 'foo'); names(x)
# NULL

x = 1;  x = x;  'names-'(x, 'foo'); names(x)
# foo


 Not for me, at least in current R:

not for me either.  i messed up the example, sorry.  here's the intended
version:

x = c(1);  'names-'(x, 'foo');  names(x)
# foo

x = c(1);  x = x; 'names-'(x, 'foo');  names(x)
# NULL
  


  x = 1;  'names-'(x, 'foo'); names(x)
 foo
   1
 NULL
  x = 1;  x = x;  'names-'(x, 'foo'); names(x)
 foo
   1
 NULL

 (both R 2.8.1 and R-devel 3/11/09, darwin 9.6)

 In addition, you still got it backwards - your output suggests that
 the assignment created a new, clean copy. Functional call of `names-`
 (whose side-effect on x is undefined BTW) is destructive when you get
 a clean copy (e.g. as a result of the c function) and non-destructive
 when the object was referenced. It is left as an exercise to the
 reader to reason why constants such as 1 are referenced.

all true, again because of my mistake. 

anyway, it may be suprising that with all its smartness (i mean it)
about copy-on-assingment, r does not see that it makes no sense to
increase refcount here.  of course, you can't judge from just the
syntactic form 'x=x', but still it should not be very difficult to have
the interpreter see when it finds an object named 'x' in the same
environment where it attempts the assignment.  (of course, who'd do
self-assignments in practical code?)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk
G. Jay Kerns wrote:
 Wacek Kusnierczyk wrote:


   
 I am prompted to imagine someone pointing out to the volunteers of the
 International Red Cross - on the field of a natural disaster, no less
 - that their uniforms are not an acceptably consistent shade of
 pink... or that the screws on their tourniquets do not have the
 appropriate pitch as to minimize the friction for the turner...

   

not that it is very accurate, because unintuitive and confusing
semantics may lead to hidden and dangerous errors in users' code.  wrong
shade of a uniform might lead to the person being shot, for example, but
then your point vanishes.


 As a practicing statistician I am simply thankful that the bleeding is
 stopped.   :-)
   

when it is stopped, not turned to an internal bleeding, which you simply
don't see.

 Cheers to R-Core (and the hundreds of other volunteers).

   

absolutely.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Wacek Kusnierczyk
Simon Urbanek wrote:
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn far
 more about R internals than you apparently know (and Peter hinted at
 that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant [=unmutable]
 thus requiring a copy) and the new copy has no other references and
 thus can be modified and hence NAMED(x) = 0.


simon, thanks for the explanation, it's now as clear as i might expect.

now i'm concerned with what you say:  that to understand something
visible to the user one needs to learn far more about R internals than
one apparently knows.  your response suggests that to use r without
confusion one needs to know the internals, and this would be a really
bad thing to say..  i have long been concerned with that r unnecessarily
exposes users to its internals, and here's one more example of how the
interface fails to hide the guts.  (and peter did not give me a full
answer, but a vague hint.)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Wacek Kusnierczyk
Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

i guess this is what every user needs to know to understand the
behaviour one can observe on the surface?  thanks for further
clarifications.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk
playing with 'names-', i observed the following:
  
x = 1
names(x)
# NULL
'names-'(x, 'foo')
# c(foo=1)
names(x)
# NULL

where 'names-' has a functional flavour (does not change x), but:

x = 1:2
names(x)
# NULL
'names-'(x, 'foo')
# c(foo=1, 2)
names(x)
# foo NA
  
where 'names-' seems to perform a side effect on x (destructively
modifies x).  furthermore:

x = c(foo=1)
names(x)
# foo
'names-'(x, NULL)
names(x)
# NULL
'names-'(x, 'bar')
names(x)
# bar !!!

x = c(foo=1)
names(x)
# foo
'names-'(x, 'bar')
names(x)
# bar !!!

where 'names-' is not only able to destructively remove names from x,
but also destructively add or modify them (quite unlike in the first
example above).

analogous code but using 'dimnames-' on a matrix performs a side effect
on the matrix even if it initially does not have dimnames:

x = matrix(1,1,1)
dimnames(x)
# NULL
'dimnames-'(x, list('foo', 'bar'))
dimnames(x)
# list(foo, bar)

this is incoherent with the first example above, in that in both cases
the structure initially has no names or dimnames attribute, but the end
result is different in the two examples.

is there something i misunderstand here?


there is another, minor issue with names:

'names-'(1, c('foo', 'bar'))
# error: 'names' attribute [2] must be the same length as the vector [1]

'names-'(1:2, 'foo')
# no error

since ?names says that If 'value' is shorter than 'x', it is extended
by character 'NA's to the length of 'x' (where x is the vector and
value is the names vector), the error message above should say that the
names attribute must be *at most*, not *exactly*, of the length of the
vector.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk
Peter Dalgaard wrote:
 Wacek Kusnierczyk wrote:
   
 playing with 'names-', i observed the following:
   
 x = 1
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1)
 names(x)
 # NULL

 where 'names-' has a functional flavour (does not change x), but:

 x = 1:2
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1, 2)
 names(x)
 # foo NA
   
 where 'names-' seems to perform a side effect on x (destructively
 modifies x).  furthermore:

 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, NULL)
 names(x)
 # NULL
 'names-'(x, 'bar')
 names(x)
 # bar !!!

 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, 'bar')
 names(x)
 # bar !!!

 where 'names-' is not only able to destructively remove names from x,
 but also destructively add or modify them (quite unlike in the first
 example above).

 analogous code but using 'dimnames-' on a matrix performs a side effect
 on the matrix even if it initially does not have dimnames:

 x = matrix(1,1,1)
 dimnames(x)
 # NULL
 'dimnames-'(x, list('foo', 'bar'))
 dimnames(x)
 # list(foo, bar)

 this is incoherent with the first example above, in that in both cases
 the structure initially has no names or dimnames attribute, but the end
 result is different in the two examples.

 is there something i misunderstand here?
 

 Only the ideology/pragmatism... In principle, R has call-by-value
 semantics and a function does not destructively modify its arguments(*),
 and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
 obvious performance repercussions (think x - rnorm(1e7); x[1] - 0), so
 we do allow destructive modification by replacement functions, PROVIDED
 that the x is not used by anything else. On the least suspicion that
 something else is using the object, a copy of x is made before the
 modification.

 So

 (A) you should not use code like y - foo-(x, bar)

 because

 (B) you cannot (easily) predict whether or not x will be modified
 destructively

   

that's fine, thanks, but i must be terribly stupid as i do not see how
this explains the examples above.  where is the x used by something else
in the first example, so that 'names-'(x, 'foo') does *not* modify x
destructively, while it does in the other cases?

i just can't see how your explanation fits the examples -- it probably
does, but i beg you show it explicitly.
thanks.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 (B) you cannot (easily) predict whether or not x will be modified
 destructively
   
 that's fine, thanks, but i must be terribly stupid as i do not see how
 this explains the examples above.  where is the x used by something else
 in the first example, so that 'names-'(x, 'foo') does *not* modify x
 destructively, while it does in the other cases?

 i just can't see how your explanation fits the examples -- it probably
 does, but i beg you show it explicitly.
 

 I think the following shows what Peter was referring to:

 In this case, there is only one pointer to the value of x:

 x - c(1,2)
   
 names-(x,foo)
 
  foo NA
12
   
 x
 
  foo NA
12

 In this case, there are two:

   
 x - c(1,2)
 y - x
 names-(x,foo)
 
  foo NA
12
   
 x
 
 [1] 1 2
   
 y
 
 [1] 1 2
   

that is and was clear to me, but none of my examples was of the second
form, and hence i think peter's answer did not answer my question. 
what's the difference here:

x = 1
'names-'(x, 'foo')
names(x)
# NULL

x = c(foo=1)
'names-'(x, 'foo')
names(x)
# foo

certainly not something like what you show.   what's the difference here:

x = 1
'names-'(x, 'foo')
names(x)
# NULL
  
x = 1:2
'names-'(x, c('foo', 'bar'))
names(x)
# foo bar

certainly not something like what you show.

 It seems as though `names-` and the like cannot be treated as R
 functions (which do not modify their arguments) but as special
 internal routines which do sometimes modify their arguments.
   

they seem to behave somewhat like macros:

'names-'(a, b)

with the destructive 'names-' is sort of replaced with

a = 'names-'(a, b)

with a functional 'names-'.  but this still does not explain the
incoherence above.  my problem was and is not that 'names-' is not a
pure function, but that it sometimes is, sometimes is not, without any
obvious explanation.  that is, i suspect (not claim) that the behaviour
is not a design feature, but an incident.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk
Peter Dalgaard wrote:

 (*) unless you mess with match.call() or substitute() and the like. But
 that's a different story.
   

different or not, it is a story that happens quite often -- too often,
perhaps -- to the degree that one may be tempted to say that the
semantics of argument passing in r is a mess. which of course is not
true, but since it is possible to mess with match.call  co, people
(including r core) do mess with them, and the result is obviously a
mess.  on top of the clear call-by-need semantics -- and on the surface,
you cannot tell how the arguments of a function will be taken (by
value?  by reference?  not at all?), which in effect looks like a messy
semantics.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] logical comparison of functions (PR#13588)

2009-03-10 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 10/03/2009 4:35 PM, michael_ka...@earthlink.net wrote:
 Full_Name: Michael Aaron Karsh
 Version: 2.8.0
 OS: Windows XP
 Submission from: (NULL) (164.67.71.215)


 When I try to say if (method==f), where f is a function, it says that
 the
 comparison is only possible for list and atomic types.  I tried
 saying if
 (method!=f), and it gave the same error message.  Would it be
 possible to repair
 it say that == and != comparisons would be possible for functions?

 This is not a bug.  Please don't report things as bugs when they
 aren't.  == and != are for atomic vectors, as documented.

 Use identical() for more general comparisons, as documented on the man
 page for ==.

note that in most programming languages comparing function objects is
either not supported or returns false unless you compare a function
object to itself.  r is a notable exception:

identical(function(a) a, function(a) a)
# TRUE

which would be false in all other languages i know;  however,

identical(function(a) a, function(b) b)
# FALSE

though they are surely identical functionally.

btw. it's not necessarily intuitive that == works only for atomic vectors.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk
i got an offline response saying that my original post may have not been
clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names-' is incoherent, in that in
some situations it acts in a functional manner, producing a copy of its
argument with the names changed, while in others it changes the object
in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one reference
to the object.

why are the following functional:

x = 1;  'names-'(x, 'foo'); names(x)
x = 'foo'; 'names-'(x, 'foo');  names(x)

while these are destructive:

x = c(1);  'names-'(x, 'foo'); names(x)
x = c('foo'); 'names-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

identical(1, c(1))
# TRUE
all.equal(is(1), is(c(1)))
# TRUE

i also do not understand the difference here:

x = c(1); 'names-'(x, 'foo'); names(x)
# foo
x = c(1); names(x); 'names-'(x, 'foo'); names(x)
# foo
x = c(1); print(x); 'names-'(x, 'foo'); names(x)
# NULL
x = c(1); print(c(x)); 'names-'(x, 'foo'); names(x)
# foo

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x to be
copied, could you please explain how and when this additional reference
is created?


thanks,
vQ




Peter Dalgaard wrote:

 is there something i misunderstand here?
 

 Only the ideology/pragmatism... In principle, R has call-by-value
 semantics and a function does not destructively modify its arguments(*),
 and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
 obvious performance repercussions (think x - rnorm(1e7); x[1] - 0), so
 we do allow destructive modification by replacement functions, PROVIDED
 that the x is not used by anything else. On the least suspicion that
 something else is using the object, a copy of x is made before the
 modification.

 So

 (A) you should not use code like y - foo-(x, bar)

 because

 (B) you cannot (easily) predict whether or not x will be modified
 destructively

 
 (*) unless you mess with match.call() or substitute() and the like. But
 that's a different story.


   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] E``rrors in recursive default argument references

2009-03-09 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 Tested in: R version 2.8.1 (2008-12-22) / Windows

 Recursive default argument references normally give nice clear errors.
  In the first set of examples, you get the error:

   Error in ... :
   promise already under evaluation: recursive default argument
 reference or earlier problems?

   (function(a = a) a  ) ()
   (function(a = a) c(a)   ) ()
   (function(a = a) a[1]   ) ()
   (function(a = a) a[[1]] ) ()
   (function(a = a) a$x) ()
   (function(a = a) mean(a) )   ()
   (function(a = a) sort(a) ) ()
   (function(a = a) as.list(a) ) ()

 But in the following examples, R seems not to detect the 'promise
 already under evaluation' condition and instead gets a stack overflow,
 with the error message:

   Error: C stack usage is too close to the limit
   

when i run these examples, the execution seems to get into an endless
loop with no error messages whatsoever.  how much time does it take
before you get the error?  (using r 2.8.0 and also the latest r-devel).

vQ

   (function(a = a)  (a)) ()
   (function(a = a)  -a ) ()
   

btw. ?'-' talks about '-' as a *binary* operator, but the only example
given there which uses '-' uses it as a *unary* operator.  since '-'()
complains that '-' takes 1 or 2 arguments, it might be a good idea to
acknowledge it in the man page.

   (function(a = a) var(a) ) ()
   (function(a = a) sum(a) ) ()
   (function(a = a) is.vector(a) ) ()
   (function(a = a) as.numeric(a) ) ()

 I don't understand why the two sets of examples behave differently.
   

a bug in excel?

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question

2009-03-08 Thread Wacek Kusnierczyk
ivo...@gmail.com wrote:
 Gentlemen---these are all very clever workarounds, but please forgive me  
 for voicing my own opinion: IMHO, returning multiple values in a  
 statistical language should really be part of the language itself. there  
 should be a standard syntax of some sort, whatever it may be, that everyone  
 should be able to use and which easily transfers from one local computer to  
 another. It should not rely on clever hacks in the .Rprofile that are  
 different from user to user, and which leave a reader of end user R code  
 baffled at first by all the magic that is going on. Even the R tutorials  
 for beginners should show a multiple-value return example right at the  
 point where function calls and return values are first explained.

   

hi again,

i was playing a bit with the idea of multiple assignment, and came up
with a simple codebit [1] that redefines the operator '='.  it hasn't
been extensively tested and is by no means foolproof, but allows various
sorts of tricks with multiple assignments:

source('http://miscell.googlecode.com/svn/rvalues/rvalues.r',
local=TRUE)

a = function(n) 1:n
# a is a function

b = a(3)
# b is c(1, 2, 3)

c(c, d) = a(1)
# c is 1, d is NULL

c(a, b) = list(b, a)
# swap: a is 1:3, b is a function

# these are equivalent:
c(a, b) = 1:2
{a; b} = 1:2
list(a, b) = 1:2

a = data.frame(x=1:3, y=3)
# a is a 2-column data frame

c(a, b) = data.frame(x=1:3, b=3)
# a is c(1, 2, 3), b is c(3, 3, 3)

and so on.  this is sort of pattern matching as in some functional
languages, but only sort of:  it does not do recursive matching, for
example:

c(c(a, b), c) = list(1:2, 3)
# error
# not: a = 1, b = 2, c = 3
 
anyway, it's just a toy for which there is no need.

vQ


[1] svn checkout */http/*://miscell.googlecode.com/svn/rvalues

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question

2009-03-07 Thread Wacek Kusnierczyk
mark.braving...@csiro.au wrote:
   
 The syntax for returning multiple arguments does not strike me as
 particularly appealing.  would it not possible to allow syntax like:

   f= function() { return( rnorm(10), rnorm(20) ) }
   (a,d$b) = f()

 


 FWIW, my own solution is to define a multi-assign operator:

 '%-%' - function( a, b){
   # a must be of the form '{thing1;thing2;...}'
   a - as.list( substitute( a))[-1]
   e - sys.parent()
   stopifnot( length( b) == length( a))
   for( i in seq_along( a))
 eval( call( '-', a[[ i]], b[[i]]), envir=e)
   NULL
 }
   

you might want to have the check less stringent, so that rhs may consist
of more values that the lhs has variables.  or even skip the check and
assign NULL to a[i] for i  length(b).  another idea is to allow %-% to
be used with just one variable on the lhs.

here's a modified version:

'%-%' - function(a, b){
a - as.list( substitute(a))
if (length(a)  1)
a - a[-1]
if (length(a)  length(b))
b - c(b, rep(list(NULL), length(a) - length(b)))
e - sys.parent()
for( i in seq_along( a))
eval( call( '-', a[[ i]], b[[i]]), envir=e)
NULL }

{a; b} %-% 1:2
# a = 1; b = 2
a %-% 3:4
# a = 3
{a; b} %-% 5
# a = 5; b = NULL


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   >