Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-21 Thread Hilmar Berger via R-devel

Dear Toby and Ivan,

thanks a lot for the proposed patch and this detailed analysis. The
timing analysis nicely shows what I suspected - that partial matching in
large tables (>>10^5 rows) can get prohibitively slow. For 10^6 rows
with 50% non-hits in exact matching I roughly would expect 10,000
seconds, i.e. almost 3h.

That would be quite slow even if one would want partial matching. My
suspicion, however, is that most users do not want partial matching at
all and use row name indexing using character vectors in the same way as
applied in data.table or tibble, i.e. as a unique key to table rows.

I can't remember a valid use case where I would have used partial
matching for a rowname index in the last 10 years, and I would be happy
to learn how widespread such use cases are.

Regarding the workaround, I do not fully agree that adding match() to
the call of [.data.frame() would be a preferable solution. In cases
where one cannot exclude that the data.frame will grow large and that
there might be considerable proportions of non-hits in exact matching,
the workaround would have to applied always in order to achieve a
predictable performance. Which, in my opinion, raises the question if
and when the ordinary, partial matching option would be still applicable.

I am not knowledgeable to say how much work it would be, but I believe
that we could test the impact of Ivan's proposed solution by running
CRAN/BioC package tests against a patched R compared to an unpatched
one. I can offer to have a look at failing test cases to see if those
are intentional or unintentional uses of partial matching.

Best regards

Hilmar

On 19.12.23 21:57, Toby Hocking wrote:

Hi Hilmar and Ivan,
I have used your code examples to write a blog post about this topic,
which has figures that show the asymptotic time complexity of the
various approaches,
https://tdhock.github.io/blog/2023/df-partial-match/
The asymptotic complexity of partial matching appears to be quadratic
O(N^2) whereas the other approaches are asymptotically faster: linear
O(N) or log-linear O(N log N).
I think that accepting Ivan's pmatch.rows patch would add un-necessary
complexity to base R, since base R already provides an efficient
work-around, d1[match(q1,rownames(d1)),]
I do think the CheckUserInterrupt patch is a good idea, though.
Best,
Toby

On Sat, Dec 16, 2023 at 2:49 AM Ivan Krylov  wrote:

On Wed, 13 Dec 2023 09:04:18 +0100
Hilmar Berger via R-devel  wrote:


Still, I feel that default partial matching cripples the functionality
of data.frame for larger tables.

Changing the default now would require a long deprecation cycle to give
everyone who uses `[.data.frame` and relies on partial matching
(whether they know it or not) enough time to adjust.

Still, adding an argument feels like a small change: edit
https://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R and
add a condition before calling pmatch(). Adjust the warning() for named
arguments. Don't forget to document the new argument in the man page at
https://svn.r-project.org/R/trunk/src/library/base/man/Extract.data.frame.Rd

Index: src/library/base/R/dataframe.R
===
--- src/library/base/R/dataframe.R  (revision 85664)
+++ src/library/base/R/dataframe.R  (working copy)
@@ -591,14 +591,14 @@
  ###  These are a little less general than S

  `[.data.frame` <-
-function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1)
+function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1, 
pmatch.rows = TRUE)
  {
  mdrop <- missing(drop)
  Narg <- nargs() - !mdrop  # number of arg from x,i,j that were specified
  has.j <- !missing(j)
-if(!all(names(sys.call()) %in% c("", "drop"))
+if(!all(names(sys.call()) %in% c("", "drop", "pmatch.rows"))
 && !isS4(x)) # at least don't warn for callNextMethod!
-warning("named arguments other than 'drop' are discouraged")
+warning("named arguments other than 'drop', 'pmatch.rows' are 
discouraged")

  if(Narg < 3L) {  # list-like indexing or matrix indexing
  if(!mdrop) warning("'drop' argument will be ignored")
@@ -679,7 +679,11 @@
  ## for consistency with [, ]
  if(is.character(i)) {
  rows <- attr(xx, "row.names")
-i <- pmatch(i, rows, duplicates.ok = TRUE)
+i <- if (pmatch.rows) {
+pmatch(i, rows, duplicates.ok = TRUE)
+} else {
+match(i, rows)
+}
  }
  ## need to figure which col was selected:
  ## cannot use .subset2 directly as that may
@@ -699,7 +703,11 @@
   # as this can be expensive.
  if(is.character(i)) {
  rows <- attr(xx, "row.names")
-i

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-13 Thread Hilmar Berger via R-devel

Dear Ivan,

thanks a lot, that is helpful.

Still, I feel that default partial matching cripples the functionality
of data.frame for larger tables.

Thanks again and best regards

Hilmar

On 12.12.23 13:55, Ivan Krylov wrote:

В Mon, 11 Dec 2023 21:11:48 +0100
Hilmar Berger via R-devel  пишет:


What was unexpected is that in this case was that [.data.frame was
hanging for a long time (I waited about 10 minutes and then restarted
R). Also, this cannot be interrupted in interactive mode.

That's unfortunate. If an operation takes a long time, it ought to be
interruptible. Here's a patch that passes make check-devel:

--- src/main/unique.c   (revision 85667)
+++ src/main/unique.c   (working copy)
@@ -1631,6 +1631,7 @@
}
  }

+unsigned int ic = ;
  if(nexact < n_input) {
/* Second pass, partial matching */
for (R_xlen_t i = 0; i < n_input; i++) {
@@ -1642,6 +1643,10 @@
mtch = 0;
mtch_count = 0;
for (int j = 0; j < n_target; j++) {
+   if (!--ic) {
+   R_CheckUserInterrupt();
+   ic = ;
+   }
if (no_dups && used[j]) continue;
if (strncmp(ss, tar[j], temp) == 0) {
mtch = j + 1;



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Partial matching performance in data frame rownames using [

2023-12-11 Thread Hilmar Berger via R-devel

Dear all,

I have seen that others have discussed the partial matching behaviour of
data.frame[idx,] in the past, in particular with respect to unexpected
results sets.

I am aware of the fact that one can work around this using either
match() or switching to tibble/data.table or similar altogether.

I have a different issue with the partial matching, in particular its
performance when used on large data frames or more specifically, with
large queries matched against its row names.

I came across a case where I wanted to extract data from a large table
(approx 1M rows) using an index which matched only about 50% to the row
names, i.e. about 50% row name hits and 50% misses.

What was unexpected is that in this case was that [.data.frame was
hanging for a long time (I waited about 10 minutes and then restarted
R). Also, this cannot be interrupted in interactive mode.

ids <- paste0("cg", sprintf("%06d",0:(1e6-1)))
d1 <- data.frame(row.names=ids, v=1:(1e6) )

q1 <- sample(ids, 1e6, replace=F)
system.time({r <- d1[q1,,drop=F]})
#   user  system elapsed
#  0.464   0.000   0.465

# those will hang a long time, I stopped R after 10 minutes
q2 <- c(q1[1:5e5], gsub("cg", "ct", q1[(5e5+1):1e6]) )
system.time({r <- d1[q2,,drop=F]})

# same here
q3 <- c(q1[1:5e5], rep("FOO",5e5) )
system.time({r <- d1[q3,,drop=F]})

It seems that the penalty of partial matching the non-hits across the
whole row name vector is not negligible any more with large tables and
queries, compared to small and medium tables.

I checked and pmatch(q2, rownames(d1) is equally slow.

Is there a chance to a) document this in the help page ("with large
indexes/tables use match()") or even better b) add an exact flag to
[.data.frame ?

Thanks a lot!

Best regards

Hilmar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dbinom link

2020-05-23 Thread Hilmar Berger



What about using the Wayback Machine archive ? The web archive should be 
more stable than other links which also might disappear in the future.


E.g. 
https://web.archive.org/web/20070610002602/http://www.herine.net/stat/software/dbinom.html


, which also links to an archived copy of the PDF.

Best regards,

Hilmar

On 18.05.20 10:57, peter dalgaard wrote:

In principle a good idea, but I'm not sure the whereabouts of Catherine Loader 
are known at this point. Last peeps from her on the net seem to be about a 
decade old.

.pd


On 18 May 2020, at 10:31 , Abby Spurdle  wrote:

This has come up before.

Here's the last time:
https://stat.ethz.ch/pipermail/r-devel/2019-March/077478.html

I guess my answer to the following the question...

Perhaps we should ask permission to
nail the thing down somewhere on r-project.org?

...would be, to reproduce it somewhere.
And then update the link in the binom help file.

Given that the article was previously available freely (with no
apparent restrictions on reproducing it), and that the author has
significant published works which are open access, I'd be surprised if
there's any objection to reproducing it.


On Mon, May 18, 2020 at 8:01 PM Koenker, Roger W  wrote:

FWIW the link from ?dbinom to the Loader paper on Binomials is broken but the 
paper seems to be
available here:   
https://octave.1599824.n4.nabble.com/attachment/3829107/0/loader2000Fast.pdf

Roger Koenker
r.koen...@ucl.ac.uk
Honorary Professor of Economics
Department of Economics, UCL
Emeritus Professor of Economics
and Statistics, UIUC



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is matrix product slower when matrix has very small values?

2019-11-20 Thread Hilmar Berger

Hi Florian,

just a guess, but couldn't it be that the multiplication of very small 
values leads to FP underflow exceptions which have to be handled by BLAS 
in a less efficient way than "normal" multiplications handled by SIMD 
instructions ?


Best regards,
Hilmar

On 19/11/2019 15:09, Florian Gerber wrote:

Hi,

I experience surprisingly large timing differences for the
multiplication of matrices of the same dimension. An example is given
below. How can this be explained?
I posted the question on Stackoverflow:
https://stackoverflow.com/questions/58886111/r-why-is-matrix-product-slower-when-matrix-has-very-small-values
Somebody could reproduce the behavior but I did not get any useful
explanations yet.

Many thanks for hints!
Florian

## disable openMP
library(RhpcBLASctl); blas_set_num_threads(1); omp_set_num_threads(1)

A <- exp(-as.matrix(dist(expand.grid(1:60, 1:60
summary(c(A))
# Min.  1st Qu.   Median Mean  3rd Qu. Max.
# 0.00 0.00 0.00 0.001738 0.00 1.00

B <- exp(-as.matrix(dist(expand.grid(1:60, 1:60)))*10)
summary(c(B))
#  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
# 0.000 0.000 0.000 0.0002778 0.000 1.000

identical(dim(A), dim(B))
## [1] TRUE

system.time(A %*% A)
#user  system elapsed
#   2.387   0.001   2.389
system.time(B %*% B)
#user  system elapsed
#  21.285   0.020  21.310

sessionInfo()
# R version 3.6.1 (2019-07-05)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Linux Mint 19.2

# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
# LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] '==' operator: inconsistency in data.frame(...) == NULL

2019-09-24 Thread Hilmar Berger

Dear Martin,

thanks a lot for looking into this. Of course you were right that the 
fix was not complete - I apologize for not having tested what I believed 
to be the solution.


My comments on the S4 classes seemed to stem from a misunderstanding on 
my side. I now believe to understand that S4 classes that inherit from R 
base object types might dispatch Ops for the same object types.


If the base object value of such S4 classes is unset and therefore 
empty, this empty value will be passed on to e.g. Ops.data.frame where 
it would trigger the same issue as e.g. logical(0).


setClass("MyClass", slots = list(x="numeric", label="character"), 
contains = "numeric")

a = new("MyClass", x=3, label="FOO")
a@.Data

> logical(0)

a == data.frame(a=1:3)
# error

I understand that this is all as expected and the error should most 
likely disappear with the fix you submitted for other 0-extent cases.


Thanks again and best regards,

Hilmar

Am 18/09/2019 um 11:29 schrieb Martin Maechler:

Martin Maechler
     on Wed, 18 Sep 2019 10:35:42 +0200 writes:

   >>>>> Hilmar Berger
   >>>>> on Sat, 14 Sep 2019 13:31:27 +0200 writes:

>> Dear all,
>> I did some more tests regarding the == operator in Ops.data.frame 
(see
>> below).  All tests done in R 3.6.1 (x86_64-w64-mingw32).

>> I find that errors are thrown also when comparing a zero length
>> data.frame to atomic objects with length>0 which should be a valid 
case
>> according to the documentation. This can be traced to a check in the
>> last line of Ops.data.frame which tests for the presence of an empty
>> result value (i.e. list() ) but does not handle a list of empty 
values
>> (i.e. list(logical(0))) which in fact is generated in those cases.

>> There  is a simple fix (see also below).

 > I'm pretty sure what you write above is wrong:  For some reason
 > you must have changed more in your own version of Ops.data.frame :

 > Because there's a line

 > value <- unlist(value, ...)

 > there, value is *not*  list(logical(0)) there, but rather  logical(0)
 > and then indeed, your proposed line change (at the end of Ops.data.frame)
 > has no effect for the examples you give.

On the other hand, there *is* a simple "fix" at the end of
Ops.data.frame()  which makes all your examples "work" (i.e. not
give an error), namely

--

@@ -1685,7 +1684,7 @@
  else { ## 'Logic' ("&","|")  and  'Compare' ("==",">","<","!=","<=",">=") 
:
value <- unlist(value, recursive = FALSE, use.names = FALSE)
matrix(if(is.null(value)) logical() else value,
-  nrow = nr, dimnames = list(rn,cn))
+  nrow = nr, ncol = length(cn), dimnames = list(rn,cn))
  }

--

i.e., explicitly specifying 'ncol' compatibly with the column names.
However, I guess that this change would *not* signal errors
where it *should* and so am *not* (yet?) proposing to "do" it.

Another remark, on  S4  which you've raised several times:
As you may know that the 'Matrix' package (part of every
"regular" R installation) uses S4 "everywhere" and it does
define many methods for its Matrix classes, all in source file  Matrix/R/Ops.R
the development version (in svn / subversion) being online on R-forge here:

   
https://r-forge.r-project.org/scm/viewvc.php/pkg/Matrix/R/Ops.R?view=markup=matrix

and "of course", there we define S4 group methods for Ops all
the time, and (almost) never S3 ones...
[[but I hope you don't want to start combining data frames
  with Matrix package matrices, now !]]

Martin Maechler
ETH Zurich  and  R Core Team


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] '==' operator: inconsistency in data.frame(...) == NULL

2019-09-14 Thread Hilmar Berger

Dear all,

I did some more tests regarding the == operator in Ops.data.frame (see 
below).  All tests done in R 3.6.1 (x86_64-w64-mingw32).


I find that errors are thrown also when comparing a zero length 
data.frame to atomic objects with length>0 which should be a valid case 
according to the documentation. This can be traced to a check in the 
last line of Ops.data.frame which tests for the presence of an empty 
result value (i.e. list() ) but does not handle a list of empty values 
(i.e. list(logical(0))) which in fact is generated in those cases. There 
is a simple fix (see also below).


There are other issues with the S4 class example (i.e. data.frame() == 
) which fails for different reasons.


##

d_0 = data.frame(a = numeric(0)) # zero length data.frame
d_00 = data.frame(numeric(0)) # zero length data.frame without names
names(d_00) <- NULL # remove names to obtain value being an empty list() 
at the end of Ops.data.frame

d_3 = data.frame(a=1:3) # non-empty data.frame

m_0 = matrix(logical(0)) # zero length matrix
#
# error A:
# Error in matrix(if (is.null(value)) logical() else value, nrow = nr, 
dimnames = list(rn,  :

# length of 'dimnames' [2] not equal to array extent

d_0 == 1   # error A
d_00 == 1  # <0 x 0 matrix>
d_3 == 1   # <3 x 1 matrix>

d_0 == logical(0) # error A
d_00 == logical(0) # <0 x 0 matrix>
d_3 == logical(0) # error A

d_0 == NULL # error A
d_00 == NULL # <0 x 0 matrix>
d_3 == NULL # error A

m_0 == d_0  # error A
m_0 == d_00 # <0 x 0 matrix>
m_0 == d3   # error A

# empty matrix for comparison
m_0 == 1 # < 0 x 1 matrix>
m_0 == logical(0) # < 0 x 1 matrix>
m_0 == NULL # < 0 x 1 matrix>

# All errors above could be solved by changing the last line in 
Ops.data.frame from
# matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = 
list(rn, cn))

# to
# matrix(if (length(value)==0) logical() else value, nrow = nr, dimnames 
= list(rn, cn))
# Alternatively or in addition one could add an explicit test for 
data.frame() == NULL if desired and raise an error


#
# non-empty return value but failing in the same code line due to 
incompatible dimensions.
# should Ops.data.frame at all be dispatched for  == object> ?

setClass("FOOCLASS",
  representation("list")
)
ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
isS4(ma)
d_3 == ma # error A
##########

Best regards,
Hilmar

Am 11/09/2019 um 13:26 schrieb Hilmar Berger:
Sorry, I can't reproduce the example below even on the same machine. 
However, the following example produces the same error as NULL values 
in prior examples:


> setClass("FOOCLASS",
+  representation("list")
+ )
> ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
> isS4(ma)
[1] TRUE
> data.frame(a=1:3) == ma
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), 
nrow = nr,  :

  length of 'dimnames' [2] not equal to array extent

Best,
Hilmar


On 11/09/2019 12:24, Hilmar Berger wrote:
Another example where a data.frame is compared to (here non-null, 
non-empty) non-atomic values in Ops.data.frame, resulting in an error 
message:


setClass("FOOCLASS2",
 slots = c(M="matrix")
)
ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))

> isS4(ma)
[1] TRUE
> ma == data.frame(a=1:3)
Error in eval(f) : dims [product 1] do not match the length of object 
[3]


As for the NULL/logical(0) cases I would suggest to explicitly test 
for invalid conditions in Ops.data.frame and generate a 
comprehensible message (e.g. "comparison is possible only for atomic 
and list types") if appropriate.


Best regards,
Hilmar


On 11/09/2019 11:55, Hilmar Berger wrote:


In the data.frame()==NULL cases I have the impression that the fact 
that both sides are non-atomic is not properly detected and 
therefore R tries to go on with the == method for data.frames.


From a cursory check in Ops.data.frame() and some debugging I have 
the impression that the case of the second argument being non-atomic 
or empty is not handled at all and the function progresses until the 
end, where it fails in the last step on an empty value:


matrix(unlist(value, recursive = FALSE, use.names = FALSE),
    nrow = nr, dimnames = list(rn, cn)) 






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] '==' operator: inconsistency in data.frame(...) == NULL

2019-09-11 Thread Hilmar Berger
Sorry, I can't reproduce the example below even on the same machine. 
However, the following example produces the same error as NULL values in 
prior examples:


> setClass("FOOCLASS",
+  representation("list")
+ )
> ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
> isS4(ma)
[1] TRUE
> data.frame(a=1:3) == ma
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), 
nrow = nr,  :

  length of 'dimnames' [2] not equal to array extent

Best,
Hilmar


On 11/09/2019 12:24, Hilmar Berger wrote:
Another example where a data.frame is compared to (here non-null, 
non-empty) non-atomic values in Ops.data.frame, resulting in an error 
message:


setClass("FOOCLASS2",
 slots = c(M="matrix")
)
ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))

> isS4(ma)
[1] TRUE
> ma == data.frame(a=1:3)
Error in eval(f) : dims [product 1] do not match the length of object [3]

As for the NULL/logical(0) cases I would suggest to explicitly test 
for invalid conditions in Ops.data.frame and generate a comprehensible 
message (e.g. "comparison is possible only for atomic and list types") 
if appropriate.


Best regards,
Hilmar


On 11/09/2019 11:55, Hilmar Berger wrote:


In the data.frame()==NULL cases I have the impression that the fact 
that both sides are non-atomic is not properly detected and therefore 
R tries to go on with the == method for data.frames.


From a cursory check in Ops.data.frame() and some debugging I have 
the impression that the case of the second argument being non-atomic 
or empty is not handled at all and the function progresses until the 
end, where it fails in the last step on an empty value:


matrix(unlist(value, recursive = FALSE, use.names = FALSE),
    nrow = nr, dimnames = list(rn, cn)) 




--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] '==' operator: inconsistency in data.frame(...) == NULL

2019-09-11 Thread Hilmar Berger
Another example where a data.frame is compared to (here non-null, 
non-empty) non-atomic values in Ops.data.frame, resulting in an error 
message:


setClass("FOOCLASS2",
 slots = c(M="matrix")
)
ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))

> isS4(ma)
[1] TRUE
> ma == data.frame(a=1:3)
Error in eval(f) : dims [product 1] do not match the length of object [3]

As for the NULL/logical(0) cases I would suggest to explicitly test for 
invalid conditions in Ops.data.frame and generate a comprehensible 
message (e.g. "comparison is possible only for atomic and list types") 
if appropriate.


Best regards,
Hilmar


On 11/09/2019 11:55, Hilmar Berger wrote:


In the data.frame()==NULL cases I have the impression that the fact 
that both sides are non-atomic is not properly detected and therefore 
R tries to go on with the == method for data.frames.


From a cursory check in Ops.data.frame() and some debugging I have the 
impression that the case of the second argument being non-atomic or 
empty is not handled at all and the function progresses until the end, 
where it fails in the last step on an empty value:


matrix(unlist(value, recursive = FALSE, use.names = FALSE),
    nrow = nr, dimnames = list(rn, cn)) 


--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] '==' operator: inconsistency in data.frame(...) == NULL

2019-09-04 Thread Hilmar Berger
Dear all,

I just stumbled upon some behavior of the == operator which is at least 
somewhat inconsistent.

R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

 > list(a=1:3, b=LETTERS[1:3]) == NULL
logical(0)
 > matrix(1:6, 2,3) == NULL
logical(0)
 > data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0)
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, 
dimnames = list(rn,  :
   length of 'dimnames' [2] not equal to array extent

 > data.frame(NULL) == 1
<0 x 0 matrix>
 > data.frame(NULL) == NULL
<0 x 0 matrix>
 > data.frame(NULL) == logical(0)
<0 x 0 matrix>

I wonder if data.frame() == NULL should also return 
a value instead of an error. R help reads:

"At least one of |x| and |y| must be an atomic vector, but if the other 
is a list *R* attempts to coerce it to the type of the atomic vector: 
this will succeed if the list is made up of elements of length one that 
can be coerced to the correct type.

If the two arguments are atomic vectors of different types, one is 
coerced to the type of the other, the (decreasing) order of precedence 
being character, complex, numeric, integer, logical and raw."

It is not clear from the help what to expect for NULL or empty atomic 
vectors. It is also strange that for list() there is no error but for 
data.frame() with the same data an error is thrown. I can see that there 
might be reasons to return logical(0) instead of FALSE, but I do not 
fully understand why there should be differences between e.g. matrix() 
and data.frame().

Also, It is at least somewhat strange that data.frame(NULL) == NULL and 
similar expressions return an empty matrix, while comparing a normal 
filled matrix to NULL returns logical(0).

Even if this behavior is expected, the error message shown by 
data.frame(...) == NULL is not very informative.

Thanks and best regards,

Hilmar





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Seg fault stats::runmed

2018-10-05 Thread Hilmar Berger

Dear all,

I just found this issue:

dd1 = c(rep(NaN,82), rep(-1, 144), rep(1, 74))
xx = runmed(dd1, 21)

-> R crashes reproducibly in R 3.4.3, R3.4.4 (Ubuntu 14.04/Ubuntu 16.04)

With GDB:
Program received signal SIGSEGV, Segmentation fault.
swap (l=53, r=86, window=window@entry=0xc59308, 
outlist=outlist@entry=0x12ea2e8, nrlist=nrlist@entry=0x114fdd8, 
print_level=print_level@entry=0) at Trunmed.c:64

64        outlist[nr/* = nrlist[l] */] = l;

Valgrind also reports access to unallocated memory and/or writing past 
the end of the heap.


The crash does not happen if the order is changed:

dd2 = c(rep(-1, 144), rep(1, 74), rep(NaN,82))
xx = runmed(dd2,21)

Error in if (a < b) { : missing value where TRUE/FALSE needed

Best regards,
Hilmar

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-09 Thread Hilmar Berger


On 09/05/17 11:22, Joris Meys wrote:
>
>
> On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger 
> <ber...@mpiib-berlin.mpg.de <mailto:ber...@mpiib-berlin.mpg.de>> wrote:
>
> Hi,
>
> On 08/05/17 16:37, Ista Zahn wrote:
>
> One of the key strengths of R is that packages are not akin to
> "fan
> created mods". They are a central and necessary part of the R
> system.
>
> I would tend to disagree here. R packages are in their majority
> not maintained by the core R developers. Concepts, features and
> lifetime depend mainly on the maintainers of the package (even
> though in theory GPL will allow to somebody to take over anytime).
> Several packages that are critical for processing big data and
> providing "modern" visualizations introduce concepts quite
> different from the legacy S/R language. I do feel that in a way,
> current core R shows strongly its origin in S, while modern
> concepts (e.g. data.table, dplyr, ggplot, ...) are often only
> available via extension packages. This is fine if one considers R
> to be a statistical toolkit; as a programming language, however,
> it introduces inconsistencies and uncertainties which could be
> avoided if some of the "modern" parts (including language
> concepts) could be more integrated in core-R.
>
> Best regards,
> Hilmar
>
>
> And I would tend to disagree here. R is build upon the paradigm of a 
> functional programming language, and falls in the same group as 
> clojure, haskell and the likes. It is a turing complete programming 
> language on its own. That's quite a bit more than "a statistical 
> toolkit". You can say that about eg the macro language of SPSS, but 
> not about R.
>
My point was that inconsistencies are harder to tolerate when using R as 
a programming language as opposed to a toolkit that just has to do a job.
> Second, there's little "modern" about the ideas behind the tidyverse. 
> Piping is about as old as unix itself. The grammar of graphics, on 
> which ggplot is based, stems from the SYStat graphics system from the 
> nineties. Hadley and colleagues did (and do) a great job implementing 
> these ideas in R, but the ideas do have a respectable age.
Those ideas seem still to be more modern than e.g. stock R graphics 
designed probably in the seventies or eighties. Which still do their job 
for lots and lots of applications, however, the fact that many newer 
packages use ggplot in stead of plot() forces users to learn and use 
different paradigms for things so simple as drawing a line.

I also would like to make clear that I do not advocate for including the 
whole tidyverse in core R. I just believe that having core concepts well 
supported in core R instead of implemented in a package might make 
things more consistent. E.g. method chaining ("%>%") is a core language 
feature in many languages.
>
> The one thing I would like to see though, is the adaptation of the 
> statistical toolkit so that it can work with data.table and tibble 
> objects directly, as opposed to having to convert to a data.frame once 
> you start building the models. And I believe that eventually there 
> will be a replacement for the data.frame that increases R's 
> performance and lessens its burden on the memory.
>
Which is a perfect example of what I mean: improved functionality should 
find their way into core R at some time point, replacing or extending 
outdated functionality. Otherwise, I don't know how hard it will be to 
develop 21st century methods on top of a 1980s/90s language core. 
Although I admit that the R developers are doing a great job to make it 
possible.

Best,
Hilmar

> So all in all, I do admire the tidyverse and how it speeds up data 
> preparation for analysis. But tidyverse is a powerful data toolkit, 
> not a programming language. And it won't make R a programming language 
> either. Because R is already.
>
> Cheers
> Joris
>
>
> -- 
> Dr. Hilmar Berger, MD
> Max Planck Institute for Infection Biology
> Charitéplatz 1
> D-10117 Berlin
> GERMANY
>
> Phone: + 49 30 28460 430 <tel:%2B%2049%2030%2028460%20430>
> Fax: + 49 30 28460 401 <tel:%2B%2049%2030%2028460%20401>
>  E-Mail: ber...@mpiib-berlin.mpg.de
> <mailto:ber...@mpiib-berlin.mpg.de>
> Web   : www.mpiib-berlin.mpg.de <http://www.mpiib-berlin.mpg.de>
>
>
> __
> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-09 Thread Hilmar Berger

Hi,

On 08/05/17 16:37, Ista Zahn wrote:

One of the key strengths of R is that packages are not akin to "fan
created mods". They are a central and necessary part of the R system.

I would tend to disagree here. R packages are in their majority not 
maintained by the core R developers. Concepts, features and lifetime 
depend mainly on the maintainers of the package (even though in theory 
GPL will allow to somebody to take over anytime). Several packages that 
are critical for processing big data and providing "modern" 
visualizations introduce concepts quite different from the legacy S/R 
language. I do feel that in a way, current core R shows strongly its 
origin in S, while modern concepts (e.g. data.table, dplyr, ggplot, ...) 
are often only available via extension packages. This is fine if one 
considers R to be a statistical toolkit; as a programming language, 
however, it introduces inconsistencies and uncertainties which could be 
avoided if some of the "modern" parts (including language concepts) 
could be more integrated in core-R.


Best regards,
Hilmar

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Crash after (wrongly) applying product operator on object from LIMMA package

2017-04-24 Thread Hilmar Berger
Hi January,

I believe the root of the xlsx issue has been identified and a fix 
suggested by Tomas Kalibera (see https://github.com/s-u/rJava/pull/102). 
In a nutshell, Oracle Java on Linux modifies the stack in a way that 
makes it smaller and and the same time makes it impossible for R to 
detect this change, leading to segfaults. It is not clear to me that the 
same problem would occur on Mac, since the behavior of Oracle seems to 
be Linux specific. Possibly even Linux users on OpenJDK might not 
encounter any problems (not tested).

So possibly the next release of rJava should also fix the xlsx problems 
with other packages.

Best regards,
Hilmar


On 24/04/17 11:46, January W. wrote:
> Hi Hilmar,
>
> weird. The memory problem seems be due to recursion (my R, version 
> 3.3.3, says: Error: evaluation nested too deeply: infinite recursion / 
> options(expressions=)?, just write traceback() to see how it happens), 
> but why does it segfault with xlsx? Nb xlsx is the culprit: neither 
> rJava nor xlsxjars cause the problem.
>
> On the other hand, quick googling for r+xlsx+segfault returns tons of 
> reports of how xlsx crashes in dozens of situations. See for example 
> http://r.789695.n4.nabble.com/segfault-in-gplots-heatmap-2-td4641808.html. 
> Also, the problem might be platform-specific. It would be interesting 
> to see whether anyone with a Mac can reproduce it.
>
> kind regards,
>
> j.
>
>
>
>
>
> On 19 April 2017 at 10:01, Hilmar Berger <ber...@mpiib-berlin.mpg.de 
> <mailto:ber...@mpiib-berlin.mpg.de>> wrote:
>
> Hi,
>
> following up on my own question, I found smaller example that does
> not require LIMMA:
>
> setClass("FOOCLASS",
>  representation("list")
> )
> ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
>
> > ma * ma$M
> Error: C stack usage  7970512 is too close to the limit
>
> > library(xlsx)
> Loading required package: rJava
> Loading required package: xlsxjars
> > ma * ma$M
> ---> Crash
>
> xlsx seems to act like a catalyst here, with the product operator
> running in a deep nested iteration, exhausting the stack. Valgrind
> shows thousands of invalid stack accesses when loading xslx, which
> might contribute to the problem. Package xlsx has not been updated
> since 2014, so it might fail with more current versions of R or
> Java (I'm using Oracle Java 8).
>
> Still, even if xlsx was the package to be blamed for the crash, I
> fail to understand what exactly the product operator is trying to
> do in the multiplication of the matrix with the object.
>
> Best regards,
> Hilmar
>
>
> On 18/04/17 18:57, Hilmar Berger wrote:
>
> Hi,
>
> this is a problem that occurs in the presence of two libraries
> (limma, xlsx) and leads to a crash of R. The problematic code
> is the wrong application of sweep or the product ("*")
> function on an LIMMA MAList object. To my knowledge, limma
> does not define a "*" method for MAList objects.
>
> If only LIMMA is loaded but not package xlsx, the code does
> not crash but rather produces an error ("Error: C stack usage 
> 7970512 is too close to the limit"). Loading only package
> rJava instead of xlsx does also not produce the crash but the
> error message instead. Note that xlsx functions are not
> explicitly used.
>
> It could be reproduced on two different Linux machines running
> R-3.2.5, R-3.3.0 and R-3.3.2.
>
> Code to reproduce the problem:
> -
> library(limma)
> library(xlsx)
>
> # a MAList
> ma = new("MAList", list(A=matrix(rnorm(300), 30,10),
> M=matrix(rnorm(300), 30,10)))
>
> # This should actually be sweep(ma$M, ...) for functional
> code, but I omitted the $M...
> #sweep(ma, 2, c(1:10), "*")
> # sweep will crash when doing the final operation of applying
> the function over the input matrix, which in this case is
> function "*"
>
> f = match.fun("*")
> # This is not exactly the same as in sweep but it also tries
> to multiply the MAList object with a matrix of same size and
> leads to the crash
> f(ma, ma$M)
> # ma * ma$M has the same effect
> -
>
> My output:
>
> R version 3.3.0 (2016-05-03) -- "Supposedly Educational"

Re: [Rd] Crash after (wrongly) applying product operator on object from LIMMA package

2017-04-19 Thread Hilmar Berger

Hi,

following up on my own question, I found smaller example that does not 
require LIMMA:


setClass("FOOCLASS",
 representation("list")
)
ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))

> ma * ma$M
Error: C stack usage  7970512 is too close to the limit

> library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
> ma * ma$M
---> Crash

xlsx seems to act like a catalyst here, with the product operator 
running in a deep nested iteration, exhausting the stack. Valgrind shows 
thousands of invalid stack accesses when loading xslx, which might 
contribute to the problem. Package xlsx has not been updated since 2014, 
so it might fail with more current versions of R or Java (I'm using 
Oracle Java 8).


Still, even if xlsx was the package to be blamed for the crash, I fail 
to understand what exactly the product operator is trying to do in the 
multiplication of the matrix with the object.


Best regards,
Hilmar

On 18/04/17 18:57, Hilmar Berger wrote:

Hi,

this is a problem that occurs in the presence of two libraries (limma, 
xlsx) and leads to a crash of R. The problematic code is the wrong 
application of sweep or the product ("*") function on an LIMMA MAList 
object. To my knowledge, limma does not define a "*" method for MAList 
objects.


If only LIMMA is loaded but not package xlsx, the code does not crash 
but rather produces an error ("Error: C stack usage  7970512 is too 
close to the limit"). Loading only package rJava instead of xlsx does 
also not produce the crash but the error message instead. Note that 
xlsx functions are not explicitly used.


It could be reproduced on two different Linux machines running 
R-3.2.5, R-3.3.0 and R-3.3.2.


Code to reproduce the problem:
-
library(limma)
library(xlsx)

# a MAList
ma = new("MAList", list(A=matrix(rnorm(300), 30,10), 
M=matrix(rnorm(300), 30,10)))


# This should actually be sweep(ma$M, ...) for functional code, but I 
omitted the $M...

#sweep(ma, 2, c(1:10), "*")
# sweep will crash when doing the final operation of applying the 
function over the input matrix, which in this case is function "*"


f = match.fun("*")
# This is not exactly the same as in sweep but it also tries to 
multiply the MAList object with a matrix of same size and leads to the 
crash

f(ma, ma$M)
# ma * ma$M has the same effect
-

My output:

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(limma)
> library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
>
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8 
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8 
LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8  LC_MEASUREMENT=en_US.UTF-8 
LC_IDENTIFICATION=en_US.UTF-8


attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-8limma_3.30.7

loaded via a namespace (and not attached):
[1] tools_3.3.0
>
> ma = new("MAList", list(A=matrix(rnorm(300), 30,10), 
M=matrix(rnorm(300), 30,10)))

> #sweep(ma, 2, c(1:10), "*")
>
> f = match.fun("*")
> f
function (e1, e2)  .Primitive("*")

> f(ma, ma$M)

> crash to command line with segfault.

Best regards,
Hilmar



--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Crash after (wrongly) applying product operator on object from LIMMA package

2017-04-18 Thread Hilmar Berger

Hi,

this is a problem that occurs in the presence of two libraries (limma, 
xlsx) and leads to a crash of R. The problematic code is the wrong 
application of sweep or the product ("*") function on an LIMMA MAList 
object. To my knowledge, limma does not define a "*" method for MAList 
objects.


If only LIMMA is loaded but not package xlsx, the code does not crash 
but rather produces an error ("Error: C stack usage  7970512 is too 
close to the limit"). Loading only package rJava instead of xlsx does 
also not produce the crash but the error message instead. Note that xlsx 
functions are not explicitly used.


It could be reproduced on two different Linux machines running R-3.2.5, 
R-3.3.0 and R-3.3.2.


Code to reproduce the problem:
-
library(limma)
library(xlsx)

# a MAList
ma = new("MAList", list(A=matrix(rnorm(300), 30,10), 
M=matrix(rnorm(300), 30,10)))


# This should actually be sweep(ma$M, ...) for functional code, but I 
omitted the $M...

#sweep(ma, 2, c(1:10), "*")
# sweep will crash when doing the final operation of applying the 
function over the input matrix, which in this case is function "*"


f = match.fun("*")
# This is not exactly the same as in sweep but it also tries to multiply 
the MAList object with a matrix of same size and leads to the crash

f(ma, ma$M)
# ma * ma$M has the same effect
-

My output:

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(limma)
> library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
>
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8 
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8 
LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8  LC_MEASUREMENT=en_US.UTF-8 
LC_IDENTIFICATION=en_US.UTF-8


attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-8limma_3.30.7

loaded via a namespace (and not attached):
[1] tools_3.3.0
>
> ma = new("MAList", list(A=matrix(rnorm(300), 30,10), 
M=matrix(rnorm(300), 30,10)))

> #sweep(ma, 2, c(1:10), "*")
>
> f = match.fun("*")
> f
function (e1, e2)  .Primitive("*")

> f(ma, ma$M)

> crash to command line with segfault.

Best regards,
Hilmar

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:+ 49 30 28460 401
 
E-Mail: ber...@mpiib-berlin.mpg.de

Web   : www.mpiib-berlin.mpg.de

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ifelse drops classes

2008-02-29 Thread Hilmar Berger
Hi all,

I guess that this is rather a feature request than a bug report, but I'm 
not really sure:

I stumbled over this today (R 2.6.2, WinXP):

  c=c(as.Date(2007-01-01))
  class(c)
[1] Date
  ifelse(is.na(c),as.Date(Sys.time()), c)
[1] 13514
  typeof(ifelse(is.na(c),as.Date(Sys.time()), c))
[1] double
  class(ifelse(is.na(c),as.Date(Sys.time()), c))
[1] numeric
  mode(ifelse(is.na(c),as.Date(Sys.time()), c))
[1] numeric

So - unexpected by me - ifelse drops the date class.

Afterwards I found in the Help page:

The mode of the result may depend on the value of test, and the class 
attribute of the result is taken from test and may be inappropriate for 
the values selected from yes and no.

So even I should expect that the class of the results might not Date,

1. shouldn't it be 'logical' (class(TRUE)) instead of 'numeric' ?
2. Isn't it pretty useless to take the class from the test argument, 
given the fact that this will be usely class(TRUE/FALSE) ?
3. If we can take the mode from the value selected, why can't we take 
the class attribute as well ?

The Help page advices:
Sometimes it is better to use a construction such as (tmp - yes; 
tmp[!test] - no[!test]; tmp), possibly extended to handle missing 
values in test.

So - why doesn't ifelse use a similiar construction ?

In conclusion, I plead for changing ifelse() to not drop class attributes.

Regards,
Hilmar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] tapply on empty data.frames (PR#10644)

2008-01-27 Thread hilmar . berger
Full_Name: Hilmar Berger
Version: 2.4.1/2.6.2alpha
OS: WinXP
Submission from: (NULL) (84.185.128.110)


Hi all,

If I use tapply on an empty data.frame I get an error. I'm not quite sure if one
can actually expect the function to return with a result. However, the error
message suggests that this case does not get handled well.

This happens both in R-2.4.1 and 2.6.2alpha (version 2008-01-26).

 z = data.frame(a = c(1,2,3,4),b=c(a,b,c,d))
 z1 = subset(z,a == 5)
 tapply(z1$a,z1$b,length)
Error in ansmat[index] - ans : 
  incompatible types (from NULL to logical) in subassignment type fix

Deleting unused factor levels from the group parameter gives:

 tapply(z1$a,factor(z1$b),length)
logical(0)


Regards,
Hilmar

platform   i386-pc-mingw32  
arch   i386 
os mingw32  
system i386, mingw32
status alpha
major  2
minor  6.2  
year   2008 
month  01   
day26   
svn rev44181
language   R
version.string R version 2.6.2 alpha (2008-01-26 r44181)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel