> concat() doesn't get a lot of use How do you know? Maybe its used a lot but the users had no need to tell you what they were using. The exact opposite might in fact be the case i.e. because concat is so good in splus, you just never hear of problems with it from the users. That might be a very good sign.
> perhaps that model would work well for a concatenation function in R I'd be happy to test it. I'm a bit concerned about performance though given what you said about repeated recursive calls, and dispatch. Could you run the following test in s-plus please and post back the timing? If this small 100MB example was fine, then we could proceed to a 64bit 10GB test. This is quite nippy at the moment in R (1.1sec). I'd be happy with a better way as long as speed wasn't compromised. set.seed(1) L = as.vector(outer(LETTERS,LETTERS,paste,sep="")) # union set of 676 levels F = lapply(1:100, function(i) { # create 100 factors f = sample(1:100, 1*1024^2 / 4, replace=TRUE) # each factor 1MB large (262144 integers), plus small amount for the levels levels(f) = sample(L,100) # pick 100 levels from the union set class(f) = "factor" f }) > head(F[[1]]) [1] RT DM CO JV BG KU 100 Levels: YC FO PN IL CB CY HQ ... > head(F[[2]]) [1] RK PD FE SG SJ CQ 100 Levels: JV FV DX NL XB ND CY QQ ... > With c.factor from data.table, as posted, placed in .GlobalEnv > system.time(G <- do.call("c",F)) user system elapsed 0.81 0.32 1.12 > head(G) [1] RT DM CO JV BG KU # looks right, comparing to F[[1]] above 676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF ... ZZ > G[262145:262150] [1] RK PD FE SG SJ CQ # looks right, comparing to F[[2]] above 676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF ... ZZ > identical(as.character(G),as.character(unlist(F))) [1] TRUE So I guess this would be compared to following in splus ? system.time(G <- do.call("concat", F)) or maybe its just the following : system.time(G <- concat(F)) I don't have splus so I can't test that myself. "William Dunlap" <wdun...@tibco.com> wrote in message news:77eb52c6dd32ba4d87471dcd70c8d7000275b...@na-pa-vbe03.na.tibco.com... > -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard > Sent: Friday, February 05, 2010 7:41 AM > To: Hadley Wickham > Cc: John Fox; r-devel@r-project.org; Thomas Lumley > Subject: Re: [Rd] Why is there no c.factor? > > Hadley Wickham wrote: > > On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham > <had...@rice.edu> wrote: > >>> I'd propose the following: If the sets of levels of all > arguments are the > >>> same, then c.factor() would return a factor with the > common set of levels; > >>> if the sets of levels differ, then, as Hadley suggests, > the level-set of the > >>> result would be the union of sets of levels of the > arguments, but a warning > >>> would be issued. > >> I like this compromise (as long as there was an argument > to suppress > >> the warning) > > > > If I provided code to do this, along with the warnings for ordered > > factors and using the optimisation suggested by Matthew, is > there any > > member of R core would be interested in sponsoring it? > > > > Hadley > > > > Messing with c() is a bit unattractive (I'm not too happy > with the other > c methods either; normally c() strips attributes and reduces > to the base > class, and those obviously do not), but a more general > concat() function > has been suggested a number of times. With a suitable range > of methods, > this could also be used to reimplement rbind.data.frame (which, > incidentally, already contains a method for concatenating > factors, with > several ugly warts!) Yes, c() should have been put on the deprecated list a couple of decades ago, since people expect it to do too many incompatible things. And factor should have been a virtual class, with subclasses "FixedLevels" (e.g., Sex) or "AdHocLevels" (e.g., FamilyName), so c() and [()<- could do the appropriate thing in either case. Back to reality, S+ has a concat(...) function, whose comments say # This function works like c() except that names of arguments are # ignored. That is, it concatenates its arguments into a single # S vector object, without considering the names of the arguments, # in the order that the arguments are given. # # To make this function work for new classes, it is only necessary # to make methods for the concat.two function, which concatenates # two vectors; recursion will take care of the rest. concat() is not generic but it repeatedly calls concat.two(x,y), an SV4-generic that dispatches on the classes of x and y. Thus you can easily predict the class of concat(x,y,z), although it may not be the same as the class of concat(z,y,x), given suitably bizarre methods for concat.two(). concat() doesn't get a lot of use but I think the idea is sound. Perhaps that model would work well for a concatenation function in R. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > -- > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: > (+45) 35327907 > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel