Problem fixed by R-patched, thanks; see comments below. >On Thu, 11 Oct 2007, [EMAIL PROTECTED] wrote: > >> I'm encountering excruciatingly slow load times for character vectors
>> in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes >> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, >> repeated loads of the same set of files are near-instantaneous. >> >> The problem is proving tricky to reproduce consistently from scratch, >> so I have attached the 3 files used in the examples below. > >There was no attachment: since these are (I presume) binary files, can you >not put them on a website (as suggested by the posting guide)? Sorry, I would have if I could, but can't at present. The attachments got through OK to me at least, though. If anyone does have an interest in the files, let me know off-list and I'll re-send as a zip or somesuch. > >> If I create a similar-looking object from scratch, then save it and >> re-load it a few times, the problem doesn't always occur... at least not >> in that session. >> >> >> FWIW I have noticed that the time taken to load seems to be roughly a >> power of 2 of the "base slow load time"-- could be a red herring. >> >> The problem seems specific to character vectors-- I noticed it with >> entire workspaces and have whittled it down to char vecs only. >> >> The example below is from a brand-new session with only the basic >> packages loaded; delays in my real sessions are much longer. > >Can you please try R-patched or R-devel. We've found and solved a couple >of performance issues with creating STRSXPs, but with character vectors of >the millions of elements. Thanks; R-patched fixed it. I did look in R-devel NEWS before posting, but that doesn't mention the bug fix on CHARSXP which is in the R-patched NEWS, so I didn't persist. FWIW in case work is still being done on new CHARSXP: my problems were with much shorter vectors (~1e4) than the millions mentioned in patched-NEWS, and the strings were short too: 90% were '' and the other 10% were 'a'. Also, when the previously offending objects are loaded into 2.6.0patched, they are 3-10X smaller (according to object.size) than in unpatched-- I was also amazed by the compression! Looks like unpatched R was allocating at least a 32-byte memory entry per individual zero-character string. It is down to about 4 bytes per (zero-character) string in R-patched. Mark Bravington > >I tried several examples of around 10000 elements and got times of at most >0.05 secs in 2.6.0. These included parts of those examples on which we >had seen performance issues. > >A few clues: > >- even your base time is much slower than I would expect. > >- you say 'a 15K file ... object size ~0.5MB'. That's pretty phenomenal > compression, and I am seeing file sizes more like 100Kb for objects that > size. Since object.size does take into account duplication, one way to > get that would be to have all unique elements. At ca 50bytes per > element you would need an average string length of about 15 chars. Such > an object takes about 200Kb as a .rda file. > > >> >> >> Mark Bravington >> CSIRO Mathematical & Information Sciences >> Marine Laboratory >> Castray Esplanade >> Hobart 7001 >> TAS >> >> ph (+61) 3 6232 5118 >> fax (+61) 3 6232 5012 >> mob (+61) 438 315 623 >> >> >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. Type 'q()' to >> quit R. >> >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 0.5 0.0 0.5 >>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower >> user system elapsed >> 3.5 0.0 3.5 >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 4.13 0.00 4.13 >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 3.51 0.00 3.52 >> >>> system.time( load( 'd:/r2.0/t2.rda')) # different bigger file >> user system elapsed >> 4.42 0.00 4.42 >>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower >> user system elapsed >> 10.44 0.00 10.44 >>> system.time( load( 'd:/r2.0/t2.rda')) >> user system elapsed >> 10.79 0.00 10.80 >>> system.time( load( 'd:/r2.0/t2.rda')) >> user system elapsed >> 10.39 0.00 10.41 >>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; >>> slower >> user system elapsed >> 10.67 0.00 10.69 >>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file >> user system elapsed >> 10.51 0.00 10.52 >>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again: slower >> user system elapsed >> 14.61 0.00 14.61 >> >> >> >> --please do not edit the information below-- >> >> Version: >> platform = i386-pc-mingw32 >> arch = i386 >> os = mingw32 >> system = i386, mingw32 >> status = >> major = 2 >> minor = 6.0 >> year = 2007 >> month = 10 >> day = 03 >> svn rev = 43063 >> language = R >> version.string = R version 2.6.0 (2007-10-03) >> >> Windows XP (build 2600) Service Pack 2.0 >> >> Locale: >> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M >> ON >> ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 >> >> Search Path: >> Search Path: >> .GlobalEnv, package:stats, package:graphics, package:grDevices, >> package:utils, package:datasets, package:methods, Autoloads, >> package:base >> > >-- >Brian D. Ripley, [EMAIL PROTECTED] >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595 > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel