Re: [R] how to change the ff properties of a ff-related R object after the original ff output folder has been moved
Tao, I do assume that the ff-files are still at some location and not deleted by a finalizer. The following explains how to manipulate file locations with ff and ffdf objects. Kind regards jens library(ff) path1 - c:/tmp path2 - c:/tmp2 # create ffdf, # using non-standard path sets finalizer to 'close' instead of 'delete' fdf1 - as.ffdf(iris, col_args=list(pattern=file.path(path1,iris))) # let's copy the old metadata (but not the files, useclone for that) # using ffs hybrid copying semantics fdf2 - fdf1 # note both are open is.open(fdf1) is.open(fdf2) # close the files close(fdf1) # and note that is.open(fdf1) is.open(fdf2) # the magic has kept physical metadata in synch even in the copy # (virtual metadata is not kept in synch # which allows different virtual views into the same files # not unlike SQL VIEWs virtualize dastabase TABLEs) # filename on a ffdf filename(fdf2) # is a shortcut for lapply(physical(fdf2), filename) # so filename is a physical attribute # actually moving the files can be done with the filename- method lapply(physical(fdf2), function(x)filename(x) - sub(path1, path2, filename(x))) # check this filename(fdf1) filename(fdf2) # filename on ff filename(fdf1$Species) # is a shortcut for attr(attr(fdf1$Species, physical), filename) # and if you directly manipulate this attribute # you circummvent the filename method # and the file itself will not be moved attr(attr(fdf1$Species, physical), filename) - sub(path2, path1, filename(fdf1$Species)) # now the metadata points to a different location filename(fdf1$Species) # note that this physical attribute was also changed # for the copy filename(fdf2$Species) # of course you can fix the erroneous metadata by attr(attr(fdf1$Species, physical), filename) - sub(path1, path2, filename(fdf1$Species)) # or for all columns in a ffdf by lapply(physical(fdf2), function(x)attr(attr(x, physical), filename) - sub(path2, path1, filename(x))) # now we have your situation with broken metadata open(fdf2) # and can fix that by lapply(physical(fdf2), function(x)attr(attr(x, physical), filename) - sub(path1, path2, filename(x))) # check open(fdf2) Am 26.06.2015 um 01:04 schrieb Shi, Tao: Hi all, I'm new to ff package through the using Bioconductor package crlmm. Here is my problem: I've created a few R objects (e.g. an CNSet) using crlmm based on my data and save them in a .RData file. crlmm heavily uses ff package to store results on a local folder. For certain reasons, I have moved the ff output folder to somewhere else. Now when I go back to R, I can't open those CNSet, for example, anymore, as the file has a property still storing the old ff output folder path. My question is: is there a quick way to change these paths to the new one, so I don't have to re-run the own analysis. Many thanks! Tao __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] package bit64 with new functionality
Dear R community, The new version of package 'bit64' - which extends R with fast 64-bit integers - now has fast (single-threaded) implementations of the most important univariate algorithmic operations (those based on hashing and sorting). Package 'bit64' now has methods for 'match', '%in%', 'duplicated', 'unique', 'table', 'sort', 'order', 'rank', 'quantile', 'median' and 'summary'. Regarding data management it has novel generics 'unipos' (positions of the unique values), 'tiepos' (positions of ties), 'keypos' (positions of values in a sorted unique table) and derived methods 'as.factor' and 'as.ordered'. This 64-bit functionality is implemented carefully to be not slower than the respective 32-bit operations in Base R and also to avoid excessive execution times observed with 'order', 'rank' and 'table' (speedup factors 20/16/200 respective). This increases the dataset size with wich we can work truly interactive. The speed is achieved by simple heuristic optimizers: the mentioned high-level functions choose the best from multiple low-level algorithms and further take advantage of a novel optional caching method. In an example R session using a couple of these operations the 64-bit integers performed 22x faster than base 32-bit integers, hash-caching improved this to 24x amortized, sortorder-caching was most efficient with 38x (caching both, hashing and sorting is not worth it with 32x at duplicated RAM consumption). Since the package covers the most important functions for (univariate) data exploration and data management, I think it is now appropriate to claim that R has sound 64-bit integer support, for example for working with keys or counts imported from large databases. For details concerning approach, implementation and roadmap please check the ANNOUNCEMENT-0.9-Details.txt file and the package help files. Kind regards Jens Oehlschlägel Munich, 8.11.2012 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr Von: Jonathan Greenberg j...@illinois.edu An: r-help r-help@r-project.org, r-sig-...@r-project.org Betreff: [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc References 1. http://www.geog.illinois.edu/people/JonathanGreenberg.html 2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ff objects saving problem
Xiaobo, You indeed need external 'zip' and 'unzip' utlities in the path, citing from ffsave's help: using an external zip utility, e.g. for windows in Rtools on [http://www.murdoch-sutherland.com/Rtools/];. Please note that the mentioned utilities have a 4 GB limit for the zip file, AFAIK. I will for the next release check for a way to get rid of this limit and also to get rid of inconsistencies in upper/lower-case spelling of drive letters which can cause ffsave to fail. Note that - even without fffsave - ff objects can be made permanent simply by creating them with 'filename' resp. 'pattern' outside of fftempdir and saving the R-side ff-object with the usual 'save' or 'save.image' function. In a new R session, after 'library(ff)' and 'load' you again have access, assumed your ff files are still in the same location. And yes, each column of a ffdf dataframe is stored as a separate ff file. Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] ff version 2.2.0
Dear R community, The next release of package ff is available on CRAN. With kind help of Brian Ripley it now supports the Win64 and Sun versions of R. It has three major functional enhancements: a) new fast in-memory sorting and ordering functions (single-threaded) b) ff now supports on-disk sorting and ordering of ff vectors and ffdf dataframes c) ff integer vectors now can be used as subscripts of ff vectors and ffdf dataframes a) is achieved by careful implementation of NA-handling and exploiting context information b) although permanently stored, sorting and ordering of ff objects can be faster than the standard routines in R c) applying an order to ff vectors and ffdf dataframes is substantially slower than in pure R because it involves disk-access AND sorting index positions (to avoid random access). There is still room for improvement, however, the current status should already be useful. I run some comparisons with SAS (see end of mail): - both could sort German census size (81e6 rows) on a 3GB notebook - ff sorts and orders faster on single columns - sorting big multicolumn-tables is faster in SAS Win64 binaries and version 2.2.1 supporting Sun should appear during the next days on CRAN. For the impatient: checkout from r-forge with revision 67 or higher. Non-Windows users: please note that you need to set appropriate values for options 'ffbatchbytes' and 'ffmaxbytes' yourself. Note that virtual window support is deprecated now because it leads to too complex code. Let us know if you urgently need this and why. Feedback, ideas and contributions appreciated. To those who offered code during the last months: please forgive us that integrating and documenting was not possible with this release. Jens Daniel P.S. NEWS CHANGES IN ff VERSION 2.2.0 NEW FEATURES o ff now supports the 64 bit Windows and Sun versions of R (thanks to Brian Ripley) o ff now supports sorting and ordering of ff vectors and dataframes (see ramsort, ffsort, ffdfsort, ramorder, fforder, ffdforder) o ff now supports ff vectors as subscripts of ff objects (currently positive integers only, booleans are planned) o New option 'ffmaxbytes' which allows certain ff procedures like sorting using larger limit of RAM than 'ffbatchbytes' in chunked processing. Such higher limit is useful for (single-R-process) sorting compared to some multi-R-process chunked processing. It is a good idea to reduce 'ffmaxbytes' on slaves or avoid ff sorting there completely. o New generic 'pagesize' with method 'pagesize.ff' which returns the current pagesize as defined on opening the ff object. USER VISIBLE CHANGES o [.ff now returns with the same vmode as the ff-object o Certain operations are faster now because we worked around unnecessary copying triggered by many of R's assignment functions. For example reading a factor from a (well-cached) file is now 20% faster and thus as fast as just creating this factor in-RAM using levels()- and class()- assignments. (consider this tuning temporary, hoping for a generic fix in base R) o ff() can now open files larger than .Machine$integer.max elements (but gives access only to the first .Machine$integer.max elements) o ff now has default pattern NULL translating to the pattern in 'filename' (and only to the previous default 'ff' if no filename is given) o ff now sets the pattern in synch with a requested 'filename' o clone.ff now always creates a file consistent with the previous pattern o clone.ff now always creates a finalizer consistent with the file location o clone.ffdf has a new argument 'nrow' which allows to create an empty copy with a different number of rows (currently requires 'initdata=NULL') o clone.default now deep-copies lists and atomic vectors DEPRECATED o virtual window support is deprecated. Let us know if you urgently need this and why. BUG FIXES o read.table.ffdf now also works if transFUN filters and returns less rows BUG FIXES at 2.1.4 o [-.ffdf no longer does calculate the number of elements in an ffdf which could led to an integer overflow BUG FIXES at 2.1.3 o ffsafe now always closes ffdf objects - also partially closed ones o ffsafe no longer passes arguments 'add' and 'move' to 'save' o ffsafe and friends now work around the fact that under windows getwd() can report the same path in upper and lower case versions. CHANGES IN bit VERSION 1.1.5 NEW FEATURES o new utility functions setattr() and setattributes() allow to set attributes by reference (unlike attr()- attributes()- without copying the object) o new utility unattr() returns copy of input with attributes removed USER VISIBLE CHANGES o certain
Re: [R] Pass By Value Questions
Jeff, R has 'environments' as a general mechanism to pass around objects by reference. However, that does not help with most functions like 'apply' which take arguments other than environments. I'm familiar with FF and BigMemory, but are there any packages/tricks which allow for passing such objects by reference without having to code in C? With ff (and I assume with bigmemory as well) you can pass around objects by reference without C-coding.To be more precise with regard to ff: atomic ff objects have 'hybrid copying semantics', which means that two references to an ff object will share the data and SOME features (like the 'length') while OTHER features (like 'dim') are copied on modify (see 'vt' for an powerful application of this concept). You might want to have a look at 'ffapply' and friends and at 'chunk'. HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ff objects and ordinary analytical functions.
Xiaobo.Gu, Can the plenty of analytical functions provided by base R and contributed packages be called with ff objects as parameters directly, or do we have to write special version of the functions for ff objects? If it is the latter case, is there a list of functions which support ff objects already.Xiaobo.Gu ff is an add-on package that allows you to store and access larger datasets - its not part of the language. ff objects have different copy semantics than standard R objects (partially by reference) so it is unlikely that you can write R code that does use ff objects exactly the same way as with standard R objects. There is no comprehensive list, but some functions allow ff objects, e.g. 'biglars' which you find if you look at the reverse-dependencies of ff on CRAN. Other functions are prepared to handle large datasets in chunks - like 'biglm' - and it is your responsibility to extract those chunks from ff, a database or whatever other source. HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can saved R object .RData files be loaded by more than one R sessions for read only purpose?
Xiaobo.Gu, Shared reading should be fine. Shared writing is also possible, but it is important to understand that .RData files do only contain the meta-data of ff objects, not the ff data itself. This means you cannot have multiple processes updating the same .RData metadata but you can have multiple processes writing simultaneously to the same ff datafile. (it is your responsibility to avoid conflicts and to make sure you do not suffer problems with delayed cache refreshs as can happen on network drives) HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can saved R object .RData files be loaded by more than one R sessions for read only purpose?
Xiaobo.Gu, Shared reading should be fine. Shared writing is also possible, but it is important to understand that .RData files do only contain the meta-data of ff objects, not the ff data itself. This means you cannot have multiple processes updating the same .RData metadata but you can have multiple processes writing simultaneously to the same ff datafile. (it is your responsibility to avoid conflicts and to make sure you do not suffer problems with delayed cache refreshs as can happen on network drives) HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to deal with more than 6GB dataset using R?
Matthew, You might want to look at function read.table.ffdf in the ff package, which can read large csv files in chunks and store the result in a binary format on disk that can be quickly accessed from R. ff allows you to access complete columns (returned as a vector or array) or subsets of the data identified by row-positions (and column selection, returned as a data.frame). As Jim pointed out: all depends on what you are going with the data. If you want to access subsets not by row-position but rather by search conditions, you are better-off with an indexed database. Please let me know if you write a fast read.fwf.ffdf - we would be happy to include it into the ff package. Jens __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assign Formulas to Arrays or Matrices?
for assigning formulas to arrays use an array of list nr form.arr[[31,5]]y ~ 1 + 2 Jens Oehlschlägel -Ursprüngliche Nachricht- Von: McLovin Gesendet: Jul 6, 2010 9:13:49 AM An: r-help@r-project.org Betreff: [R] Assign Formulas to Arrays or Matrices? Hi, I am very new to R. I am hoping to create formulas and assign them to locations within an array (or matrix, if it will work). Here's a simplified example of what I'm trying to do: form.arr for (i in seq(from=1, to=31, by=1)) { for (j in seq(from=1, to=5, by=1)) { form.arr[i,j,] } } which results in this error: Error in form.arr[i, j, ]incorrect number of subscripts The reason I had made the 3rd dimension of the array size 3 is because that's the length R tells me that formula is. When I had tried to do this using a matrix, using this code: form.mat for (i in seq(from=1, to=31, by=1)) { for (j in seq(from=1, to=5, by=1)) { form.mat[i,j] = as.formula(y~1+2) } } I was told: Error in form.mat[i, j] = as.formula(y ~ 1 + 2) : number of items to replace is not a multiple of replacement length My question is: is it possible to assign formulas within a matrix or array? If so, how? thanks@real.com -- View this message in context: http://r.789695.n4.nabble.com/Assign-Formulas-to-Arrays-or-Matrices-tp2279136p2279136.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory management in R
You might want to mention/talk about packages that enhance R's ability to work with less RAM / more data, such as package SOAR (transparently moving objects between RAM and disk) and ff (which allows vectors and dataframes larger than RAM and which supports dense datatypes like true boolean, short integers etc.). Jens Oehlschlägel -Ursprüngliche Nachricht- Von: john mull...@fastmail.fm Gesendet: Jun 16, 2010 12:20:17 PM An: r-help@r-project.org Betreff: [R] memory management in R I have volunteered to give a short talk on memory management in R to my local R user group, mainly to motivate myself to learn about it. The focus will be on what a typical R coder might want to know ( e.g. how objects are created, call by value, basics of garbage collection ) but I want to go a little deeper just in case there are some advanced users in the crowd. Here are the resources I am using right now Chambers book Software for Data Analysis Manuals such as R Internals and Writing R Extensions Any suggestions on other sources of information? There are still some things that are not clear to me, such as - how to make sense of the output from various memory diagnostics such as memory.profile ... are these counts? How to get the amount of memory used: gc() and memory.size() seem to differ - what gets allocated on the heap versus stack - why the name cons cells for the stack allocation Any help with these would be greatly appreciated. Thanks greatly, John Muller __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read CSV file in R?
If you have memory problems reading csv you can use read.csv.ffdf from package ff which reads in chunks. The result is a ffdf object, say myffdf, from which binary subscripting [,] returns standard data.frames, such as myffdf[,] # returns all data (if it fits into memory) myffdf[somerows,] # returns a subset of data Do read and understand the help concerning filename location and implications for finalizers and permanency. Cheers Jens Oehlschlägel -Ursprüngliche Nachricht- Von: Joris Meys jorism...@gmail.com Gesendet: Jun 8, 2010 1:11:20 PM An: dhanush dhana...@gmail.com Betreff: Re: [R] how to read CSV file in R? That will be R 2.10.1 if I'm correct. For reading in csv files, there's a function read.csv who does just that: los - read.csv(file.csv,header=T) But that is just a detail. You have problems with your memory, but that's not caused by the size of your dataframe. On my system, a matrix with 100,000 rows and 75 columns takes only 28 Mb. So I guess your workspace is cluttered with other stuff. Check following help pages : ?Memory ?memory.size ?Memory.limits it generally doesn't make a difference, but sometimes using gc() can set some memory free again. If none of this information helps, please provide us with a bit more info regarding your system and the content of your current workspace. Cheers Joris On Tue, Jun 8, 2010 at 8:46 AM, dhanush dhana...@gmail.com wrote: I tried to read a CSV file in R. The file has about 100,000 records and 75 columns. When used read.delim, I got this error. I am using R ver 10.1. los-read.delim(file.csv,header=T,sep=,) Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) Thanks -- View this message in context: http://r.789695.n4.nabble.com/how-to-read-CSV-file-in-R-tp2246930p2246930.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] appending objects to file created with save()
If you work with large data you might want to look at the ff package - useful if your data is close or above your RAM. The package has ffsave where with option add=TRUE you can add data to an existing ff archive. With ff data is stored outside of R in files, only meta-data is stored within R. An ff archive consists of two files, a zip file which stores the data (and to which you can add) and a standard .RData file which stores the meta-data using standard save(). HTH Jens -Ursprüngliche Nachricht- Von: Jannis bt_jan...@yahoo.de Gesendet: May 25, 2010 8:22:26 PM An: r-help@r-project.org Betreff: [R] appending objects to file created with save() Dears, is there a way to append R objects similar to the function save() to a binary file that already consists some previously saved R objects? I browsed the mailing list archive and only found some suggestions that include reading in the old file first and then saving the new objects together with the old ones. This would not be handy for me as my data is rather large. I have tried dump() but this does not seem to compress my data. Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ff for 64-bit windows and 64-bit R
Lawrence, My understanding is that only a minor change is needed in ff's C++ layer in order to remove the 64bit compiler warnings/errors. The C++ layer is maintained by Daniel Adler, who can give you an outlook if/when he plans to attack this. Until a 64bit version of ff is available, you might consider using the 32bit win version of R and ff on a 64bit win machine: while 32bit R itself has limited memory access, ff can handle larger objects faster because it benefits from *all* RAM via filesystem-caching. Jens - Von: Hunsicker, Lawrence lawrence-hunsic...@uiowa.edu Gesendet: May 13, 2010 3:32:25 PM An: jens.oehlschlae...@truecluster.com Betreff: ff for 64-bit windows and 64-bit R !-- Converted from text/rtf format -- Jens: I am running R on a 64 bit PC, 64 bit Windows 7, and 64 bit R. I have to handle rather large data sets, and I need the 64 bit environment to run some of my analyses. Use of ff has been recommended to me to help with some of the memory problems, but I am told that ff has not yet been ported to a 64-bit Windows environment. I have access, of course, to the native code, but I am not the world’s best compiler operator. Do you have any plans to port ff to a 64-bit Windows and R environment? Is there anything that I can do to encourage this? I would be happy to make a contribution to the “ff project” if such a thing exists. Let me know your plans. [L. G. Hunsicker, M.D.] Professor, Internal Medicine U. Iowa College of Medicine Phone: (319) 356-4763 Fax: (319) 356-7488 lawrence-hunsic...@uiowa.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to work with big matrices and the ff-package?
Anne, After the above step I need to convert my ff_matrix to a data.frame to discretize the whole matrix and calculate the mutual information. The calculated result should be saved as an ffdf-object or something similar. disc - as.ffdf(discretize(as.data.frame(as.ffdf(ffmat)), disc=equalwidth, nbins=5)) ffdf are ff's aquivalent to data.frames: they handle many rows (2^31-1) and a limited number of columns (with potentially different column types). Like data.frames, they are not suitable for millions of columns. You probably want to store your data in one big ff matrix. If you use ff objects because you don't have the RAM for standard R objects, converting ff to a data.frame is not an option because it will require too much RAM. If 'discretize' expects a data.frame, you cannot call it on an ff matrix either. But if 'discretize' works on single columns, you can call discretize on chunks of columns that you coerce to data.frames. something like for (i in chunk(from=1, to=ncol(ffmat), by=10)) ffmat[,i] - as.matrix(discretize(as.data.frame(ffmat[,i]))) If discretize returns integers, you might want to write the results rather to an integer ff matrix because this saves disk space and improves caching. HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.ffdf.data.frame now breaks if using pattern
Ramon, for me this works setwd(d:/tmp) ffd - as.ffdf(d, col_args=list(pattern = paste(getwd(), /fftmp, sep = ))) filename(ffd) $x [1] d:/tmp/fftmp35c34861.ff $y [1] d:/tmp/fftmp5be946bb.ff $z [1] d:/tmp/fftmp26c49ce.ff Jens -Ursprüngliche Nachricht- Von: Ramon Diaz-Uriarte rdia...@gmail.com Gesendet: Apr 7, 2010 7:01:23 PM An: r-help@r-project.org Betreff: as.ffdf.data.frame now breaks if using pattern Dear All, I am using package ff. In version 2.1-1 it was possible to use pattern with as.ffdf.data.frame: d - data.frame(x=1:26, y=letters, z=Sys.time()+1:26) as.ffdf(d, pattern = paste(getwd(), /fftmp, sep = )) With the latest version, the last command crashes. I wonder if the new behavior is intentional or a bug. If intentional, what is the recommended way of using pattern now? Thanks, R. -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 Fax: +-34-91-224-6972 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ff package: ff objects don't reload compl etely on NFS drives from a different machine
Try to close the file on the first nfs client before reopening it on the second nfs client. NFS has something called close-to-open cache consistency. This means that two clients which have the same nfs file open, cannot rely on seeing the updates from the respective other client. If one clients closes, and the other client opens thereafter, it should see the changes. If you want multiple clients to write at the same time, you should make sure they only write non-overlapping sections (and then all need to close for synching). Let me know if this worked for you. J. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New version of package ff
Dear R community, Package bit version 1.1-3 and ff version 2.1.2 is available on CRAN and should be useful to handle large datasets. It adds convenient utilities for managing ff objects and files (see ?ffsave) and removes some performance bottlenecks. In case you experience unexpected performance problems with ff, here is a couple of recommendations based on FAQs: 1) Compare the size of data to be written at the same time to available RAM for your filesystem cache. If the data exceeds available RAM, then consider using caching=mmeachflush instead of caching=mmnoflush, this will make write operations predictably slower but prevent write storms stalling some systems (observed under NTFS win32+64). You can set ff's caching option either with options(ffcaching=mmeachflush) before creating ff objects or create ff objects with ffobj - ff(..., caching=mmeachflush) or open your existing ff object with open(ffobj, caching=mmeachflush) (while it is closed) ff objects will remember this setting 2) If you use caching=mmnoflush: check the writeback cache configuration of your filesystem (e.g. set data=writeback for ext3, tune limits for dirty pages, consider different filesystem, consider different OS). 3) Choose a reasonable size for options(ffbatchbytes), which limits the amount of RAM used for one chunk. With too small chunks you pay more performance overhead. Note that bigger chunks are not always better, for example if you distribute chunked processing on many cores or if some operation involved does not scale well with chunk size. Final remark: testing ff access functionality on a Core i7 920 (4 cores, 8 cores with HT) shows that hyperthreading with 8 parallel processes (snowfall, sockets) gives about 5x the performance of a single process, but already 7 processes with HT perform worse than 4 processes without HT. Conclusion: if a machine is dedicated to R for RAM-critical applications, try switching hyperthreading off. Hope you find this useful. We appreciate any feedback. Jens Daniel ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about the ff package
Peter, ff objects are not allowed as subscripts to ff objects. You can take several routes 1) use bit objects instead of logical or ff logical. This is fast and takes factor 32 less RAM than logicals (BTW bit objects can be coerced to ff via as.ff() and as.bit() but they convert to vmode boolean (1 bit), not logical (2 bits). Examples for working with bit are on http://ff.r-forge.r-project.org/ffbit_UseR!2009.pdf 2) convert your logicals into positive integer subscripts (assuming that there are not too many elements selected, as you assume if writing bigData[select,] 3) keep your logical in a ff logical or ff boolean and then do chunked looping over both - the ff with the subscripts and the ffdf - and in each chunk convert the logical selection to integers, see 2) HTH Jens Oehlschlägel P.S. you might want to try the newer version on r-forge. It has several improvements but is not yet on CRAN because there is currently some issue with snow leopard. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ops method does not dispatch on either of two classes
Thanks Brian, This is as documented on the help page for Ops (the page in base, not the one in Methods) which is linked from ?|. For background you should also read the reference on that help page. Unfortunately I have no access to that book. You are wrong in asserting that the internal method is 'for logicals': please do study the help page. It covers e.g. integer vectors which is what I suspect you have here (assuming this is something to do with package 'bit', unmentioned). Yes, both bit and bitwhich are integer with an S3 class attribute (bitwhich sometimes is logical instead). I am lost. What do I need to do to get |.a dispatched if calling a | b where a and b are objects from S3 classes a and b that both have methods defined for | ? In the R Language definition I find If they do not suggest a single method then the default method is used. Does this mean it is not possible to write Ops methods for classes a and b such that |.a is called in a | b ? I don't see how I can get any hook into the dispatch mechanism, my methods are always bypassed if the classes of e1 and e2 differ (simple example below). Best wishes for 2010 Jens Oehlschlägel ca - function(x){ + x - as.integer(x) + oldClass(x) - a + x + } cb - function(x){ + x - as.integer(x) + oldClass(x) - b + x + } a - ca(1) b - cb(1) Ops.a - + function(e1, e2){ + cat(here Ops.a \n) + NULL + } Ops.b - + function(e1, e2){ + cat(here Ops.a \n) + NULL + } # OK, Ops.a dispatched a | a here |.a NULL # BUT both, Ops.a and Ops.b bypassed a | b [1] TRUE Warning message: Incompatible methods (|.a, |.b) for | |.a - function(e1, e2){ + cat(here |.a \n) + NULL + } |.b - function(e1, e2){ + cat(here |.b \n) + NULL + } # OK, |.a dispatched a | a here |.a NULL # BUT both, |.a and |.b bypassed a | b [1] TRUE Warning message: Incompatible methods (|.a, |.b) for | __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ops method does not dispatch on either of two classes
I have defined boolean methods for bit and bitwhich objects, for example |.bit - function(e1,e2) and |.bitwhich - function(e1,e2) Both methods coerce their arguments to the respective class, however if I do something like bit_obj | bitwhich_obj then I get a warning Warning message: Incompatible methods (|.bit, |.bitwhich) for | and none of the two methods is called. Instead the (internal) method for logicals seems to be called - not even coercing its arguments to logical. Same problem with Ops.bit and Ops.bitwhich . What is the recommended way to get my methods reliably dispatched? Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save workspace with ff objects
My script generates a mixture of normal and ff objects. Very often I would like to save the workspace for each parameter setting, so that I can get back to it later on. Is there an easy way to do this, instead of needing to save individual ff objects separately? With one save() you can store as many ff objects as you like. However, this does not save the ff files to a different location. I've tried the naive way of just saving the workspace, only to find that ff objects are empty. When loading the ff objects, the ff files need to be in their original locations. You need to make sure that you do not overwrite those and they survive finalizer and tempdir remove at rm(ff) or q() time. Do read the ff help on parameters 'filename', 'pattern', 'finalizer', 'finonexit'. The next version of ff will have ffsave() which will store a mixture of normal and ff objects *and* all ff-files into a ffarchive, i.e. two files ffarchive.RData and ffarchive.ffData from which you can restore all or a selection of ff objects / files using the ffload() command. Regards Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions on the ff package
I wonder how efficiently it is to do the following command on a frequent basis. nrow(matFF) - nrow(matFF)+1 Obviously there is overhead (closing file, enlarging file, openeing file). I recommend you measure yourself whether this is acceptable for you. no large file copying is needed each time the nrow is changed? With a decent filesystem there is *no* copying from smaller to larger file. would you think I can open 2000 large matrices and leave them open or I need to close each after it is opened and used? Not tested yet. I guess the number of open files can be configured when compiling your OS. Please test and let us know your experience. Regards Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions on the ff package
Jeff, I need to save a matrix as a memory-mapped file and load it back later. To save the matrix, I use mat = matrix(1:20, 4, 5) matFF = ff(mat, dim=dim(mat), filename=~/a.mat , overwrite=TRUE, dimnames = dimnames(mat)) # This stores the data in an ff file, # but not the metadata in R's ff object. # To do the latter you need to do save(matFF, file=~/matFF.RData) # Assuming that your ff file remains in the same location, # in a new R session you simply load(file=~/matFF.RData) # and the ff file is available automagically However, I don't always know the dimension when loading the matrix back. If I miss the dim attributes, ff will return it as vector. Is there a way to load the matrix without specifying the dimension? # You can create an ff object using your existing ff file by matFF - ff(filename=~/a.mat, vmode=double, dim=c(4,5)) # You can do the same at unknown file size with matFF - ff(filename=~/a.mat, vmode=double) # which gives you the length of the ff object length(matFF) # if you know the number of columns you can calculate the number of rows and give your ff object the interpretation of a matrix dim(matFF) - c(length(matFF)/5, 5) the matrix may grow in terms of the number of rows. Is there an efficient way to do this? # there are two ways to grow a matrix by rows # 1) you create the matrix in major row order matFF - ff(1:20, dim=c(4,5), dimorder=c(2:1)) # then you require a higher number of rows nrow(matFF) - 6 # as you can see there are new empty rows in the file matFF # 2) Instead of a matrix you create a ffdf data.frame #which you can also give more rows using nrow- #An example of this is in read.table.ffdf #which reads a csv file in chunks and extends the #number of rows in the ffdf Jens Oehlschlägel -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: cannot allocate vector of size...
For me with ff - on a 3 GB notebook - 3e6x100 works out of the box even without compression: doubles consume 2.2 GB on disk, but the R process remains under 100MB, rest of RAM used by file-system-cache. If you are under windows, you can create the ffdf files in a compressed folder. For the random doubles this reduces size on disk to 230MB - which should even work on a 1GB notebook. BTW: the most compressed datatype (vmode) that can handle NAs is logical: consumes 2bit per tri-bool. The nextmost compressed is byte covering c(NA, -127:127) and consuming its name on disk and in fs-cache. The code below should give an idea of how to do pairwise stats on columns where each pair fits easily into RAM. In the real world, you would not create the data but import it using read.csv.ffdf (expect that reading your file takes longer than reading/writing the ffdf). Regards Jens Oehlschlägel library(ff) k - 100 n - 3e6 # creating a ffdf dataframe of the requires size l - vector(list, k) for (i in 1:k) l[[i]] - ff(vmode=double, length=n, update=FALSE) names(l) - paste(c, 1:k, sep=) d - do.call(ffdf, l) # writing 100 columns of 1e6 random data takes 90 sec system.time( for (i in 1:k){ cat(i, ) print(system.time(d[,i] - rnorm(n))[elapsed]) } )[elapsed] m - matrix(as.double(NA), k, k) # pairwise correlating one column against all others takes ~ 17.5 sec # pairwise correlating all combinations takes 15 min system.time( for (i in 2:k){ cat(i, ) print(system.time({ x - d[[i]][] for (j in 1:(i-1)){ m[i,j] - cor(x, d[[j]][]) } })[elapsed]) } )[elapsed] -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New version of package ff
Dear R community, ff Version 2.1.1 is available on CRAN. It now supports large data.frames, csv import/export, packed atomic datatypes and bit filtering from package 'bit' on which it depends from now. Some performance results in seconds from test data with 78 mio rows and 7 columns on a 3 GB notebook: sequential reading 1 mio rows: csv = 32.7 ffdf = 1.3 sequential writing 1 mio rows: csv = 35.5 ffdf = 1.5 Examples of things you can do with ff and bit: - direct random access to rows of large data-frame instead of talking to SQL database (?ffdf) - store 4-level factor like A,T,G,C with 2bit instead of 32bit (?vmode) - fast chunked iteration (?chunk) - run linear model on large dataset using biglm (?chunk.ffdf) - handle boolean selections by factor 32 faster and less RAM consuming (?bit) - handle very skewed selections very fast (?bitwhich) - parallel access to large dataset just by sending ff's small metadata from master to slaves (e.g. with snowfall) ff is hosted on r-forge now and you find some presentations on ff at http://ff.r-forge.r-project.org/ Hope you find this useful. We appreciate any feedback. Jens Daniel ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hand-crafting an .RData file
If you can manage to write out your data in separate binary files, one for each column, then another possibility is using package ff. You can link those binary columns into R by defining an ffdf dataframe: columns are memory mapped and you can access those parts you need - without initially importing them. This is much faster than a csv import and also works for files that are too large to import at once. If all your columns have the same storage.mode (vmode in ff), then another alternative is writing out all your data in one single binary matrix with major row-order (because that can be written row by row from your program) and link the file into R as a single ff_matrix. Since ffdf in ff is new, I give a mini-tutorial below. Let me know how that works for you. Kind regards Jens Oehlschlägel library(ff) # Create example csv fnam - /tmp/example.csv write.csv(data.frame(a=1:9, b=1:9+0.1), file=fnam, row.names=FALSE) # Create example binary files on disk. # Reading csv into ffdf actually stores # each column as a binary file on disk. # Using a pattern outside fftempdir automatically sets finalizer=close # and thus makes those binary files permanent. path - /tmp/example_ x - read.csv.ffdf(file=fnam, ff_args=list(pattern=path)) close(x) # Note that a standard ffdf is made-up column by column from simple ff objects. # More coplex mappings from ff objects into ffdf are possible, # but let's keep it simple for now. p - physical(x) p # Now let's just create an ffdf from existing binary files. # Step one: create an ff object for each binary file (without reading them). # Note that because we open ff files outside fftempdir, # the default finalizer is close, not delete, # so the file will not be deleted on finalization # files are opened for memory mapping, but not read ffcols - vector(list, length(p)) for (i in 1:length(p)){ ffcols[[i]] - ff(filename=filename(p[[i]]), vmode=vmode(p[[i]])) } ffcols # step two: bundle several ff objects into one ffdf data.frame # (still without reading data) ffdafr - ffdf(a=ffcols[[1]], b=ffcols[[2]]) # now reading rows from this will return a standard data.frame # (and only read the required rows) ffdafr[1:4,] ffdafr[5:9,] # As an alternative create an example binary # (double) matrix in major row order y - as.ff(t(ffdafr[,]), filename=d:/tmp/example_single_matrix.ff) # Again we can link this existing binary file. # if we know the size of the matrix we can do z - ff(filename=filename(y), vmode=double, dim=c(9,2), dimorder=c(2,1)) z rm(z) # If we only know the number of columns we can do z - ff(filename=filename(y), vmode=double) # and set dim later dim(z) - c(length(z)/2, 2) # Note that so far we have interpreted the file in major column order z # To interpret the file in major column order we set dimorder # (a generalization for n-way arrays) dimorder(z) - c(2,1) z # removing the ff objects will trigger finalizer # at next garbage collection rm(x, ffcols, ffdafr, y, z) gc() # since we carefully selected the close finalizer, # the files still exist dir(path=/tmp, pattern=example_) # now remove them physically unlink(file.path(/tmp, dir(path=/tmp, pattern=example_))) -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incremental ReadLines
Gene, You might want to look at function read.csv.ffdf from package ff which can read large csv-files into a ffdf object. That's kind of data.frame which is stored on disk resp. in the file-system-cache. Once you subscript part of it, you get a regular data.frame. Jens Oehlschlägel -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Hi, Does anyone know where the following package is available: Holleczek B, Gondos A, Brenner H. PeriodR - an R package to calculate long term survival estimates using period analysis. Methods of Information in Medicine 2009; 48: 123-128. Thanks Jens Oehlschlägel -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions on csv reading
Hi, Is there any official way to determine the colClasses of a data.frame? Why has POSIXct such a strange class structure? Why is colClasses ordered not allowed (and doesn't work)? Background == I am writing a chunked csv reader that provides the functionality of read.table for large files (in the next version of package ff). In chunked reading, one wants to learn the colClasses from the data.frame returned for the first chunk and submit this as argument colClasses= to the following chunks (following calls to read.table). for most column types colClasses - sapply(data.frame, class) works fine. However, two column types have more than one class: ordered has c(ordered, factor) - currently we can't tell read.table that a column is an ordered factor POSIXct has c(POSIXt,POSIXct) - here the LESS specific class POSIXt is in the first position and would win in class-dispatch over the MORE specific class POSIXct. Why? Jens Oehlschlägel -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any difference between - and =
Sean, would like to receive expert opinion to avoid potential trouble [..] i think the following is the most secure way if one really really has to do assignment in a function call f({a=3}) and if one keeps this convention, - can be dropped altogether. secure is relative, since due to R's lazy evaluation you never know whether a function's argument is being evalutated, look at: f- function(x)TRUE x - 1 f((x=2)) # obscured attempt to assign in a function call [1] TRUE x [1] 1 Thus there is dangerous advice in the referenced blog which reads: f(x - 3) which means assign 3 to x, and call f with the first argument set to the value 3 This might be the case in C but not in R. Actually in R f(x - 3) means: call f with a first unevaluated argument x - 3, and if and only if f decides to evaluate its first argument, then the assignment is done. To make this very clear: f - function(x)if(runif(1)0.5) TRUE else x x - 1 print(f(x - x + 1)) [1] TRUE print(f(x - x + 1)) [1] 2 print(f(x - x + 1)) [1] 3 print(f(x - x + 1)) [1] TRUE print(f(x - x + 1)) [1] 4 print(f(x - x + 1)) [1] 5 print(f(x - x + 1)) [1] TRUE print(f(x - x + 1)) [1] 6 print(f(x - x + 1)) [1] TRUE Here it is unpredictable whether your assignment takes place. Thus assigning like f({x=1}) or f((x=1))is the maximum dangerous thing to do: even if you have a code-reviewer and the guy is aware of the danger of f(x-1) he will probably miss it because f((x=1)) does look too similar to a standard call f(x=1). According to help(-), R's assignment operator is rather - than =: The operators - and = assign into the environment in which they are evaluated. The operator - can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions. So my recommendation is 1) use R's assignment operator with two spaces around (or assign()) and don't obscure assignments by using C's assignment operator (or other languages equality operator) 2) do not assign in function arguments unless you have good reasons like in system.time(x - something) HTH Jens Oehlschlägel P.S. Disclaimer: you can consider me biased towards -, never trust experts, whether experienced or not. P.P.S. a puzzle, following an old tradition: What is going on here? (and what would you need to do to prove it?) search() [1] .GlobalEnvpackage:stats package:graphics package:grDevices package:utils package:datasets package:methods [8] Autoloads package:base ls(all.names = TRUE) [1] y y [1] 1 2 3 identical(y, 1:3) [1] TRUE y[] - 1 # assigning 1 fails y [1] 1 2 3 y[] - 2 # assigning 2 works y [1] 2 2 2 # Tip: no standard packages modified, no extra packages loaded, neither classes nor methods defined, no print methods hiding anything, if you would investigate my R you would not find any false bottom anymore version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.1 year 2008 month 12 day22 svn rev47281 language R version.string R version 2.8.1 (2008-12-22) -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package: bit 1.0
Dear R community, Package 'bit' Version 1.0 is available on CRAN. It provides bitmapped vectors of booleans (no NAs), coercion from and to logicals, integers and integer subscripts; fast boolean operators and fast summary statistics. With bit vectors you can store true binary booleans {FALSE,TRUE} at the expense of 1 bit only, on a 32 bit architecture this means factor 32 less RAM and factor 32 more speed on boolean operations. With this speed gain it even pays-off to convert to bit in order to avoid a single boolean operation on logicals or a single set operation on (longer) integer subscripts, the pay-off is dramatic when such components are used more than once. Reading from and writing to bit is approximately as fast as accessing standard logicals - mostly due to R's time for memory allocation. The package allows to work with pre-allocated memory for return values by calling .Call() directly: when evaluating the speed of C-access with pre-allocated vector memory, coping from bit to logical requires only 70% of the time for copying from logical to logical; and copying from logical to bit comes at a performance penalty of 150%. Functions 'which' and 'xor' are made S3 generic, 'xor.default' is implemented much faster than in base R (this should go into base R). The package has automated regression-tests and is hopefully useful for better handling large datasets, together with packages 'rindex' and 'ff'. Best regards Jens Oehlschlägel Munich, 10.10.2008 ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 2.7.0 is released
Many thanks to the core team for an impressive list of new improvements ... o strwidth() and strheight() gain 'font' and 'vfont' arguments and accept in-line pars such as 'family' in the same way as text() does. (Longstanding wish of PR#776) ... and for not having forgotten an 8 year old wish! Jens Oehlschlaegel -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind function
And here is a second solution, that differs in what happens if the variables have differing lengths: var1 - 1:4 var2 - 1:3 sapply(ls(patt=^var[0-9]), get) $var1 [1] 1 2 3 4 $var2 [1] 1 2 3 do.call(cbind, lapply(ls(patt=^var[0-9]), get)) [,1] [,2] [1,]11 [2,]22 [3,]33 [4,]41 Warning message: In cbind(1:4, 1:3) : number of rows of result is not a multiple of vector length (arg 2) Best regards Jens Oehlschlägel -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.