Re: [R] sapply returning list instead of matrix
Can I follow-up with what I've learned about my own myopia regarding sapply()? First, I appreciate all the feedback. After thinking about it for a while I realized R designers have often chosen to accommodate interactive usage, and in that context, sapply() returning different types makes perfect sense. If applying both 'mean' and 'var' to multiple data sets in a list, it makes sense to return a matrix, but if applying just 'mean' the same list of data sets it makes sense to return a list, not a 1xN matrix. This works well in an interactive context but when writing robust applications, it is essential that routines return consistent types, especially if the parameters are determined from unpredictable user input. The behavior of functions like sapply() in R seems extraordinary compared to languages I am more familiar with like C, Java, or Python. In my case I was using sapply() to extract alignments from multiple BAM files that overlap exons of a gene.My application of sapply() returned a matrix with data sets across columns and exons down the rows. This worked well for most genes, but failed when run on a gene with only a single exon because sapply() returned a list instead of a matrix. This bug in my code was just waiting for the right set of inputs to trigger it. [ Some suggested using vapply() but don't think that would help in this case because the length of the return value from the applied function is variable and depends on how many exons are in the gene. Or perhaps I just don't understand vapply well. ] sapply() is behaving very similarly to the way the '[' and '[[' operators treat data frames. The extract operator '[' returns a vector when extracting a single column from a data frame, otherwise it returns a data frame.However both '[' and '[[' take a 'drop' parameter to control this behavior so you can get a consistent type back if you need it. I wish sapply() had a similar option. -csw __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sapply returning list instead of matrix
Can anyone suggest a rationale for why sapply() returns different types (list and matrix) in the two examples below? Is there any way to get sapply() or any other apply() function to return a matrix in both cases? simplify=TRUE doesn't change the outcome. I understand why it is happening, I just can't understand why such unpredictable behavior makes sense. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply returning list instead of matrix
Hey thanks for the helpful snark, Bert. To everyone else, I apologize for neglecting to actually include the examples. a - function(i) { list(1) } b - function(i) { list(1,2) } ll - sapply(seq(3), a, simplfy=list) mm - sapply(seq(3), b) class(ll) class(mm) class(ll) [1] list class(mm) [1] matrix I can read the documentation, I see why it happens, but who in their right mind would design a function this way? Can you imagine how many bugs are lurking because people haven't yet hit the right set of input that is going to cause sapply() to return a list instead of a matrix(). The point is that having the type of return value depend on the length of output from the applied function is simply madness. It is a terrible design decision. What is to be gained from the fact that I have to test the type of value returned from sapply()? I was hoping plyr::laply() would be better but it perpetuates the same bad interface. [so sorry for sending html, if that is what's happening. I guess gmail send html by default? ] On Fri, Jan 31, 2014 at 1:44 PM, Bert Gunter gunter.ber...@gene.com wrote: As you ignored the posting guide and posted in HTML, your below didn't get through. So one can only guess that it has something to do with (see ?sapply) Simplification in sapply is only attempted if X has length greater than zero and if the return values from all elements of X are all of the same (positive) length. If the common length is one the result is a vector, and if greater than one is a matrix with a column corresponding to each element of X. Return values most also be of the same type, also, obviously. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Fri, Jan 31, 2014 at 1:36 PM, chris warth cswa...@gmail.com wrote: Can anyone suggest a rationale for why sapply() returns different types (list and matrix) in the two examples below? Is there any way to get sapply() or any other apply() function to return a matrix in both cases? simplify=TRUE doesn't change the outcome. I understand why it is happening, I just can't understand why such unpredictable behavior makes sense. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readLines() behavior is really strange
This might also be titled, How do I use R as a streaming process? I would like to use R as a streaming processor, but it seems to have trouble capturing all the input. Can someone explain why this script skips the first few lines of input? Is this a bug in R or some interaction with line buffering on /dev/stdin? $ R --slave --no-site-file --no-init-file -e 'readLines(/dev/stdin)' a b c d e f ^D [1] d e f I would use something like 'readLines(stdin())' but stdin() doesn't seem to be hooked up to the tty in slave mode (doesn't wait for input). $ R --slave -e 'readLines(stdin())' [1] This odd behavior is not limited to slave sessions, and seems to be skipping a minimum number of characters rather than skipping a certain number of lines. $ R --no-site-file --no-init-file -q readLines(/dev/stdin) a b c d e f g ^D [1] d e f g readLines(/dev/stdin) aa b ^D [1] b Thanks in advance, -csw [Thx also to Yihui Xie for helping identify this as a behavior in base R, not in a package] $ uname -a Linux xx 3.5.0-43-generic #66~precise1-Ubuntu SMP Thu Oct 24 14:52:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux $ R --slave --no-site-file --no-init-file -e 'sessionInfo()' R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.