Re: [R] sapply returning list instead of matrix

2014-02-02 Thread chris warth
Can I follow-up with what I've learned about my own myopia regarding
sapply()?

First, I appreciate all the feedback.   After thinking about it for a
while I realized R designers have often chosen to accommodate
interactive usage,  and in that context, sapply() returning different
types makes perfect sense.

If applying both 'mean' and 'var' to multiple data sets in a list, it
makes sense to return a matrix, but if applying just 'mean' the same
list of data sets it makes sense to return a list, not a 1xN matrix.
   This works well in an interactive context but when writing robust
applications, it is essential that routines return consistent types,
especially if the parameters are determined from unpredictable user
input.   The behavior of functions like sapply() in R seems
extraordinary compared to languages I am more familiar with like C,
Java, or Python.

In my case I was using sapply() to extract alignments from multiple
BAM files that overlap exons of a gene.My application of sapply()
returned a matrix with data sets across columns and exons down the
rows.   This worked well for most genes, but failed when run on a gene
with only a single exon because sapply() returned a list instead of a
matrix.   This bug in my code was just waiting for the right set of
inputs to trigger it.

[ Some suggested using vapply() but don't think that would help in
this case because the length of the return value from the applied
function is variable and depends on how many exons are in the gene.
Or perhaps I just don't understand vapply well. ]

sapply() is behaving very similarly to the way the '[' and '[['
operators treat data frames.   The extract operator '[' returns a
vector when extracting a single column from a data frame,  otherwise
it returns a data frame.However both '[' and '[[' take a 'drop'
parameter to control this behavior so you can get a consistent type
back if you need it.

I wish sapply() had a similar option.

-csw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sapply returning list instead of matrix

2014-01-31 Thread chris warth
Can anyone suggest a rationale for why sapply() returns different types
(list and matrix) in the two examples below?   Is there any way to get
sapply() or any other apply() function to return a matrix in both cases?
simplify=TRUE doesn't change the outcome.

I understand why it is happening, I just can't understand why such
unpredictable behavior makes sense.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sapply returning list instead of matrix

2014-01-31 Thread chris warth
Hey thanks for the helpful snark, Bert.
To everyone else, I apologize for neglecting to actually include the
examples.

a - function(i) { list(1) }
b - function(i) { list(1,2) }
ll - sapply(seq(3), a, simplfy=list)
mm - sapply(seq(3), b)
class(ll)
class(mm)
 class(ll)
[1] list
 class(mm)
[1] matrix

I can read the documentation, I see why it happens, but who in their right
mind would design a function this way?  Can you imagine how many bugs are
lurking because people haven't yet hit the right set of input that is going
to cause sapply() to return a list instead of a matrix().

The point is that having the type of return value depend on the length of
output from the applied function is simply madness.   It is a terrible
design decision.  What is to be gained from the fact that I have to test
the type of value returned from sapply()?   I was hoping plyr::laply()
would be better but it perpetuates the same bad interface.

[so sorry for sending html, if that is what's happening.   I guess gmail
send html by default? ]


On Fri, Jan 31, 2014 at 1:44 PM, Bert Gunter gunter.ber...@gene.com wrote:

 As you ignored the posting guide and posted in HTML, your below
 didn't get through. So one can only guess that it has something to do
 with (see ?sapply)

 Simplification in sapply is only attempted if X has length greater
 than zero and if the return values from all elements of X are all of
 the same (positive) length. If the common length is one the result is
 a vector, and if greater than one is a matrix with a column
 corresponding to each element of X. 

 Return values most also be of the same type, also, obviously.

 Cheers,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch




 On Fri, Jan 31, 2014 at 1:36 PM, chris warth cswa...@gmail.com wrote:
  Can anyone suggest a rationale for why sapply() returns different types
  (list and matrix) in the two examples below?   Is there any way to get
  sapply() or any other apply() function to return a matrix in both cases?
  simplify=TRUE doesn't change the outcome.
 
  I understand why it is happening, I just can't understand why such
  unpredictable behavior makes sense.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] readLines() behavior is really strange

2014-01-24 Thread chris warth
This might also be titled, How do I use R as a streaming process?
I would like to use R as a streaming processor, but it seems to have trouble
capturing all the input.

Can someone explain why this script skips the first few lines of input?
Is this a bug in R or some interaction with line buffering on /dev/stdin?

$ R --slave  --no-site-file  --no-init-file -e 'readLines(/dev/stdin)'
 a
 b
 c
 d
 e
 f
 ^D
 [1] d e f


I would use something like 'readLines(stdin())' but stdin() doesn't seem to
be hooked up to the tty in slave mode (doesn't wait for input).

$ R --slave -e 'readLines(stdin())'
 [1] 



This odd behavior is not limited to slave sessions, and seems to be skipping
a minimum number of characters rather than skipping a certain number of
lines.

$ R --no-site-file --no-init-file -q
  readLines(/dev/stdin)
 a
 b
 c
 d
 e
 f
 g
 ^D
 [1] d e f g
  readLines(/dev/stdin)
 aa
 b
 ^D
 [1] b
 



Thanks in advance,   -csw

[Thx also to Yihui Xie for helping identify this as a behavior in base R,
not in a package]



 $ uname -a
 Linux xx 3.5.0-43-generic #66~precise1-Ubuntu SMP Thu Oct 24 14:52:23
 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 $ R --slave  --no-site-file  --no-init-file -e 'sessionInfo()'
 R version 3.0.2 (2013-09-25)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.