Re: [R] Randomly remove condition-selected rows from a matrix

2009-01-02 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron
 carnivorescie...@gmail.com wrote:
   
 m[-sample(which(m[,1]8  m[,2]12),2),]
   
 Supposing I sample only one row among the ones matching my criteria. Then
 consider the case where there is just one row matching this criteria. Sure,
 there is no need to sample, but the instruction would still be executed.
 Then if this row index is 15, my instruction becomes which(15,1), and this
 can gives me any row from 1 to 15, which is not correct. I have to make a
 condition in case there is only one row matching the criteria.
 

 Yes, this is a (documented!) design flaw in 'sample' -- see the man page.

 For some reason, the designers of R have chosen to document the flaw
 and leave it up to individual users to work around it rather than fix
 it definitively.  A related case is sample(c(),0), which gives an
 error rather than giving an empty vector, though in general R deals
 with empty vectors correctly (e.g. sum(c()) = 0).

   

interestingly, ?sample says:


 'sample' takes a sample of the specified size from the elements of
 'x' using either with or without replacement.

   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.

If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.  _Note_ that this
 convenience feature may lead to undesired behaviour when 'x' is of
 varying length 'sample(x)'.  See the 'resample()' example below.



yet the following works, even though x has length 1 and is *not* numeric:

x = foolme
is.numeric(x)
sample(x, 1)
sample(x)

x = NA
is.numeric(NA)
sample(x, 1)
sample(x)

is this a bug in the code, or a bug in the documentation?



 To my mind, it is bizarre to have an important basic function which
 works for some argument lengths but not others.  The convenience of
 being able to write sample(5,2) for sample(1:5,2) hardly seems worth
 inflicting inconsistency on all users -- but perhaps one of the
 designers of R/S can enlighten us on the design rationale here.

   

hopefully.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2009-01-02 Thread Duncan Murdoch

On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote:

Stavros Macrakis wrote:

On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron
carnivorescie...@gmail.com wrote:
  

m[-sample(which(m[,1]8  m[,2]12),2),]
  

Supposing I sample only one row among the ones matching my criteria. Then
consider the case where there is just one row matching this criteria. Sure,
there is no need to sample, but the instruction would still be executed.
Then if this row index is 15, my instruction becomes which(15,1), and this
can gives me any row from 1 to 15, which is not correct. I have to make a
condition in case there is only one row matching the criteria.


Yes, this is a (documented!) design flaw in 'sample' -- see the man page.

For some reason, the designers of R have chosen to document the flaw
and leave it up to individual users to work around it rather than fix
it definitively.  A related case is sample(c(),0), which gives an
error rather than giving an empty vector, though in general R deals
with empty vectors correctly (e.g. sum(c()) = 0).

  


interestingly, ?sample says:


 'sample' takes a sample of the specified size from the elements of
 'x' using either with or without replacement.

   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.

If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.  _Note_ that this
 convenience feature may lead to undesired behaviour when 'x' is of
 varying length 'sample(x)'.  See the 'resample()' example below.



yet the following works, even though x has length 1 and is *not* numeric:

x = foolme
is.numeric(x)
sample(x, 1)
sample(x)

x = NA
is.numeric(NA)
sample(x, 1)
sample(x)

is this a bug in the code, or a bug in the documentation?




To my mind, it is bizarre to have an important basic function which
works for some argument lengths but not others.  The convenience of
being able to write sample(5,2) for sample(1:5,2) hardly seems worth
inflicting inconsistency on all users -- but perhaps one of the
designers of R/S can enlighten us on the design rationale here.

  


hopefully.


This is more of an R-devel sort of question.  My guess is that this is 
in the S blue book, but I don't have a copy here to check.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2009-01-02 Thread Wacek Kusnierczyk
xxx wrote:
 On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
   
 ... 'sample' takes a sample of the specified size from the elements of
 'x' using either with or without replacement.

   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.

If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.  _Note_ that this
 convenience feature may lead to undesired behaviour when 'x' is of
 varying length 'sample(x)'.  See the 'resample()' example below.
 ...
 yet the following works, even though x has length 1 and is *not* numeric:...
 is this a bug in the code, or a bug in the documentation?
 

 I would guess it's a bug in the documentation.

  

possibly.  looking at the r code for sample, it's clear why
sample(foo) works:

function (x, size, replace = FALSE, prob = NULL)
{
if (length(x) == 1  is.numeric(x)  x = 1) {
if (missing(size))
size - x
.Internal(sample(x, size, replace, prob))
}
else {
if (missing(size))
size - length(x)
x[.Internal(sample(length(x), size, replace, prob))]
}
}

what is also clear from the code is that the function has another,
supposedly buggy behaviour due to the smart behaviour of the : operator:

sample(1.1)
# 1, not 1.1

this is consistent with


 If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.


due to the downcast performed by the colon operator, but not with


   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.


both from ?sample.  tfm is seemingly wrong wrt. the implementation, and
i find sample(1.1) returning 1 a design flaw.  (i guess the note _Note_
that this convenience feature may lead to undesired behaviour when 'x'
is of varying length 'sample(x)'. is supposed to explain away such cases.)

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2009-01-02 Thread Stavros Macrakis
There is another undocumented glitch in sample:

 sample(2^31-1,1) = OK
 sample(2^31 ,1) = Error

I suppose you could interpret sampling takes place from '1:x'  to
mean that 1:x is actually generated, but that doesn't work as an
explanation either; on my 32-bit Windows box, 1:(2^29) gives an error,
but sample(2^29,1) works fine.

  -s

On Fri, Jan 2, 2009 at 2:18 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 xxx wrote:
 On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 ... 'sample' takes a sample of the specified size from the elements of
 'x' using either with or without replacement.

   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.

If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.  _Note_ that this
 convenience feature may lead to undesired behaviour when 'x' is of
 varying length 'sample(x)'.  See the 'resample()' example below.
 ...
 yet the following works, even though x has length 1 and is *not* numeric:...
 is this a bug in the code, or a bug in the documentation?


 I would guess it's a bug in the documentation.



 possibly.  looking at the r code for sample, it's clear why
 sample(foo) works:

 function (x, size, replace = FALSE, prob = NULL)
 {
if (length(x) == 1  is.numeric(x)  x = 1) {
if (missing(size))
size - x
.Internal(sample(x, size, replace, prob))
}
else {
if (missing(size))
size - length(x)
x[.Internal(sample(length(x), size, replace, prob))]
}
 }

 what is also clear from the code is that the function has another,
 supposedly buggy behaviour due to the smart behaviour of the : operator:

 sample(1.1)
 # 1, not 1.1

 this is consistent with

 
 If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
 'x = 1', sampling takes place from '1:x'.
 

 due to the downcast performed by the colon operator, but not with

 
   x: Either a (numeric, complex, character or logical) vector of
  more than one element from which to choose, or a positive
  integer.
 

 both from ?sample.  tfm is seemingly wrong wrt. the implementation, and
 i find sample(1.1) returning 1 a design flaw.  (i guess the note _Note_
 that this convenience feature may lead to undesired behaviour when 'x'
 is of varying length 'sample(x)'. is supposed to explain away such cases.)

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-31 Thread Guillaume Chapron



I believe this does what you want:

m[-sample(which(m[,1]8  m[,2]12),2),]

Analysis:

Get a boolean vector of rows fitting criteria:
   m[,1]8  m[,2]12

What are their indexes?
   which(...)

Choose two among those indexes:
sample(...,2)


Thanks, but this does not seem to always work.

Supposing I sample only one row among the ones matching my criteria.  
Then consider the case where there is just one row matching this  
criteria. Sure, there is no need to sample, but the instruction would  
still be executed. Then if this row index is 15, my instruction  
becomes which(15,1), and this can gives me any row from 1 to 15, which  
is not correct. I have to make a condition in case there is only one  
row matching the criteria.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-31 Thread Charles C. Berry

On Wed, 31 Dec 2008, Guillaume Chapron wrote:




I believe this does what you want:

m[-sample(which(m[,1]8  m[,2]12),2),]

Analysis:

Get a boolean vector of rows fitting criteria:
   m[,1]8  m[,2]12

What are their indexes?
   which(...)

Choose two among those indexes:
sample(...,2)


Thanks, but this does not seem to always work.

Supposing I sample only one row among the ones matching my criteria. Then 
consider the case where there is just one row matching this criteria. Sure, 
there is no need to sample, but the instruction would still be executed. Then 
if this row index is 15, my instruction becomes which(15,1), and this can


I think you mean 'sample(15,1)', no?


From ?sample:


---

Details

If x has length 1, is numeric (in the sense of is.numeric) and x = 1, 
sampling takes place from 1:x. Note that this convenience feature may lead 
to undesired behaviour when x is of varying length sample(x). See the 
resample() example below.


---

So define and use 'resample'.


It often helps to reread help pages and rerun example()s, when things are 
not going your way!



HTH,

Chuck



gives me any row from 1 to 15, which is not correct. I have to make a 
condition in case there is only one row matching the criteria.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-31 Thread Stavros Macrakis
On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron
carnivorescie...@gmail.com wrote:
 m[-sample(which(m[,1]8  m[,2]12),2),]
 Supposing I sample only one row among the ones matching my criteria. Then
 consider the case where there is just one row matching this criteria. Sure,
 there is no need to sample, but the instruction would still be executed.
 Then if this row index is 15, my instruction becomes which(15,1), and this
 can gives me any row from 1 to 15, which is not correct. I have to make a
 condition in case there is only one row matching the criteria.

Yes, this is a (documented!) design flaw in 'sample' -- see the man page.

For some reason, the designers of R have chosen to document the flaw
and leave it up to individual users to work around it rather than fix
it definitively.  A related case is sample(c(),0), which gives an
error rather than giving an empty vector, though in general R deals
with empty vectors correctly (e.g. sum(c()) = 0).

To my mind, it is bizarre to have an important basic function which
works for some argument lengths but not others.  The convenience of
being able to write sample(5,2) for sample(1:5,2) hardly seems worth
inflicting inconsistency on all users -- but perhaps one of the
designers of R/S can enlighten us on the design rationale here.

   -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-30 Thread Sarah Goslee
Assuming your values aren't always in such neat order, you could use
something like:

valtoremove1 - sample((1:nrow(m))[m[,1]  8], 1)
valtoremove2 - sample((1:nrow(m))[m[,1]  12], 1)

Sarah

On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron
carnivorescie...@gmail.com wrote:
 Hello all,

 I create the following matrix:

 m - matrix(1:20, nrow = 10, ncol = 2)

 which looks like:

  [,1] [,2]
  [1,]1   11
  [2,]2   12
  [3,]3   13
  [4,]4   14
  [5,]5   15
  [6,]6   16
  [7,]7   17
  [8,]8   18
  [9,]9   19
 [10,]   10   20

 Then, I want to remove randomly 2 rows among the ones where m[,1]8 and
 m[,2]12

 I suppose the best way is to use the sample() function. I understand how to
 do it when I remove among any rows, but I have not been able to do it when I
 remove among specific rows only. What I could do is split the matrix into
 two matrices, one with the rows to be sampled and removed, one with the
 other rows. I would sample and remove, and then merge the two matrices
 again. But since this part of the code is going to be done many times, I
 would like to have it the most efficient possible without creating new
 objects. Any idea? Thanks!

 Cheers

 Guillaume



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-30 Thread Daniel Malter

Hi,

The approach below uses a function. The nice thing about it is that you can
define the cutoff values dynamically (i.e. what is 8 and 12 in your
example). The functions extract a row index to remove. Be aware that there
is no warning if both return the same row index. You might have to adjust
for that.

x=1:10
y=11:20
z=cbind(x,y)

a=function(x,m){which(x==sample(x[xm],1))}
b=function(y,n){which(y==sample(y[yn],1))}

z[-c(a(x,8),b(y,12)),]

Cheers,
Daniel




Guillaume Chapron-3 wrote:
 
 Hello all,
 
 I create the following matrix:
 
 m - matrix(1:20, nrow = 10, ncol = 2)
 
 which looks like:
 
[,1] [,2]
   [1,]1   11
   [2,]2   12
   [3,]3   13
   [4,]4   14
   [5,]5   15
   [6,]6   16
   [7,]7   17
   [8,]8   18
   [9,]9   19
 [10,]   10   20
 
 Then, I want to remove randomly 2 rows among the ones where m[,1]8  
 and m[,2]12
 
 I suppose the best way is to use the sample() function. I understand  
 how to do it when I remove among any rows, but I have not been able to  
 do it when I remove among specific rows only. What I could do is split  
 the matrix into two matrices, one with the rows to be sampled and  
 removed, one with the other rows. I would sample and remove, and then  
 merge the two matrices again. But since this part of the code is going  
 to be done many times, I would like to have it the most efficient  
 possible without creating new objects. Any idea? Thanks!
 
 Cheers
 
 Guillaume
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Randomly-remove-condition-selected-rows-from-a-matrix-tp21218219p21218541.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly remove condition-selected rows from a matrix

2008-12-30 Thread Stavros Macrakis
I believe this does what you want:

m[-sample(which(m[,1]8  m[,2]12),2),]

Analysis:

Get a boolean vector of rows fitting criteria:
m[,1]8  m[,2]12

What are their indexes?
which(...)

Choose two among those indexes:
 sample(...,2)

Choose all except the selected rows from the original:
 m[- ... , ]

  -s

On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron
carnivorescie...@gmail.com wrote:

 m - matrix(1:20, nrow = 10, ncol = 2)
 [,1] [,2]
  [1,]1   11
  [2,]2   12
  [3,]3   13
  [4,]4   14
  [5,]5   15
  [6,]6   16
  [7,]7   17
  [8,]8   18
  [9,]9   19
 [10,]   10   20

 Then, I want to remove randomly 2 rows among the ones where m[,1]8 and
 m[,2]12

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.