Re: [R] Randomly remove condition-selected rows from a matrix
Stavros Macrakis wrote: On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron carnivorescie...@gmail.com wrote: m[-sample(which(m[,1]8 m[,2]12),2),] Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria. Yes, this is a (documented!) design flaw in 'sample' -- see the man page. For some reason, the designers of R have chosen to document the flaw and leave it up to individual users to work around it rather than fix it definitively. A related case is sample(c(),0), which gives an error rather than giving an empty vector, though in general R deals with empty vectors correctly (e.g. sum(c()) = 0). interestingly, ?sample says: 'sample' takes a sample of the specified size from the elements of 'x' using either with or without replacement. x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. See the 'resample()' example below. yet the following works, even though x has length 1 and is *not* numeric: x = foolme is.numeric(x) sample(x, 1) sample(x) x = NA is.numeric(NA) sample(x, 1) sample(x) is this a bug in the code, or a bug in the documentation? To my mind, it is bizarre to have an important basic function which works for some argument lengths but not others. The convenience of being able to write sample(5,2) for sample(1:5,2) hardly seems worth inflicting inconsistency on all users -- but perhaps one of the designers of R/S can enlighten us on the design rationale here. hopefully. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote: Stavros Macrakis wrote: On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron carnivorescie...@gmail.com wrote: m[-sample(which(m[,1]8 m[,2]12),2),] Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria. Yes, this is a (documented!) design flaw in 'sample' -- see the man page. For some reason, the designers of R have chosen to document the flaw and leave it up to individual users to work around it rather than fix it definitively. A related case is sample(c(),0), which gives an error rather than giving an empty vector, though in general R deals with empty vectors correctly (e.g. sum(c()) = 0). interestingly, ?sample says: 'sample' takes a sample of the specified size from the elements of 'x' using either with or without replacement. x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. See the 'resample()' example below. yet the following works, even though x has length 1 and is *not* numeric: x = foolme is.numeric(x) sample(x, 1) sample(x) x = NA is.numeric(NA) sample(x, 1) sample(x) is this a bug in the code, or a bug in the documentation? To my mind, it is bizarre to have an important basic function which works for some argument lengths but not others. The convenience of being able to write sample(5,2) for sample(1:5,2) hardly seems worth inflicting inconsistency on all users -- but perhaps one of the designers of R/S can enlighten us on the design rationale here. hopefully. This is more of an R-devel sort of question. My guess is that this is in the S blue book, but I don't have a copy here to check. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
xxx wrote: On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: ... 'sample' takes a sample of the specified size from the elements of 'x' using either with or without replacement. x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. See the 'resample()' example below. ... yet the following works, even though x has length 1 and is *not* numeric:... is this a bug in the code, or a bug in the documentation? I would guess it's a bug in the documentation. possibly. looking at the r code for sample, it's clear why sample(foo) works: function (x, size, replace = FALSE, prob = NULL) { if (length(x) == 1 is.numeric(x) x = 1) { if (missing(size)) size - x .Internal(sample(x, size, replace, prob)) } else { if (missing(size)) size - length(x) x[.Internal(sample(length(x), size, replace, prob))] } } what is also clear from the code is that the function has another, supposedly buggy behaviour due to the smart behaviour of the : operator: sample(1.1) # 1, not 1.1 this is consistent with If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. due to the downcast performed by the colon operator, but not with x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. both from ?sample. tfm is seemingly wrong wrt. the implementation, and i find sample(1.1) returning 1 a design flaw. (i guess the note _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. is supposed to explain away such cases.) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
There is another undocumented glitch in sample: sample(2^31-1,1) = OK sample(2^31 ,1) = Error I suppose you could interpret sampling takes place from '1:x' to mean that 1:x is actually generated, but that doesn't work as an explanation either; on my 32-bit Windows box, 1:(2^29) gives an error, but sample(2^29,1) works fine. -s On Fri, Jan 2, 2009 at 2:18 PM, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: xxx wrote: On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: ... 'sample' takes a sample of the specified size from the elements of 'x' using either with or without replacement. x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. See the 'resample()' example below. ... yet the following works, even though x has length 1 and is *not* numeric:... is this a bug in the code, or a bug in the documentation? I would guess it's a bug in the documentation. possibly. looking at the r code for sample, it's clear why sample(foo) works: function (x, size, replace = FALSE, prob = NULL) { if (length(x) == 1 is.numeric(x) x = 1) { if (missing(size)) size - x .Internal(sample(x, size, replace, prob)) } else { if (missing(size)) size - length(x) x[.Internal(sample(length(x), size, replace, prob))] } } what is also clear from the code is that the function has another, supposedly buggy behaviour due to the smart behaviour of the : operator: sample(1.1) # 1, not 1.1 this is consistent with If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x = 1', sampling takes place from '1:x'. due to the downcast performed by the colon operator, but not with x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. both from ?sample. tfm is seemingly wrong wrt. the implementation, and i find sample(1.1) returning 1 a design flaw. (i guess the note _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. is supposed to explain away such cases.) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
I believe this does what you want: m[-sample(which(m[,1]8 m[,2]12),2),] Analysis: Get a boolean vector of rows fitting criteria: m[,1]8 m[,2]12 What are their indexes? which(...) Choose two among those indexes: sample(...,2) Thanks, but this does not seem to always work. Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
On Wed, 31 Dec 2008, Guillaume Chapron wrote: I believe this does what you want: m[-sample(which(m[,1]8 m[,2]12),2),] Analysis: Get a boolean vector of rows fitting criteria: m[,1]8 m[,2]12 What are their indexes? which(...) Choose two among those indexes: sample(...,2) Thanks, but this does not seem to always work. Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can I think you mean 'sample(15,1)', no? From ?sample: --- Details If x has length 1, is numeric (in the sense of is.numeric) and x = 1, sampling takes place from 1:x. Note that this convenience feature may lead to undesired behaviour when x is of varying length sample(x). See the resample() example below. --- So define and use 'resample'. It often helps to reread help pages and rerun example()s, when things are not going your way! HTH, Chuck gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron carnivorescie...@gmail.com wrote: m[-sample(which(m[,1]8 m[,2]12),2),] Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria. Yes, this is a (documented!) design flaw in 'sample' -- see the man page. For some reason, the designers of R have chosen to document the flaw and leave it up to individual users to work around it rather than fix it definitively. A related case is sample(c(),0), which gives an error rather than giving an empty vector, though in general R deals with empty vectors correctly (e.g. sum(c()) = 0). To my mind, it is bizarre to have an important basic function which works for some argument lengths but not others. The convenience of being able to write sample(5,2) for sample(1:5,2) hardly seems worth inflicting inconsistency on all users -- but perhaps one of the designers of R/S can enlighten us on the design rationale here. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
Assuming your values aren't always in such neat order, you could use something like: valtoremove1 - sample((1:nrow(m))[m[,1] 8], 1) valtoremove2 - sample((1:nrow(m))[m[,1] 12], 1) Sarah On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron carnivorescie...@gmail.com wrote: Hello all, I create the following matrix: m - matrix(1:20, nrow = 10, ncol = 2) which looks like: [,1] [,2] [1,]1 11 [2,]2 12 [3,]3 13 [4,]4 14 [5,]5 15 [6,]6 16 [7,]7 17 [8,]8 18 [9,]9 19 [10,] 10 20 Then, I want to remove randomly 2 rows among the ones where m[,1]8 and m[,2]12 I suppose the best way is to use the sample() function. I understand how to do it when I remove among any rows, but I have not been able to do it when I remove among specific rows only. What I could do is split the matrix into two matrices, one with the rows to be sampled and removed, one with the other rows. I would sample and remove, and then merge the two matrices again. But since this part of the code is going to be done many times, I would like to have it the most efficient possible without creating new objects. Any idea? Thanks! Cheers Guillaume -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
Hi, The approach below uses a function. The nice thing about it is that you can define the cutoff values dynamically (i.e. what is 8 and 12 in your example). The functions extract a row index to remove. Be aware that there is no warning if both return the same row index. You might have to adjust for that. x=1:10 y=11:20 z=cbind(x,y) a=function(x,m){which(x==sample(x[xm],1))} b=function(y,n){which(y==sample(y[yn],1))} z[-c(a(x,8),b(y,12)),] Cheers, Daniel Guillaume Chapron-3 wrote: Hello all, I create the following matrix: m - matrix(1:20, nrow = 10, ncol = 2) which looks like: [,1] [,2] [1,]1 11 [2,]2 12 [3,]3 13 [4,]4 14 [5,]5 15 [6,]6 16 [7,]7 17 [8,]8 18 [9,]9 19 [10,] 10 20 Then, I want to remove randomly 2 rows among the ones where m[,1]8 and m[,2]12 I suppose the best way is to use the sample() function. I understand how to do it when I remove among any rows, but I have not been able to do it when I remove among specific rows only. What I could do is split the matrix into two matrices, one with the rows to be sampled and removed, one with the other rows. I would sample and remove, and then merge the two matrices again. But since this part of the code is going to be done many times, I would like to have it the most efficient possible without creating new objects. Any idea? Thanks! Cheers Guillaume __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Randomly-remove-condition-selected-rows-from-a-matrix-tp21218219p21218541.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly remove condition-selected rows from a matrix
I believe this does what you want: m[-sample(which(m[,1]8 m[,2]12),2),] Analysis: Get a boolean vector of rows fitting criteria: m[,1]8 m[,2]12 What are their indexes? which(...) Choose two among those indexes: sample(...,2) Choose all except the selected rows from the original: m[- ... , ] -s On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron carnivorescie...@gmail.com wrote: m - matrix(1:20, nrow = 10, ncol = 2) [,1] [,2] [1,]1 11 [2,]2 12 [3,]3 13 [4,]4 14 [5,]5 15 [6,]6 16 [7,]7 17 [8,]8 18 [9,]9 19 [10,] 10 20 Then, I want to remove randomly 2 rows among the ones where m[,1]8 and m[,2]12 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.