[R] Subsets of Boolean string model
Dear R-help list, I have a problem regarding text manipulation in R, where my basic knowledge doesn't suffice anymore. It might be a bigger problem, but any help would be greatly appreciated and acknowledged. As input, I have a character string representing some Boolean function, such as aB+Bc+D, for instance, where + means OR, AND has been omitted between two factors represented by single letters, and lower-case x simply means NOT X. Now I would like to form all sub-models without including models that are not redundancy-free. For example, D, a+D and B+c+D would be ok, but aB+B+D, B+Bc and B+B+D would not because B is a (strict) superset of both aB and Bc as well as a (trivial) superset of B. With regards to D+aB+Bc, there would thus be 24 permissible and unique sub-models (including the empty set): a, B, c, D, aB, Bc, a+B, a+c, a+D, B+c, B+D, c+D, a+Bc, aB+c, aB+D, aB+Bc, Bc+D, a+B+D, a+c+D, a+Bc+D, aB+c+D, B+c+D, aB+Bc+D, . How could I generate a character vector of all permissible and unique sub-models from any Boolean function of the form given above? Best wishes, Alrik Alrik Thiem Post-Doctoral Researcher Department of Philosophy University of Geneva Rue de Candolle 2 CH-1211 Geneva +41 76 527 80 83 http://www.alrik-thiem.net http://www.compasss.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsets of a function
Another option is the plyr package. library(plyr) result - dlply(size, ~ Year +Season, function(.sub){ with(.sub, smooth.spline(Size, Prop, spar = 0.25)) } ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens jim holtman Verzonden: dinsdag 20 mei 2014 2:44 Aan: Marlin Keith Cox CC: r-help@r-project.org Onderwerp: Re: [R] Subsets of a function It would have been nice if you at least supplied a subset of the data, but here is a try at it: myList - split(size, list(size$Year, size$Season)) result - lapply(myList, function(.sub){ smooth.spline(.sub$Size, spar = 0.25) }) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, May 19, 2014 at 8:34 PM, Marlin Keith Cox marlink...@gmail.comwrote: Hi all, this is a reoccurring theme in my programming and I need some help with it. When I use a built in function and need to use it on a subset of my data frame, I always end up using the subset function first, but this seems very clunky. For example, if I have years 2003:2013 with season a and b within each year, and I want to create a smooth.spline, I end up creating a subset for each year and season, and then have a smooth spline function for each year and season. Can I do this more efficiently? The subsets are below: size.2003-subset(size,Year==2003Season==a) size.2004-subset(size,Year==2004Season==a) size.2005-subset(size,Year==2005Season==a) size.2006-subset(size,Year==2006Season==a) size.2007-subset(size,Year==2007Season==a) size.2008-subset(size,Year==2008Season==a) size.2009-subset(size,Year==2009Season==a) size.2010-subset(size,Year==2010Season==a) size.2011-subset(size,Year==2011Season==a) size.2012-subset(size,Year==2012Season==a) size.2013-subset(size,Year==2013Season==a) size.2003b-subset(size,Year==2003Season==b) size.2004b-subset(size,Year==2004Season==b) size.2005b-subset(size,Year==2005Season==b) size.2006b-subset(size,Year==2006Season==b) size.2007b-subset(size,Year==2007Season==b) size.2008b-subset(size,Year==2008Season==b) size.2009b-subset(size,Year==2009Season==b) size.2010b-subset(size,Year==2010Season==b) size.2011b-subset(size,Year==2011Season==b) size.2012b-subset(size,Year==2012Season==b) size.2013b-subset(size,Year==2013Season==b) The smooth.spline is below 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25)) 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25)) 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25)) etc. etc. M. Keith Cox, Ph.D. Principal MKConsulting 17105 Glacier Hwy Juneau, AK 99801 U.S. 907.957.4606 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsets of a function
Hi all, this is a reoccurring theme in my programming and I need some help with it. When I use a built in function and need to use it on a subset of my data frame, I always end up using the subset function first, but this seems very clunky. For example, if I have years 2003:2013 with season a and b within each year, and I want to create a smooth.spline, I end up creating a subset for each year and season, and then have a smooth spline function for each year and season. Can I do this more efficiently? The subsets are below: size.2003-subset(size,Year==2003Season==a) size.2004-subset(size,Year==2004Season==a) size.2005-subset(size,Year==2005Season==a) size.2006-subset(size,Year==2006Season==a) size.2007-subset(size,Year==2007Season==a) size.2008-subset(size,Year==2008Season==a) size.2009-subset(size,Year==2009Season==a) size.2010-subset(size,Year==2010Season==a) size.2011-subset(size,Year==2011Season==a) size.2012-subset(size,Year==2012Season==a) size.2013-subset(size,Year==2013Season==a) size.2003b-subset(size,Year==2003Season==b) size.2004b-subset(size,Year==2004Season==b) size.2005b-subset(size,Year==2005Season==b) size.2006b-subset(size,Year==2006Season==b) size.2007b-subset(size,Year==2007Season==b) size.2008b-subset(size,Year==2008Season==b) size.2009b-subset(size,Year==2009Season==b) size.2010b-subset(size,Year==2010Season==b) size.2011b-subset(size,Year==2011Season==b) size.2012b-subset(size,Year==2012Season==b) size.2013b-subset(size,Year==2013Season==b) The smooth.spline is below 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25)) 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25)) 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25)) etc. etc. M. Keith Cox, Ph.D. Principal MKConsulting 17105 Glacier Hwy Juneau, AK 99801 U.S. 907.957.4606 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsets of a function
Have you read An Introduction to R and sections on indexing (?[) where this is discussed. Have you read about apply type functions there like ?tapply. If not, don't you think you should. If so, read again. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Mon, May 19, 2014 at 5:34 PM, Marlin Keith Cox marlink...@gmail.com wrote: Hi all, this is a reoccurring theme in my programming and I need some help with it. When I use a built in function and need to use it on a subset of my data frame, I always end up using the subset function first, but this seems very clunky. For example, if I have years 2003:2013 with season a and b within each year, and I want to create a smooth.spline, I end up creating a subset for each year and season, and then have a smooth spline function for each year and season. Can I do this more efficiently? The subsets are below: size.2003-subset(size,Year==2003Season==a) size.2004-subset(size,Year==2004Season==a) size.2005-subset(size,Year==2005Season==a) size.2006-subset(size,Year==2006Season==a) size.2007-subset(size,Year==2007Season==a) size.2008-subset(size,Year==2008Season==a) size.2009-subset(size,Year==2009Season==a) size.2010-subset(size,Year==2010Season==a) size.2011-subset(size,Year==2011Season==a) size.2012-subset(size,Year==2012Season==a) size.2013-subset(size,Year==2013Season==a) size.2003b-subset(size,Year==2003Season==b) size.2004b-subset(size,Year==2004Season==b) size.2005b-subset(size,Year==2005Season==b) size.2006b-subset(size,Year==2006Season==b) size.2007b-subset(size,Year==2007Season==b) size.2008b-subset(size,Year==2008Season==b) size.2009b-subset(size,Year==2009Season==b) size.2010b-subset(size,Year==2010Season==b) size.2011b-subset(size,Year==2011Season==b) size.2012b-subset(size,Year==2012Season==b) size.2013b-subset(size,Year==2013Season==b) The smooth.spline is below 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25)) 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25)) 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25)) etc. etc. M. Keith Cox, Ph.D. Principal MKConsulting 17105 Glacier Hwy Juneau, AK 99801 U.S. 907.957.4606 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsets of a function
It would have been nice if you at least supplied a subset of the data, but here is a try at it: myList - split(size, list(size$Year, size$Season)) result - lapply(myList, function(.sub){ smooth.spline(.sub$Size, spar = 0.25) }) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, May 19, 2014 at 8:34 PM, Marlin Keith Cox marlink...@gmail.comwrote: Hi all, this is a reoccurring theme in my programming and I need some help with it. When I use a built in function and need to use it on a subset of my data frame, I always end up using the subset function first, but this seems very clunky. For example, if I have years 2003:2013 with season a and b within each year, and I want to create a smooth.spline, I end up creating a subset for each year and season, and then have a smooth spline function for each year and season. Can I do this more efficiently? The subsets are below: size.2003-subset(size,Year==2003Season==a) size.2004-subset(size,Year==2004Season==a) size.2005-subset(size,Year==2005Season==a) size.2006-subset(size,Year==2006Season==a) size.2007-subset(size,Year==2007Season==a) size.2008-subset(size,Year==2008Season==a) size.2009-subset(size,Year==2009Season==a) size.2010-subset(size,Year==2010Season==a) size.2011-subset(size,Year==2011Season==a) size.2012-subset(size,Year==2012Season==a) size.2013-subset(size,Year==2013Season==a) size.2003b-subset(size,Year==2003Season==b) size.2004b-subset(size,Year==2004Season==b) size.2005b-subset(size,Year==2005Season==b) size.2006b-subset(size,Year==2006Season==b) size.2007b-subset(size,Year==2007Season==b) size.2008b-subset(size,Year==2008Season==b) size.2009b-subset(size,Year==2009Season==b) size.2010b-subset(size,Year==2010Season==b) size.2011b-subset(size,Year==2011Season==b) size.2012b-subset(size,Year==2012Season==b) size.2013b-subset(size,Year==2013Season==b) The smooth.spline is below 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25)) 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25)) 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25)) etc. etc. M. Keith Cox, Ph.D. Principal MKConsulting 17105 Glacier Hwy Juneau, AK 99801 U.S. 907.957.4606 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
require(data.table) DT = as.data.table(df) # 1. Patients with ah and ihd DT[,.SD[ah%in%diagnosis ihd%in%diagnosis],by=id] id diagnosis [1,] 2ah [2,] 2 ihd [3,] 2im [4,] 4ah [5,] 4 ihd [6,] 4angina # 2. Patients with ah but no ihd DT[,.SD[ah%in%diagnosis !ihd%in%diagnosis],by=id] id diagnosis [1,] 1ah [2,] 3ah [3,] 3stroke # 3. Patients with ihd but no ah? DT[,.SD[!ah%in%diagnosis ihd%in%diagnosis],by=id] id diagnosis [1,] 5 ihd -- View this message in context: http://r.789695.n4.nabble.com/subsets-tp3227143p3233177.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsets
Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
Hi! I think you should read the intro to R, as well as ?[ and ?subset. It should help you to understand. Let's say your data is in a data.frame called df: # 1. ah and ihd df_ah_ihd - df[df$diagnosis==ah | df$diagnosis==ihd, ] ## the | is the boolean OR (you want one OR the other). Note the last comma #2. ah df_ah - df[df$diagnosis==ah, ] #3. ihd df_ihd - df[df$diagnosis==ihd, ] You could do the same using subset() if you feel better with this function. HTH, Ivan Le 1/20/2011 09:53, Den a écrit : Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
I don't think Ivan's solution meets the OP's needs. I think you could do it using %in% and the approriate logical operations e.g. aDF - data.frame(id=c(1,2,2,2,3,3,4,4,4,5), diagnosis=c(ah, ah, ihd, im, ah, stroke, ah, ihd, angina, ihd)) aDF[with(aDF,(id %in% id[diagnosis==ah]) (id %in% id[diagnosis==ihd])),] aDF[with(aDF,(id %in% id[diagnosis==ah]) !(id %in% id[diagnosis==ihd])),] aDF[with(aDF,!(id %in% id[diagnosis==ah]) (id %in% id[diagnosis==ihd])),] That starts to feel a bit fiddly for me. You might want to look at package sqldf. HTH Keith J -- Ivan Calandra ivan.calan...@uni-hamburg.de wrote in message news:4d37fbea.5070...@uni-hamburg.de... Hi! I think you should read the intro to R, as well as ?[ and ?subset. It should help you to understand. Let's say your data is in a data.frame called df: # 1. ah and ihd df_ah_ihd - df[df$diagnosis==ah | df$diagnosis==ihd, ] ## the | is the boolean OR (you want one OR the other). Note the last comma #2. ah df_ah - df[df$diagnosis==ah, ] #3. ihd df_ihd - df[df$diagnosis==ihd, ] You could do the same using subset() if you feel better with this function. HTH, Ivan Le 1/20/2011 09:53, Den a écrit : Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
Try this: lapply(list(c('ah', 'ihd'), 'ah', 'ihd'), function(x)subset(aDF, diagnosis == x)) On Thu, Jan 20, 2011 at 6:53 AM, Den d.kazakiew...@gmail.com wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
Hello Den, your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have more then one diagnosis and I take that you want to isolate patients based on particular conditions. Thus, simply looking for ah or idh as Ivan suggests will yield patients which can have either of those but not necessarily patients that have both. Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient. I think that its done best with the aggregate function. This function splits the data according to some factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be a condition test): ids - aggregate(diagnosis ~ id, df, function(x) ah %in% x ihd %in% x) ids - aggregate(diagnosis ~ id, df, function(x) ah %in% x !ihd %in% x) ids - aggregate(diagnosis ~ id, df, function(x) ! ah %in% x ihd %in% x) Now, ids will contain a data frame like: id diagnosis 1 TRUE 2 FALSE 3 FALSE ... which shows which patients have the set of diagnoses you asked for. You can then apply these patients to the original data by something like: subset(df, id %in% subset(ids, diagnosis == TRUE)$id) this will extract only patients from the 'ids' data frame for which the diagnosis applies and then extract the associated diagnosis sets from the original 'df' data frame. Hope it helps, Taras On Jan 20, 2011, at 9:53 , Den wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like iddiagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
I did try it. It gave me [[1]] id diagnosis 1 1ah 5 3ah 7 4ah 8 4 ihd 10 5 ihd [[2]] id diagnosis 1 1ah 2 2ah 5 3ah 7 4ah [[3]] id diagnosis 3 2 ihd 8 4 ihd 10 5 ihd Which isn't what the OP asked for Q: How to make three data sets: 1. Patients with ah and ihd id diagnosis 2 2ah 3 2 ihd 4 2im 7 4ah 8 4 ihd 9 4angina 2. Patients with ah but no ihd id diagnosis 1 1ah 5 3ah 6 3stroke 3. Patients with ihd but no ah? id diagnosis 10 5 ihd Regards, KJ - Henrique Dallazuanna www...@gmail.com wrote in message news:aanlktikqnw_hntdyxdrj+ytyqf6tghlmh0qsleouf...@mail.gmail.com... Try this: lapply(list(c('ah', 'ihd'), 'ah', 'ihd'), function(x)subset(aDF, diagnosis == x)) On Thu, Jan 20, 2011 at 6:53 AM, Den d.kazakiew...@gmail.com wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
Hi Taras, Indeed, I've overlooked the problem. Anyway, I'm not sure I would have been able to give a complete answer like you did! Ivan Le 1/20/2011 11:05, Taras Zakharko a écrit : Hello Den, your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have more then one diagnosis and I take that you want to isolate patients based on particular conditions. Thus, simply looking for ah or idh as Ivan suggests will yield patients which can have either of those but not necessarily patients that have both. Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient. I think that its done best with the aggregate function. This function splits the data according to some factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be a condition test): ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x ihd %in% x) ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x !ihd %in% x) ids- aggregate(diagnosis ~ id, df, function(x) ! ah %in% x ihd %in% x) Now, ids will contain a data frame like: id diagnosis 1 TRUE 2 FALSE 3 FALSE ... which shows which patients have the set of diagnoses you asked for. You can then apply these patients to the original data by something like: subset(df, id %in% subset(ids, diagnosis == TRUE)$id) this will extract only patients from the 'ids' data frame for which the diagnosis applies and then extract the associated diagnosis sets from the original 'df' data frame. Hope it helps, Taras On Jan 20, 2011, at 9:53 , Den wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like iddiagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? This may be understood as a two step procedure: 1. Split the id into disjoint groups according the above criteria. 2. Split the data cases into the groups from step 1. If this is what you want, then function table() may be used to collect information on each id. df - structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L), diagnosis = structure(c(1L, 1L, 3L, 4L, 1L, 5L, 1L, 3L, 2L, 3L), .Label = c(ah, angina, ihd, im, stroke), class = factor)), .Names = c(id, diagnosis), class = data.frame, row.names = c(NA, -10L)) tab - table(df$id, df$diag) Then, for example, the data cases for 2. Patients with ah but no ihd may be obtained sel - tab[, ah] != 0 tab[, ihd] == 0 ah.noihd - dimnames(tab)[[1]][sel] # [1] 1 3 df[df$id %in% ah.noihd, ] # id diagnosis # 1 1ah # 5 3ah # 6 3stroke I hope, this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets
On 2011-01-20 02:05, Taras Zakharko wrote: Hello Den, your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have more then one diagnosis and I take that you want to isolate patients based on particular conditions. Thus, simply looking for ah or idh as Ivan suggests will yield patients which can have either of those but not necessarily patients that have both. Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient. I think that its done best with the aggregate function. This function splits the data according to some factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be a condition test): ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x ihd %in% x) ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x !ihd %in% x) ids- aggregate(diagnosis ~ id, df, function(x) ! ah %in% x ihd %in% x) Now, ids will contain a data frame like: id diagnosis 1 TRUE 2 FALSE 3 FALSE ... which shows which patients have the set of diagnoses you asked for. You can then apply these patients to the original data by something like: subset(df, id %in% subset(ids, diagnosis == TRUE)$id) this will extract only patients from the 'ids' data frame for which the diagnosis applies and then extract the associated diagnosis sets from the original 'df' data frame. Hope it helps, Taras Here's a tidy version using the plyr package: require(plyr) df1 - ddply(df, .(id), summarize, has.both = (ah %in% diagnosis) (ihd %in% diagnosis), has.only.ah = (ah %in% diagnosis) !(ihd %in% diagnosis), has.only.ihd = !(ah %in% diagnosis) (ihd %in% diagnosis) ) Further processing on the columns of df1 is straightforward. Peter Ehlers On Jan 20, 2011, at 9:53 , Den wrote: Dear R people Could you please help. Basically, there are two variables in my data set. Each patient ('id') may have one or more diseases ('diagnosis'). It looks like id diagnosis 1 ah 2 ah 2 ihd 2 im 3 ah 3 stroke 4 ah 4 ihd 4 angina 5 ihd .. Q: How to make three data sets: 1. Patients with ah and ihd 2. Patients with ah but no ihd 3. Patients with ihd but no ah? If you have any ideas could just guide what should I look for. Is a subset or aggregate, or loops, or something else??? I am a bit lost. (F1 F1 F1 !!!:) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsets, %in%
Hi, I have a question about %in% and subsettin data frames. Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do: dat - data.frame(ID = 1:10, var = 1:10) someID - c(1,2,4,5,10) subset(dat, dat$ID %in% someID) Is there a quick way to do the opposite, ie to do a subset that contains all ID but someID? Something like %not in%, which would *remove* lines with ID in someID? I am asking because I need this in a more complex example where there are multiple lines with the same ID (data in long format) and I need to remove selected ID. thanks, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets, %in%
Well, %in% returns a logical vector... So subset(dat, ! ID %in% someID) Also, from ?subset: Note that ‘subset’ will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression Thus, you don't need 'dat$ID', bur just 'ID' in the subset argument. -Erik mp.sylves...@gmail.com wrote: Hi, I have a question about %in% and subsettin data frames. Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do: dat - data.frame(ID = 1:10, var = 1:10) someID - c(1,2,4,5,10) subset(dat, dat$ID %in% someID) Is there a quick way to do the opposite, ie to do a subset that contains all ID but someID? Something like %not in%, which would *remove* lines with ID in someID? I am asking because I need this in a more complex example where there are multiple lines with the same ID (data in long format) and I need to remove selected ID. thanks, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets, %in%
Any logical value can be negatively compared using ! does: subset(dat, !(dat$ID %in% someID)) provide what you need? -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it. - Jubal Early, Firefly From: mp.sylves...@gmail.com To: r-help@r-project.org Date: 11/05/2010 02:21 PM Subject: [R] subsets, %in% Sent by: r-help-boun...@r-project.org Hi, I have a question about %in% and subsettin data frames. Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do: dat - data.frame(ID = 1:10, var = 1:10) someID - c(1,2,4,5,10) subset(dat, dat$ID %in% someID) Is there a quick way to do the opposite, ie to do a subset that contains all ID but someID? Something like %not in%, which would *remove* lines with ID in someID? I am asking because I need this in a more complex example where there are multiple lines with the same ID (data in long format) and I need to remove selected ID. thanks, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets, %in%
Hi MP, Try subset(dat, ! dat$ID %in% someID) # ! symbol HTH, Jorge On Fri, Nov 5, 2010 at 10:13 AM, wrote: Hi, I have a question about %in% and subsettin data frames. Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do: dat - data.frame(ID = 1:10, var = 1:10) someID - c(1,2,4,5,10) subset(dat, dat$ID %in% someID) Is there a quick way to do the opposite, ie to do a subset that contains all ID but someID? Something like %not in%, which would *remove* lines with ID in someID? I am asking because I need this in a more complex example where there are multiple lines with the same ID (data in long format) and I need to remove selected ID. thanks, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets, %in%
Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do: dat - data.frame(ID = 1:10, var = 1:10) someID - c(1,2,4,5,10) subset(dat, dat$ID %in% someID) Is there a quick way to do the opposite ... Two operators spring to mind: ! and %nin subset(dat, !(dat$ID %in% someID)) subset(dat, dat$ID %nin% someID) -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.c...@epa.gov 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsets with a small cardinality for variable selection
Hello, I am working on a variable selection problem and would like to have some suggestions. Thank you. In my data, the number of observations/samples is much less than the number of variables. And I am not interested in generating only a few models, instead I will need a couple of hundred models. For each model, I only need a fixed number of variables, in other word, with a specific cardinality. I've tried leaps(subselect package) and regsubsets(leaps package). However, I have to reduce the number of variables is using leaps in subselect package which is not I want and the regsubsets in leaps package doesn't read a specific cardinality. It accepts a maximal subset size. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/subsets-with-a-small-cardinality-for-variable-selection-tp2965552p2965552.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsets problem
Help with this much appreciated I have a large dataframe that I would like to subset where the constraint Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates that must be matched to create Test1. I would like to perform an operation on Test1 that results in a single column of data. So far so good. How do loop through all values in the uniques list (say there is 50), perform an operationon Test1Test50, and then bolt all the lists together in a single list please ? Regards Glenn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets problem
you can try lapply(lapply(uniques, function(x) subset(df, date == x)), myfun) or possibly more accurate (subset may be finicky due to scoping): lapply(lapply(uniques, function(x) df[df$date == x, ]), myfun) or use ?split lapply(split(df, df$date), myfun) HTH, --sundar On Sun, Feb 8, 2009 at 5:00 PM, glenn g1enn.robe...@btinternet.com wrote: Help with this much appreciated I have a large dataframe that I would like to subset where the constraint Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates that must be matched to create Test1. I would like to perform an operation on Test1 that results in a single column of data. So far so good. How do loop through all values in the uniques list (say there is 50), perform an operationon Test1Test50, and then bolt all the lists together in a single list please ? Regards Glenn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsets problem
See if this illustration using the %in% operator within subset() is helpful: df1 - data.frame(x=1:10, y=sample(c(a,b,c), 10, replace=TRUE) ) uniques - list(a,b) Test1 - subset(df1, y %in% uniques) Test1 x y 1 1 b 4 4 a 5 5 b 6 6 b 7 7 a 9 9 a Next question of course is whether you were using the word list in an r-specific fashion? Fortunately, I think %in% will also work with vector input. You might not want to make 50 Testn's. That would be very much against the spirit of R. Provide a simpler example involving 3 or 4 lists and someone might step up and solve it. Of course, I may have given you a one step solution if you were thinking that uniques[[1]] was a single number. Might be best to name your dataframe something other than df which is also valid function name for the density of the F distribution. -- David Winsemius On Feb 8, 2009, at 8:00 PM, glenn wrote: Help with this much appreciated I have a large dataframe that I would like to subset where the constraint Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates that must be matched to create Test1. I would like to perform an operation on Test1 that results in a single column of data. So far so good. How do loop through all values in the uniques list (say there is 50), perform an operationon Test1Test50, and then bolt all the lists together in a single list please ? Regards Glenn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.