[R] Subsets of Boolean string model

2015-01-05 Thread Alrik Thiem
Dear R-help list,

I have a problem regarding text manipulation in R, where my basic knowledge
doesn't suffice anymore. It might be a bigger problem, but any help would be
greatly appreciated and acknowledged.

As input, I have a character string representing some Boolean function, such
as aB+Bc+D, for instance, where + means OR, AND has been omitted between
two factors represented by single letters, and lower-case x simply means
NOT X.

Now I would like to form all sub-models without including models that are
not redundancy-free. For example, D, a+D and B+c+D would be ok, but
aB+B+D, B+Bc and B+B+D would not because B is a (strict) superset of
both aB and Bc as well as a (trivial) superset of B.

With regards to D+aB+Bc, there would thus be 24 permissible and unique
sub-models (including the empty set):

a, B, c, D, aB, Bc, a+B, a+c, a+D, B+c, B+D, c+D,
a+Bc, aB+c, aB+D, aB+Bc, Bc+D, a+B+D, a+c+D, a+Bc+D,
aB+c+D, B+c+D, aB+Bc+D, .

How could I generate a character vector of all permissible and unique
sub-models from any Boolean function of the form given above?

Best wishes,
Alrik



Alrik Thiem
Post-Doctoral Researcher

Department of Philosophy
University of Geneva
Rue de Candolle 2
CH-1211 Geneva

+41 76 527 80 83

http://www.alrik-thiem.net
http://www.compasss.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsets of a function

2014-05-20 Thread ONKELINX, Thierry
Another option is the plyr package.

library(plyr)
result - dlply(size, ~ Year +Season, function(.sub){
with(.sub, smooth.spline(Size, Prop, spar = 0.25))
}


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
jim holtman
Verzonden: dinsdag 20 mei 2014 2:44
Aan: Marlin Keith Cox
CC: r-help@r-project.org
Onderwerp: Re: [R] Subsets of a function

It would have been nice if you at least supplied a subset of the data, but here 
is a try at it:

myList - split(size, list(size$Year, size$Season)) result - lapply(myList, 
function(.sub){
smooth.spline(.sub$Size, spar = 0.25)
})




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, May 19, 2014 at 8:34 PM, Marlin Keith Cox marlink...@gmail.comwrote:

 Hi all, this is a reoccurring theme in my programming and I need some
 help with it.  When I use a built in function and need to use it on a
 subset of my data frame, I always end up using the subset function
 first, but this seems very clunky.  For example, if I have years 2003:2013 
 with season a
 and b within each year, and I want to create a smooth.spline, I end
 up creating a subset for each year and season, and then have a smooth
 spline function for each year and season.  Can I do this more efficiently?


 The subsets are below:


 size.2003-subset(size,Year==2003Season==a)

 size.2004-subset(size,Year==2004Season==a)

 size.2005-subset(size,Year==2005Season==a)

 size.2006-subset(size,Year==2006Season==a)

 size.2007-subset(size,Year==2007Season==a)

 size.2008-subset(size,Year==2008Season==a)

 size.2009-subset(size,Year==2009Season==a)

 size.2010-subset(size,Year==2010Season==a)

 size.2011-subset(size,Year==2011Season==a)

 size.2012-subset(size,Year==2012Season==a)

 size.2013-subset(size,Year==2013Season==a)

 size.2003b-subset(size,Year==2003Season==b)

 size.2004b-subset(size,Year==2004Season==b)

 size.2005b-subset(size,Year==2005Season==b)

 size.2006b-subset(size,Year==2006Season==b)

 size.2007b-subset(size,Year==2007Season==b)

 size.2008b-subset(size,Year==2008Season==b)

 size.2009b-subset(size,Year==2009Season==b)

 size.2010b-subset(size,Year==2010Season==b)

 size.2011b-subset(size,Year==2011Season==b)

 size.2012b-subset(size,Year==2012Season==b)

 size.2013b-subset(size,Year==2013Season==b)

 The smooth.spline is below

 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25))

 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25))
 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25))
 etc. etc.

 M. Keith Cox, Ph.D.
 Principal
 MKConsulting
 17105 Glacier Hwy
 Juneau, AK 99801
 U.S. 907.957.4606

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsets of a function

2014-05-19 Thread Marlin Keith Cox
Hi all, this is a reoccurring theme in my programming and I need some help
with it.  When I use a built in function and need to use it on a subset of
my data frame, I always end up using the subset function first, but this
seems very clunky.  For example, if I have years 2003:2013 with season a
and b within each year, and I want to create a smooth.spline, I end up
creating a subset for each year and season, and then have a smooth spline
function for each year and season.  Can I do this more efficiently?


The subsets are below:


size.2003-subset(size,Year==2003Season==a)

size.2004-subset(size,Year==2004Season==a)

size.2005-subset(size,Year==2005Season==a)

size.2006-subset(size,Year==2006Season==a)

size.2007-subset(size,Year==2007Season==a)

size.2008-subset(size,Year==2008Season==a)

size.2009-subset(size,Year==2009Season==a)

size.2010-subset(size,Year==2010Season==a)

size.2011-subset(size,Year==2011Season==a)

size.2012-subset(size,Year==2012Season==a)

size.2013-subset(size,Year==2013Season==a)

size.2003b-subset(size,Year==2003Season==b)

size.2004b-subset(size,Year==2004Season==b)

size.2005b-subset(size,Year==2005Season==b)

size.2006b-subset(size,Year==2006Season==b)

size.2007b-subset(size,Year==2007Season==b)

size.2008b-subset(size,Year==2008Season==b)

size.2009b-subset(size,Year==2009Season==b)

size.2010b-subset(size,Year==2010Season==b)

size.2011b-subset(size,Year==2011Season==b)

size.2012b-subset(size,Year==2012Season==b)

size.2013b-subset(size,Year==2013Season==b)

The smooth.spline is below

2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25))

2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25))
2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25))
etc. etc.

M. Keith Cox, Ph.D.
Principal
MKConsulting
17105 Glacier Hwy
Juneau, AK 99801
U.S. 907.957.4606

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsets of a function

2014-05-19 Thread Bert Gunter
Have you read An Introduction to R and sections on indexing (?[)
where this is discussed. Have you read about apply type functions
there like ?tapply.  If not, don't you think you should. If so, read
again.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Mon, May 19, 2014 at 5:34 PM, Marlin Keith Cox marlink...@gmail.com wrote:
 Hi all, this is a reoccurring theme in my programming and I need some help
 with it.  When I use a built in function and need to use it on a subset of
 my data frame, I always end up using the subset function first, but this
 seems very clunky.  For example, if I have years 2003:2013 with season a
 and b within each year, and I want to create a smooth.spline, I end up
 creating a subset for each year and season, and then have a smooth spline
 function for each year and season.  Can I do this more efficiently?


 The subsets are below:


 size.2003-subset(size,Year==2003Season==a)

 size.2004-subset(size,Year==2004Season==a)

 size.2005-subset(size,Year==2005Season==a)

 size.2006-subset(size,Year==2006Season==a)

 size.2007-subset(size,Year==2007Season==a)

 size.2008-subset(size,Year==2008Season==a)

 size.2009-subset(size,Year==2009Season==a)

 size.2010-subset(size,Year==2010Season==a)

 size.2011-subset(size,Year==2011Season==a)

 size.2012-subset(size,Year==2012Season==a)

 size.2013-subset(size,Year==2013Season==a)

 size.2003b-subset(size,Year==2003Season==b)

 size.2004b-subset(size,Year==2004Season==b)

 size.2005b-subset(size,Year==2005Season==b)

 size.2006b-subset(size,Year==2006Season==b)

 size.2007b-subset(size,Year==2007Season==b)

 size.2008b-subset(size,Year==2008Season==b)

 size.2009b-subset(size,Year==2009Season==b)

 size.2010b-subset(size,Year==2010Season==b)

 size.2011b-subset(size,Year==2011Season==b)

 size.2012b-subset(size,Year==2012Season==b)

 size.2013b-subset(size,Year==2013Season==b)

 The smooth.spline is below

 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25))

 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25))
 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25))
 etc. etc.

 M. Keith Cox, Ph.D.
 Principal
 MKConsulting
 17105 Glacier Hwy
 Juneau, AK 99801
 U.S. 907.957.4606

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsets of a function

2014-05-19 Thread jim holtman
It would have been nice if you at least supplied a subset of the data, but
here is a try at it:

myList - split(size, list(size$Year, size$Season))
result - lapply(myList, function(.sub){
smooth.spline(.sub$Size, spar = 0.25)
})




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, May 19, 2014 at 8:34 PM, Marlin Keith Cox marlink...@gmail.comwrote:

 Hi all, this is a reoccurring theme in my programming and I need some help
 with it.  When I use a built in function and need to use it on a subset of
 my data frame, I always end up using the subset function first, but this
 seems very clunky.  For example, if I have years 2003:2013 with season a
 and b within each year, and I want to create a smooth.spline, I end up
 creating a subset for each year and season, and then have a smooth spline
 function for each year and season.  Can I do this more efficiently?


 The subsets are below:


 size.2003-subset(size,Year==2003Season==a)

 size.2004-subset(size,Year==2004Season==a)

 size.2005-subset(size,Year==2005Season==a)

 size.2006-subset(size,Year==2006Season==a)

 size.2007-subset(size,Year==2007Season==a)

 size.2008-subset(size,Year==2008Season==a)

 size.2009-subset(size,Year==2009Season==a)

 size.2010-subset(size,Year==2010Season==a)

 size.2011-subset(size,Year==2011Season==a)

 size.2012-subset(size,Year==2012Season==a)

 size.2013-subset(size,Year==2013Season==a)

 size.2003b-subset(size,Year==2003Season==b)

 size.2004b-subset(size,Year==2004Season==b)

 size.2005b-subset(size,Year==2005Season==b)

 size.2006b-subset(size,Year==2006Season==b)

 size.2007b-subset(size,Year==2007Season==b)

 size.2008b-subset(size,Year==2008Season==b)

 size.2009b-subset(size,Year==2009Season==b)

 size.2010b-subset(size,Year==2010Season==b)

 size.2011b-subset(size,Year==2011Season==b)

 size.2012b-subset(size,Year==2012Season==b)

 size.2013b-subset(size,Year==2013Season==b)

 The smooth.spline is below

 2003-with(size.2003,smooth.spline(Size,Prop,spar=0.25))

 2004-with(size.2004,smooth.spline(Size,Prop,spar=0.25))
 2005-with(size.2005,smooth.spline(Size,Prop,spar=0.25))
 etc. etc.

 M. Keith Cox, Ph.D.
 Principal
 MKConsulting
 17105 Glacier Hwy
 Juneau, AK 99801
 U.S. 907.957.4606

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-23 Thread Matthew Dowle


require(data.table)
DT = as.data.table(df)

# 1. Patients with ah and ihd
DT[,.SD[ah%in%diagnosis  ihd%in%diagnosis],by=id]

 id diagnosis
[1,]  2ah
[2,]  2   ihd
[3,]  2im
[4,]  4ah
[5,]  4   ihd
[6,]  4angina

# 2. Patients with ah but no ihd
DT[,.SD[ah%in%diagnosis  !ihd%in%diagnosis],by=id]

 id diagnosis
[1,]  1ah
[2,]  3ah
[3,]  3stroke


# 3. Patients with  ihd but no ah?
DT[,.SD[!ah%in%diagnosis  ihd%in%diagnosis],by=id]

 id diagnosis
[1,]  5   ihd
 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/subsets-tp3227143p3233177.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsets

2011-01-20 Thread Den
Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like 

id  diagnosis
1   ah
2   ah
2   ihd
2   im
3   ah
3   stroke
4   ah
4   ihd
4   angina
5   ihd
..
Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

 If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Ivan Calandra

Hi!

I think you should read the intro to R, as well as ?[ and ?subset. It 
should help you to understand.


Let's say your data is in a data.frame called df:
# 1. ah and ihd
df_ah_ihd - df[df$diagnosis==ah | df$diagnosis==ihd, ]  ## the | 
is the boolean OR (you want one OR the other). Note the last comma


#2. ah
df_ah - df[df$diagnosis==ah, ]

#3. ihd
df_ihd - df[df$diagnosis==ihd, ]

You could do the same using subset() if you feel better with this function.

HTH,
Ivan

Le 1/20/2011 09:53, Den a écrit :

Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like

id  diagnosis
1   ah
2   ah
2   ihd
2   im
3   ah
3   stroke
4   ah
4   ihd
4   angina
5   ihd
..
Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

  If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Keith Jewell
I don't think Ivan's solution meets the OP's needs.

I think you could do it using %in% and the approriate logical operations 
e.g.

aDF - data.frame(id=c(1,2,2,2,3,3,4,4,4,5),
 diagnosis=c(ah, ah, ihd, im, ah, stroke, ah, ihd, 
angina, ihd))
aDF[with(aDF,(id %in% id[diagnosis==ah])  (id %in% 
id[diagnosis==ihd])),]
aDF[with(aDF,(id %in% id[diagnosis==ah])  !(id %in% 
id[diagnosis==ihd])),]
aDF[with(aDF,!(id %in% id[diagnosis==ah])  (id %in% 
id[diagnosis==ihd])),]

That starts to feel a bit fiddly for me. You might want to look at package 
sqldf.

HTH

Keith J
--
Ivan Calandra ivan.calan...@uni-hamburg.de wrote in message 
news:4d37fbea.5070...@uni-hamburg.de...
Hi!

I think you should read the intro to R, as well as ?[ and ?subset. It
should help you to understand.

Let's say your data is in a data.frame called df:
# 1. ah and ihd
df_ah_ihd - df[df$diagnosis==ah | df$diagnosis==ihd, ]  ## the |
is the boolean OR (you want one OR the other). Note the last comma

#2. ah
df_ah - df[df$diagnosis==ah, ]

#3. ihd
df_ihd - df[df$diagnosis==ihd, ]

You could do the same using subset() if you feel better with this function.

HTH,
Ivan

Le 1/20/2011 09:53, Den a écrit :
 Dear R people
 Could you please help.

 Basically, there are two variables in my data set. Each patient ('id')
 may have one or more diseases ('diagnosis'). It looks like

 id diagnosis
 1 ah
 2 ah
 2 ihd
 2 im
 3 ah
 3 stroke
 4 ah
 4 ihd
 4 angina
 5 ihd
 ..
 Q: How to make three data sets:
 1. Patients with ah and ihd
   2. Patients with ah but no ihd
 3. Patients with  ihd but no ah?

   If you have any ideas could just guide what should I look for. Is a
 subset or aggregate, or loops, or something else??? I am a bit lost. (F1
 F1 F1 !!!:)
 Thank you

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Henrique Dallazuanna
Try this:

lapply(list(c('ah', 'ihd'), 'ah', 'ihd'), function(x)subset(aDF, diagnosis
== x))


On Thu, Jan 20, 2011 at 6:53 AM, Den d.kazakiew...@gmail.com wrote:

 Dear R people
 Could you please help.

 Basically, there are two variables in my data set. Each patient ('id')
 may have one or more diseases ('diagnosis'). It looks like

 id  diagnosis
 1   ah
 2   ah
 2   ihd
 2   im
 3   ah
 3   stroke
 4   ah
 4   ihd
 4   angina
 5   ihd
 ..
 Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

  If you have any ideas could just guide what should I look for. Is a
 subset or aggregate, or loops, or something else??? I am a bit lost. (F1
 F1 F1 !!!:)
 Thank you

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Taras Zakharko
Hello Den,

your problem is not as it may seem so Ivan's suggestion is only a partial 
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on 
particular conditions. 
Thus, simply looking for ah or idh as Ivan suggests will yield patients 
which can have either of those but not 
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis 
associated with each patient.
I think that its done best with the aggregate function. This function splits 
the data according to some
factor (in our case it will be the patient id) and performs a routine on each 
subset (in our case it will be
a condition test):


ids - aggregate(diagnosis ~ id, df, function(x) ah %in% x   ihd %in% x)
ids - aggregate(diagnosis ~ id, df, function(x) ah %in% x   !ihd %in% x)
ids - aggregate(diagnosis ~ id, df, function(x) ! ah %in% x   ihd %in% x)

Now, ids will contain a data frame like:

id  diagnosis
1   TRUE
2   FALSE
3   FALSE
...

which shows which patients have the set of diagnoses you asked for. You can 
then apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which  the 
diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame. 

Hope it helps,

Taras
On Jan 20, 2011, at 9:53 , Den wrote:

 Dear R people
 Could you please help.
 
 Basically, there are two variables in my data set. Each patient ('id')
 may have one or more diseases ('diagnosis'). It looks like 
 
 iddiagnosis
 1 ah
 2 ah
 2 ihd
 2 im
 3 ah
 3 stroke
 4 ah
 4 ihd
 4 angina
 5 ihd
 ..
 Q: How to make three data sets:
   1. Patients with ah and ihd
   2. Patients with ah but no ihd
   3. Patients with  ihd but no ah?
 
 If you have any ideas could just guide what should I look for. Is a
 subset or aggregate, or loops, or something else??? I am a bit lost. (F1
 F1 F1 !!!:)
 Thank you
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Keith Jewell
I did try it. It gave me
[[1]]
   id diagnosis
1   1ah
5   3ah
7   4ah
8   4   ihd
10  5   ihd

[[2]]
  id diagnosis
1  1ah
2  2ah
5  3ah
7  4ah

[[3]]
   id diagnosis
3   2   ihd
8   4   ihd
10  5   ihd

Which isn't what the OP asked for

 Q: How to make three data sets:
1. Patients with ah and ihd
  id diagnosis
2  2ah
3  2   ihd
4  2im
7  4ah
8  4   ihd
9  4angina

2. Patients with ah but no ihd
  id diagnosis
1  1ah
5  3ah
6  3stroke

3. Patients with  ihd but no ah?
   id diagnosis
10  5   ihd

Regards,

KJ
-
Henrique Dallazuanna www...@gmail.com wrote in message 
news:aanlktikqnw_hntdyxdrj+ytyqf6tghlmh0qsleouf...@mail.gmail.com...
Try this:

lapply(list(c('ah', 'ihd'), 'ah', 'ihd'), function(x)subset(aDF, diagnosis
== x))


On Thu, Jan 20, 2011 at 6:53 AM, Den d.kazakiew...@gmail.com wrote:

 Dear R people
 Could you please help.

 Basically, there are two variables in my data set. Each patient ('id')
 may have one or more diseases ('diagnosis'). It looks like

 id  diagnosis
 1   ah
 2   ah
 2   ihd
 2   im
 3   ah
 3   stroke
 4   ah
 4   ihd
 4   angina
 5   ihd
 ..
 Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

  If you have any ideas could just guide what should I look for. Is a
 subset or aggregate, or loops, or something else??? I am a bit lost. (F1
 F1 F1 !!!:)
 Thank you

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]









__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Ivan Calandra

Hi Taras,

Indeed, I've overlooked the problem. Anyway, I'm not sure I would have 
been able to give a complete answer like you did!


Ivan

Le 1/20/2011 11:05, Taras Zakharko a écrit :

Hello Den,

your problem is not as it may seem so Ivan's suggestion is only a partial 
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on 
particular conditions.
Thus, simply looking for ah or idh as Ivan suggests will yield patients 
which can have either of those but not
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis 
associated with each patient.
I think that its done best with the aggregate function. This function splits 
the data according to some
factor (in our case it will be the patient id) and performs a routine on each 
subset (in our case it will be
a condition test):


ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x   ihd %in% x)
ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x   !ihd %in% x)
ids- aggregate(diagnosis ~ id, df, function(x) ! ah %in% x   ihd %in% x)

Now, ids will contain a data frame like:

id  diagnosis
1   TRUE
2   FALSE
3   FALSE
...

which shows which patients have the set of diagnoses you asked for. You can 
then apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which  the 
diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame.

Hope it helps,

Taras
On Jan 20, 2011, at 9:53 , Den wrote:


Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like

id  diagnosis
1   ah
2   ah
2   ihd
2   im
3   ah
3   stroke
4   ah
4   ihd
4   angina
5   ihd
..
Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Petr Savicky
On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote:
 Dear R people
 Could you please help.
 
 Basically, there are two variables in my data set. Each patient ('id')
 may have one or more diseases ('diagnosis'). It looks like 
 
 iddiagnosis
 1 ah
 2 ah
 2 ihd
 2 im
 3 ah
 3 stroke
 4 ah
 4 ihd
 4 angina
 5 ihd
 ..
 Q: How to make three data sets:
   1. Patients with ah and ihd
   2. Patients with ah but no ihd
   3. Patients with  ihd but no ah?

This may be understood as a two step procedure:
1. Split the id into disjoint groups according the above criteria.
2. Split the data cases into the groups from step 1.

If this is what you want, then function table() may be used to
collect information on each id.

  df - structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L),
  diagnosis = structure(c(1L, 1L, 3L, 4L, 1L, 5L, 1L, 3L, 2L, 3L),
  .Label = c(ah, angina, ihd, im, stroke), class = factor)),
  .Names = c(id, diagnosis), class = data.frame, row.names = c(NA, 
-10L))

  tab - table(df$id, df$diag)

Then, for example, the data cases for 2. Patients with ah but no ihd
may be obtained

  sel - tab[, ah] != 0  tab[, ihd] == 0
  ah.noihd - dimnames(tab)[[1]][sel] # [1] 1 3
  df[df$id %in% ah.noihd, ]
  #   id diagnosis
  # 1  1ah
  # 5  3ah
  # 6  3stroke

I hope, this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets

2011-01-20 Thread Peter Ehlers

On 2011-01-20 02:05, Taras Zakharko wrote:

Hello Den,

your problem is not as it may seem so Ivan's suggestion is only a partial 
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on 
particular conditions.
Thus, simply looking for ah or idh as Ivan suggests will yield patients 
which can have either of those but not
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis 
associated with each patient.
I think that its done best with the aggregate function. This function splits 
the data according to some
factor (in our case it will be the patient id) and performs a routine on each 
subset (in our case it will be
a condition test):


ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x   ihd %in% x)
ids- aggregate(diagnosis ~ id, df, function(x) ah %in% x   !ihd %in% x)
ids- aggregate(diagnosis ~ id, df, function(x) ! ah %in% x   ihd %in% x)

Now, ids will contain a data frame like:

id  diagnosis
1   TRUE
2   FALSE
3   FALSE
...

which shows which patients have the set of diagnoses you asked for. You can 
then apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which  the 
diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame.

Hope it helps,

Taras


Here's a tidy version using the plyr package:

require(plyr)
df1 - ddply(df, .(id), summarize,
 has.both = (ah %in% diagnosis)  (ihd %in% diagnosis),
 has.only.ah = (ah %in% diagnosis)  !(ihd %in% diagnosis),
 has.only.ihd = !(ah %in% diagnosis)  (ihd %in% diagnosis)
)

Further processing on the columns of df1 is straightforward.

Peter Ehlers


On Jan 20, 2011, at 9:53 , Den wrote:


Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like

id  diagnosis
1   ah
2   ah
2   ihd
2   im
3   ah
3   stroke
4   ah
4   ihd
4   angina
5   ihd
..
Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with  ihd but no ah?

If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsets, %in%

2010-11-05 Thread MP . Sylvestre
Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat - data.frame(ID = 1:10, var = 1:10)
someID - c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains  
all ID but someID? Something like %not in%, which would *remove* lines with  
ID in someID?

I am asking because I need this in a more complex example where there are  
multiple lines with the same ID (data in long format) and I need to remove  
selected ID.

thanks,

MP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets, %in%

2010-11-05 Thread Erik Iverson

Well, %in% returns a logical vector...

So

subset(dat, ! ID %in% someID)

Also, from ?subset:

Note
 that ‘subset’ will be evaluated in the data frame, so columns can
 be referred to (by name) as variables in the expression

Thus, you don't need 'dat$ID', bur just 'ID' in the subset argument.

-Erik

mp.sylves...@gmail.com wrote:

Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat - data.frame(ID = 1:10, var = 1:10)
someID - c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains  
all ID but someID? Something like %not in%, which would *remove* lines with  
ID in someID?


I am asking because I need this in a more complex example where there are  
multiple lines with the same ID (data in long format) and I need to remove  
selected ID.


thanks,

MP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets, %in%

2010-11-05 Thread Jonathan P Daily
Any logical value can be negatively compared using !
does:
subset(dat, !(dat$ID %in% someID))

provide what you need?
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly



From:
mp.sylves...@gmail.com
To:
r-help@r-project.org
Date:
11/05/2010 02:21 PM
Subject:
[R] subsets, %in%
Sent by:
r-help-boun...@r-project.org



Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat - data.frame(ID = 1:10, var = 1:10)
someID - c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains 
all ID but someID? Something like %not in%, which would *remove* lines 
with 
ID in someID?

I am asking because I need this in a more complex example where there are 
multiple lines with the same ID (data in long format) and I need to remove 
 
selected ID.

thanks,

MP

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets, %in%

2010-11-05 Thread Jorge Ivan Velez
Hi MP,

Try

subset(dat, ! dat$ID %in% someID) # ! symbol

HTH,
Jorge


On Fri, Nov 5, 2010 at 10:13 AM,  wrote:

 Hi,

 I have a question about %in% and subsettin data frames.

 Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

 dat - data.frame(ID = 1:10, var = 1:10)
 someID - c(1,2,4,5,10)
 subset(dat, dat$ID %in% someID)

 Is there a quick way to do the opposite, ie to do a subset that contains
 all ID but someID? Something like %not in%, which would *remove* lines with
 ID in someID?

 I am asking because I need this in a more complex example where there are
 multiple lines with the same ID (data in long format) and I need to remove
 selected ID.

 thanks,

 MP

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets, %in%

2010-11-05 Thread Seeliger . Curt
 Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
   dat - data.frame(ID = 1:10, var = 1:10)
   someID - c(1,2,4,5,10)
   subset(dat, dat$ID %in% someID)
 Is there a quick way to do the opposite ...
 

Two operators spring to mind: ! and %nin
subset(dat, !(dat$ID %in% someID))
subset(dat, dat$ID %nin% someID)


-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsets with a small cardinality for variable selection

2010-10-06 Thread CZ

Hello,

I am working on a variable selection problem and would like to have some
suggestions.  Thank you. 

In my data, the number of observations/samples is much less than the number
of variables.  And I am not interested in generating only a few models, 
instead I will need a couple of hundred models.  For each model, I only need
a fixed number of variables, in other word, with a specific cardinality. 

I've tried leaps(subselect package) and regsubsets(leaps package).  However,
I have to reduce the number of variables is using leaps in subselect package
which is not I want  and the regsubsets in leaps package doesn't read a
specific cardinality. It accepts a maximal subset size. 

Thank you.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/subsets-with-a-small-cardinality-for-variable-selection-tp2965552p2965552.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsets problem

2009-02-08 Thread glenn
Help with this much appreciated

 

I have a large dataframe that I would like to subset where the constraint

 

Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates
that must be matched to create Test1.

 

I would like to perform an operation on Test1 that results in a single
column of data. So far so good.

 

How do loop through all values in the uniques list (say there is 50),
perform an operationon Test1Test50, and then bolt all the lists together
in a single list please ?

 

Regards

 

 

Glenn

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets problem

2009-02-08 Thread Sundar Dorai-Raj
you can try

lapply(lapply(uniques, function(x) subset(df, date == x)), myfun)

or possibly more accurate (subset may be finicky due to scoping):

lapply(lapply(uniques, function(x) df[df$date == x, ]), myfun)

or use ?split

lapply(split(df, df$date), myfun)

HTH,

--sundar

On Sun, Feb 8, 2009 at 5:00 PM, glenn g1enn.robe...@btinternet.com wrote:
 Help with this much appreciated



 I have a large dataframe that I would like to subset where the constraint



 Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates
 that must be matched to create Test1.



 I would like to perform an operation on Test1 that results in a single
 column of data. So far so good.



 How do loop through all values in the uniques list (say there is 50),
 perform an operationon Test1Test50, and then bolt all the lists together
 in a single list please ?



 Regards





 Glenn




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsets problem

2009-02-08 Thread David Winsemius
See if this illustration using the %in% operator within subset() is  
helpful:


 df1 - data.frame(x=1:10, y=sample(c(a,b,c), 10,  
replace=TRUE)  )

 uniques - list(a,b)

 Test1 - subset(df1, y %in% uniques)
 Test1
  x y
1 1 b
4 4 a
5 5 b
6 6 b
7 7 a
9 9 a

Next question of course is whether you were using the word list in  
an r-specific fashion? Fortunately, I think %in% will also work with  
vector input.


You might not want to make 50 Testn's. That would be very much  
against the spirit of R. Provide a simpler example involving 3 or 4  
lists and someone might step up and solve it. Of course, I may have  
given you a one step solution if you were thinking that uniques[[1]]  
was a single number.


Might be best to name your dataframe something other than df which is  
also valid function name for the density of the F distribution.


--
David Winsemius

On Feb 8, 2009, at 8:00 PM, glenn wrote:


Help with this much appreciated



I have a large dataframe that I would like to subset where the  
constraint




Test1 - subset(df, date == uniques[[1]]), where uniques is a list  
of dates

that must be matched to create Test1.



I would like to perform an operation on Test1 that results in a single
column of data. So far so good.



How do loop through all values in the uniques list (say there is 50),
perform an operationon Test1Test50, and then bolt all the lists  
together

in a single list please ?



Regards





Glenn




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.