Dear R-help list,
I have a problem regarding text manipulation in R, where my basic knowledge
doesn't suffice anymore. It might be a bigger problem, but any help would be
greatly appreciated and acknowledged.
As input, I have a character string representing some Boolean function, such
as
bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens
jim holtman
Verzonden: dinsdag 20 mei 2014 2:44
Aan: Marlin Keith Cox
CC: r-help@r-project.org
Onderwerp: Re: [R] Subsets of a function
It would have been nice if you at least supplied a subset of the data
Hi all, this is a reoccurring theme in my programming and I need some help
with it. When I use a built in function and need to use it on a subset of
my data frame, I always end up using the subset function first, but this
seems very clunky. For example, if I have years 2003:2013 with season a
Have you read An Introduction to R and sections on indexing (?[)
where this is discussed. Have you read about apply type functions
there like ?tapply. If not, don't you think you should. If so, read
again.
Cheers,
Bert
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
Data is not
It would have been nice if you at least supplied a subset of the data, but
here is a try at it:
myList - split(size, list(size$Year, size$Season))
result - lapply(myList, function(.sub){
smooth.spline(.sub$Size, spar = 0.25)
})
Jim Holtman
Data Munger Guru
What is the problem that you
require(data.table)
DT = as.data.table(df)
# 1. Patients with ah and ihd
DT[,.SD[ah%in%diagnosis ihd%in%diagnosis],by=id]
id diagnosis
[1,] 2ah
[2,] 2 ihd
[3,] 2im
[4,] 4ah
[5,] 4 ihd
[6,] 4angina
# 2. Patients with ah but no ihd
Dear R people
Could you please help.
Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like
id diagnosis
1 ah
2 ah
2 ihd
2 im
3 ah
3 stroke
4 ah
4 ihd
4 angina
5
Hi!
I think you should read the intro to R, as well as ?[ and ?subset. It
should help you to understand.
Let's say your data is in a data.frame called df:
# 1. ah and ihd
df_ah_ihd - df[df$diagnosis==ah | df$diagnosis==ihd, ] ## the |
is the boolean OR (you want one OR the other). Note the
I don't think Ivan's solution meets the OP's needs.
I think you could do it using %in% and the approriate logical operations
e.g.
aDF - data.frame(id=c(1,2,2,2,3,3,4,4,4,5),
diagnosis=c(ah, ah, ihd, im, ah, stroke, ah, ihd,
angina, ihd))
aDF[with(aDF,(id %in% id[diagnosis==ah]) (id %in%
Try this:
lapply(list(c('ah', 'ihd'), 'ah', 'ihd'), function(x)subset(aDF, diagnosis
== x))
On Thu, Jan 20, 2011 at 6:53 AM, Den d.kazakiew...@gmail.com wrote:
Dear R people
Could you please help.
Basically, there are two variables in my data set. Each patient ('id')
may have one or more
Hello Den,
your problem is not as it may seem so Ivan's suggestion is only a partial
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on
particular conditions.
Thus, simply looking for ah or idh as Ivan suggests will yield
I did try it. It gave me
[[1]]
id diagnosis
1 1ah
5 3ah
7 4ah
8 4 ihd
10 5 ihd
[[2]]
id diagnosis
1 1ah
2 2ah
5 3ah
7 4ah
[[3]]
id diagnosis
3 2 ihd
8 4 ihd
10 5 ihd
Which isn't what
Hi Taras,
Indeed, I've overlooked the problem. Anyway, I'm not sure I would have
been able to give a complete answer like you did!
Ivan
Le 1/20/2011 11:05, Taras Zakharko a écrit :
Hello Den,
your problem is not as it may seem so Ivan's suggestion is only a partial
answer. I see that each
On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote:
Dear R people
Could you please help.
Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like
iddiagnosis
1 ah
2 ah
2 ihd
2 im
3 ah
On 2011-01-20 02:05, Taras Zakharko wrote:
Hello Den,
your problem is not as it may seem so Ivan's suggestion is only a partial
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on
particular conditions.
Thus, simply looking
Hi,
I have a question about %in% and subsettin data frames.
Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
dat - data.frame(ID = 1:10, var = 1:10)
someID - c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)
Is there a quick way to do the opposite, ie to do a subset that
Well, %in% returns a logical vector...
So
subset(dat, ! ID %in% someID)
Also, from ?subset:
Note
that ‘subset’ will be evaluated in the data frame, so columns can
be referred to (by name) as variables in the expression
Thus, you don't need 'dat$ID', bur just 'ID' in the subset
when its empty? Does the room,
the thing itself have purpose? Or do we, what's the word... imbue it.
- Jubal Early, Firefly
From:
mp.sylves...@gmail.com
To:
r-help@r-project.org
Date:
11/05/2010 02:21 PM
Subject:
[R] subsets, %in%
Sent by:
r-help-boun...@r-project.org
Hi,
I have
Hi MP,
Try
subset(dat, ! dat$ID %in% someID) # ! symbol
HTH,
Jorge
On Fri, Nov 5, 2010 at 10:13 AM, wrote:
Hi,
I have a question about %in% and subsettin data frames.
Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
dat - data.frame(ID = 1:10, var = 1:10)
someID
Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
dat - data.frame(ID = 1:10, var = 1:10)
someID - c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)
Is there a quick way to do the opposite ...
Two operators spring to mind: ! and %nin
subset(dat, !(dat$ID %in% someID))
Hello,
I am working on a variable selection problem and would like to have some
suggestions. Thank you.
In my data, the number of observations/samples is much less than the number
of variables. And I am not interested in generating only a few models,
instead I will need a couple of hundred
Help with this much appreciated
I have a large dataframe that I would like to subset where the constraint
Test1 - subset(df, date == uniques[[1]]), where uniques is a list of dates
that must be matched to create Test1.
I would like to perform an operation on Test1 that results in a
you can try
lapply(lapply(uniques, function(x) subset(df, date == x)), myfun)
or possibly more accurate (subset may be finicky due to scoping):
lapply(lapply(uniques, function(x) df[df$date == x, ]), myfun)
or use ?split
lapply(split(df, df$date), myfun)
HTH,
--sundar
On Sun, Feb 8, 2009
See if this illustration using the %in% operator within subset() is
helpful:
df1 - data.frame(x=1:10, y=sample(c(a,b,c), 10,
replace=TRUE) )
uniques - list(a,b)
Test1 - subset(df1, y %in% uniques)
Test1
x y
1 1 b
4 4 a
5 5 b
6 6 b
7 7 a
9 9 a
Next question of course is whether you
24 matches
Mail list logo