On 21/05/10 00:08, John Owens wrote:
> I'd like to do two (actually three) things:
>
> 1) Using a grep-like operator, delete rows in a dataframe that match a
> particular pattern in a particular column (in my case, every row that
> has a '#' as the first character in column 'a')
> 2) Set elements in a dataframe based on the characteristics of other
> elements, across all rows (in my case, if an element in column 'c'
> is NA, set it to 2*that row's value in column 'b')
> 2a) Only do this if column 'd''s value is a particular value (in my
> case, the character 'J')
>
> I'm trying to do this with calling the R code directly using ro.r,
> but that's (a) not satisfying because I'd rather do it in python (how?),
> (b) rpy/R doesn't seem to like doing code like "df = ro.r("function(df)"),
> and (c) it doesn't work anyway.
>
> I'm having some coredump problems when instantiating the dataframe below
> with NAs in it, so forgive any errd ors in the code since I can't run it.
> Thanks for any help!
>
> JDO
>
> ==========================================================
>
> #!/usr/bin/env python2.6
> import rpy2.robjects as ro
>
> df = ro.DataFrame({'a': ro.StrVector(('# x','y','z')),
> 'b': ro.IntVector((4,5,6)),
> 'c': ro.IntVector((8,ro.NA_integer,10)),
> 'd': ro.StrVector(('I','J','K')),
> })
>
> # would like to delete all rows whose name in column 'a' begins with a '#'
> df = ro.r("df[grep('^#', sdpf[,%d], invert=TRUE),]" % \
> tuple(df.colnames).index('a'))
from rpy2.robjects.packages import importr
base = importr('base')
# if column 'a' is vector, you have to work on the levels. That's one
# more level on indirection, so I avoid it for the sake of clarity.
df = ro.DataFrame({'a': base.I(ro.StrVector(('# x','y','z'))),
'b': ro.IntVector((4,5,6)),
'c': ro.IntVector((8,ro.NA_integer[0],10)),
'd': base.I(ro.StrVector(('I','J','K'))),
})
# leave out rows with elements of 'a' starting with '#'
base.subset(df, base.parse(text='! grepl("^#", a)'))
# same with more logic done in Python
df.rx(ro.BoolVector([not x.startswith("#") for x in df.rx2('a')]),
True)
> # would like to set all NAs in 'c' to 2*value in 'b'
> df = ro.r("ifelse(is.na(df$c), 2*df$b, df$c)")
>
> # would really like to do this only if column 'd' is 'J' - not sure how
>
Something like:
def myfunc(i, df):
if df.rx2('a')[i].startswith("#") and df.rx2('d') == 'J':
rx2('c')[i] = rx2('b')[i] * 2
for i in range(df.nrow):
myfunc(i, df)
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> rpy-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rpy-list
------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list