I'd like to do two (actually three) things: 1) Using a grep-like operator, delete rows in a dataframe that match a particular pattern in a particular column (in my case, every row that has a '#' as the first character in column 'a') 2) Set elements in a dataframe based on the characteristics of other elements, across all rows (in my case, if an element in column 'c' is NA, set it to 2*that row's value in column 'b') 2a) Only do this if column 'd''s value is a particular value (in my case, the character 'J')
I'm trying to do this with calling the R code directly using ro.r, but that's (a) not satisfying because I'd rather do it in python (how?), (b) rpy/R doesn't seem to like doing code like "df = ro.r("function(df)"), and (c) it doesn't work anyway. I'm having some coredump problems when instantiating the dataframe below with NAs in it, so forgive any errors in the code since I can't run it. Thanks for any help! JDO ========================================================== #!/usr/bin/env python2.6 import rpy2.robjects as ro df = ro.DataFrame({'a': ro.StrVector(('# x','y','z')), 'b': ro.IntVector((4,5,6)), 'c': ro.IntVector((8,ro.NA_integer,10)), 'd': ro.StrVector(('I','J','K')), }) # would like to delete all rows whose name in column 'a' begins with a '#' df = ro.r("df[grep('^#', sdpf[,%d], invert=TRUE),]" % \ tuple(df.colnames).index('a')) # would like to set all NAs in 'c' to 2*value in 'b' df = ro.r("ifelse(is.na(df$c), 2*df$b, df$c)") # would really like to do this only if column 'd' is 'J' - not sure how ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list