Re: [Rpy] dataframe: removing rows and changing internal elements

Laurent Thu, 20 May 2010 22:52:56 -0700

On 21/05/10 00:08, John Owens wrote:
> I'd like to do two (actually three) things:
>
> 1) Using a grep-like operator, delete rows in a dataframe that match a
>     particular pattern in a particular column (in my case, every row that
>     has a '#' as the first character in column 'a')
> 2) Set elements in a dataframe based on the characteristics of other
>     elements, across all rows (in my case, if an element in column 'c'
>     is NA, set it to 2*that row's value in column 'b')
> 2a) Only do this if column 'd''s value is a particular value (in my
>      case, the character 'J')
>
> I'm trying to do this with calling the R code directly using ro.r,
> but that's (a) not satisfying because I'd rather do it in python (how?),
> (b) rpy/R doesn't seem to like doing code like "df = ro.r("function(df)"),
> and (c) it doesn't work anyway.
>
> I'm having some coredump problems when instantiating the dataframe below
> with NAs in it, so forgive any errd ors in the code since I can't run it.
> Thanks for any help!
>
> JDO
>
> ==========================================================
>
> #!/usr/bin/env python2.6
> import rpy2.robjects as ro
>
> df = ro.DataFrame({'a': ro.StrVector(('# x','y','z')),
>                     'b': ro.IntVector((4,5,6)),
>                     'c': ro.IntVector((8,ro.NA_integer,10)),
>                     'd': ro.StrVector(('I','J','K')),
>                     })
>
> # would like to delete all rows whose name in column 'a' begins with a '#'
> df = ro.r("df[grep('^#', sdpf[,%d], invert=TRUE),]" % \
> tuple(df.colnames).index('a'))


from rpy2.robjects.packages import importr
base = importr('base')

# if column 'a' is vector, you have to work on the levels. That's one
# more level on indirection, so I avoid it for the sake of clarity.
df = ro.DataFrame({'a': base.I(ro.StrVector(('# x','y','z'))),
                    'b': ro.IntVector((4,5,6)),
                    'c': ro.IntVector((8,ro.NA_integer[0],10)),
                    'd': base.I(ro.StrVector(('I','J','K'))),
                    })

# leave out rows with elements of 'a' starting with '#'
base.subset(df, base.parse(text='! grepl("^#", a)'))

# same with more logic done in Python
df.rx(ro.BoolVector([not x.startswith("#") for x in df.rx2('a')]),
                      True)


> # would like to set all NAs in 'c' to 2*value in 'b'
> df = ro.r("ifelse(is.na(df$c), 2*df$b, df$c)")
>
> # would really like to do this only if column 'd' is 'J' - not sure how
>

Something like:

def myfunc(i, df):
   if df.rx2('a')[i].startswith("#") and df.rx2('d') == 'J':
     rx2('c')[i] = rx2('b')[i] * 2

for i in range(df.nrow):
   myfunc(i, df)


>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list


------------------------------------------------------------------------------

_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Re: [Rpy] dataframe: removing rows and changing internal elements

Reply via email to