Re: [Rpy] Handling NAs in data.frames and vectors

Laurent Gautier Fri, 15 Jan 2010 08:42:09 -0800

Hi Luca,

Unfortunately this does not seem to be caused by your installation.


The problem exists for IntVector, and FactorVector inherits from it. Few 
features are likely missing from FactorVector, but the good thing is 
that they already can be implemented simply.

Let's take an example:

import rpy2.robjects as ro
fcr = ro.r('factor(c("a", "b", NA, "a", NA))')

'fcr' is now a FactorVector, that is an IntVector with levels.

 >>> list(fcr)
[1, 2, -2147483648, 1, -2147483648]

That large negative integer is the one used by R to encode missing 
"integer" values:

 >>> ro.NA_integer[0]
-2147483648

What is happening when doing 'list(fcr)' is that fcr will be iterated 
through and each element stored into a result Python list.
The issue is that Python does not have a "missing integer" value, but
that should not stop us from writing a simple function to deal with it 
as needed.

def as_character_list(factor):
     na_val = ro.NA_integer[0]
     res = [None, ] * len(factor)
     for i, elt in enumerate(factor):
         if elt != na_val:
             #NOTE: R is using 1-offset indices
             res[i] = factor.levels[elt-1]
     return res

 >>> as_character_list(fcr)
['a', 'b', None, 'a',  None]


What we have implemented is a variant of the R base function 
"as.character.factor":

from rpy2.robjects.packages import importr
base = importr("base")

 >>>list(base.as_character(fcr))
['a', 'b', 'NA', 'a', 'NA']



L.




On 1/15/10 2:36 PM, Luca Beltrame wrote:
> Hello,
>
> in my code, I need to convert the columns from a robjects.DataFrame to other
> data types (list, for example). Howver, I've found a problem when dealing with
> data that contains NAs. In particular, I'm referring to non-numeric columns,
> that are represented as FactorVectors.
>
> Example code:
>
> import rpy2.robjects as robjects
>
> data = robjects.DataFrame.from_csvfile("file_with_NAs_in_columns", sep="\t")
>
> column_with_na = data.rx2("Column")
>
> print column_with_na
>
> [1]<NA>  <NA>  <NA>  some_value
> Levels: some_value
>
> and If I issue
>
> print column_with_na[0]
>
> I get:
> -2147483648
>
> And of course, accessing the levels I only get some_value. Converting to other
> types of Vector doesn't seem to help.
>
> Notice that this works if I do
>
> base = importr("base")
> column_value = base.as_vector(column_with_na)
> column_value = list(column_value)
> print column_value
> ['NA', 'NA', 'NA', 'some_value']
>
> Is there a way to translate the column *including* the NAs, into a Python list
> without doing the hackish way described above?
>
> This is with RPy 2.1 alpha 2. I admit that there may be a problem with my
> installation as I'm running a local copy of rpy2 2.1 as I still have a system-
> wide 2.0.x needed for some projects.
>
>
>
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
>
>
>
> _______________________________________________
> rpy-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rpy-list


------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list

Re: [Rpy] Handling NAs in data.frames and vectors

Reply via email to