Very helpful, thanks!

As for having data in the "wrong" order, it's a little odd that a datafile 
that's perfect for loading into R as a dataframe (via read.table), is 
inherently in the "wrong" order for dataframe creation after reading it 
into python (using numpy.genfromtext(), or f.readlines() or whatever).

As for NAs, those I can do when I set up my data prior to looping. So, 
those shouldn't be a problem. THanks for giving me a clue on what to use 
in python/rnumpy to get a proper NA conversion ... that's been far from 
obvious.

I'll head off to do some tinkering and profiling now.

-best
Gary

On Tue, 29 Sep 2009, Nathaniel Smith wrote:

> On Tue, Sep 29, 2009 at 4:21 AM, Gary Strangman
> <str...@nmr.mgh.harvard.edu> wrote:
>> Without benchmarking, that seems mighty inefficient. Nathaniel Smith's
>> rnumpy mostly allows the following:
>>
>> df = rnumpy.r.data_frame(numpy.array(d,np.object))
>>
>> ... which is 2 conversions (rather than 4), but I haven't been able to get
>> the column names attached in this case. (My inexperience, I'm sure.)
>
> Something like
>  d_array = np.array(d, dtype=object)
>  named_columns = dict([(name, d_array[:, i]) for i, name in
> enumerate(colnames)])
>  df = rnumpy.r.data_frame(**named_columns)
> should work, and doesn't add any extra data copies compared to what
> you had (because numpy slicing is cheap).
>
> Fundamentally there isn't that much you can do to make this
> crazy-fast, though. You have a jumble of individual Python objects
> laid out in the wrong way in memory, and that's going to take some
> Python-land fiddling around to get that straightened out. You have to
> transpose the data somehow -- I don't know whether a Python loop or
> copying the data into a np.array is faster. Then you have to detect
> the Python object types and convert them into R. rnumpy uses numpy's
> type detection code; this code is in C and quite fast, but it also has
> to be conservative and check the entire column to make sure that
> everything is of the same type, and it copies the data into a np.array
> again before calling rinterface.*SexpVector -- you might be able to
> beat it since you know your data is uniform and can construct the
> SexpVector directly from the list/object array.
>
> Also, if you need proper NA handling, you need another pass to handle
> that, and making that pass work may defeat all your optimization
> tricks. How exactly this should work depends on how you've coded NA's
> in your data. (I see from your example that you use np.nan for a float
> column, but what about other data types?) Note that np.nan is not the
> same as R's NA. (However, you can pull a float representing NA out of
> R and use it when building your data structures in the first place --
> in rnumpy it's rnumpy.NA_NUMERIC[0]. It will look like a NaN to
> Python, but convert properly.)
>
> I'd suggest writing something that works, getting the analysis
> pipeline running, and then profiling to see if this part matters...
>
> -- Nathaniel
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list
>
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to