On 2011-09-13 19:40, Artur Wroblewski wrote:
> On Tue, Sep 13, 2011 at 7:01 AM, Laurent Gautier<lgaut...@gmail.com>  wrote:
>> On 2011-09-12 21:16, Artur Wroblewski wrote:
>>> On Mon, Sep 12, 2011 at 12:26 PM, Laurent Gautier<lgaut...@gmail.com>
>>>   wrote:
>>>> Probably not.
>>>>
>>>> R is doing a lot of things behind the hood. Sometimes it is good,
>>>> sometimes
>>>> it is bad.
>>>> The code snippet given to you has a quadratic time-complexity ( O(nm) ).
>>>> It
>>>> can be make linearithmic ( O(n log(m) ) ) simply:
>>>>
>>>> from rpy2.robjects.vector import BoolVector
>>>> ref = set(differential)
>>>> select_b = BoolVector(tuple(x in ref for x in source.rx2('gene')))
>>>> mysubset = source.rx(select_b, True)
>>> If I reckon well BoolVector(...genexpr...) is not possible here due to
>>> R API limitation - we need length of iterable, isn't it?
>> You might have missed the call to tuple(). This will make it work
>> independently of having a generator with a length.
>>>>> tuple(x for x in [1,2,3])
>> (1, 2, 3)
> In case of Luca's data (please correct me if I am wrong):
> 1. Tuple with 2mln items is created.
> 2. The tuple is iterated to created BoolVector with 2mln items.
> 3. The vector (and hopefully) data frame are iterated to filter data.
>

In the code snippet you commented on, a BoolVector is created from a 
tuple and that BoolVector used to extract a subset.
During the construction of the BoolVector the presence of an element is 
now checked against a reference set (and lookup assumed to be of 
time-complexity O(log(n)) ), rather than against a iterable in the first 
code snippet (then the lookup is of time complexity O(n) ).

For reference, the snippet was:

from rpy2.robjects.vector import BoolVector
ref = set(differential)
select_b = BoolVector(tuple(x in ref for x in source.rx2('gene')))
mysubset = source.rx(select_b, True)




>>> It seems like copying R on Python level is not always nice
>>> and can be quite inefficient.
>>>
>>> Above probably could be rewritten in more Pythonic way (it would
>>> be more efficient I believe, as well)
>>>
>>>     mysubset = source.rx((x in ref for x in source.rx2('gene')), True)
>>>
>>> or
>>>
>>>     mysubset = DataFrame(row for row in source if row['gene'] in ref)
>>>
>>> but of course is not supported by rpy2.
>>>
>>> Is there a chance to make rpy2 bit more Python integrated? :)
>> As Luca wrote it earlier, what is missing for it to work is that the
>> generator had a length.
> Let's say
>
>      select_b = BoolVector(...gen..., n)
>
> Will above create bool vector with 2mln items?

I don't think so. The constructor for BoolVector does not take a second 
argument that would be the length.

> On other side
>
>      mysubset = source.rx(...gen..., True)
>
> could allow to avoid generating of large vectors/tuples, isn't it?

It would. Unfortunately this would mean that a vanilla R can operate 
with iterators (for that matter extract subsets of a vector)... and R can't.
Skipping the construction of a large tuple could be avoided with a 
custom iterator with a length for R vectors (note that for example 
IntVector(xrange(100)) will work), but not the construction of a large 
BoolVector.
[well, technically there is a way, may using the R package 'iterators', 
but that would make it more like an extension to the current rpy2 as 
long as 'iterators' is not part of the default R distribution].

>> This can probably be addressed by adding a custom iterator for R vectors,
>> and should I appear a little slow to have it implemented you are welcome to
>> submit a patch. ;-)
> Depends on amount of knowledge of R internals required. ;) ;P

You are in luck then: almost no R internals required for allowing the 
creation of generators-with-length from R vectors, you should be able to 
put something together with pure Python to get something working at the 
rpy2.robjects level. A more definitive implementation will require C and 
be at the rpy2.rinterface level.


Best,


Laurent


> Best regards,
>
> w
>
> ------------------------------------------------------------------------------
> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> Learn about the latest advances in developing for the
> BlackBerry&reg; mobile platform with sessions, labs&  more.
> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
> http://p.sf.net/sfu/rim-devcon-copy1
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list


------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the 
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1 
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to