On 2011-09-13 19:40, Artur Wroblewski wrote: > On Tue, Sep 13, 2011 at 7:01 AM, Laurent Gautier<lgaut...@gmail.com> wrote: >> On 2011-09-12 21:16, Artur Wroblewski wrote: >>> On Mon, Sep 12, 2011 at 12:26 PM, Laurent Gautier<lgaut...@gmail.com> >>> wrote: >>>> Probably not. >>>> >>>> R is doing a lot of things behind the hood. Sometimes it is good, >>>> sometimes >>>> it is bad. >>>> The code snippet given to you has a quadratic time-complexity ( O(nm) ). >>>> It >>>> can be make linearithmic ( O(n log(m) ) ) simply: >>>> >>>> from rpy2.robjects.vector import BoolVector >>>> ref = set(differential) >>>> select_b = BoolVector(tuple(x in ref for x in source.rx2('gene'))) >>>> mysubset = source.rx(select_b, True) >>> If I reckon well BoolVector(...genexpr...) is not possible here due to >>> R API limitation - we need length of iterable, isn't it? >> You might have missed the call to tuple(). This will make it work >> independently of having a generator with a length. >>>>> tuple(x for x in [1,2,3]) >> (1, 2, 3) > In case of Luca's data (please correct me if I am wrong): > 1. Tuple with 2mln items is created. > 2. The tuple is iterated to created BoolVector with 2mln items. > 3. The vector (and hopefully) data frame are iterated to filter data. >
In the code snippet you commented on, a BoolVector is created from a tuple and that BoolVector used to extract a subset. During the construction of the BoolVector the presence of an element is now checked against a reference set (and lookup assumed to be of time-complexity O(log(n)) ), rather than against a iterable in the first code snippet (then the lookup is of time complexity O(n) ). For reference, the snippet was: from rpy2.robjects.vector import BoolVector ref = set(differential) select_b = BoolVector(tuple(x in ref for x in source.rx2('gene'))) mysubset = source.rx(select_b, True) >>> It seems like copying R on Python level is not always nice >>> and can be quite inefficient. >>> >>> Above probably could be rewritten in more Pythonic way (it would >>> be more efficient I believe, as well) >>> >>> mysubset = source.rx((x in ref for x in source.rx2('gene')), True) >>> >>> or >>> >>> mysubset = DataFrame(row for row in source if row['gene'] in ref) >>> >>> but of course is not supported by rpy2. >>> >>> Is there a chance to make rpy2 bit more Python integrated? :) >> As Luca wrote it earlier, what is missing for it to work is that the >> generator had a length. > Let's say > > select_b = BoolVector(...gen..., n) > > Will above create bool vector with 2mln items? I don't think so. The constructor for BoolVector does not take a second argument that would be the length. > On other side > > mysubset = source.rx(...gen..., True) > > could allow to avoid generating of large vectors/tuples, isn't it? It would. Unfortunately this would mean that a vanilla R can operate with iterators (for that matter extract subsets of a vector)... and R can't. Skipping the construction of a large tuple could be avoided with a custom iterator with a length for R vectors (note that for example IntVector(xrange(100)) will work), but not the construction of a large BoolVector. [well, technically there is a way, may using the R package 'iterators', but that would make it more like an extension to the current rpy2 as long as 'iterators' is not part of the default R distribution]. >> This can probably be addressed by adding a custom iterator for R vectors, >> and should I appear a little slow to have it implemented you are welcome to >> submit a patch. ;-) > Depends on amount of knowledge of R internals required. ;) ;P You are in luck then: almost no R internals required for allowing the creation of generators-with-length from R vectors, you should be able to put something together with pure Python to get something working at the rpy2.robjects level. A more definitive implementation will require C and be at the rpy2.rinterface level. Best, Laurent > Best regards, > > w > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > Learn about the latest advances in developing for the > BlackBerry® mobile platform with sessions, labs& more. > See new tools and technologies. Register for BlackBerry® DevCon today! > http://p.sf.net/sfu/rim-devcon-copy1 > _______________________________________________ > rpy-list mailing list > rpy-list@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list ------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1 _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list