Hi Jan and TJ, Thank you very much for your comments. Yes, I'm going to use fingerprints, but I was hoping to use UTL_RAW bitwise operation to handle them (we'll see how this goes). What worries me that invoking structure matching via PYPL for each molecule would be slow, do you see any way of doing it batchwise? (for example, using oracle's table functions) Best wishes, Michal
On 13 March 2015 at 07:50, Jan Holst Jensen <[email protected]> wrote: > Hi Michal and TJ, > > The nice thing about Postgres extensions is that they are loaded directly > into the session's process space. Therefore the overhead is minimal, almost > non-existing. Not so with Oracle cartridges/extensions that are loaded in a > separate process, the extproc process. > > The overhead per call into PYPL is on the order of tens of microseconds, > which could be a lot or not, depending on how many calls you do and what > kind of calls. > > I have tried to do a naïve SSS search with PYPL and HasSubstructMatch() on a > database of 70 000 compounds (seventy thousand) and it took several minutes > to complete so it was not really usable. If you need any kind of speed you > need to use fingerprints to find an initial hit list, and you need to pass > fingerprints in bulk to PYPL to avoid too much call overhead. > >> Do consecutive pypl calls always share the same interpreter? > > On Oracle 10g and 11g, yes. I do have a disclaimer that it might not be the > case if you run shared server, but in my experience even shared server > ensures that each session gets its own private instance of an interpreter > (its own extproc process). And, if you run a multi-threaded extproc > configuration then there are no guarantees, but I don't know anyone who does > that. > > On 12c I just don't know yet. The little I have done with it seems to > indicate that it behaves like 10 and 11, so looking good so far. > > Cheers > -- Jan > > On 2015-03-13 00:43, TJ O'Donnell wrote: > > I've implemented a suite of rdkit functions > for postgres using plpython > https://github.com/tjod/rdchord > and the overhead is minimal > since most of the heavy lifting of substructure searching > is done by rdkit. > > I think the same would be true of oracle. > ------------------- > TJ O'Donnell > > On Thu, Mar 12, 2015 at 4:24 PM, Michal Krompiec <[email protected]> > wrote: >> >> Hello, has anybody tried to implement substructure searching in an Oracle >> database using PYPL and RDKit? Is it just a matter of writing a wrapper >> function for molecule.HasSubstructMatch(pattern) or is the overhead of >> calling pypl each time too costly timewise? Do consecutive pypl calls always >> share the same interpreter? >> Best wishes, >> Michal >> >> > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

