Here are my changes:
http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?r1=4758&r2=4763&
Craig
On Thu, May 24, 2012 at 8:00 AM, Craig James <cja...@emolecules.com> wrote:
> On Thu, May 24, 2012 at 7:45 AM, Craig James <cja...@emolecules.com>wrote:
>
>> On Thu, May 24, 2012 at 6:03 AM, Noel O'Boyle <baoille...@gmail.com>wrote:
>>
>>> Looking at "blame" online, it seems that Tim made the change to allow
>>> ferrocene structures to be normalised for the eMolecule
>>> canonicalisation tests:
>>>
>>>
>>> http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?r1=4033&r2=4034&
>>>
>>> IMO, structure normalisations like this are probably best done as an
>>> op (like the normalisation of dative bonds is currently) rather than
>>> inline in the smilesformat.
>>>
>>
>> Interesting. Even more interesting is that this ferrocene code was taken
>> out at some point.
>>
>>
>> http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?r1=4034&r2=4758&
>>
>> I have a vague recollection of a discussion about the wisdom of doing
>> this in OpenBabel versus adding either normalization plug-ins or letting
>> the application decide how to do it, but I could be wrong.
>>
>> I'll look in detail at the rest of the changes to smilesformat.cpp and
>> see if I can spot any reason for copying the molecule. Since there's only
>> one BeginModify/EndModify anywhere in the file, and all it does is add
>> hydrogens, I suspect it will be safe to remove the copying operation.
>>
>
> I scanned through all the changes, and it seems to be nice improvements to
> stereo handling -- in particular, cis/trans double bonds on ring-closure
> digits, and correct chirality when there's just a lone pair instead of a
> hydrogen.
>
> The ferrocene code is gone. I suspect that when it was removed, the copy
> operation was left behind accidentally.
>
> Craig
>
>
>
>> In fact, whether or not it's safe, I have to do it. It's dramatically
>> slower for fullerenes, but even for ordinary molecules it slows me down by
>> a factor of 5 to 100.
>>
>> I should probably devise a test application to add to OpenBabel so that
>> this doesn't happen again. Is there some way to add a test to the system
>> where success is defined by performance rather than correct output?
>>
>> In fact, I tested it by removing the copy operation, and it improved the
>> speed dramatically ... the roughly 10,000 slowdown disappeared completely
>> on analyzing fullerenes.
>>
>> Craig
>>
>>>
>>> I haven't checked whether there are other modifications to the
>>> structure, but it would be easy to test with a before and after
>>> comparison.
>>>
>>> - Noel
>>>
>>> On 23 May 2012 23:50, Craig James <cja...@emolecules.com> wrote:
>>> > More information on this question...
>>> >
>>> > On Wed, May 23, 2012 at 3:05 PM, Craig James <cja...@emolecules.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I've run into a case where one of my algorithms runs something like
>>> 10,000
>>> >> time slower in 2.3.x than it did in 2.2.x, and running valgrind with
>>> the
>>> >> "callgrind" tool seems to indicate that aromaticity detection is the
>>> >> problem.
>>> >>
>>> >> Here's the problem: Normally when you generate a SMILES, it detects
>>> >> aromaticity and you're done. But in my case, I'm generating
>>> thousands of
>>> >> "fragment SMILES" on the same molecule (taking various substructures
>>> and
>>> >> generating a SMILES for each structure using an OBBitVec of atoms).
>>> It
>>> >> appears that each time you ask for a SMILES, it discards the
>>> aromaticity and
>>> >> starts over, even though the molecule hasn't changed.
>>> >
>>> >
>>> > It looks like the source of the problem is here:
>>> >
>>> > SMIBaseFormat::WriteMolecule(OBBase* pOb,OBConversion* pConv)
>>> > {
>>> > OBMol* pmol = dynamic_cast<OBMol*>(pOb);
>>> > ....
>>> > OBMol mol = *pmol;
>>> >
>>> > In other words, it makes a fresh new copy of the molecule object every
>>> time
>>> > you ask for a SMILES. The operator= method copies most of the
>>> molecule's
>>> > data, but not aromaticity, so the aromaticity has to be recomputed
>>> every
>>> > time. Plus, the time to copy the molecule object is significant.
>>> >
>>> > This turns out to be a horrible performance hit for my algorithm. I'm
>>> > generating fingerprints using fragment SMILES, and each fragment that
>>> goes
>>> > into the fingerprint requires a canonical "fragment SMILES". As
>>> mentioned
>>> > above, for large systems like fullerenes it's about 10,000 times
>>> slower.
>>> > Even for ordinary molecules it's a huge performance hit.
>>> >
>>> > As far as I can tell, there is only one place where the molecule is
>>> > modified: explicit hydrogens are added to chiral atoms to make the
>>> analysis
>>> > easier.
>>> >
>>> > Is there some reason for this change? In 2.2.x the molecule wasn't
>>> copied.
>>> >
>>> > Thanks,
>>> > Craig
>>> >
>>> >>
>>> >>
>>> >> I haven't dug into the code yet ... I wanted to check to see if this
>>> makes
>>> >> sense to anyone. In 2.2.x this didn't happen, so it's a recent
>>> change, and
>>> >> I know several OB contributors made significant improvements to
>>> >> aromaticity. My "use case" is unusual so I'm not surprised this
>>> problem
>>> >> didn't show up in anyone else's app.
>>> >>
>>> >> It is dramatic with fullerenes -- a C70 molecule takes less than a
>>> second
>>> >> to analyze in 2.2.x and over five minutes in 2.3.x.
>>> >>
>>> >> Thanks,
>>> >> Craig
>>> >>
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Live Security Virtual Conference
>>> > Exclusive live event will cover all the ways today's security and
>>> > threat landscape has changed and how IT managers can respond.
>>> Discussions
>>> > will include endpoint security, mobile security and the latest in
>>> malware
>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> > _______________________________________________
>>> > OpenBabel-Devel mailing list
>>> > OpenBabel-Devel@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>> >
>>>
>>
>>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel