Dear JP,
On Tue, May 10, 2011 at 11:30 AM, JP <[email protected]> wrote:
> Thanks for this.
> I, for one, think it is useful - not in a "parse this particular smiles
> string" fashion.
> But consider this use case. I have 8,000,000 molecules in a few hundred
> smiles files on which I am calculating descriptors on the cloud. I only
> have access to log files.
> I get some ten thousand "SMILES Parse Error" without any additional info.
> Also, I think this error should be just one line (no need to bloat log
> files with redundant static data).
Yeah, the use case is clear.
> These should have a bit of static info which is the same for both (so you
> can grep on that) and must have (on the same line) the offending smiles
> string, which you could extract easily with regex, so I suggest something
> structured like:
> In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC')
> [06:06:25] SMILES Parse Error: Ccc1XXXcCCC (reason: unknown atoms X)
> In [3]: Chem.MolFromSmiles('C1C')
> [06:06:28] SMILES Parse Error: C1C (reason: unclosed ring for input)
Providing a good reason for the failure would certainly sometimes be
useful. It is theoretically possible, but it will require a lot of
work (there are many, many reasons a SMILES could fail to parse). I
think the initial version of this is going to have to just include the
SMILES that caused the failure. Adding explanations is something that
will need to wait.
Best,
-greg
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss