Re: [Rdkit-devel] polymers

2014-12-30 Thread Igor Filippov
Thank you, Greg!
Perhaps I should have posted this on rdkit-discuss, there seems to be more
traffic there usually
and the topic would get more notice.

Igor

On Tue, Dec 30, 2014 at 12:55 PM, Greg Landrum 
wrote:

>
>
> On Mon, Dec 29, 2014 at 12:42 PM, Igor Filippov  > wrote:
>
>> Greg,
>>
>> Thanks for the fast reply, as always!
>>
>> > I could imagine a couple solutions to this:
>> > 1) adding additional arguments to the mol file parser that allows
>> calling code to specify that they are willing to accept polymers and then
>> using some new data structure to return info about the polymer.
>> > 2) extending the applicability of the "strictParsing" flag (this
>> already exists) to disable the tests for S groups and either just ignore
>> them or return them as molecule properties.
>>
>>
>>> I think I would prefer 1) personally but I can live with 2) if that's
>> what the community chooses.
>> How difficult it would be to implement?
>>
>
> 1) is probably not a huge amount of work, once we figure out the
> appropriate data structure for the polymer info, but it's certainly more
> than 2), which probably only takes an hour or so.
>
> Given that the polymer problem is a somewhat larger one, and it would be
> nice to solve "right", I'd be inclined to go with 2) as an interim solution
> and start a conversation around the polymer representation if there's real
> interest there.
>
> -greg
>
>
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


Re: [Rdkit-devel] polymers

2014-12-30 Thread Greg Landrum
On Mon, Dec 29, 2014 at 12:42 PM, Igor Filippov 
wrote:

> Greg,
>
> Thanks for the fast reply, as always!
>
> > I could imagine a couple solutions to this:
> > 1) adding additional arguments to the mol file parser that allows
> calling code to specify that they are willing to accept polymers and then
> using some new data structure to return info about the polymer.
> > 2) extending the applicability of the "strictParsing" flag (this already
> exists) to disable the tests for S groups and either just ignore them or
> return them as molecule properties.
>
>
>> I think I would prefer 1) personally but I can live with 2) if that's
> what the community chooses.
> How difficult it would be to implement?
>

1) is probably not a huge amount of work, once we figure out the
appropriate data structure for the polymer info, but it's certainly more
than 2), which probably only takes an hour or so.

Given that the polymer problem is a somewhat larger one, and it would be
nice to solve "right", I'd be inclined to go with 2) as an interim solution
and start a conversation around the polymer representation if there's real
interest there.

-greg
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


Re: [Rdkit-devel] polymers

2014-12-29 Thread Igor Filippov
Greg,

Thanks for the fast reply, as always!

> I could imagine a couple solutions to this:
> 1) adding additional arguments to the mol file parser that allows calling
code to specify that they are willing to accept polymers and then using
some new data structure to return info about the polymer.
> 2) extending the applicability of the "strictParsing" flag (this already
exists) to disable the tests for S groups and either just ignore them or
return them as molecule properties.



I think I would prefer 1) personally but I can live with 2) if that's what
the community chooses.
How difficult it would be to implement?

Igor
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


Re: [Rdkit-devel] polymers

2014-12-28 Thread Greg Landrum
Hi Igor,

On Mon, Dec 29, 2014 at 3:43 AM, Igor Filippov 
wrote:

> I was wondering how complicated it would be to add
> the ability to read polymers from molfiles. Right now I am getting
> something like this:
> Unhandled CTAB feature: S group SRU on line: 75. Molecule skipped
>

The RDKit doesn't do much with most information from the S Group section of
CTABs. SRU is a special case because it's a clear indication that the CTAB
contains information about something that the RDKit cannot currently
properly represent: a polymer. Rather than constructing a molecule which is
definitely wrong, the code generates an error. This is the usual RDKit
approach. What is at least somewhat different with this case is that there
is no way to disable the check.


> What I would prefer:
> 1) The molecule is read in and some kind of flag is set to signify that it
> is a polymer
> 2) the position of the brackets is saved in some structure a user can query
>
> A crude way to achieve "1" would be to just skip the "M  STY" and similar
> lines
> while setting "is_polymer" flag, not sure if this is the right approach
> though.
>

The problem with having the standard mol block parser set an isPolymer flag
by default is that code expecting "normal" molecules would always have to
check it in order to ensure that they aren't getting polymers.

I could imagine a couple solutions to this:
1) adding additional arguments to the mol file parser that allows calling
code to specify that they are willing to accept polymers and then using
some new data structure to return info about the polymer.
2) extending the applicability of the "strictParsing" flag (this already
exists) to disable the tests for S groups and either just ignore them or
return them as molecule properties.


> Commercial packages seems to be able to handle this - cactvs, chemaxon,
> accelrys draw,
> so there should be no technical reason RDKit cannot read such files.
>

Ignoring the S group is easy. :-)

-greg
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


[Rdkit-devel] polymers

2014-12-28 Thread Igor Filippov
I was wondering how complicated it would be to add
the ability to read polymers from molfiles. Right now I am getting
something like this:
Unhandled CTAB feature: S group SRU on line: 75. Molecule skipped

What I would prefer:
1) The molecule is read in and some kind of flag is set to signify that it
is a polymer
2) the position of the brackets is saved in some structure a user can query

A crude way to achieve "1" would be to just skip the "M  STY" and similar
lines
while setting "is_polymer" flag, not sure if this is the right approach
though.

Commercial packages seems to be able to handle this - cactvs, chemaxon,
accelrys draw,
so there should be no technical reason RDKit cannot read such files.

Happy New Year to everybody!
Igor
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel