Re: [Rdkit-devel] polymers
Thank you, Greg! Perhaps I should have posted this on rdkit-discuss, there seems to be more traffic there usually and the topic would get more notice. Igor On Tue, Dec 30, 2014 at 12:55 PM, Greg Landrum wrote: > > > On Mon, Dec 29, 2014 at 12:42 PM, Igor Filippov > wrote: > >> Greg, >> >> Thanks for the fast reply, as always! >> >> > I could imagine a couple solutions to this: >> > 1) adding additional arguments to the mol file parser that allows >> calling code to specify that they are willing to accept polymers and then >> using some new data structure to return info about the polymer. >> > 2) extending the applicability of the "strictParsing" flag (this >> already exists) to disable the tests for S groups and either just ignore >> them or return them as molecule properties. >> >> >>> I think I would prefer 1) personally but I can live with 2) if that's >> what the community chooses. >> How difficult it would be to implement? >> > > 1) is probably not a huge amount of work, once we figure out the > appropriate data structure for the polymer info, but it's certainly more > than 2), which probably only takes an hour or so. > > Given that the polymer problem is a somewhat larger one, and it would be > nice to solve "right", I'd be inclined to go with 2) as an interim solution > and start a conversation around the polymer representation if there's real > interest there. > > -greg > > -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
Re: [Rdkit-devel] polymers
On Mon, Dec 29, 2014 at 12:42 PM, Igor Filippov wrote: > Greg, > > Thanks for the fast reply, as always! > > > I could imagine a couple solutions to this: > > 1) adding additional arguments to the mol file parser that allows > calling code to specify that they are willing to accept polymers and then > using some new data structure to return info about the polymer. > > 2) extending the applicability of the "strictParsing" flag (this already > exists) to disable the tests for S groups and either just ignore them or > return them as molecule properties. > > >> I think I would prefer 1) personally but I can live with 2) if that's > what the community chooses. > How difficult it would be to implement? > 1) is probably not a huge amount of work, once we figure out the appropriate data structure for the polymer info, but it's certainly more than 2), which probably only takes an hour or so. Given that the polymer problem is a somewhat larger one, and it would be nice to solve "right", I'd be inclined to go with 2) as an interim solution and start a conversation around the polymer representation if there's real interest there. -greg -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
Re: [Rdkit-devel] polymers
Greg, Thanks for the fast reply, as always! > I could imagine a couple solutions to this: > 1) adding additional arguments to the mol file parser that allows calling code to specify that they are willing to accept polymers and then using some new data structure to return info about the polymer. > 2) extending the applicability of the "strictParsing" flag (this already exists) to disable the tests for S groups and either just ignore them or return them as molecule properties. I think I would prefer 1) personally but I can live with 2) if that's what the community chooses. How difficult it would be to implement? Igor -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
Re: [Rdkit-devel] polymers
Hi Igor, On Mon, Dec 29, 2014 at 3:43 AM, Igor Filippov wrote: > I was wondering how complicated it would be to add > the ability to read polymers from molfiles. Right now I am getting > something like this: > Unhandled CTAB feature: S group SRU on line: 75. Molecule skipped > The RDKit doesn't do much with most information from the S Group section of CTABs. SRU is a special case because it's a clear indication that the CTAB contains information about something that the RDKit cannot currently properly represent: a polymer. Rather than constructing a molecule which is definitely wrong, the code generates an error. This is the usual RDKit approach. What is at least somewhat different with this case is that there is no way to disable the check. > What I would prefer: > 1) The molecule is read in and some kind of flag is set to signify that it > is a polymer > 2) the position of the brackets is saved in some structure a user can query > > A crude way to achieve "1" would be to just skip the "M STY" and similar > lines > while setting "is_polymer" flag, not sure if this is the right approach > though. > The problem with having the standard mol block parser set an isPolymer flag by default is that code expecting "normal" molecules would always have to check it in order to ensure that they aren't getting polymers. I could imagine a couple solutions to this: 1) adding additional arguments to the mol file parser that allows calling code to specify that they are willing to accept polymers and then using some new data structure to return info about the polymer. 2) extending the applicability of the "strictParsing" flag (this already exists) to disable the tests for S groups and either just ignore them or return them as molecule properties. > Commercial packages seems to be able to handle this - cactvs, chemaxon, > accelrys draw, > so there should be no technical reason RDKit cannot read such files. > Ignoring the S group is easy. :-) -greg -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
[Rdkit-devel] polymers
I was wondering how complicated it would be to add the ability to read polymers from molfiles. Right now I am getting something like this: Unhandled CTAB feature: S group SRU on line: 75. Molecule skipped What I would prefer: 1) The molecule is read in and some kind of flag is set to signify that it is a polymer 2) the position of the brackets is saved in some structure a user can query A crude way to achieve "1" would be to just skip the "M STY" and similar lines while setting "is_polymer" flag, not sure if this is the right approach though. Commercial packages seems to be able to handle this - cactvs, chemaxon, accelrys draw, so there should be no technical reason RDKit cannot read such files. Happy New Year to everybody! Igor -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel