Re: [Rdkit-discuss] SDF tags and -
On 2015-04-29 23:08, Greg Landrum wrote: Here are my thoughts on this: The RDKit is usually strict while parsing molecules from SDF, SMILES, or other formats. My point was that given ''' my_property2 1234 my_property3 ''' a lexer shouldn't have a problem recognizing the 2 tags. A leninent parser would return stuff in between as value: 1234\n\n There are exceptions to this: the RDKit ignores the limit on line length while reading SDFs: there's no chance of confusion here, so I believe it's safe to do so. Similarly, a lenient parser could ignore the line length and value length limits. I still need to put some thought into patching the SDWriter so that it can recognize things like consecutive line endings in property values. The big question is what it should do when it encounters such a case. Is that an error? Should it just write the output up to the blank line? A conservative writer should never write out 1234\n\n. Squash the multiple newlines. And/or give it a strict flag that makes it error out instead. I'm sure Andrew's seen a lot of badly broken SDFs. It doesn't mean you can't handle the ones you can unambiguously parse. Dimitri -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
Ahh ok… Interesting way to format a file! Got to love ChemAxon... Best, Nick Nicholas C. Firth | PhD Student | Cancer Therapeutics The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG T 020 8722 4033 | E nicholas.fi...@icr.ac.ukmailto:nicholas.fi...@icr.ac.uk | W www.icr.ac.ukhttp://www.icr.ac.uk/ | Twitter @ICRnewshttps://twitter.com/ICRnews Facebook www.facebook.com/theinstituteofcancerresearchhttp://www.facebook.com/theinstituteofcancerresearch Making the discoveries that defeat cancer [cid:image001.gif@01CE053D.51D3C4E0] On 29 Apr 2015, at 12:23, Paolo Tosco paolo.to...@unito.itmailto:paolo.to...@unito.it wrote: Hi Nick, newlines in data properties are fine, but they should not include blank lines (i.e., multiple newlines). For example, in: my_property1 1 2 3 4 my_property2 1234 my_property3 5678 my_property1 will be truncated to just 1. Based on the specifications, if you want to include a blank line, it should actually be either a or a \t, rather than being completely blank. Cheers, Paolo On 04/29/15 12:16, Nicholas Firth wrote: I use SD files with new lines in the properties quite frequently (inherited from Pipeline Pilot's merge function) and I've never had a problem reading them. I've attached an SD file that works fine for me. In [2]: suppl = Chem.SDMolSupplier('/Volumes/nfirth/tempf.sdf') In [3]: m = suppl[0] In [4]: t = m.GetProp('genNum') In [5]: print t 1 2 3 4 In [6]: print t.split('\n') ['1', '2', '3', '4'] So I guess the problem is in the writer? Best, Nick The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.netmailto:Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.-- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
Dear all, Indeed, as Riccardo mentions, according to the specifications in CTfile.pdf a property should be truncated after the first blank line. This is also what other SDF parsers I have tried actually do. What I noticed is that other SDF parsers are tolerant of spurious lines not starting with a , either blank or containing characters. Currently the RDKit isn't on read, while it is on write. I think the easiest solution is to make the SDF parser more tolerant in such cases, printing a warning rather than throwing an exception. I have just submitted a pull request about it - feel free to ignore it if you do not agree with me! Cheers, Paolo On 04/29/15 11:27, Tuomo Kalliokoski wrote: Hello Riccardo, That sounds very reasonable solution to the issue. [I replied to rdkit-discuss to bring this thread on the list back again] Best regards, Tuomo From: riccardo.viane...@gmail.com Date: Wed, 29 Apr 2015 12:08:48 +0200 Subject: Re: [Rdkit-discuss] SDF tags and - To: tkall...@live.com Hi Tuomo, yes, I agree the behavior seems a bit inconsistent. I suppose that if the correctness of the parser is confirmed, then a change could be suggested for the writer, consisting in raising an error if blank lines are present inside the data item. [but once again, I didn't notice the defailt reply-to settings of rdkit-discuss and accidentally brought the thread off-list, sorry.] Regards, Riccardo On Wed, Apr 29, 2015 at 11:46 AM, Tuomo Kalliokoski tkall...@live.com mailto:tkall...@live.com wrote: Hello Riccardo, Thanks for the swift reply! Indeed, it is the extra line-feed, not the -. It was just around the same line where I had the issue, so it got me confused. I suppose the current functionality of RDKit, irrespective to the SDF file format specifications, is a bit odd: SDWriter produces file that SDMolSupplier can't handle. Best regards, Tuomo From: riccardo.viane...@gmail.com mailto:riccardo.viane...@gmail.com Date: Wed, 29 Apr 2015 11:33:14 +0200 Subject: Re: [Rdkit-discuss] SDF tags and - To: tkall...@live.com mailto:tkall...@live.com Hi Tuomo, On Wed, Apr 29, 2015 at 10:47 AM, Tuomo Kalliokoski tkall...@live.com mailto:tkall...@live.com wrote: I have got a bunch of SDF-files with molecules and some long descriptions in SDF-tags on them that include stuff like - inside. These files have been produced by ChemAxon's software and are handled fine by their software. Such files can be written out also from RDKit 2014_09_02, but they fail when you try to read them in. I suspect the parse error could be independent from the -, but due to the blank line (\n\n) that appears inside the TESTFIELD data: mol.SetProp(TESTFIELD,This should not work - Let's see\n\nI guess this is not visible\n) and that is interpreted as the data item terminator. Iirc this interpretation is compliant with the specifications for the SDF file format, but I could be mistaken. Best regards, Riccardo -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
Riccardo Vianello: I suppose that if the correctness of the parser is confirmed, then a change could be suggested for the writer, consisting in raising an error if blank lines are present inside the data item. Yes, the SD tag data is not a general purpose data field. It's not possible, for example, to embed the contents of an SD file as a value. According to the spec: A [Data] value may extend over multiple lines containing up to 200 characters each. A blank line terminates each data item. The failure cases are: a data value that starts with a newline, or ends with a newline, or contains two or more successive newline characters. Some file readers, like Python's universal mode, will normalize \r\n to \n on input, so \r\n\r\n and variants are also problematic. However, experience shows that there are many incorrectly written SD parsers. Some think that the data extends until the next line that starts with '' or line that is ''. For example, one organization decided to use '$', '$$', '$$$', and '' as rough estimates of the cost of a compound. It might have looked like this: M END cost MW 123.45 ... Some non-compliant parsers interpret the '' as the end of the compound record, rather than the data value it's supposed to be. In practice, enough SD parsers are broken in this regards that the best practice for a workflow is to avoid those byte sequences if at all possible. A common solution to store free-form data is to use base64 encoding. data = \nHello\n\n and\ngoodbye*10 print(data.encode(base64)) CkhlbGxvCgo+IDxhbmQ+Cmdvb2RieWUKSGVsbG8KCj4gPGFuZD4KZ29vZGJ5ZQpIZWxsbwoKPiA8 YW5kPgpnb29kYnllCkhlbGxvCgo+IDxhbmQ+Cmdvb2RieWUKSGVsbG8KCj4gPGFuZD4KZ29vZGJ5 ZQpIZWxsbwoKPiA8YW5kPgpnb29kYnllCkhlbGxvCgo+IDxhbmQ+Cmdvb2RieWUKSGVsbG8KCj4g PGFuZD4KZ29vZGJ5ZQpIZWxsbwoKPiA8YW5kPgpnb29kYnllCkhlbGxvCgo+IDxhbmQ+Cmdvb2Ri eWU= This also ensures that the line length never exceeds 200 characters. (I have no experience on what sort of errors might occur should the line length exceed 200 characters.) The SD format has no way to indicate the encoding, so the information about how tag data is encoded must be passed through other means. This is unfortunate. And while I'm here, it's also a bad idea to have a \0 (NUL) in the data, as some tools use C's string functions on the assumption that a \0 does not exist. RDKit will write the \0: from rdkit import Chem mol = Chem.MolFromSmiles(C) mol.SetProp(abc, x\0z) mol.GetProp(abc) 'x\x00z' writer = Chem.SDWriter(tmp.sdf) writer.write(mol) writer.close() content = open(tmp.sdf).read() \0 in content True On Apr 29, 2015, at 12:27 PM, Tuomo Kalliokoski wrote: I suppose the current functionality of RDKit, irrespective to the SDF file format specifications, is a bit odd: SDWriter produces file that SDMolSupplier can't handle. All of the toolkits I've used have the same behavior. They trust that the user of the API knows to not pass arbitrary data as the value. ChemAxon's toolkit, as you saw, can produce invalid SD files. You might see what happens if you add x\n\n data2\nvalue as the value; I suspect you'll end up with a new data2 tag. I know that Open Babel and OEChem will also forward the value unchanged. I can see the argument that RDKit should check for \n\n in the data. What should it do? It's built on the GetProp/SetProp mechanism, which allows arbitrary string data, and it's reasonable to SetProp() a value containing a \n\n for purposes other than writing to an SD file, so it has to be done in the reader or writer layer: Here are some possibilities for changing the writer: - stop writing to the file, with an error - skip records which contain tags with forbidden values - skip tags which contains forbidden values - convert multiple newlines into one (including the edge cases) - also enforce the 200 character restriction - also enforce a check for well-known legal but ill-advised character sequences like a line starting with , or starting with , or containing a \0. Paolo's suggestion is to change the reader to be more lenient: -throw FileParseException(Problems encountered parsing data fields); +BOOST_LOG(rdWarningLog) + Ignoring spurious lines encountered parsing data fields + std::endl; While it's true that this would be lenient, it wouldn't handle Tuomo's problem. Tuomo has the following: data1 X Y data2 Whatever Tuomo wants data1 to become X\n\n\nY. Paolo's patch will set data1 to X, and generate a warning for the spurious Y but otherwise ignore it. I do not believe this is any better for Tuomo that what RDKit does now -- actually, it's worse because it's data loss and people tend to ignore warnings. I think the problem is, what should the writer do if given data which cannot be represented as an SD data value? Suppose that one of Tuomo's
Re: [Rdkit-discuss] SDF tags and -
Hello Riccardo, That sounds very reasonable solution to the issue. [I replied to rdkit-discuss to bring this thread on the list back again] Best regards, Tuomo From: riccardo.viane...@gmail.com Date: Wed, 29 Apr 2015 12:08:48 +0200 Subject: Re: [Rdkit-discuss] SDF tags and - To: tkall...@live.com Hi Tuomo, yes, I agree the behavior seems a bit inconsistent. I suppose that if the correctness of the parser is confirmed, then a change could be suggested for the writer, consisting in raising an error if blank lines are present inside the data item. [but once again, I didn't notice the defailt reply-to settings of rdkit-discuss and accidentally brought the thread off-list, sorry.] Regards, Riccardo On Wed, Apr 29, 2015 at 11:46 AM, Tuomo Kalliokoski tkall...@live.com wrote: Hello Riccardo, Thanks for the swift reply! Indeed, it is the extra line-feed, not the -. It was just around the same line where I had the issue, so it got me confused. I suppose the current functionality of RDKit, irrespective to the SDF file format specifications, is a bit odd: SDWriter produces file that SDMolSupplier can't handle. Best regards, Tuomo From: riccardo.viane...@gmail.com Date: Wed, 29 Apr 2015 11:33:14 +0200 Subject: Re: [Rdkit-discuss] SDF tags and - To: tkall...@live.com Hi Tuomo, On Wed, Apr 29, 2015 at 10:47 AM, Tuomo Kalliokoski tkall...@live.com wrote: I have got a bunch of SDF-files with molecules and some long descriptions in SDF-tags on them that include stuff like - inside. These files have been produced by ChemAxon's software and are handled fine by their software. Such files can be written out also from RDKit 2014_09_02, but they fail when you try to read them in. I suspect the parse error could be independent from the -, but due to the blank line (\n\n) that appears inside the TESTFIELD data: mol.SetProp(TESTFIELD,This should not work - Let's see\n\nI guess this is not visible\n) and that is interpreted as the data item terminator. Iirc this interpretation is compliant with the specifications for the SDF file format, but I could be mistaken. Best regards, Riccardo -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
On 04/29/2015 01:47 PM, Andrew Dalke wrote: Postel's Robustness principle is a mistake. See RFC 3117 for elaboration, ... Or from http://cacm.acm.org/magazines/2011/8/114933-the-robustness-principle-reconsidered/fulltext : There is a difference between ACM members writing network protocols and domain people writing junk. XML in this example Or http://www.tbray.org/ongoing/When/200x/2004/01/11/PostelPilgrim for an example with XML. is written by a ball street wanker. Much of xml is. Similarly, MOL/SDF is written by chemists. On the pro-ish side, which recommends a patch to the law, see http://langsec.org/papers/postel-patch.pdf . I've spent enough time looking for definitive documentation on any number of file formats to know: domain people don't do that. With one or two exceptions to reinforce the rule. Again, c.f. to computer scientists: every RFC starts with the definitions of may, must, etc. - if a record contains forbidden values, stop writing to the file, with an error. Yes, I agree with this. What constitutes forbidden? Simply put, the ones that lexer will match as not values. If there is an error, does the writer generate a partial record, My interpretation of conservative is wipe out the file then crash and burn. With a useful error message. For what it's worth, those values are acceptable. The following is legal, according to the specification: ... If you define your lexical tokens properly, no problem. The problem is when lexer can't decide what's what. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
On 04/29/2015 07:54 AM, Andrew Dalke wrote: I don't have a good solution. Were it me, I would have the writer fail should any unsupported value be present in the output, including those which are allowed by the SD specification but will cause problems in practice, like embedded \0 and leading . Based on be liberal in what you accept and conservative in what you produce, the writer should - convert multiple newlines into one (including the edge cases) - also enforce the 200 character restriction - also enforce a check for well-known legal but ill-advised character sequences - if a record contains forbidden values, stop writing to the file, with an error. With the reader it looks like you can't help it if someone makes a value like 55 or . With that caveat, you should be able to find tags and read everything in between as a value. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
On 04/29/2015 05:32 PM, Andrew Dalke wrote: On Apr 29, 2015, at 9:19 PM, Dimitri Maziuk wrote: There is a difference between ACM members writing network protocols and domain people writing junk. I think that you are saying that the MDL connection table file formats are junk. I do not disagree. But it's something we have to deal with so my personal views matter little. The MDL file formats are definitely not network protocols, but as you brought up Postel's Robustness Principle I thought you were suggesting that the principle applies more broadly than just network protocols. And for what it's worth, I used to be an ACM member. Mee too ;) No, what I was suggesting is that something as well-defined as an RFC'ed protocol should not need Postel's principle in the first place. No, it should be applied to the stuff we have to deal with: that way we'll generate fewer bad files and the users will be happier when it doesn't crash on whatever stuff they have to deal with. Or at least not on every input file. If the output is to a stream than there is no file to wipe. Yeah, there's that I suppose... P.S. XML in this example ... is written by a ball street wanker. This slur is both gratuitous and wrong. You misunderstood: I was just rephrasing Tim's Then it is clearly not OK to guess that someone just forgot the /amount and /trade but didn’t also drop a trailing zero or two. A programmer in a position of responsibility who did this would be spanked and maybe fired. A manager who mandated or authorized such an implementation would be spanked, maybe fired, and maybe subject to legal action. The problem there argument, though, is that XML is well defined and Anyone who can’t make a syndication feed that’s well-formed XML is an incompetent fool (ibid). Blaming Postel for incompetence of fools is like blaming Jesus for Salem witch trials. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
Actually, you want to send your loving thoughts to MDL (now: Biovia). They defined the SDF format :-). Cheers -- Jan On 2015-04-29 13:26, Nicholas Firth wrote: Ahh ok… Interesting way to format a file! Got to love ChemAxon... Best, Nick *Nicholas C. Firth*| PhD Student | Cancer Therapeutics The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG *T* 020 8722 4033 |*E*nicholas.fi...@icr.ac.uk mailto:nicholas.fi...@icr.ac.uk|*W*www.icr.ac.uk http://www.icr.ac.uk/|*Twitter*@ICRnews https://twitter.com/ICRnews *Facebook*www.facebook.com/theinstituteofcancerresearch http://www.facebook.com/theinstituteofcancerresearch *Making the discoveries that defeat cancer* On 29 Apr 2015, at 12:23, Paolo Tosco paolo.to...@unito.it mailto:paolo.to...@unito.it wrote: Hi Nick, newlines in data properties are fine, but they should not include blank lines (i.e., multiple newlines). For example, in: my_property1 1 2 3 4 my_property2 1234 my_property3 5678 my_property1 will be truncated to just 1. Based on the specifications, if you want to include a blank line, it should actually be either a or a \t, rather than being completely blank. Cheers, Paolo On 04/29/15 12:16, Nicholas Firth wrote: I use SD files with new lines in the properties quite frequently (inherited from Pipeline Pilot's merge function) and I've never had a problem reading them. I've attached an SD file that works fine for me. In [2]: suppl = Chem.SDMolSupplier('/Volumes/nfirth/tempf.sdf') In [3]: m = suppl[0] In [4]: t = m.GetProp('genNum') In [5]: print t 1 2 3 4 In [6]: print t.split('\n') ['1', '2', '3', '4'] So I guess the problem is in the writer? Best, Nick -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF tags and -
Here are my thoughts on this: The RDKit is usually strict while parsing molecules from SDF, SMILES, or other formats. This is done for one simple reason: it tends to be difficult/impossible to recover from syntax errors in input in a way that doesn't result in a significant chance of producing a result that is different from what the original writer intended. In this case, as Andrew pointed out elsewhere on the thread, if Paolo's suggested patch is applied, the molecule will be loaded with the TESTFIELD property present, but different from what it was in the input. Since people ignore warning messages (again quoting Andrew) this difference is not going to be noticed most of the time. There are exceptions to this: the RDKit ignores the limit on line length while reading SDFs: there's no chance of confusion here, so I believe it's safe to do so. I'm planning on accepting Paolo's patch, but after it has been modified to only accept the extra blank lines if the SDMolSupplier is not in strict mode. This will allow these files to be parsed if the client/user indicates that they are willing to take the risk of incorrect data. I still need to put some thought into patching the SDWriter so that it can recognize things like consecutive line endings in property values. The big question is what it should do when it encounters such a case. Is that an error? Should it just write the output up to the blank line? -greg On Wed, Apr 29, 2015 at 10:47 AM, Tuomo Kalliokoski tkall...@live.com wrote: Hello all, I have got a bunch of SDF-files with molecules and some long descriptions in SDF-tags on them that include stuff like - inside. These files have been produced by ChemAxon's software and are handled fine by their software. Such files can be written out also from RDKit 2014_09_02, but they fail when you try to read them in. Here is an example code: 1. Generate t.sdf in Python: from rdkit import Chem mol = Chem.MolFromSmiles(CC) mol.SetProp(TESTFIELD,This should not work - Let's see\n\nI guess this is not visible\n) mol.SetProp(TESTFIELD2,Beep) mol2 = Chem.MolFromSmiles(CCC) mol2.SetProp(TESTFIELD,Added another molecule - Here the same thing\n\nI guess this is not visible\n) mol2.SetProp(TESTFIELD2,Beep) w = Chem.SDWriter(t.sdf) w.write(mol) w.write(mol2) w.close() 2. Trying to read the file in Python fails: from rdkit import Chem s = Chem.SDMolSupplier(t.sdf) for mol in s: print mol.GetProp(TESTFIELD) // The TESTFIELD text is cropped and TESTFIELD2 is skipped completely // so the line below will fail: // print mol.GetProp(TESTFIELD2) [10:29:43] ERROR: Problems encountered parsing data fields [10:29:43] ERROR: moving to the begining of the next molecule I guess in this case I will do some pre-processing for the files before reading them in SDMolSupplier, but I just wanted to point out this special case. Apologies if this was old news, but at least I was unable to find it after quick look. Best regards, Tuomo -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss