On Feb 18, 2013, at 5:50 PM, [email protected] wrote:
> My issue2solve: read in a sdf.gz & simply extract the SD tags.

If you don't mind digging into the undocumented chemfp API
(which mean that it may change in the future), then you can
use the simple-minded SDF reader I wrote for it.

It's "simple-minded" because there are valid SD files which will
cause it to break. However, it should work for the large majority
of SD files that you'll come across.




from chemfp import sdf_reader
import re

# Match the tag and its data lines. Data lines include the newlines.
tag_pattern = re.compile(r"""
^>                # start of line (using re.MULTILINE mode)
\s*               # ignore whitespace
<([^>]+)>         # tag field (allow any character except '>')
.*\n              # skip junk to the end of line
(                 # zero or more data lines
  (?:[^\n].*\n)*  #   data lines start with something other than a newline then 
can have anything
)
\n                # data must end with a blank line
""", re.MULTILINE | re.VERBOSE )

# For each record, extract the title and (tag, data) fields
def read_tags(records):
    for record in records:
        title = record[:record.find("\n")]
        ct_end = record.find("\nM  END\n")
        assert ct_end != -1, "Missing 'M  END' for record %r" % (title,)
        yield title, tag_pattern.findall(record, ct_end), record
    

# This iterates over the records in the SD file. Ignore any errors.
reader = sdf_reader.open_sdf("/Users/dalke/databases/chembl_14.sdf.gz",
                             errors = "ignore")

# Extract the tag data for each record and print it out
for title, tag_fields, record in read_tags(reader):
    print "Title:", title
    for tag, data in tag_fields:
        if len(data) > 40:
            data_repr = repr(data[:38] + " ...")
        else:
            data_repr = repr(data)
        print tag, "=>", data_repr



                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to