Hi Paul,

On Friday, December 28, 2012, wrote:

> Dear RDKitters,
>
> I would like to write out sdf.gz files.
>
> When looking into the manual:
> http://www.rdkit.org/docs/api/rdkit.Chem.rdmolfiles.SDWriter-class.html
>
> "
> file_name = sys.argv[1]+".onlylargestfrag.sdf.gz"
> test_output = gzip.open(file_name,'w+')
> test_cpd_out = Chem.ForwardSDMolSupplier(test_output)
> "
> this does not work =>
> "
> AttributeError: 'ForwardSDMolSupplier' object has no attribute 'write'
> "
> I guess that the documentation is not correct in this respect.
>

Ooo, nice catch. Thanks! I will fix that.


>
> The solution from the mailing list
>
> http://www.mail-archive.com/[email protected]/msg02127.html^
> "
> file_name = sys.argv[1]+".onlylargestfrag.sdf.gz"
> test_output = gzip.open(file_name,'w+')
> test_cpd_out = Chem.SDWriter(test_output)
> # [...] inside a loop..
>         test_cpd_out.write(largest_frag)
> # outside the loop
> test_cpd_out.flush()
> test_output.flush()
> test_cpd_out=None
> test_output=None
> "
>
> gives a sdf.gz, but it seems to be corrupted when trying to gunzip on the
> command line:
> "gzip: f.sdf.gz.onlylargestfrag.sdf.gz: unexpected end of file"
>
>
> When unzipping the file via "gunzip < a.sdf.gz > blubb.sdf"
> =>
> Here comes the very last part of this file:
> "
> $$$$
> CHEMBL70380
>      RDKit          3D
>
>  30 33  0  0  0  0  0  0  0  0999 V2000
>    -0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     1.2320    0.8556    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>     2.5379    0.1177    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     2.4393   -1.3791    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>     3.9294    0.6779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     5.0376   -0.3330    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>     6.4672    0.1213    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     6.7885    1.5865    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>     7.575
> "
>
>
> Apparently, the flush didn't do a good job, since the outputted file is
> not complete.
> When moving the flush part into the loop (in which I loop over the
> compounds_to_be_outputted) does not work as well.
>
>
> Does anyone have a suggestion?
>

It seems like you have to call the .close() method, not flush().
Here's an example.

This is the equivalent of what you are doing:

In [36]: gz = gzip.open('out.sdf.gz','w+')

In [37]: w = Chem.SDWriter(gz)

In [38]: for m in ms: w.write(m)

In [39]: w.flush()

In [40]: gz.flush()


and it produces an error:

In [41]: !gzcat out.sdf.gz | tail

gzip: out.sdf.gz: unexpected end of file
>  <NUM_ROTATABLEBONDS>  (200)
2

>  <P1>  (200)
2.17

>  <SMILES>  (200)
CC(=O)Nc1cc(C)cc(C)c1

$$$$


But if I use close() it seems to work:

In [30]: gz = gzip.open('out.sdf.gz','w+')

In [31]: w = Chem.SDWriter(gz)

In [32]: for m in ms: w.write(m)

In [33]: w.close()

In [34]: gz.close()

In [35]: !gzcat out.sdf.gz | tail
>  <NUM_ROTATABLEBONDS>  (200)
2

>  <P1>  (200)
2.17

>  <SMILES>  (200)
CC(=O)Nc1cc(C)cc(C)c1

$$$$


 I will take a look at the underlying C++ code and see if I can figure out
some way to make at least some of that unnecessary.

-greg
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to