Dear Greg,
thanks a lot for your help!
Cheers,
Paul
>
> The solution from the mailing list
>
http://www.mail-archive.com/[email protected]/msg02127.html^
> "
> file_name = sys.argv[1]+".onlylargestfrag.sdf.gz"
> test_output = gzip.open(file_name,'w+')
> test_cpd_out = Chem.SDWriter(test_output)
> # [...] inside a loop..
> test_cpd_out.write(largest_frag)
> # outside the loop
> test_cpd_out.flush()
> test_output.flush()
> test_cpd_out=None
> test_output=None
> "
>
> gives a sdf.gz, but it seems to be corrupted when trying to gunzip
> on the command line:
> "gzip: f.sdf.gz.onlylargestfrag.sdf.gz: unexpected end of file"
>
>
> When unzipping the file via "gunzip < a.sdf.gz > blubb.sdf"
> =>
> Here comes the very last part of this file:
> "
> $$$$
> CHEMBL70380
> RDKit 3D
>
> 30 33 0 0 0 0 0 0 0 0999 V2000
> -0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 1.2320 0.8556 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 2.5379 0.1177 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 2.4393 -1.3791 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
> 3.9294 0.6779 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 5.0376 -0.3330 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 6.4672 0.1213 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 6.7885 1.5865 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
> 7.575
> "
>
>
> Apparently, the flush didn't do a good job, since the outputted file
> is not complete.
> When moving the flush part into the loop (in which I loop over the
> compounds_to_be_outputted) does not work as well.
>
>
> Does anyone have a suggestion?
>
> It seems like you have to call the .close() method, not flush().
> Here's an example.
>
> This is the equivalent of what you are doing:
>
> In [36]: gz = gzip.open('out.sdf.gz','w+')
>
> In [37]: w = Chem.SDWriter(gz)
>
> In [38]: for m in ms: w.write(m)
>
> In [39]: w.flush()
>
> In [40]: gz.flush()
>
> and it produces an error:
>
> In [41]: !gzcat out.sdf.gz | tail
>
> gzip: out.sdf.gz: unexpected end of file
> > <NUM_ROTATABLEBONDS> (200)
> 2
>
> > <P1> (200)
> 2.17
>
> > <SMILES> (200)
> CC(=O)Nc1cc(C)cc(C)c1
>
> $$$$
>
> But if I use close() it seems to work:
>
> In [30]: gz = gzip.open('out.sdf.gz','w+')
>
> In [31]: w = Chem.SDWriter(gz)
>
> In [32]: for m in ms: w.write(m)
>
> In [33]: w.close()
>
> In [34]: gz.close()
>
> In [35]: !gzcat out.sdf.gz | tail
> > <NUM_ROTATABLEBONDS> (200)
> 2
>
> > <P1> (200)
> 2.17
>
> > <SMILES> (200)
> CC(=O)Nc1cc(C)cc(C)c1
>
> $$$$
>
> I will take a look at the underlying C++ code and see if I can
> figure out some way to make at least some of that unnecessary.
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss