To solve the problem with the new zip_istream object created in OBConversion::Convert(istream* is, ostream* os), I have added the following lines at the end of this method
#ifdef HAVE_LIBZ if ( CheckedForGzip ){ delete zIn; pInStream = is; } #endif Although I am not sure if the assignment pInStream = is; is really necessary, but when a standard stream is used, at the end of the call OBConversion::Convert(istream* is, ostream* os), the internal pInStream is also referring to is. On Sep 16, 2010, at 2:10 PM, Gert Thijs wrote: > Digging into this memory issue somewhat deeper, it seems to me that > there is problem of a new zip stream object which is created but is > never deleted. > > Within OBConversion::Convert(istream* is, ostream* os) there is this > part of code > > #ifdef HAVE_LIBZ > zlib_stream::zip_istream *zIn; > > // only try to decode the gzip stream once > if (!CheckedForGzip) { > zIn = new zlib_stream::zip_istream(*pInStream); > if (zIn->is_gzip()) { > pInStream = zIn; > CheckedForGzip = true; > } > else > delete zIn; > } > #endif > > As far as I understand, zIn is purely a object local to > OBConversion::Convert(istream* is, ostream* os). When zIn is an actual > gzipped stream it is not deleted when the function exits. So I guess > adding some code to deleted this zip stream would help. > > > > > On Sep 15, 2010, at 4:28 PM, Noel O'Boyle wrote: > >> How does it perform with an unzipped SD file? >> >> On 15 September 2010 15:19, Gert Thijs <gert.th...@silicos.com> >> wrote: >>> Dear all, >>> >>> I have encountered a memory issue when using OBConversion in a large >>> batch run. What I am trying to do is to process a large set of >>> gzipped >>> SD files and transform them into canonical smiles and write these >>> smiles string to std::cout. The file names of the are generated on >>> the fly based on some information about the directory structure. >>> >>> Below I have copied the main code used in the test script in which I >>> encountered a serious memory error. >>> >>> OpenBabel::OBConversion conv; >>> conv.SetInFormat("sdf"); >>> conv.SetOutFormat("can"); >>> >>> for ( unsigned int i=1; i<100000; ++i ){ >>> // get file name of sd file from i and store it in d1 >>> // d1 is then of the form "/here/is/my/sdf/dir/mol.sdf.gz" >>> int2dir(i,d1); >>> >>> std::ifstream ifs(d1.c_str()); >>> >>> conv.Convert(&ifs,&std::cout); >>> >>> ifs.close(); >>> } >>> >>> >>> If I run this code, I can see that it gradually eats all the RAM >>> until >>> the program crashes with a memory allocation error. I have done >>> several tests to check where the problem could come from. As far >>> as I >>> understand it, it seems that OBConversion is the main source of the >>> problem. For instance when I open the stream, read one line from it >>> and print this line (and do not use OBConversion), the same program >>> can handle easily more than 1,000,000 files without any hassle. >>> >>> Furthermore, when I use the same code but now I recreate the >>> OBConversion object each time within the for loop the exactly the >>> same >>> kind of behavior is observed. >>> for ( unsigned int i=1; i<100000; ++i ){ >>> // get file name of sd file from i >>> // d1 = /my/dir/mol.sdf.gz >>> int2dir(i,d1); >>> >>> std::ifstream ifs(d1.c_str()); >>> >>> OpenBabel::OBConversion conv; >>> conv.SetInFormat("sdf"); >>> conv.SetOutFormat("can"); >>> conv.Convert(&ifs,&std::cout); >>> >>> ifs.close(); >>> } >>> >>> So my guess is that there is something strange going on within >>> OBConversion. But as I am not really familiar with the inner >>> workings >>> of OBConversion, I am not sure where to start looking. >>> >>> >>> Any thoughts on this one. >>> >>> I am working on Mac OS X 10.5.8 using g++ 4.0.1 >>> >>> many thanks, >>> Gert >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Start uncovering the many advantages of virtual appliances >>> and start using them to simplify application deployment and >>> accelerate your shift to cloud computing. >>> http://p.sf.net/sfu/novell-sfdev2dev >>> _______________________________________________ >>> OpenBabel-Devel mailing list >>> OpenBabel-Devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel >>> > > Gert Thijs > > Director Chemoinformatics > Silicos NV. > Wetenschapspark 7 > B-3590 Diepenbeek > Belgium > > Tel: +32 11 350703 > Fax: +32 11 220525 > > http://www.silicos.com/ > > > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > OpenBabel-Devel mailing list > OpenBabel-Devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openbabel-devel Gert Thijs Director Chemoinformatics Silicos NV. Wetenschapspark 7 B-3590 Diepenbeek Belgium Tel: +32 11 350703 Fax: +32 11 220525 http://www.silicos.com/ ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel