To solve the problem with the new zip_istream object created in  
OBConversion::Convert(istream* is, ostream* os), I have added the  
following lines at the end of this method

#ifdef HAVE_LIBZ
        if ( CheckedForGzip ){
                delete zIn;
                pInStream = is;
        }
#endif


Although I am not sure if the assignment pInStream = is; is really  
necessary, but when a standard stream is used, at the end of the call  
OBConversion::Convert(istream* is, ostream* os), the internal  
pInStream is also referring to is.



On Sep 16, 2010, at 2:10 PM, Gert Thijs wrote:

> Digging into this memory issue somewhat deeper, it seems to me that
> there is problem of a new zip stream object which is created but is
> never deleted.
>
> Within OBConversion::Convert(istream* is, ostream* os)  there is this
> part of code
>
> #ifdef HAVE_LIBZ
>     zlib_stream::zip_istream *zIn;
>
>     // only try to decode the gzip stream once
>     if (!CheckedForGzip) {
>       zIn = new zlib_stream::zip_istream(*pInStream);
>       if (zIn->is_gzip()) {
>         pInStream = zIn;
>         CheckedForGzip = true;
>       }
>       else
>         delete zIn;
>     }
> #endif
>
> As far as I understand, zIn is purely a object local to
> OBConversion::Convert(istream* is, ostream* os). When zIn is an actual
> gzipped stream it is not deleted when the function exits. So I guess
> adding some code to deleted this zip stream would help.
>
>
>
>
> On Sep 15, 2010, at 4:28 PM, Noel O'Boyle wrote:
>
>> How does it perform with an unzipped SD file?
>>
>> On 15 September 2010 15:19, Gert Thijs <gert.th...@silicos.com>  
>> wrote:
>>> Dear all,
>>>
>>> I have encountered a memory issue when using OBConversion in a large
>>> batch run. What I am trying to do is to process a large set of
>>> gzipped
>>> SD files and transform them into canonical smiles and write these
>>> smiles string to std::cout. The file names of the  are generated  on
>>> the fly based on some information about the directory structure.
>>>
>>> Below I have copied the main code used in the test script in which I
>>> encountered a serious memory error.
>>>
>>> OpenBabel::OBConversion conv;
>>> conv.SetInFormat("sdf");
>>> conv.SetOutFormat("can");
>>>
>>> for ( unsigned int i=1; i<100000; ++i ){
>>>       // get file name of sd file from i and store it in d1
>>>       // d1 is then of the form "/here/is/my/sdf/dir/mol.sdf.gz"
>>>       int2dir(i,d1);
>>>
>>>       std::ifstream ifs(d1.c_str());
>>>
>>>       conv.Convert(&ifs,&std::cout);
>>>
>>>       ifs.close();
>>> }
>>>
>>>
>>> If I run this code, I can see that it gradually eats all the RAM
>>> until
>>> the program crashes with a memory allocation error. I have done
>>> several tests to check where the problem could come from. As far  
>>> as I
>>> understand it, it seems that OBConversion is the main source of the
>>> problem. For instance when I open the stream, read one line from it
>>> and print this line (and do not use OBConversion), the same program
>>> can handle easily more than 1,000,000 files without any hassle.
>>>
>>> Furthermore, when I use the same code but now I recreate the
>>> OBConversion object each time within the for loop the exactly the
>>> same
>>> kind of behavior is observed.
>>> for ( unsigned int i=1; i<100000; ++i ){
>>>       // get file name of sd file from i
>>>       // d1 = /my/dir/mol.sdf.gz
>>>       int2dir(i,d1);
>>>
>>>       std::ifstream ifs(d1.c_str());
>>>
>>>       OpenBabel::OBConversion conv;
>>>       conv.SetInFormat("sdf");
>>>       conv.SetOutFormat("can");
>>>       conv.Convert(&ifs,&std::cout);
>>>
>>>       ifs.close();
>>> }
>>>
>>> So my guess is that there is something strange going on within
>>> OBConversion. But as I am not really familiar with the inner  
>>> workings
>>> of OBConversion, I am not sure where to start looking.
>>>
>>>
>>> Any thoughts on this one.
>>>
>>> I am working on Mac OS X 10.5.8 using g++ 4.0.1
>>>
>>> many thanks,
>>> Gert
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Start uncovering the many advantages of virtual appliances
>>> and start using them to simplify application deployment and
>>> accelerate your shift to cloud computing.
>>> http://p.sf.net/sfu/novell-sfdev2dev
>>> _______________________________________________
>>> OpenBabel-Devel mailing list
>>> OpenBabel-Devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>>
>
> Gert Thijs
>
> Director Chemoinformatics
> Silicos NV.
> Wetenschapspark 7
> B-3590 Diepenbeek
> Belgium
>
> Tel:   +32 11 350703
> Fax:  +32 11 220525
>
> http://www.silicos.com/
>
>
>
>
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> OpenBabel-Devel mailing list
> OpenBabel-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Gert Thijs

Director Chemoinformatics
Silicos NV.
Wetenschapspark 7
B-3590 Diepenbeek
Belgium

Tel:   +32 11 350703
Fax:  +32 11 220525

http://www.silicos.com/




------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to