Hi,

I've been doing some tests using serialize() to a raw vector:

        df <- data.frame(runif(50e6,1,10))
        ser <- serialize(df,NULL)

In this example the data frame and the serialized raw vector occupy ~400MB 
each, for a total of ~800M. However the memory peak during serialize() is 
~1.2GB:

        $ cat /proc/15155/status |grep Vm
        ...
        VmHWM:   1207792 kB
        VmRSS:    817272 kB

We work with very large data frames and in many cases this is killing R with an 
"out of memory" error.

This is the relevant code in R 3.1.3 in src/main/serialize.c:2494

        InitMemOutPStream(&out, &mbs, type, version, hook, fun);
        R_Serialize(object, &out);
        val =  CloseMemOutPStream(&out);

The serialized object is being stored in a buffer pointed by out.data. Then in 
CloseMemOutPStream() R copies the whole buffer to a newly allocated SEXP object 
(the raw vector that stores the final result):

        PROTECT(val = allocVector(RAWSXP, mb->count));
        memcpy(RAW(val), mb->buf, mb->count);
        free_mem_buffer(mb);
        UNPROTECT(1);

Before calling free_mem_buffer() the process is using ~1.2GB (the original data 
frame + the serialization buffer + final serialized raw vector). 

One possible solution would be to allocate a buffer for the final raw vector 
and store the serialization result directly into that buffer. This would bring 
the memory peak down from ~1.2GB to ~800MB.

Thanks,
-Jorge

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to