Jorge, what you propose is not possible because the size of the output is unknown, that's why a dynamically growing PStream buffer is used - it cannot be pre-allocated.
Cheers, Simon > On Mar 17, 2015, at 1:37 PM, Martinez de Salinas, Jorge > <jorge.martinez-de-sali...@hp.com> wrote: > > Hi, > > I've been doing some tests using serialize() to a raw vector: > > df <- data.frame(runif(50e6,1,10)) > ser <- serialize(df,NULL) > > In this example the data frame and the serialized raw vector occupy ~400MB > each, for a total of ~800M. However the memory peak during serialize() is > ~1.2GB: > > $ cat /proc/15155/status |grep Vm > ... > VmHWM: 1207792 kB > VmRSS: 817272 kB > > We work with very large data frames and in many cases this is killing R with > an "out of memory" error. > > This is the relevant code in R 3.1.3 in src/main/serialize.c:2494 > > InitMemOutPStream(&out, &mbs, type, version, hook, fun); > R_Serialize(object, &out); > val = CloseMemOutPStream(&out); > > The serialized object is being stored in a buffer pointed by out.data. Then > in CloseMemOutPStream() R copies the whole buffer to a newly allocated SEXP > object (the raw vector that stores the final result): > > PROTECT(val = allocVector(RAWSXP, mb->count)); > memcpy(RAW(val), mb->buf, mb->count); > free_mem_buffer(mb); > UNPROTECT(1); > > Before calling free_mem_buffer() the process is using ~1.2GB (the original > data frame + the serialization buffer + final serialized raw vector). > > One possible solution would be to allocate a buffer for the final raw vector > and store the serialization result directly into that buffer. This would > bring the memory peak down from ~1.2GB to ~800MB. > > Thanks, > -Jorge > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel