Re: Orc writer - continous memory flush out

Owen O'Malley Mon, 25 Sep 2017 10:37:18 -0700

ORC has to buffer the entire stripe in memory, so that can write the data
in column order rather than row order. If you have large blobs that you
can't buffer, I'd suggest writing them to a side file and storing the
offsets and lengths in the ORC file. That way you can write the large blobs
without spending all of your memory caching them (on either read or write).


.. Owen

On Mon, Aug 21, 2017 at 6:44 AM, Ozsvath, Tamas (GE Corporate, consultant) <
[email protected]> wrote:

> Dear Apache users,
>
> We are willing to create orc files with org.apache.orc.Writer. Our test
> were okay, till we the orc file creation from a database table which
> contained blob-s. We have tried to change the following settings but
> neither of them was helpful:
>
>
>
> org.apache.orc.OrcFile.WriterOptions:
>
> bufferSize()
>
> stripeSize()
>
> blockSize()
>
> enforceBufferSize()
>
>
>
> Is there a way to continously populate the ORC file(flushing out from
> memory continously), instead of flushing out data  from memory up on
> closing the file writer? What is the best practice to create an orc file
> from datasource which contains blobs, and can’t be handled only in-memory?
>
>
>
> Any information is appreciated!
>
>
>
> Thanks,
> Tamas
>
>
>

Re: Orc writer - continous memory flush out

Reply via email to