Hello,
As part of writing data in ORC-file-format, I am doing the following:
=================================================================
1) Creating the "writer" object, as shown below:
_writer = OrcFile.createWriter(new Path(_fileName),
OrcFile.writerOptions(conf)
.fileSystem(fs)
.inspector(ObjInspector)
.stripeSize(100000)
.bufferSize(10000)
.compress(CompressionKind.ZLIB)
.version(OrcFile.Version.V_0_12));
2) Adding rows by executing the code: "_writer.addRow(_record)"
3) After writing all the input-rows, I am calling "_writer.close()"
=================================================================
This logic is working fine, when the file-size is small, but when the
input-data is more than "100G", I am getting OOM error. As I understand,
the "writer object" is flushing the data, only when we call
"_writer.close()", hence the issue. I do not see any API such as "flush",
that can be called on "writer object", after writing a portion of
input-data.
In this context, I am trying to understand, how to flush the data (during
ORC-file-write), after processing some portion of the input-data & before
calling "close" (which would be called, after processing all input rows).
Could you please let me know your inputs, in this regard.
Thanks,
Ravi