Hi,

I have a couple of questions about the persistence and consistency of the data 
when written to the file. In my use case I generally expect that the data rate 
is high enough such that I can write sized (1GB or more) ORC files in short 
(less than 60 seconds) amount of time. There could be occasions where the data 
stream would be significantly reduced. With that I'm afraid of is having 
substantial amount of data already in the opened ORC file that is still being 
written to, albeit now slowly, and risking of losing already "written" that 
data in case the writer process dies (file not cleanly closed). I would like to 
still have large files even if the data rate slow, and willing to wait for the 
data to accumulate up to the desired file size.

I'm specifically interested in the C++ writer.

How does the ORC writer handle the situation where some data has been written 
to the file, and then the writer process dies?

In what state is such file? Can its contents be recovered?

How is data persisted to the file; is the data buffered in the ORC library or 
in the OS or directly written to the file?

Can a file be "temporary closed" as a precaution, on demand (ie. to achieve 
consistency on read), but then still be written to further until desired file 
size is achieved and closed for good? I'm imagining a intermediate "footers" 
that would superseded by the final footer.

Thank you,
Hinko

Reply via email to