I was testing out the conversion of a table to ORC. Using previous posts, I did alter table tablename set fileformat ORC; This worked great All new partitions created were ORC, the RC and ORC files played nice next to each other.
Then I had a hypothesis. I have tables that almost always have hive jobs running and inserting data. Ideally, I don't want to stop those. In my head, I saw a problem, if I converted the table mid INSERT job, what would happen? Ideally, the rc format that existed when the job started would be honored, the files would be written as RC files, and all would be well. What I think actually happened is that the setting was not honored; either the writers changed to ORC mid files causing major borkage, or, and this is what I suspect happened, the writers used RC file format, but when the partition metadata was updated, it was ORC? Either way, I am not an expert, but I could cause all subsequent queries to fail when I did that. Like I said, almost everything about the conversion of ORC is going well, but I'd recommend a change that would allow the setting to be changed, and that current running jobs would honor the old setting for partitions, and all would be well, and any new jobs would use the new settings. Also, for the group: how does things respond when you are doing insert append operations, and the first jobs where RC files and then other files in the same partition are ORC? Thanks!
