Re: "Transactional" conversion of CSV to Parquet?

2016-10-24 Thread MattK
All of this is trivial on a conventional file system or on MapR. Don't think it works out of the box on HDFS (but am willing to be corrected). I did not mention that I am using MapR-FS, so links are options. On 24 Oct 2016, at 17:34, Ted Dunning wrote: Yeah... it is quite doable. It helps a

Re: "Transactional" conversion of CSV to Parquet?

2016-10-24 Thread Ted Dunning
Yeah... it is quite doable. It helps a bit to have hard links. The basic idea is to have one symbolic link that points to either of two ping-pong staging directories. Whichever staging directory the symbolic points to is called the active staging directory, the other is called inactive. To

Re: "Transactional" conversion of CSV to Parquet?

2016-10-24 Thread Jim Scott
I would think that the best way to accommodate this would be: When landing the CSV, place it into folder A, then convert them to parquet format and put them in folder B... This will give you isolation between the file formats, and you can then choose to only query the parquet files. This is the