Hello, I'm writing a process that ingests json files and saves them a parquet files. The process is as such:
val sqlContext = new org.apache.spark.sql.SQLContext(sc) val jsonRequests=sqlContext.jsonFile("/requests") val parquetRequests=sqlContext.parquetFile("/requests_parquet") jsonRequests.registerTempTable("jsonRequests") parquetRequests.registerTempTable("parquetRequests") val unified_requests=sqlContext.sql("select * from jsonRequests union select * from parquetRequests") unified_requests.saveAsParquetFile("/tempdir") and then I delete /requests_parquet and rename /tempdir as /requests_parquet Is there a better way to achieve that ? Another problem I have is that I get a lot of small json files and as a result a lot of small parquet files, I'd like to merge the json files into a few parquet files.. how I do that? Thank you, Daniel