subject:"Data source API \| Support for dynamic schema"

Re: Data source API | Support for dynamic schema

2015-01-29 Thread Aniket Bhatnagar

Thanks Reynold and Cheng. It does seem quiet a bit of heavy lifting to have schema per row. I will for now settle with having to do a union schema of all the schema versions and complain any incompatibilities :-) Looking forward to do great things with the API! Thanks, Aniket On Thu Jan 29 2015

Data source API | Support for dynamic schema

2015-01-28 Thread Aniket Bhatnagar

I saw the talk on Spark data sources and looking at the interfaces, it seems that the schema needs to be provided upfront. This works for many data sources but I have a situation in which I would need to integrate a system that supports schema evolutions by allowing users to change schema without

Re: Data source API | Support for dynamic schema

2015-01-28 Thread Cheng Lian

Hi Aniket, In general the schema of all rows in a single table must be same. This is a basic assumption made by Spark SQL. Schema union does make sense, and we're planning to support this for Parquet. But as you've mentioned, it doesn't help if types of different versions of a column differ

Re: Data source API | Support for dynamic schema

2015-01-28 Thread Reynold Xin

It's an interesting idea, but there are major challenges with per row schema. 1. Performance - query optimizer and execution use assumptions about schema and data to generate optimized query plans. Having to re-reason about schema for each row can substantially slow down the engine, but due to