Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Wei Yan
Creating dataframes and union them looks reasonable. thanks, Wei On Mon, May 11, 2015 at 6:39 PM, Michael Armbrust wrote: > Ah, yeah sorry. I should have read closer and realized that what you are > asking for is not supported. It might be possible to add simple coercions > such as this one,

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Michael Armbrust
Ah, yeah sorry. I should have read closer and realized that what you are asking for is not supported. It might be possible to add simple coercions such as this one, but today, compatible schemas must only add/remove columns and cannot change types. You could try creating different dataframes and

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Wei Yan
Thanks for the reply, Michael. The problem is, if I set "spark.sql.parquet.useDataSourceApi" to true, spark cannot create a DataFrame. The exception shows it "failed to merge incompatible schemas". I think here it means that, the "int" schema cannot be merged with the "long" one. Does it mean that

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Michael Armbrust
> > BTW, I use spark 1.3.1, and already set > "spark.sql.parquet.useDataSourceApi" to false. > Schema merging is only supported when this flag is set to true (setting it to false uses old code that will be removed once the new code is proven).

Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Wei Yan
Hi, devs, I met a problem when using spark to read to parquet files with two different versions of schemas. For example, the first file has one field with "int" type, while the same field in the second file is a "long". I thought spark would automatically generate a merged schema "long", and use t