Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Makes sense, Russell. I am trying to figure out if there is a way to enforce metadata reload at "createRelation" if the provided schema in the new sparkSession is different than the existing metadata schema. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Russell Spitzer
The code you linked to is very old and I don't think that method works anymore (Hive context not existing anymore). My latest attempt at trying this was Spark 2.2 and I ran into the issues I wrote about before. In DSV2 it's done via a catalog implementation, so you basically can write a new

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Thanks Russell. This shows that the "refreshTable" and "invalidateTable" could be used to reload the metadata but they do not work in our case. I have tried to

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Russell Spitzer
The last I looked into this the answer is no. I believe since there is a Spark Session internal relation cache, the only way to update a sessions information was a full drop and create. That was my experience with a custom hive metastore and entries read from it. I could change the entries in the

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Piyush Acharya
Do you want to merge the schema when incoming data is changed? spark.conf.set("spark.sql.parquet.mergeSchema", "true") https://kontext.tech/column/spark/381/schema-merging-evolution-with-parquet-in-spark-and-hive On Mon, Jul 20, 2020 at 3:48 PM fansparker wrote: > Does anybody know if there

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Does anybody know if there is a way to get the persisted table's schema updated when the underlying custom data source schema is changed? Currently, we have to drop and re-create the table. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/