Hi Chris, Thank you for the input, I know I can always write the table DDL manually.
But here I would like to rely on Spark generating the schema. What I don't understand is the change in the behaviour of Spark: having the storage path specified does not necessarily mean it should be an external table. Is there any way to control/override this? Thanks, Peter On Wed, Feb 13, 2019, 13:09 Chris Teoh <chris.t...@gmail.com wrote: > Hey there, > > Could you not just create a managed table using the DDL in Spark SQL and > then written the data frame to the underlying folder or use Spark SQL to do > an insert? > > Alternatively try create table as select. Iirc hive creates managed tables > this way. > > I've not confirmed this works but I think that might be worth trying. > > I hope that helps. > > Kind regards > Chris > > On Wed., 13 Feb. 2019, 10:44 pm Horváth Péter Gergely, < > horvath.peter.gerg...@gmail.com> wrote: > >> Dear All, >> >> I am facing a strange issue with Spark 2.3, where I would like to create >> a MANAGED table out of the content of a DataFrame with the storage path >> overridden. >> >> Apparently, when one tries to create a Hive table via >> DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to >> automatically create an external table. >> >> This demonstrates the behaviour: >> >> scala> val numbersDF = sc.parallelize((1 to 100).toList).toDF("numbers") >> numbersDF: org.apache.spark.sql.DataFrame = [numbers: int] >> >> scala> numbersDF.write.format("orc").saveAsTable("numbers_table1") >> >> scala> spark.sql("describe formatted >> numbers_table1").filter(_.get(0).toString == "Type").show >> +--------+---------+-------+ >> |col_name|data_type|comment| >> +--------+---------+-------+ >> | Type| MANAGED| | >> +--------+---------+-------+ >> >> >> scala> numbersDF.write.format("orc").option("path", >> "/user/foobar/numbers_table_data").saveAsTable("numbers_table2") >> >> scala> spark.sql("describe formatted >> numbers_table2").filter(_.get(0).toString == "Type").show >> +--------+---------+-------+ >> |col_name|data_type|comment| >> +--------+---------+-------+ >> | Type| EXTERNAL| | >> +--------+---------+-------+ >> >> >> >> I am wondering if there is any way to force creation of a managed table >> with a custom path (which as far as I know, should be possible via standard >> Hive commands). >> >> I often seem to have the problem that I cannot find the appropriate >> documentation for the option configuration of Spark APIs. Could someone >> please point me to the right direction and tell me where these things are >> documented? >> >> Thanks, >> Peter >> >>