[jira] [Assigned] (HUDI-852) Add validation to check Table name when Append Mode is used in DataSource writer
[ https://issues.apache.org/jira/browse/HUDI-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aakash Pradeep reassigned HUDI-852: --- Assignee: Aakash Pradeep > Add validation to check Table name when Append Mode is used in DataSource > writer > > > Key: HUDI-852 > URL: https://issues.apache.org/jira/browse/HUDI-852 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: newbie, Writer Core >Reporter: Bhavani Sudha >Assignee: Aakash Pradeep >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > > Copied from user's description in mailing list: > Table name is not respected while inserting record with different table name > with Append mode > > {code:java} > // While running commands from Hudi quick start guide, I found that the > library does not check for the table name in the request against the table > name in the metadata available in the base path, I think it should throw > TableAlreadyExist, In case of Save mode: *overwrite *it warns. > *spark-2.4.4-bin-hadoop2.7/bin/spark-shell --packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'* > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > * | option(TABLE_NAME, "test_table").* > | mode(*Append*). > | save(basePath) > 20/04/29 17:23:42 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > No exception is thrown if we run this > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > * | option(TABLE_NAME, "foo_table").* > | mode(*Append*). > | save(basePath) > 20/04/29 17:24:37 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > | option(TABLE_NAME, *tableName*). > | mode(*Overwrite*). > | save(basePath) > *20/04/29 22:25:16 WARN HoodieSparkSqlWriter$: hoodie table at > file:/tmp/hudi_trips_cow already exists. Deleting existing data & > overwriting with new data.* > 20/04/29 22:25:18 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-852) Add validation to check Table name when Append Mode is used in DataSource writer
[ https://issues.apache.org/jira/browse/HUDI-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha reassigned HUDI-852: -- Assignee: (was: Bhavani Sudha) > Add validation to check Table name when Append Mode is used in DataSource > writer > > > Key: HUDI-852 > URL: https://issues.apache.org/jira/browse/HUDI-852 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: newbie, Writer Core >Reporter: Bhavani Sudha >Priority: Minor > Fix For: 0.6.0 > > > Copied from user's description in mailing list: > Table name is not respected while inserting record with different table name > with Append mode > > {code:java} > // While running commands from Hudi quick start guide, I found that the > library does not check for the table name in the request against the table > name in the metadata available in the base path, I think it should throw > TableAlreadyExist, In case of Save mode: *overwrite *it warns. > *spark-2.4.4-bin-hadoop2.7/bin/spark-shell --packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'* > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > * | option(TABLE_NAME, "test_table").* > | mode(*Append*). > | save(basePath) > 20/04/29 17:23:42 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > No exception is thrown if we run this > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > * | option(TABLE_NAME, "foo_table").* > | mode(*Append*). > | save(basePath) > 20/04/29 17:24:37 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > scala> df.write.format("hudi"). > | options(getQuickstartWriteConfigs). > | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). > | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). > | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > | option(TABLE_NAME, *tableName*). > | mode(*Overwrite*). > | save(basePath) > *20/04/29 22:25:16 WARN HoodieSparkSqlWriter$: hoodie table at > file:/tmp/hudi_trips_cow already exists. Deleting existing data & > overwriting with new data.* > 20/04/29 22:25:18 WARN DefaultSource: Snapshot view not supported yet via > data source, for MERGE_ON_READ tables. Please query the Hive table > registered using Spark SQL. > scala> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)