[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-835134388 Hi @vinothchandar @umehrot2 , The PR has updated, mainly with the follow changes: - Support atomic for CTAS - Support use timestamp type as partition field - Fix exception when partition column is not in the rightest of select list for CTAS. - Add `TestSqlStatement` which does a sequence of statements. see [sql-statements.sql](https://github.com/apache/hudi/blob/171d607b1adc3972aa2c9e3efce5362368599d00/hudi-spark-datasource/hudi-spark/src/test/resources/sql-statements.sql) - Add more test case for CTAS & partitioned table. - Change the `SparkSqlAdpater` to `SparkAdapter` For other issues your have mentioned above, I have filed a JIRA for each. - Support Truncate Command For Hoodie [1883](https://issues.apache.org/jira/browse/HUDI-1883) - Support Partial Update For MergeInto [1884](https://issues.apache.org/jira/browse/HUDI-1884) - Support Delete/Update Non-pk table [1885](https://issues.apache.org/jira/browse/HUDI-1885) After this first pr has merged, we can continue to solve these JIRAs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-835134388 Hi @vinothchandar @umehrot2 , The PR has updated, mainly with the follow changes: - Support atomic for CTAS - Support use timestamp type as partition field - Fix exception when partition column is not in the rightest of select list for CTAS. - Add `TestSqlStatement` which does a sequence of statements. see [sql-statements.sql](https://github.com/apache/hudi/pull/2645/files#diff-71c005a921dcea9f712db30bd3376fbc5707d09e9777c583b933072cc64276fd) - Add more test case for CTAS & partitioned table. - Change the `SparkSqlAdpater` to `SparkAdapter` For other issues your have mentioned above, I have filed a JIRA for each. - Support Truncate Command For Hoodie [1883](https://issues.apache.org/jira/browse/HUDI-1883) - Support Partial Update For MergeInto [1884](https://issues.apache.org/jira/browse/HUDI-1884) - Support Delete/Update Non-pk table [1885](https://issues.apache.org/jira/browse/HUDI-1885) After this first pr has merged, we can continue to solve these JIRAs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-835134388 Hi @vinothchandar @umehrot2 , The PR has updated, mainly with the follow changes: - Support atomic for CTAS - Support use timestamp type as partition field - Fix exception when partition column is not in the rightest of select list for CTAS. - Add `TestSqlStatement` which does a sequence of statements. see - Add more test case for CTAS & partitioned table. - Change the `SparkSqlAdpater` to `SparkAdapter` For other issues your have mentioned above, I have filed a JIRA for each. - Support Truncate Command For Hoodie [1883](https://issues.apache.org/jira/browse/HUDI-1883) - Support Partial Update For MergeInto [1884](https://issues.apache.org/jira/browse/HUDI-1884) - Support Delete/Update Non-pk table [1885](https://issues.apache.org/jira/browse/HUDI-1885) After this first pr has merged, we can continue to solve these JIRAs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-833192518 Hi @vinothchandar , Thanks for your working on the test. - CREATE TABLE > Even if it fails, it ends up creating the table (i.e its not atomic per se) Yes, It is not atomic for CTAS currently. I can fix this in this PR later. > When selecting all columns (probably need more tests across data types) Yes, I will add more tests across data types. > Truncate table For `Truncate table`, we need do some work for hudi which is not covered in this PR. I will file another PR to solve this. - MergeInto >1、 Fails due to assignment field/schema mismatch Currently, merge into cannot support Partial updates, we should specified all the fields of the target table in the update set assignments. >2、 Merges only allowed by PK Yes, this is currently a limitation of PR to `merge into` as we discussed in the RFC-25. I think we can solve this in another PR. > 3、Merge not updating to new value This is a same issue with 1,currently we do not support Partial updates. - Delete Table > Non PK based deletes are not working atm Currently we cannot support delete or update a non-pk hudi table. For this case, we can use the `_hoodie_record_key` to identify a record and do the delete & update. We can file a PR to support this. > Why do we have to encode column name into reach record key? i.e _hoodie_record_key = '1' vs being _hoodie_record_key = 'id:1' This is hoodie's original behavior for `_hoodie_record_key`. - Create or Replace table This has not supported in this PR. I will file a PR for this. - Create table, partitioned by > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select type, public, payload, repo, actor, org, id, other from gh_raw] java.lang.AssertionError: assertion failed The partitioned column must be on the rightmost side of the SELECT column. This is a requirement of Spark SQL. So we should move the `type` to the last select column, just like this: > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select public, payload, repo, actor, org, id, other , type from gh_raw. I will add a translator to move the partition columns to the right most of the select list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-833192518 Hi @vinothchandar , Thanks for your working on the test. - CREATE TABLE > Even if it fails, it ends up creating the table (i.e its not atomic per se) Yes, It is not atomic for CTAS currently. I can fix this in this PR later. > When selecting all columns (probably need more tests across data types) Yes, I will add more tests across data types. > Truncate table For `Truncate table`, we need do some work for hudi which is not covered in this PR. I will file another PR to solve this. - MergeInto >1、 Fails due to assignment field/schema mismatch Currently, merge into cannot support Partial updates, we should specified all the fields of the target table in the update set assignments. >2、 Merges only allowed by PK Yes, this is currently a limitation of PR to `merge into` as we discussed in the RFC-25. I think we can solve this in another PR. > 3、Merge not updating to new value This is a same issue with 1,currently we do not support Partial updates. - Delete Table > Non PK based deletes are not working atm Currently we cannot support delete or update a non-pk hudi table. For this case, we can use the `_hoodie_record_key` to identify a record and do the delete & update. We can file a PR to support this. > Why do we have to encode column name into reach record key? i.e _hoodie_record_key = '1' vs being _hoodie_record_key = 'id:1' This is hoodie's original behavior for `_hoodie_record_key`. - Create or Replace table This has not supported in this PR. I will file a PR for this. - Create table, partitioned by > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select type, public, payload, repo, actor, org, id, other from gh_raw] java.lang.AssertionError: assertion failed The partitioned column must be on the rightmost side of the SELECT column. This is a requirement of Spark SQL. So we should move the `type` to the last select column, just like this: > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select public, payload, repo, actor, org, id, other , type from gh_raw. I will add a translator to move the partition columns to the right most of the select list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-833195752 > @pengzhiwei2018 few suggestions/questions . could you please clarify? > > * How does one pass any `HoodieWriteConfig` for a INSERT/UPDATE/DELETE/MERGE statement > * Setting user specified metadata with each commit. This is very important for interplay with deltastreamer etc. > * Do we support setting table properties? SET command. > * Can we add a functional test that does a sequence of statements and parameterize it for both COW/MOR? Want to ensure we can document the entire support matrix. > * Should we add more rigorous tests for partitioned tables/data types? > > Next steps for me is to run it at a larger scale on a cluster. Hi @vinothchandar > How does one pass any `HoodieWriteConfig` for a INSERT/UPDATE/DELETE/MERGE statement Using the set options. e.g. `set hoodie.insert.shuffle.parallelism = 4` > Do we support setting table properties? SET command. Yes, we support SET Command to set the write config and other runtime properties. For table properties like table type, table name, we can use the alter table command. > Can we add a functional test that does a sequence of statements and parameterize it for both COW/MOR? Want to ensure we can document the entire support matrix. Yes, that is great. I will do this. > Should we add more rigorous tests for partitioned tables/data types? Of course, I will add more test case to cover more data types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245 > @pengzhiwei2018 could we make the spark-shell experience better? I think we need the extensions added by default when the jar is pulled in? > > ```scala > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' > > scala> spark.sql("create table t1 (id int, name string, price double, ts long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show > t, returning NoSuchObjectException > org.apache.hudi.exception.HoodieException: 'path' or 'hoodie.datasource.read.paths' or both must be specified. > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77) > at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337) > at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616) > at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) > ``` Hi @vinothchandar , you can test this by the following command - Using spark-sql > spark-sql --jars $HUDI_SPARK_BUNDLE \\ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' - Using spark-shell > spark-shell --jars $HUDI_SPARK_BUNDLE \\ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' just set the `spark.sql.extensions` to `org.apache.spark.sql.hudi.HoodieSparkSessionExtension`. IMO This conf is just like the `spark.serializer` which should be specified when create `SparkSession`. So It is hard to auto set this when install the hudi jar. Thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245 > @pengzhiwei2018 could we make the spark-shell experience better? I think we need the extensions added by default when the jar is pulled in? > > ```scala > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' > > scala> spark.sql("create table t1 (id int, name string, price double, ts long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show > t, returning NoSuchObjectException > org.apache.hudi.exception.HoodieException: 'path' or 'hoodie.datasource.read.paths' or both must be specified. > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77) > at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337) > at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616) > at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) > ``` Hi @vinothchandar , you can test this by the following command - Using spark-sql > spark-sql --jars $HUDI_SPARK_BUNDLE \\ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' - Using spark-shell > spark-shell --jars $HUDI_SPARK_BUNDLE \\ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' just set the `spark.sql.extensions` to `org.apache.spark.sql.hudi.HoodieSparkSessionExtension`. Thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-824759341 > @pengzhiwei2018 can we file followups from this review as sub tasks under the same umbrella JIRA? > > I spent sometime looking at snowflake and bigquery and what kind of experience users have there writing data out. > Here are my recommendations (mostly borrowing from ANSI SQL) > > * [x] We can support `PRIMARY KEY(col1, col2,..)` definition, if no PK is specified we will generate a synthetic key or have it be null. > * [ ] Multi table inserts. `INSERT ALL WHEN condition1 INTO t1 WHEN condition2 into t2` > * [x] Update statement `UPDATE t1 SET t1.a = t2.b + 1 FROM t2 WHERE condition` > * [x] Merge into statement with matched and not matched clauses. > * [x] Delete from statement > * [ ] Copy INTO statement that integrates with Hudi bootstrap functionality > * [ ] CREATE table with support for unique constraint check. > * [ ] ALTER table statement to alter schema constraints. > * [ ] CREATE table with `CLUSTER BY(col1, col2)` > * [ ] CREATE INDEX for adding indexes (future, as we complete RFC-08,27) > * [ ] CREATE table with `FOREIGN KEY`, `DATABASE, SCHEMA` (future plans, needs multi table txns + our metaserver) > * [ ] Expose all Hudi table services (cleaning, compaction, clustering, .. ) using a `CALL cleaner ` kind of syntax. Over time we can expose more standard functions there. For e.g more advanced compaction and clustering strategies call be specified there. We may need a `SHOW services t1` to show information for these scheduled calls. > > Checked off items I think are already covered in this PR. If not, please raise JIRA subtasks for these as well. That is greate! I will file a JIRA for each of those have not covered in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-815583131 Hi @vinothchandar @kwondw , Thanks for the review on this feature. The code has updated. Main changes: - Sql support spark3 based on the same `HoodieAnalysis` and `Commands` with spark2. We can pass the test case for spark3 by the following command: `mvn clean install -DskipTests -Pspark3` `mvn test -Punit-tests -Pspark3 -pl hudi-spark-datasource/hudi-spark` - Fix the bug when the source column name is not same with the target table column name, MergeInto not work. e.g. `mere into ... on t0.id = s0.s_id` - Support expression on source columns for the merge-on condition. e.g. `merge into on t0.id = s0.id + 1 ...` - Add more test case for `TestMergeInto` & `TestUpdate` & `TestDelete` & `TestCreateTable` - Remove the `tableSchema`, use `writeSchema` instead. Please take a review again when you have time, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-806465222 > Let me see how/if we can simplify the inputSchema vs writeSchema thing. > > I went over the PR now. LGTM at a high level. > Few questions though > > * I see we are introducing some antlr parsing and inject a custom parse for spark 2.x. Is this done for backwards compat with Spark 2 and will be eventually removed? > * Do we reuse the MERGE/DELETE keywords from Spark 3? Is Spark 3 and Spark 2 syntax different. Can you comment on how we are approaching all this. > * Have you done any production testing of this PR? > > cc @kwondw could you also please chime in. We would like to land something basic and iterate and get this out for 0.9.0 next month. Thanks for you review @vinothchandar ! > I see we are introducing some antlr parsing and inject a custom parse for spark 2.x. Is this done for backwards compat with Spark 2 and will be eventually removed? Yes, It is for backwards for Spark2 and will be eventually removed for spark3 if there are no other syntax extend for the spark3. > Do we reuse the MERGE/DELETE keywords from Spark 3? Is Spark 3 and Spark 2 syntax different. Can you comment on how we are approaching all this. Yes ,I reused the extended syntax( MERGE) from spark 3. So they are the same between spark2 and spark3 in the syntax. For spark3, spark can recognize the MERGE/DELTE syntax and parser it to LogicalPlan. For spark2, our extended sql parser will also parser it to the some LogicalPlan. After the parser, the LogicalPlan will goes to the same Rules(In `HoodieAnalysis`) to resolve and rewrite to Hoodie Command. Hoodie Command will translate the logical plan to the hoodie api call.The Hoodie Command shared between spark2 & spark 3. So except the sql parser for spark2, other parts can share between spark2 & spark3. > Have you done any production testing of this PR? Yes, I have test it in Aliyun's EMR cluster. And more test case will be done this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-803355840 > Getting started on this. Sorry for the delay. > > How important are the changes around writeSchema vs inputSchema and such changes to the SQL implementation? Hi @vinothchandar ,Thanks for your review. It's necessary to introduce the `inputSchema` & `tableSchema` to replace the origin `writeSchema` for MergeInto. For example: ``` Merge Into h0 using ( select id, name, flag from s) as s0 on s0.id = h0.id when matched and flag ='u' then update set id = s0.name, name = s0.name when not matched then insert (id, name) values(s0.id, s0.name) ``` The input is `"select id, name, flag from s"` which schema is `(id, name, flag)`. But the record write to the table is `(id, name) ` after the update translate. The inputSchema is not equal to the writeSchema. So the origin `writeSchema` can not satisfy this scenario. I introduce introduce the `inputSchema` & `tableSchema` to solve this problem. The `inputSchema` is used to parse the incoming record and the `tableSchema` for write & read record from the table. In most case except the MergeInto, The `inputSchema` is the same the `tableSchema`,So it should not affect the origin logical, IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org