Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
github-actions[bot] closed pull request #14269: Spark: Support CREATE TABLE LIKE with Spark URL: https://github.com/apache/iceberg/pull/14269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
github-actions[bot] commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3726485710 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
github-actions[bot] commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3703094047 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
MaxNevermind commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3593227342 @anuragmantri @singhpk234 Can you please look at it again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
github-actions[bot] commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3587562485 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
MaxNevermind commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3454499631 @anuragmantri @singhpk234 Can you please look at it again? The current state of tests coverage and some questions: ✅ Partition spec preservation - 2 tests for partitioned an not partitioned tables ✅ IF NOT EXISTS behavior ✅ TBLPROPERTIES - a single test for merge and override ❓ Sort order preservation - Do we want to support Sort order functionality in CREATE TABLE LIKE? I see that [Spark's DDL](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-like.html) doesn't have it. In Iceberg DDL we also add ordering separately by using ALTER TABLE. ❓ Cross-catalog table copying - Do we want this functionality? I might be wrong but I don't see test for that in other DDL statements, does Iceberg even support cross-catalog DDL? ❓ Error cases (source table doesn't exist, non-Iceberg source, etc.) - I'm not sure what I'm supposed to test for that. Do I have to test that exception will be thrown? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
MaxNevermind commented on code in PR #14269:
URL: https://github.com/apache/iceberg/pull/14269#discussion_r2442707422
##
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestTables.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import org.apache.iceberg.Parameter;
+import org.apache.iceberg.ParameterizedTestExtension;
+import org.apache.iceberg.Parameters;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.types.Types;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestTables extends ExtensionsTestBase {
+
+ @Parameter(index = 3)
+ private String sourceName;
+
+ @BeforeEach
+ public void createTableIfNotExists() {
+sql(
+"CREATE TABLE IF NOT EXISTS %s (id bigint NOT NULL, data string) "
++ "USING iceberg PARTITIONED BY (truncate(id, 3))",
+sourceName);
+sql("INSERT INTO %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5,
'e')", sourceName);
+ }
+
+ @AfterEach
+ public void removeTables() {
+sql("DROP TABLE IF EXISTS %s", tableName);
+ }
+
+ @Parameters(name = "catalogName = {0}, implementation = {1}, config = {2},
sourceName = {3}")
+ protected static Object[][] parameters() {
+return new Object[][] {
+ {
+SparkCatalogConfig.HIVE.catalogName(),
+SparkCatalogConfig.HIVE.implementation(),
+SparkCatalogConfig.HIVE.properties(),
+SparkCatalogConfig.HIVE.catalogName() + ".default.source"
+ },
+ {
+SparkCatalogConfig.HADOOP.catalogName(),
+SparkCatalogConfig.HADOOP.implementation(),
+SparkCatalogConfig.HADOOP.properties(),
+SparkCatalogConfig.HADOOP.catalogName() + ".default.source"
+ },
+ {
+SparkCatalogConfig.SPARK_SESSION.catalogName(),
+SparkCatalogConfig.SPARK_SESSION.implementation(),
+SparkCatalogConfig.SPARK_SESSION.properties(),
+"default.source"
+ }
+};
+ }
+
+ @TestTemplate
+ public void testCreateTableLike() {
Review Comment:
@anuragmantri @singhpk234
The current state of tests coverage and some questions:
- ✅ Partition spec preservation - 2 tests for partitioned an not partitioned
tables
- ✅ IF NOT EXISTS behavior
- ✅ TBLPROPERTIES - a single test for merge and override
- ❓ Sort order preservation - Do we want to support Sort order functionality
in CREATE TABLE LIKE? I see that [Spark's
DDL](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-like.html)
doesn't have it. In Iceberg DDL we also add ordering separately by using ALTER
TABLE.
- ❓ Cross-catalog table copying - Do we want this functionality? I might be
wrong but I don't see test for that in other DDL statements, does Iceberg even
support cross-catalog DDL?
- ❓ Error cases (source table doesn't exist, non-Iceberg source, etc.) - I'm
not sure what I'm supposed to test for that. Do I have to test that exception
will be thrown?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
MaxNevermind commented on code in PR #14269:
URL: https://github.com/apache/iceberg/pull/14269#discussion_r2415364353
##
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTables.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.logical.CreateIcebergTableLike
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+case class ResolveTables(spark: SparkSession) extends Rule[LogicalPlan] {
+
+ override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp
{
+case x @ CreateIcebergTableLike(_, _, _, _) => x
Review Comment:
Actually I just realized that entire ResolveTables class might not be needed
it as tables are resolved in ExtendedDataSourceV2Strategy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
MaxNevermind commented on PR #14269: URL: https://github.com/apache/iceberg/pull/14269#issuecomment-3375120630 Hey @anuragmantri Can you look up this draft of implementation for some red flags? I don't contribute to Iceberg too often so not entirely certain in my approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Spark: Support CREATE TABLE LIKE with Spark [iceberg]
anuragmantri commented on code in PR #14269:
URL: https://github.com/apache/iceberg/pull/14269#discussion_r2412033643
##
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTables.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.logical.CreateIcebergTableLike
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+case class ResolveTables(spark: SparkSession) extends Rule[LogicalPlan] {
+
+ override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp
{
+case x @ CreateIcebergTableLike(_, _, _, _) => x
Review Comment:
This should resolve the catalogs and tables. See
https://github.com/apache/iceberg/blob/main/spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala
##
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestTables.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import org.apache.iceberg.Parameter;
+import org.apache.iceberg.ParameterizedTestExtension;
+import org.apache.iceberg.Parameters;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.types.Types;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestTables extends ExtensionsTestBase {
+
+ @Parameter(index = 3)
+ private String sourceName;
+
+ @BeforeEach
+ public void createTableIfNotExists() {
+sql(
+"CREATE TABLE IF NOT EXISTS %s (id bigint NOT NULL, data string) "
++ "USING iceberg PARTITIONED BY (truncate(id, 3))",
+sourceName);
+sql("INSERT INTO %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5,
'e')", sourceName);
+ }
+
+ @AfterEach
+ public void removeTables() {
+sql("DROP TABLE IF EXISTS %s", tableName);
+ }
+
+ @Parameters(name = "catalogName = {0}, implementation = {1}, config = {2},
sourceName = {3}")
+ protected static Object[][] parameters() {
+return new Object[][] {
+ {
+SparkCatalogConfig.HIVE.catalogName(),
+SparkCatalogConfig.HIVE.implementation(),
+SparkCatalogConfig.HIVE.properties(),
+SparkCatalogConfig.HIVE.catalogName() + ".default.source"
+ },
+ {
+SparkCatalogConfig.HADOOP.catalogName(),
+SparkCatalogConfig.HADOOP.implementation(),
+SparkCatalogConfig.HADOOP.properties(),
+SparkCatalogConfig.HADOOP.catalogName() + ".default.source"
+ },
+ {
+SparkCatalogConfig.SPARK_SESSION.catalogName(),
+SparkCatalogConfig.SPARK_SESSION.implementation(),
+SparkCatalogConfig.SPARK_SESSION.properties(),
+"default.source"
+ }
+};
+ }
+
+ @TestTemplate
+ public void testCreateTableLike() {
Review Comment:
Thanks for adding the test, but we may beed to cover a more
- IF NOT EXISTS behavior (tested in
