Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-07-10 Thread via GitHub
manuzhang commented on PR #10037: URL: https://github.com/apache/iceberg/pull/10037#issuecomment-2221875007 @puchengy Feel free to open a new issue to track. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-07-10 Thread via GitHub
puchengy commented on PR #10037: URL: https://github.com/apache/iceberg/pull/10037#issuecomment-2221660261 Leaving an idea for a further speed up for table w/ data skewness on partition level: we can further divide files from a given partition into X number of buckets. -- This is an auto

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-06-11 Thread via GitHub
manuzhang commented on PR #10037: URL: https://github.com/apache/iceberg/pull/10037#issuecomment-2162137634 @nastra @RussellSpitzer @aokolnychyi Could you please take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-05-18 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1605809547 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-16 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1568016911 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-16 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1568016911 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-16 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1568016911 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-15 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1566576095 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-15 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1566291691 ## data/src/main/java/org/apache/iceberg/data/MigrationService.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-15 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1566289821 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -237,4 +237,11 @@ protected ExecutorService executorService(int t

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-15 Thread via GitHub
aokolnychyi commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1566289821 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -237,4 +237,11 @@ protected ExecutorService executorService(int t

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-10 Thread via GitHub
RussellSpitzer commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1559706583 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +237,42 @@ public void testMigrat

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-10 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1559705114 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +237,42 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-10 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1559655890 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +237,42 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-10 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1559638434 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ProcedureUtil.java: ## @@ -51,4 +56,29 @@ static String statsFileLocation(String table

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-02 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547578684 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -171,6 +176,10 @@ public static List listPartition( } } + public static bool

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-02 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547529529 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -171,6 +176,10 @@ public static List listPartition( } } + public static boolean

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-02 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547447584 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-02 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547402202 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-02 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547402202 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-01 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1547081267 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-01 Thread via GitHub
RussellSpitzer commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1546825628 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrat

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-04-01 Thread via GitHub
RussellSpitzer commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1546814116 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrat

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-28 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1542702819 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-28 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1542559775 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1542198454 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1541275556 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1541039378 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1540956163 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
manuzhang commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1540660926 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmpt

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1540633541 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-27 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1540633541 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,25 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538856714 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -232,4 +232,20 @@ public void testMigrateEmptyTa

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538854414 ## docs/docs/spark-procedures.md: ## @@ -588,6 +589,8 @@ By default, the original table is retained with the name `table_BACKUP_`. | `properties` | ️ | map | Prop

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538851936 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java: ## @@ -108,6 +109,12 @@ public MigrateTableSparkAction backupTableName

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538851136 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSnapshotTableProcedure.java: ## @@ -223,4 +223,31 @@ public void testInvalidSnapsh

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538853534 ## api/src/main/java/org/apache/iceberg/actions/MigrateTable.java: ## @@ -60,6 +60,16 @@ default MigrateTable backupTableName(String tableName) { throw new Unsupp

Re: [PR] Spark 3.5: Parallelize reading files in snapshot and migrate procedures [iceberg]

2024-03-26 Thread via GitHub
nastra commented on code in PR #10037: URL: https://github.com/apache/iceberg/pull/10037#discussion_r1538852499 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SnapshotTableSparkAction.java: ## @@ -98,6 +99,12 @@ public SnapshotTableSparkAction tableProperty(S