[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-815156685 Hi @satishkotha @jsbali ! I've created the pull request for this issue. Had observed more when going down the road and I've tried my best to clarify them and hopefully had written a detailed enough description for the PR. Let me know. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-813959142 The original issue being brought up by @zherenyu831 has a bigger impact than we had imagined. The simple fix to omit the empty `partitionToReplaceFileIds` would work on release `0.7.0` but it won't work anymore for the latest master, i.e. `0.9.0-SNAPSHOT`. Ticket [HUDI-1740](https://issues.apache.org/jira/browse/HUDI-1740) has been updated to reflect the newest issues and we believe it has something to with the new clustering feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-813825461 @nsivabalan @bvaradar Sorry for the bother. Could you kindly add me as a contributor so I could self-assign issues? I've set up my local dev env successfully and would like to move on from there. Sent an email to the dev mailing list but haven't gotten replies yet. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-813353690 @jsbali Thanks! Can someone grant me permission to self-assign tickets in JIRA? I've emailed to `d...@hudi.apache.org` but haven't been approved yet. It requires actions from a PMC member... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-812964282 btw, @jsbali, are you working on both of these tickets? https://issues.apache.org/jira/browse/HUDI-1739 https://issues.apache.org/jira/browse/HUDI-1740 Can I pick up one as I would be happy to contribute? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-812856042 @satishkotha Thanks for the tips! In fact, I was also able to reproduce this issue locally on my machine regarding the `insert_overwrite_table` issue that @jsbali has raised tickets https://issues.apache.org/jira/browse/HUDI-1739 and https://issues.apache.org/jira/browse/HUDI-1740 against with. For tracking and maybe helping you guys test the fix(in the future), I am pasting the script I used for reproducing the issue here. ``` import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode._ import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.config.HoodieWriteConfig._ import java.util.UUID import java.sql.Timestamp val tableName = "hudi_date_mor" val basePath = "" < fill out this value to point to your local folder(in absolute path) val writeConfigs = Map( "hoodie.cleaner.incremental.mode" -> "true", "hoodie.insert.shuffle.parallelism" -> "20", "hoodie.upsert.shuffle.parallelism" -> "2", "hoodie.clean.automatic" -> "false", "hoodie.datasource.write.operation" -> "insert_overwrite_table", "hoodie.table.name" -> tableName, "hoodie.datasource.write.table.type" -> "MERGE_ON_READ", "hoodie.cleaner.policy" -> "KEEP_LATEST_FILE_VERSIONS", "hoodie.keep.max.commits" -> "3", "hoodie.cleaner.commits.retained" -> "1", "hoodie.keep.min.commits" -> "2", "hoodie.compact.inline.max.delta.commits" -> "1" ) val dateSMap: Map[Int, String] = Map( 0-> "2020-07-01", 1-> "2020-08-01", 2-> "2020-09-01", ) val dateMap: Map[Int, Timestamp] = Map( 0-> Timestamp.valueOf("2010-07-01 11:00:15"), 1-> Timestamp.valueOf("2010-08-01 11:00:15"), 2-> Timestamp.valueOf("2010-09-01 11:00:15"), ) var seq = Seq( (0, "value", dateMap(0), dateSMap(0), UUID.randomUUID.toString) ) for(i <- 501 to 1000) { seq :+= (i, "value", dateMap(i % 3), dateSMap(i % 3), UUID.randomUUID.toString) } val df = seq.toDF("id", "string_column", "timestamp_column", "date_string", "uuid") ``` Run the spark shell(the one taken from hudi quick start page and I am using spark version `spark-3.0.1-bin-hadoop2.7`): ``` ./spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.7.0,org.apache.spark:spark-avro_2.12:3.0.1 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' ``` Copy the above script in there and hit `df.write.format("hudi").options(writeConfigs).mode(Overwrite).save(basePath)` 4 times and on the fifth time, it throws the anticipated `Caused by: java.lang.IllegalArgumentException: Positive number of partitions required` issue. Now, the thing is, we _still_ have to _manually_ delete the first commit file, which contains the empty `partitionToReplaceFileIds`; otherwise, it would still keep throwing the `Positive number of partitions required issue. error.` The `"hoodie.embed.timeline.server" -> "false"` _does_ help as it forces the write to refresh its timeline so we wouldn't see the second error again, which is `java.io.FileNotFoundException: /.hoodie/20210403201659.replacecommit does not exist` However, it appears `"hoodie.embed.timeline.server" -> "false"` to be not _quite_ necessary since the _6th_ time we write, the writer is automatically being refreshed with the _newest_ timeline and it will put all `*replacecommit` files back to a status of integrity again. If we fix the empty `partitionToReplaceFileIds` issue, we might not need to dig into the `replacecommit does not exist` issue anymore since it is caused by the workaround of _manually_ deleting the empty commit file. It would fix everything from the start. However, I would still be curious to learn about _why_ we would need a `reset` of the timeline server within the `close` action upon the `HoodieTableFileSystemView`. It appears unnecessary to me and could be removed if there is no strong reason behind it. The `reset` within `close` was originally introduced in #600 after a bit of digging in that code. I hope that helps you narrow down the scope a little bit. Maybe @bvaradar could explain it if the memory is still fresh to you since that PR is about 2 years ago from now. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487 @jsbali To give out extra insights and details, as @zherenyu831 has posted in the beginning: ``` [20210323080718__replacecommit__COMPLETED]: size : 0 [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` If we keep everything the same and let archive logic handling everything, it would fail at 0 `partitionToReplaceFileIds` against `20210323080718__replacecommit__COMPLETED`(the second item in the list above), and this is a known issue. To make the archive work, we tried to _manually_ delete the first _empty_ commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item in the list above). This has succeeded the archive, but instead, it has failed upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list above) Now to reason through the underlying mechanism of this error, given the archive was successful, that means a few commit files have been placed within the `.archive` folder, let's say ``` [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 ``` have been successfully moved and placed in `.archive`. At this moment, the timeline has been updated and there are 3 remaining commit files which are: ``` [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` Now, if you pay attention to the stack trace which caused `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just pasting them again: ``` User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353) at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) at org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118) at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179) at org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112) ``` After a `close` action being triggered on `TimelineService`, which is understandable, it propagates to `HoodieTableFileSystemView.close` and there is: ``` at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at