[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-07 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-815156685


   Hi @satishkotha @jsbali ! I've created the pull request for this issue. Had 
observed more when going down the road and I've tried my best to clarify them 
and hopefully had written a detailed enough description for the PR. Let me 
know. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-06 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-813959142


   The original issue being brought up by @zherenyu831 has a bigger impact than 
we had imagined. The simple fix to omit the empty `partitionToReplaceFileIds` 
would work on release `0.7.0` but it won't work anymore for the latest master, 
i.e. `0.9.0-SNAPSHOT`. Ticket 
[HUDI-1740](https://issues.apache.org/jira/browse/HUDI-1740) has been updated 
to reflect the newest issues and we believe it has something to with the new 
clustering feature. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-05 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-813825461


   @nsivabalan @bvaradar Sorry for the bother. Could you kindly add me as a 
contributor so I could self-assign issues? I've set up my local dev env 
successfully and would like to move on from there.   Sent an email to the dev 
mailing list but haven't gotten replies yet. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-05 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-813353690


   @jsbali Thanks!
   Can someone grant me permission to self-assign tickets in JIRA? 
   I've emailed to `d...@hudi.apache.org` but haven't been approved yet.  It 
requires actions from a PMC member... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-03 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-812964282


   btw, @jsbali, are you working on both of these tickets? 
   https://issues.apache.org/jira/browse/HUDI-1739
   https://issues.apache.org/jira/browse/HUDI-1740
   Can I pick up one as I would be happy to contribute?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-03 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-812856042


   @satishkotha Thanks for the tips! In fact, I was also able to reproduce this 
issue locally on my machine regarding the `insert_overwrite_table` issue that 
@jsbali has raised tickets https://issues.apache.org/jira/browse/HUDI-1739 and 
https://issues.apache.org/jira/browse/HUDI-1740 against with.
   
   For tracking and maybe helping you guys test the fix(in the future), I am 
pasting the script I used for reproducing the issue here.  
   ```
   import org.apache.hudi.QuickstartUtils._
   import scala.collection.JavaConversions._
   import org.apache.spark.sql.SaveMode._
   import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.hudi.config.HoodieWriteConfig._
   import java.util.UUID
   import java.sql.Timestamp
   
   val tableName = "hudi_date_mor"
   val basePath = "" < fill out this 
value to point to your local folder(in absolute path)
   val writeConfigs = Map(
"hoodie.cleaner.incremental.mode" -> "true",
"hoodie.insert.shuffle.parallelism" -> "20",
"hoodie.upsert.shuffle.parallelism" -> "2",
"hoodie.clean.automatic" -> "false",
"hoodie.datasource.write.operation" -> "insert_overwrite_table",
"hoodie.table.name" -> tableName,
"hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
"hoodie.cleaner.policy" -> "KEEP_LATEST_FILE_VERSIONS",
"hoodie.keep.max.commits" -> "3",
"hoodie.cleaner.commits.retained" -> "1",
"hoodie.keep.min.commits" -> "2",
"hoodie.compact.inline.max.delta.commits" -> "1"
)
   
   val dateSMap: Map[Int, String] = Map(
   0-> "2020-07-01",
   1-> "2020-08-01",
   2-> "2020-09-01",
   )
   val dateMap: Map[Int, Timestamp] = Map(
   0-> Timestamp.valueOf("2010-07-01 11:00:15"),
   1-> Timestamp.valueOf("2010-08-01 11:00:15"),
   2-> Timestamp.valueOf("2010-09-01 11:00:15"),
   )
   var seq = Seq(
   (0, "value", dateMap(0), dateSMap(0), UUID.randomUUID.toString)
   )
   for(i <- 501 to 1000) {
   seq :+= (i, "value", dateMap(i % 3), dateSMap(i % 3), 
UUID.randomUUID.toString)
   }
   val df = seq.toDF("id", "string_column", "timestamp_column", "date_string", 
"uuid")
   ```
   
   Run the spark shell(the one taken from hudi quick start page and I am using 
spark version `spark-3.0.1-bin-hadoop2.7`):
   ```
   ./spark-shell --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.7.0,org.apache.spark:spark-avro_2.12:3.0.1
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   ```
   Copy the above script in there and hit 
`df.write.format("hudi").options(writeConfigs).mode(Overwrite).save(basePath)` 
4 times and on the fifth time, it throws the anticipated `Caused by: 
java.lang.IllegalArgumentException: Positive number of partitions required` 
issue.
   
   Now, the thing is, we _still_ have to _manually_ delete the first commit 
file, which contains the empty `partitionToReplaceFileIds`; otherwise, it would 
still keep throwing the `Positive number of partitions required issue. error.`
   The `"hoodie.embed.timeline.server" -> "false"` _does_ help as it forces the 
write to refresh its timeline so we wouldn't see the second error again, which 
is `java.io.FileNotFoundException: 
/.hoodie/20210403201659.replacecommit does not exist`
   However, it appears `"hoodie.embed.timeline.server" -> "false"` to be not 
_quite_ necessary since the _6th_ time we write, the writer is automatically 
being refreshed with the _newest_ timeline and it will put all `*replacecommit` 
files back to a status of integrity again. 
   
   If we fix the empty `partitionToReplaceFileIds` issue, we might not need to 
dig into the `replacecommit does not exist` issue anymore since it is caused by 
the workaround of _manually_ deleting the empty commit file. It would fix 
everything from the start. However, I would still be curious to learn about 
_why_ we would need a `reset` of the timeline server within the `close` action 
upon the `HoodieTableFileSystemView`. It appears unnecessary to me and could be 
removed if there is no strong reason behind it. 
   
   The `reset` within `close` was originally introduced in #600 after a bit of 
digging in that code. I hope that helps you narrow down the scope a little bit. 
Maybe @bvaradar could explain it if the memory is still fresh to you since that 
PR is about 2 years ago from now.  Thanks.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-31 Thread GitBox


ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487


   @jsbali To give out extra insights and details, as @zherenyu831 has posted 
in the beginning:
   ```
   [20210323080718__replacecommit__COMPLETED]: size : 0
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   If we keep everything the same and let archive logic handling everything, it 
would fail at 0 `partitionToReplaceFileIds` against 
`20210323080718__replacecommit__COMPLETED`(the second item in the list above), 
and this is a known issue. 
   
   To make the archive work, we tried to _manually_ delete the first _empty_ 
commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item 
in the list above). This has succeeded the archive, but instead, it has failed 
upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list 
above)
   
   Now to reason through the underlying mechanism of this error, given the 
archive was successful, that means a few commit files have been placed within 
the `.archive` folder, let's say 
   ```
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   ```
   have been successfully moved and placed in `.archive`. At this moment, the 
timeline has been updated and there are 3 remaining commit files which are:
   ```
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   
   Now, if you pay attention to the stack trace which caused `User class threw 
exception: org.apache.hudi.exception.HoodieIOException: Could not read commit 
details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just 
pasting them again:
   ```
   User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
   at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
   at 
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
   at 
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
   at 
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
   at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
   ```
   
   After a `close` action being triggered on `TimelineService`, which is 
understandable, it propagates to `HoodieTableFileSystemView.close` and there is:
   ```
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at