[GitHub] [flink] klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file problem in RocksDB incremental checkpoint

2020-02-07 Thread GitBox
klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file 
problem in RocksDB incremental checkpoint
URL: https://github.com/apache/flink/pull/8751#issuecomment-583382052
 
 
   @yumengz5 thanks for the question, this pr did not solve the storage 
amplification.
   @streaming-olap thanks for the information and the feedback. good to know 
that you introduce this to your company.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file problem in RocksDB incremental checkpoint

2019-07-28 Thread GitBox
klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file 
problem in RocksDB incremental checkpoint
URL: https://github.com/apache/flink/pull/8751#issuecomment-515767642
 
 
   @StephanEwen  @tzulitai @aljoscha @kl0u Could you please take a look at 
this, thanks.
   Besides the before comments. I also updated the description of the PR, 
mainly added this two description about this pr:
   - Resue the same underlying file in one checkpoint of one operator, this 
means we just generate a single file for one checkpoint.
   - Do not support to use ByteStreamStateHandle in the new feature FSCSOS, 
will always flush state to file.
   
   the failed test is irrelevant, there is an issue 
[FLINK-9900](https://issues.apache.org/jira/browse/FLINK-9900) tracking it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file problem in RocksDB incremental checkpoint

2019-07-05 Thread GitBox
klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file 
problem in RocksDB incremental checkpoint
URL: https://github.com/apache/flink/pull/8751#issuecomment-508657468
 
 
   Travis failed because of some other component's checkstyle's problem, close 
and reopen to trigger Travis build again. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file problem in RocksDB incremental checkpoint

2019-07-02 Thread GitBox
klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file 
problem in RocksDB incremental checkpoint
URL: https://github.com/apache/flink/pull/8751#issuecomment-507920049
 
 
   @StephanEwen thanks for the comments, I'm trying to answer the questions 
below:
   - As a high-level description: this change introduces a new state 
handle(`FsSegmentStateHandle`), modify checkpoint metadata(**Do not modify the 
layout**, just add a type for `FsSegmentStateHandle` in 
`SavepointV2Serializer`), add some information in `SharedRegistry` to track the 
underlying file's reference, and other necessary modifications.
   - This is a new option needs users to activate
   - I think there are no compatibility problems, **for the checkpoint meta we 
don't change the layout**, just add a new type for the new state handle, for 
restoring from the exist `FileStateHandle`, we'll delivery to 
`RocksDBStateDownloader#downloadDataForStateHandle` and will read both the 
`FileStateHandle`  and `FsSegmentStateHandle` correctly, for 
`SharedStateRegistry` all the modifications just affect the newly introduced 
state handle only.
   - In my opinion, we can't just do this change in state bankend. I'm trying 
to give the reasons below:
  - first, we need to track the position(start position and end position) 
in the file of currently state handle(because after applying this change, each 
state handle is mapping to a block of one file)
  - second, we need to track the reference count of the underlying file, so 
that we can delete the file in the feature in time(not too early and not too 
late, delete the file too early will encounter problems of `IOException`, 
delete too late will consume too may disk space).
   
   Please let me know if I need to give more information, sir.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services