Yang Yun created HDFS-14627:
-------------------------------

             Summary: Improvements to make slow archive storage works on HDFS
                 Key: HDFS-14627
                 URL: https://issues.apache.org/jira/browse/HDFS-14627
             Project: Hadoop HDFS
          Issue Type: Improvement
         Environment: !data_flow_between_datanode_and_aws_s3.jpg!
            Reporter: Yang Yun
         Attachments: data_flow_between_datanode_and_aws_s3.jpg

In our setup, we mount archival storage from remote. the write speed is about 
20M/Sec, the read speed is about 40M/Sec, and the normal file operations, for 
example 'ls', are time consuming.
we add some improvements to make this kind of archive storage works in currrent 
hdfs system.

1. Add multiply to read/write timeout if block saved on archive storage.
2. Save replica cache file of archive storage to other fast disk for quick 
restart datanode, shutdownHook may does not execute if the saving takes too 
long time.
3. Check mount file system before using mounted archive storage.
4. Reduce or avoid call DF during generating heartbeat report for archive 
storage.
5. Add option to skip archive block during decommission.
6. Use multi-threads to scan archive storage.
7. Check archive storage error with retry times.
8. Add option to disable scan block on archive storage.
9. Sleep a heartBeat time if there are too many difference when call 
checkAndUpdate in DirectoryScanner
10. An auto-service to scan fsimage and set the storage policy of files 
according to policy.
11. An auto-service to call mover to move the blocks to right storage.
12. Dedup files on remote storage if the storage is reliable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to