[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716577#comment-14716577 ] Yong Zhang commented on HDFS-4412: -- Why not try IO throttling in a fire mode, like HADOOP-9640? Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716580#comment-14716580 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716581#comment-14716581 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716579#comment-14716579 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590743#comment-13590743 ] Denis Petrov commented on HDFS-4412: It would be nice to throttle on per-datanode basis, liming not the bandwidth of the current stream, but the IO bandwidth on the bottleneck datanode. If few throttled writes go to the same datanode, the throttling threshold should be adjusted. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590824#comment-13590824 ] Colin Patrick McCabe commented on HDFS-4412: I think you need to decide what problem you're trying to solve. Are you trying to implement cluster-wide QoS (quality of service)? Are you trying to avoid a problem with hot spots in the network? (And if so, have you quantified how big a problem that really is?) Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590895#comment-13590895 ] Denis Petrov commented on HDFS-4412: I am trying to avoid a problem with hot spots in the disk IO. In particular, during major compaction of Accumulo tablets (https://issues.apache.org/jira/browse/ACCUMULO-1128). This problem is difficult to solve at the application level, because actual disk IO can be (and often is) performed on another server, not on the server which runs the Accumulo tablet server doing the compaction. Two of three compactions running on different servers can result in heavy disk IO on the same HDFS datanode resulting in degradation of query performance and latency. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556461#comment-13556461 ] Zhenxiao Luo commented on HDFS-4412: Thanks Alejandro and Daryn. I am thinking of providing an IOThrottler interface, which DataTransferThrottler implements. An app side change is extending IOUtils.copyBytes with an additional IOThrottler parameter, which does the throttling when users doing DFSShell -put or -get. Cluster side throttling might need to add additional api in FileSystem open() and create(), also pass an additional IOThrottler parameter, and put IOThrottler in FSDataInputStream/FSDataOutputStream. As Alejandro said, if enforce throttling for cluster, we will go to Cluster Side, and if only enforce application throttling, we could go app side. Or, maybe in general, we could support both? Comments and suggestions are welcome. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556481#comment-13556481 ] Colin Patrick McCabe commented on HDFS-4412: I think it makes sense to create some kind of throttler class that wraps a generic {{OutputStream}}. That doesn't need to be in HDFS-- in fact, that code should probably go in common. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555390#comment-13555390 ] Zhenxiao Luo commented on HDFS-4412: Any comments are welcome. Which approach is better? Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555399#comment-13555399 ] Alejandro Abdelnur commented on HDFS-4412: -- Is the objective to be able to enforce IO throttling for the cluster or for certain applications to be 'nice'? If the former this must be enforced on the cluster side, not on the client side. If the later apps wanting to be 'nice' could wrap the IO streams with throttling aware ones. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555433#comment-13555433 ] Daryn Sharp commented on HDFS-4412: --- A simple app side change might be to extend IOUtils.copyBytes. Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira