[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14292: -- Description: I wanted to investigate {{dfs.datanode.max.transfer.threads}} from {{hdfs-site.xml}}. It is described as "Specifies the maximum number of threads to use for transferring data in and out of the DN." The default value is 4096. I found it interesting because 4096 threads sounds like a lot to me. I'm not sure how a system with 8-16 cores would react to this large a thread count. Intuitively, I would say that the overhead of context switching would be immense. During my investigation, I discovered the [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] setup in the {{DataXceiverServer}} class: # A peer connects to a DataNode # A new thread is spun up to service this connection # The thread runs to completion # The tread dies It would perhaps be better if we used a thread pool to better manage the lifecycle of the service threads and to allow the DataNode to re-use existing threads, saving on the need to create and spin-up threads on demand. In this JIRA, I have added a couple of things: # Added a thread pool to {{DataXceiverServer}} class that, on demand, will create up to {{dfs.datanode.max.transfer.threads}}. A thread that has completed its prior duties will stay idle for up to 60 seconds (configurable), it will be retired if no new work has arrived. # Added new methods to the {{Peer}} Interface to allow for better logging and less code within each Thread ({{DataXceiver}}). # Updated the Thread code ({{DataXceiver}}) regarding its interactions with {{blockReceiver}} instance variable was: I wanted to investigate {{dfs.datanode.max.transfer.threads}} from {{hdfs-site.xml}}. It is described as "Specifies the maximum number of threads to use for transferring data in and out of the DN." The default value is 4096. I found it interesting because 4096 threads sounds like a lot to me. I'm not sure how a system with 8-16 cores would react to this large a thread count. Intuitively, I would say that the overhead of context switching would be immense. During mt investigation, I discovered the [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] setup in the {{DataXceiverServer}} class: # A peer connects to a DataNode # A new thread is spun up to service this connection # The thread runs to completion # The tread dies It would perhaps be better if we used a thread pool to better manage the lifecycle of the service threads and to allow the DataNode to re-use existing threads, saving on the need to create and spin-up threads on demand. In this JIRA, I have added a couple of things: # Added a thread pool to {{DataXceiverServer}} class that, on demand, will create up to {{dfs.datanode.max.transfer.threads}}. A thread that has completed its prior duties will stay idle for up to 60 seconds (configurable), it will be retired if no new work has arrived. # Added new methods to the {{Peer}} Interface to allow for better logging and less code within each Thread ({{DataXceiver}}). # Updated the Thread code ({{DataXceiver}}) regarding its interactions with {{blockReceiver}} instance variable > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During my investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: (was: HDFS-14292.8.patch) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.8.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: (was: HDFS-14295.8.patch) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.8.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: (was: HDFS-14292.6.patch) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14295.8.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) OK. I believe these last two unit tests are flaky and not related to the changes in this patch. I have just submitted the patch once more to see what we get. Also, the checkstyle errors are things that are a bit beyond my control,... i.e., the {{Unused Import}} is already in the code, not caused by this patch, and some of the line-length issues are due to the existing formats and to change my code would make it not match the current stuff. Please consider this patch for inclusion into the project. > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, > HDFS-14292.8.patch, HDFS-14295.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) With the last patch I provided, the {{hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes}} unit test failed. It's a bit weird because the test itself failed with a NPE. I took a look at the test and it looks a bit flaky so I tightened it up a bit in this latest patch. > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.8.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.7.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.6.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, > HDFS-14292.6.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) New patch... Checkstyle said I should make a class final, but that causes Mockito (and therefore some unit tests) to fail. > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.6.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.5.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) I don't think I got all of the UT passing, but could use a new baseline of all the test having run. > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.4.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch, HDFS-14292.4.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.3.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) New patch; rebased. > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, > HDFS-14292.3.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool to {{DataXceiverServer}} class that, on demand, will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 60 seconds > (configurable), it will be retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Description: I wanted to investigate {{dfs.datanode.max.transfer.threads}} from {{hdfs-site.xml}}. It is described as "Specifies the maximum number of threads to use for transferring data in and out of the DN." The default value is 4096. I found it interesting because 4096 threads sounds like a lot to me. I'm not sure how a system with 8-16 cores would react to this large a thread count. Intuitively, I would say that the overhead of context switching would be immense. During mt investigation, I discovered the [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] setup in the {{DataXceiverServer}} class: # A peer connects to a DataNode # A new thread is spun up to service this connection # The thread runs to completion # The tread dies It would perhaps be better if we used a thread pool to better manage the lifecycle of the service threads and to allow the DataNode to re-use existing threads, saving on the need to create and spin-up threads on demand. In this JIRA, I have added a couple of things: # Added a thread pool to {{DataXceiverServer}} class that, on demand, will create up to {{dfs.datanode.max.transfer.threads}}. A thread that has completed its prior duties will stay idle for up to 60 seconds (configurable), it will be retired if no new work has arrived. # Added new methods to the {{Peer}} Interface to allow for better logging and less code within each Thread ({{DataXceiver}}). # Updated the Thread code ({{DataXceiver}}) regarding its interactions with {{blockReceiver}} instance variable was: I wanted to investigate {{dfs.datanode.max.transfer.threads}} from {{hdfs-site.xml}}. It is described as "Specifies the maximum number of threads to use for transferring data in and out of the DN." The default value is 4096. I found it interesting because 4096 threads sounds like a lot to me. I'm not sure how a system with 8-16 cores would react to this large a thread count. Intuitively, I would say that the overhead of context switching would be immense. During mt investigation, I discovered the [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] setup in the {{DataXceiverServer}} class: # A peer connects to a DataNode # A new thread is spun up to service this connection # The thread runs to completion # The tread dies It would perhaps be better if we used a thread pool to better manage the lifecycle of the service threads and to allow the DataNode to re-use existing threads, saving on the need to create and spin-up threads on demand. In this JIRA, I have added a couple of things: # Added a thread pool that will always maintain a single thread running, always awaiting a new connection should one arrive. On-demand, it will create up to {{dfs.datanode.max.transfer.threads}}. A thread that has completed its prior duties will stay idle for up to 30 seconds, it will be retired if no new work has arrived. # Added new methods to the {{Peer}} Interface to allow for better logging and less code within each Thread ({{DataXceiver}}). # Updated the Thread code ({{DataXceiver}}) regarding its interactions with {{blockReceiver}} instance variable > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Open (was: Patch Available) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Status: Patch Available (was: Open) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.2.patch > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: (was: HDFS-14292.2.patc) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Attachment: HDFS-14292.2.patc > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-14292: --- Summary: Introduce Java ExecutorService to DataXceiverServer (was: Introduce Java ExecutorService to DataNode) > Introduce Java ExecutorService to DataXceiverServer > --- > > Key: HDFS-14292 > URL: https://issues.apache.org/jira/browse/HDFS-14292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: HDFS-14292.1.patch > > > I wanted to investigate {{dfs.datanode.max.transfer.threads}} from > {{hdfs-site.xml}}. It is described as "Specifies the maximum number of > threads to use for transferring data in and out of the DN." The default > value is 4096. I found it interesting because 4096 threads sounds like a lot > to me. I'm not sure how a system with 8-16 cores would react to this large a > thread count. Intuitively, I would say that the overhead of context > switching would be immense. > During mt investigation, I discovered the > [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216] > setup in the {{DataXceiverServer}} class: > # A peer connects to a DataNode > # A new thread is spun up to service this connection > # The thread runs to completion > # The tread dies > It would perhaps be better if we used a thread pool to better manage the > lifecycle of the service threads and to allow the DataNode to re-use existing > threads, saving on the need to create and spin-up threads on demand. > In this JIRA, I have added a couple of things: > # Added a thread pool that will always maintain a single thread running, > always awaiting a new connection should one arrive. On-demand, it will > create up to {{dfs.datanode.max.transfer.threads}}. A thread that has > completed its prior duties will stay idle for up to 30 seconds, it will be > retired if no new work has arrived. > # Added new methods to the {{Peer}} Interface to allow for better logging and > less code within each Thread ({{DataXceiver}}). > # Updated the Thread code ({{DataXceiver}}) regarding its interactions with > {{blockReceiver}} instance variable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org