[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-03-13 Thread David Mollitor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14292:
--
Description: 
I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
{{hdfs-site.xml}}.  It is described as "Specifies the maximum number of threads 
to use for transferring data in and out of the DN."   The default value is 
4096.  I found it interesting because 4096 threads sounds like a lot to me.  
I'm not sure how a system with 8-16 cores would react to this large a thread 
count.  Intuitively, I would say that the overhead of context switching would 
be immense.

During my investigation, I discovered the 
[following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
 setup in the {{DataXceiverServer}} class:

# A peer connects to a DataNode
# A new thread is spun up to service this connection
# The thread runs to completion
# The tread dies

It would perhaps be better if we used a thread pool to better manage the 
lifecycle of the service threads and to allow the DataNode to re-use existing 
threads, saving on the need to create and spin-up threads on demand.

In this JIRA, I have added a couple of things:

# Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
completed its prior duties will stay idle for up to 60 seconds (configurable), 
it will be retired if no new work has arrived.
# Added new methods to the {{Peer}} Interface to allow for better logging and 
less code within each Thread ({{DataXceiver}}).
# Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
{{blockReceiver}} instance variable

  was:
I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
{{hdfs-site.xml}}.  It is described as "Specifies the maximum number of threads 
to use for transferring data in and out of the DN."   The default value is 
4096.  I found it interesting because 4096 threads sounds like a lot to me.  
I'm not sure how a system with 8-16 cores would react to this large a thread 
count.  Intuitively, I would say that the overhead of context switching would 
be immense.

During mt investigation, I discovered the 
[following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
 setup in the {{DataXceiverServer}} class:

# A peer connects to a DataNode
# A new thread is spun up to service this connection
# The thread runs to completion
# The tread dies

It would perhaps be better if we used a thread pool to better manage the 
lifecycle of the service threads and to allow the DataNode to re-use existing 
threads, saving on the need to create and spin-up threads on demand.

In this JIRA, I have added a couple of things:

# Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
completed its prior duties will stay idle for up to 60 seconds (configurable), 
it will be retired if no new work has arrived.
# Added new methods to the {{Peer}} Interface to allow for better logging and 
less code within each Thread ({{DataXceiver}}).
# Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
{{blockReceiver}} instance variable


> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During my investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new 

[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: (was: HDFS-14292.8.patch)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.8.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: (was: HDFS-14295.8.patch)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.8.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: (was: HDFS-14292.6.patch)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14295.8.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

OK.  I believe these last two unit tests are flaky and not related to the 
changes in this patch.  I have just submitted the patch once more to see what 
we get.  Also, the checkstyle errors are things that are a bit beyond my 
control,... i.e., the {{Unused Import}} is already in the code, not caused by 
this patch, and some of the line-length issues are due to the existing formats 
and to change my code would make it not match the current stuff.

Please consider this patch for inclusion into the project.

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, 
> HDFS-14292.8.patch, HDFS-14295.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

With the last patch I provided, the 
{{hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes}} unit test failed.  
It's a bit weird because the test itself failed with a NPE.  I took a look at 
the test and it looks a bit flaky so I tightened it up a bit in this latest 
patch.

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.8.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.7.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch, HDFS-14292.7.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.6.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

New patch... Checkstyle said I should make a class final, but that causes 
Mockito (and therefore some unit tests) to fail.

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-21 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.6.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, HDFS-14292.6.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.5.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

I don't think I got all of the UT passing, but could use a new baseline of all 
the test having run.

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.4.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-20 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.3.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

New patch; rebased.

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Description: 
I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
{{hdfs-site.xml}}.  It is described as "Specifies the maximum number of threads 
to use for transferring data in and out of the DN."   The default value is 
4096.  I found it interesting because 4096 threads sounds like a lot to me.  
I'm not sure how a system with 8-16 cores would react to this large a thread 
count.  Intuitively, I would say that the overhead of context switching would 
be immense.

During mt investigation, I discovered the 
[following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
 setup in the {{DataXceiverServer}} class:

# A peer connects to a DataNode
# A new thread is spun up to service this connection
# The thread runs to completion
# The tread dies

It would perhaps be better if we used a thread pool to better manage the 
lifecycle of the service threads and to allow the DataNode to re-use existing 
threads, saving on the need to create and spin-up threads on demand.

In this JIRA, I have added a couple of things:

# Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
completed its prior duties will stay idle for up to 60 seconds (configurable), 
it will be retired if no new work has arrived.
# Added new methods to the {{Peer}} Interface to allow for better logging and 
less code within each Thread ({{DataXceiver}}).
# Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
{{blockReceiver}} instance variable

  was:
I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
{{hdfs-site.xml}}.  It is described as "Specifies the maximum number of threads 
to use for transferring data in and out of the DN."   The default value is 
4096.  I found it interesting because 4096 threads sounds like a lot to me.  
I'm not sure how a system with 8-16 cores would react to this large a thread 
count.  Intuitively, I would say that the overhead of context switching would 
be immense.

During mt investigation, I discovered the 
[following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
 setup in the {{DataXceiverServer}} class:

# A peer connects to a DataNode
# A new thread is spun up to service this connection
# The thread runs to completion
# The tread dies

It would perhaps be better if we used a thread pool to better manage the 
lifecycle of the service threads and to allow the DataNode to re-use existing 
threads, saving on the need to create and spin-up threads on demand.

In this JIRA, I have added a couple of things:

# Added a thread pool that will always maintain a single thread running, always 
awaiting a new connection should one arrive.  On-demand, it will create up to 
{{dfs.datanode.max.transfer.threads}}.  A thread that has completed its prior 
duties will stay idle for up to 30 seconds, it will be retired if no new work 
has arrived.
# Added new methods to the {{Peer}} Interface to allow for better logging and 
less code within each Thread ({{DataXceiver}}).
# Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
{{blockReceiver}} instance variable


> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It 

[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Open  (was: Patch Available)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Status: Patch Available  (was: Open)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.2.patch

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: (was: HDFS-14292.2.patc)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Attachment: HDFS-14292.2.patc

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-02-18 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-14292:
---
Summary: Introduce Java ExecutorService to DataXceiverServer  (was: 
Introduce Java ExecutorService to DataNode)

> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HDFS-14292.1.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During mt investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, 
> always awaiting a new connection should one arrive.  On-demand, it will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 30 seconds, it will be 
> retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org