[jira] [Updated] (HDFS-14308) DFSStripedInputStream curStripeBuf is not freed by unbuffer()

2019-09-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14308:
---
Component/s: ec

> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a 
> file handle is being cached and potentially unused for significant chunks of 
> time, yet this shows that the memory remains in use.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should avoid holding buffers after the unbuffer() call. See HDFS-7694. 
> "unbuffer()" is intended to move an input stream to a lower memory state to 
> support these caching use cases. In particular, the curStripeBuf seems to be 
> allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is 
> not freed until close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14308) DFSStripedInputStream curStripeBuf is not freed by unbuffer()

2019-02-26 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated HDFS-14308:
-
Description: 
Some users of HDFS cache opened HDFS file handles to avoid repeated roundtrips 
to the NameNode. For example, Impala caches up to 20,000 HDFS file handles by 
default. Recent tests on erasure coded files show that the open file handles 
can consume a large amount of memory when not in use.

For example, here is output from Impala's JMX endpoint when 608 file handles 
are cached
{noformat}
{
"name": "java.nio:type=BufferPool,name=direct",
"modelerType": "sun.management.ManagementFactoryHelper$1",
"Name": "direct",
"TotalCapacity": 1921048960,
"MemoryUsed": 1921048961,
"Count": 633,
"ObjectName": "java.nio:type=BufferPool,name=direct"
},{noformat}
This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
Attached is output from Eclipse MAT showing that the direct buffers come from 
DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a 
file handle is being cached and potentially unused for significant chunks of 
time, yet this shows that the memory remains in use.

To support caching file handles on erasure coded files, DFSStripedInputStream 
should avoid holding buffers after the unbuffer() call. See HDFS-7694. 
"unbuffer()" is intended to move an input stream to a lower memory state to 
support these caching use cases. In particular, the curStripeBuf seems to be 
allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is not 
freed until close().

  was:
Some users of HDFS cache opened HDFS file handles to avoid repeated roundtrips 
to the NameNode. For example, Impala caches up to 20,000 HDFS file handles by 
default. Recent tests on erasure coded files show that the open file handles 
can consume a large amount of memory when not in use.

For example, here is output from Impala's JMX endpoint when 608 file handles 
are cached
{noformat}
{
"name": "java.nio:type=BufferPool,name=direct",
"modelerType": "sun.management.ManagementFactoryHelper$1",
"Name": "direct",
"TotalCapacity": 1921048960,
"MemoryUsed": 1921048961,
"Count": 633,
"ObjectName": "java.nio:type=BufferPool,name=direct"
},{noformat}
This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
Attached is output from Eclipse MAT showing that the direct buffers come from 
DFSStripedInputStream objects.

To support caching file handles on erasure coded files, DFSStripedInputStream 
should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
to move an input stream to a lower memory state to support these caching use 
cases. Both Impala and HBase call unbuffer() when a file handle is being cached 
and potentially unused for significant chunks of time.


> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a 
> file handle is being cached and potentially unused for significant chunks of 
> time, yet this shows that the memory remains in use.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should avoid holding buffers after the unbuffer() call. See HDFS-7694. 
> "unbuffer()" is intended to move an input stream to a lower memory state to 
> support these caching use cases. In particular, the curStripeBuf seems to be 
> allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is 
> not freed until close().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (HDFS-14308) DFSStripedInputStream curStripeBuf is not freed by unbuffer()

2019-02-26 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated HDFS-14308:
-
Summary: DFSStripedInputStream curStripeBuf is not freed by unbuffer()  
(was: DFSStripedInputStream should implement unbuffer())

> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
> to move an input stream to a lower memory state to support these caching use 
> cases. Both Impala and HBase call unbuffer() when a file handle is being 
> cached and potentially unused for significant chunks of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org