[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files

2014-09-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6912:
---
  Resolution: Fixed
   Fix Version/s: 2.6.0
Target Version/s: 2.6.0
  Status: Resolved  (was: Patch Available)

 SharedFileDescriptorFactory should not allocate sparse files
 

 Key: HDFS-6912
 URL: https://issues.apache.org/jira/browse/HDFS-6912
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.5.0
 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch, 
 HDFS-6912.003.patch


 SharedFileDescriptor factory should not allocate sparse files.  Sparse files 
 can lead to a SIGBUS later in the short-circuit reader when we try to read 
 from the sparse file and memory is not available.
 Note that if swap is enabled, we can still get a SIGBUS even with a 
 non-sparse file, since the JVM uses MAP_NORESERVE in mmap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files

2014-08-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6912:
---

Summary: SharedFileDescriptorFactory should not allocate sparse files  
(was: HDFS Short-circuit read implementation throws SIGBUS from misc.Unsafe 
usage)

 SharedFileDescriptorFactory should not allocate sparse files
 

 Key: HDFS-6912
 URL: https://issues.apache.org/jira/browse/HDFS-6912
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.5.0
 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm
Reporter: Gopal V
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6912.001.patch


 The short-circuit reader throws SIGBUS errors from Unsafe code and crashes 
 the JVM when tmpfs on a disk is depleted.
 {code}
 ---  T H R E A D  ---
 Current thread (0x7eff387df800):  JavaThread xxx daemon [_thread_in_vm, 
 id=5880, stack(0x7eff28b93000,0x7eff28c94000)]
 siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
 si_addr=0x7eff3e51d000
 {code}
 The entire backtrace of the JVM crash is
 {code}
 Stack: [0x7eff28b93000,0x7eff28c94000],  sp=0x7eff28c90a10,  free 
 space=1014k
 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
 code)
 V  [libjvm.so+0x88232c]  Unsafe_GetLongVolatile+0x6c
 j  sun.misc.Unsafe.getLongVolatile(Ljava/lang/Object;J)J+0
 j  org.apache.hadoop.hdfs.ShortCircuitShm$Slot.setFlag(J)V+8
 j  org.apache.hadoop.hdfs.ShortCircuitShm$Slot.makeValid()V+4
 j  
 org.apache.hadoop.hdfs.ShortCircuitShm.allocAndRegisterSlot(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+70
 j  
 org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlotFromExistingShm(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+38
 j  
 org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlot(Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Ljava/lang/String;Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+100
 j  
 org.apache.hadoop.hdfs.client.DfsClientShmManager.allocSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+102
 j  
 org.apache.hadoop.hdfs.client.ShortCircuitCache.allocShmSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+18
 j  
 org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo()Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+151
 j  
 org.apache.hadoop.hdfs.client.ShortCircuitCache.create(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;Lorg/apache/hadoop/util/Waitable;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+46
 j  
 org.apache.hadoop.hdfs.client.ShortCircuitCache.fetchOrCreate(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+230
 j  
 org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal()Lorg/apache/hadoop/hdfs/BlockReader;+175
 j  
 org.apache.hadoop.hdfs.BlockReaderFactory.build()Lorg/apache/hadoop/hdfs/BlockReader;+87
 j  
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(J)Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;+291
 j  
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(Lorg/apache/hadoop/hdfs/DFSInputStream$ReaderStrategy;II)I+83
 j  org.apache.hadoop.hdfs.DFSInputStream.read([BII)I+15
 {code}
 This can be easily reproduced by starting the DataNode, filling up tmpfs (dd 
 if=/dev/zero bs=1M of=/dev/shm/dummy.zero) and running a simple task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files

2014-08-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6912:
---

Description: 
SharedFileDescriptor factory should not allocate sparse files.  Sparse files 
can lead to a SIGBUS later in the short-circuit reader when we try to read from 
the sparse file and memory is not available.

Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse 
file, since the JVM uses MAP_NORESERVE in mmap.

  was:
The short-circuit reader throws SIGBUS errors from Unsafe code and crashes the 
JVM when tmpfs on a disk is depleted.

{code}
---  T H R E A D  ---

Current thread (0x7eff387df800):  JavaThread xxx daemon [_thread_in_vm, 
id=5880, stack(0x7eff28b93000,0x7eff28c94000)]

siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
si_addr=0x7eff3e51d000
{code}

The entire backtrace of the JVM crash is

{code}
Stack: [0x7eff28b93000,0x7eff28c94000],  sp=0x7eff28c90a10,  free 
space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x88232c]  Unsafe_GetLongVolatile+0x6c
j  sun.misc.Unsafe.getLongVolatile(Ljava/lang/Object;J)J+0
j  org.apache.hadoop.hdfs.ShortCircuitShm$Slot.setFlag(J)V+8
j  org.apache.hadoop.hdfs.ShortCircuitShm$Slot.makeValid()V+4
j  
org.apache.hadoop.hdfs.ShortCircuitShm.allocAndRegisterSlot(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+70
j  
org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlotFromExistingShm(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+38
j  
org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlot(Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Ljava/lang/String;Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+100
j  
org.apache.hadoop.hdfs.client.DfsClientShmManager.allocSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+102
j  
org.apache.hadoop.hdfs.client.ShortCircuitCache.allocShmSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+18
j  
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo()Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+151
j  
org.apache.hadoop.hdfs.client.ShortCircuitCache.create(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;Lorg/apache/hadoop/util/Waitable;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+46
j  
org.apache.hadoop.hdfs.client.ShortCircuitCache.fetchOrCreate(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+230
j  
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal()Lorg/apache/hadoop/hdfs/BlockReader;+175
j  
org.apache.hadoop.hdfs.BlockReaderFactory.build()Lorg/apache/hadoop/hdfs/BlockReader;+87
j  
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(J)Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;+291
j  
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(Lorg/apache/hadoop/hdfs/DFSInputStream$ReaderStrategy;II)I+83
j  org.apache.hadoop.hdfs.DFSInputStream.read([BII)I+15
{code}

This can be easily reproduced by starting the DataNode, filling up tmpfs (dd 
if=/dev/zero bs=1M of=/dev/shm/dummy.zero) and running a simple task.

   Priority: Minor  (was: Major)

 SharedFileDescriptorFactory should not allocate sparse files
 

 Key: HDFS-6912
 URL: https://issues.apache.org/jira/browse/HDFS-6912
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.5.0
 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-6912.001.patch


 SharedFileDescriptor factory should not allocate sparse files.  Sparse files 
 can lead to a SIGBUS later in the short-circuit reader when we try to read 
 from the sparse file and memory is not available.
 Note that if swap is enabled, we can still get a SIGBUS even with a 
 non-sparse file, since the JVM uses MAP_NORESERVE in mmap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files

2014-08-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6912:
---

Attachment: HDFS-6912.002.patch

 SharedFileDescriptorFactory should not allocate sparse files
 

 Key: HDFS-6912
 URL: https://issues.apache.org/jira/browse/HDFS-6912
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.5.0
 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch


 SharedFileDescriptor factory should not allocate sparse files.  Sparse files 
 can lead to a SIGBUS later in the short-circuit reader when we try to read 
 from the sparse file and memory is not available.
 Note that if swap is enabled, we can still get a SIGBUS even with a 
 non-sparse file, since the JVM uses MAP_NORESERVE in mmap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files

2014-08-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6912:
---

Attachment: HDFS-6912.003.patch

The unit test was relying on the file position being 0.  I don't think anything 
else relies on this (we use mmap to access this) but in v3 of the patch, I made 
it restore the file position to 0 just for simplicity.

 SharedFileDescriptorFactory should not allocate sparse files
 

 Key: HDFS-6912
 URL: https://issues.apache.org/jira/browse/HDFS-6912
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.5.0
 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch, 
 HDFS-6912.003.patch


 SharedFileDescriptor factory should not allocate sparse files.  Sparse files 
 can lead to a SIGBUS later in the short-circuit reader when we try to read 
 from the sparse file and memory is not available.
 Note that if swap is enabled, we can still get a SIGBUS even with a 
 non-sparse file, since the JVM uses MAP_NORESERVE in mmap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)