[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files
[ https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6912: --- Resolution: Fixed Fix Version/s: 2.6.0 Target Version/s: 2.6.0 Status: Resolved (was: Patch Available) SharedFileDescriptorFactory should not allocate sparse files Key: HDFS-6912 URL: https://issues.apache.org/jira/browse/HDFS-6912 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.5.0 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.6.0 Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch, HDFS-6912.003.patch SharedFileDescriptor factory should not allocate sparse files. Sparse files can lead to a SIGBUS later in the short-circuit reader when we try to read from the sparse file and memory is not available. Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse file, since the JVM uses MAP_NORESERVE in mmap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files
[ https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6912: --- Summary: SharedFileDescriptorFactory should not allocate sparse files (was: HDFS Short-circuit read implementation throws SIGBUS from misc.Unsafe usage) SharedFileDescriptorFactory should not allocate sparse files Key: HDFS-6912 URL: https://issues.apache.org/jira/browse/HDFS-6912 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.5.0 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm Reporter: Gopal V Assignee: Colin Patrick McCabe Attachments: HDFS-6912.001.patch The short-circuit reader throws SIGBUS errors from Unsafe code and crashes the JVM when tmpfs on a disk is depleted. {code} --- T H R E A D --- Current thread (0x7eff387df800): JavaThread xxx daemon [_thread_in_vm, id=5880, stack(0x7eff28b93000,0x7eff28c94000)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), si_addr=0x7eff3e51d000 {code} The entire backtrace of the JVM crash is {code} Stack: [0x7eff28b93000,0x7eff28c94000], sp=0x7eff28c90a10, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x88232c] Unsafe_GetLongVolatile+0x6c j sun.misc.Unsafe.getLongVolatile(Ljava/lang/Object;J)J+0 j org.apache.hadoop.hdfs.ShortCircuitShm$Slot.setFlag(J)V+8 j org.apache.hadoop.hdfs.ShortCircuitShm$Slot.makeValid()V+4 j org.apache.hadoop.hdfs.ShortCircuitShm.allocAndRegisterSlot(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+70 j org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlotFromExistingShm(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+38 j org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlot(Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Ljava/lang/String;Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+100 j org.apache.hadoop.hdfs.client.DfsClientShmManager.allocSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+102 j org.apache.hadoop.hdfs.client.ShortCircuitCache.allocShmSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+18 j org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo()Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+151 j org.apache.hadoop.hdfs.client.ShortCircuitCache.create(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;Lorg/apache/hadoop/util/Waitable;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+46 j org.apache.hadoop.hdfs.client.ShortCircuitCache.fetchOrCreate(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+230 j org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal()Lorg/apache/hadoop/hdfs/BlockReader;+175 j org.apache.hadoop.hdfs.BlockReaderFactory.build()Lorg/apache/hadoop/hdfs/BlockReader;+87 j org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(J)Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;+291 j org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(Lorg/apache/hadoop/hdfs/DFSInputStream$ReaderStrategy;II)I+83 j org.apache.hadoop.hdfs.DFSInputStream.read([BII)I+15 {code} This can be easily reproduced by starting the DataNode, filling up tmpfs (dd if=/dev/zero bs=1M of=/dev/shm/dummy.zero) and running a simple task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files
[ https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6912: --- Description: SharedFileDescriptor factory should not allocate sparse files. Sparse files can lead to a SIGBUS later in the short-circuit reader when we try to read from the sparse file and memory is not available. Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse file, since the JVM uses MAP_NORESERVE in mmap. was: The short-circuit reader throws SIGBUS errors from Unsafe code and crashes the JVM when tmpfs on a disk is depleted. {code} --- T H R E A D --- Current thread (0x7eff387df800): JavaThread xxx daemon [_thread_in_vm, id=5880, stack(0x7eff28b93000,0x7eff28c94000)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), si_addr=0x7eff3e51d000 {code} The entire backtrace of the JVM crash is {code} Stack: [0x7eff28b93000,0x7eff28c94000], sp=0x7eff28c90a10, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x88232c] Unsafe_GetLongVolatile+0x6c j sun.misc.Unsafe.getLongVolatile(Ljava/lang/Object;J)J+0 j org.apache.hadoop.hdfs.ShortCircuitShm$Slot.setFlag(J)V+8 j org.apache.hadoop.hdfs.ShortCircuitShm$Slot.makeValid()V+4 j org.apache.hadoop.hdfs.ShortCircuitShm.allocAndRegisterSlot(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+70 j org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlotFromExistingShm(Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+38 j org.apache.hadoop.hdfs.client.DfsClientShmManager$EndpointShmManager.allocSlot(Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Ljava/lang/String;Lorg/apache/hadoop/hdfs/ExtendedBlockId;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+100 j org.apache.hadoop.hdfs.client.DfsClientShmManager.allocSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+102 j org.apache.hadoop.hdfs.client.ShortCircuitCache.allocShmSlot(Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;Lorg/apache/hadoop/hdfs/net/DomainPeer;Lorg/apache/commons/lang/mutable/MutableBoolean;Lorg/apache/hadoop/hdfs/ExtendedBlockId;Ljava/lang/String;)Lorg/apache/hadoop/hdfs/ShortCircuitShm$Slot;+18 j org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo()Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+151 j org.apache.hadoop.hdfs.client.ShortCircuitCache.create(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;Lorg/apache/hadoop/util/Waitable;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+46 j org.apache.hadoop.hdfs.client.ShortCircuitCache.fetchOrCreate(Lorg/apache/hadoop/hdfs/ExtendedBlockId;Lorg/apache/hadoop/hdfs/client/ShortCircuitCache$ShortCircuitReplicaCreator;)Lorg/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo;+230 j org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal()Lorg/apache/hadoop/hdfs/BlockReader;+175 j org.apache.hadoop.hdfs.BlockReaderFactory.build()Lorg/apache/hadoop/hdfs/BlockReader;+87 j org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(J)Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;+291 j org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(Lorg/apache/hadoop/hdfs/DFSInputStream$ReaderStrategy;II)I+83 j org.apache.hadoop.hdfs.DFSInputStream.read([BII)I+15 {code} This can be easily reproduced by starting the DataNode, filling up tmpfs (dd if=/dev/zero bs=1M of=/dev/shm/dummy.zero) and running a simple task. Priority: Minor (was: Major) SharedFileDescriptorFactory should not allocate sparse files Key: HDFS-6912 URL: https://issues.apache.org/jira/browse/HDFS-6912 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.5.0 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6912.001.patch SharedFileDescriptor factory should not allocate sparse files. Sparse files can lead to a SIGBUS later in the short-circuit reader when we try to read from the sparse file and memory is not available. Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse file, since the JVM uses MAP_NORESERVE in mmap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files
[ https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6912: --- Attachment: HDFS-6912.002.patch SharedFileDescriptorFactory should not allocate sparse files Key: HDFS-6912 URL: https://issues.apache.org/jira/browse/HDFS-6912 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.5.0 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch SharedFileDescriptor factory should not allocate sparse files. Sparse files can lead to a SIGBUS later in the short-circuit reader when we try to read from the sparse file and memory is not available. Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse file, since the JVM uses MAP_NORESERVE in mmap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6912) SharedFileDescriptorFactory should not allocate sparse files
[ https://issues.apache.org/jira/browse/HDFS-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6912: --- Attachment: HDFS-6912.003.patch The unit test was relying on the file position being 0. I don't think anything else relies on this (we use mmap to access this) but in v3 of the patch, I made it restore the file position to 0 just for simplicity. SharedFileDescriptorFactory should not allocate sparse files Key: HDFS-6912 URL: https://issues.apache.org/jira/browse/HDFS-6912 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.5.0 Environment: HDFS Data node, with 8 gb tmpfs in /dev/shm Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6912.001.patch, HDFS-6912.002.patch, HDFS-6912.003.patch SharedFileDescriptor factory should not allocate sparse files. Sparse files can lead to a SIGBUS later in the short-circuit reader when we try to read from the sparse file and memory is not available. Note that if swap is enabled, we can still get a SIGBUS even with a non-sparse file, since the JVM uses MAP_NORESERVE in mmap. -- This message was sent by Atlassian JIRA (v6.2#6252)