[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system
[ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958353#comment-15958353 ] Thomas Demoor commented on HDFS-9806: - We will post an updated design doc next week. Quick status update: * General infrastructure, protocol changes and read path are almost done * Write path and dynamic mounting are ongoing > Allow HDFS block replicas to be provided by an external storage system > -- > > Key: HDFS-9806 > URL: https://issues.apache.org/jira/browse/HDFS-9806 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Chris Douglas > Attachments: HDFS-9806-design.001.pdf > > > In addition to heterogeneous media, many applications work with heterogeneous > storage systems. The guarantees and semantics provided by these systems are > often similar, but not identical to those of > [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html]. > Any client accessing multiple storage systems is responsible for reasoning > about each system independently, and must propagate/and renew credentials for > each store. > Remote stores could be mounted under HDFS. Block locations could be mapped to > immutable file regions, opaque IDs, or other tokens that represent a > consistent view of the data. While correctness for arbitrary operations > requires careful coordination between stores, in practice we can provide > workable semantics with weaker guarantees. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11026) Convert BlockTokenIdentifier to use Protobuf
[ https://issues.apache.org/jira/browse/HDFS-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632918#comment-15632918 ] Thomas Demoor commented on HDFS-11026: -- Ewan's stacktraces match with [~andrew.wang]'s remarks. [~daryn], once HDFS-11096 gets resolved we expect the current patch to work across 2.x and 3.0. Thanks for looking at our patch. > Convert BlockTokenIdentifier to use Protobuf > > > Key: HDFS-11026 > URL: https://issues.apache.org/jira/browse/HDFS-11026 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs, hdfs-client >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Ewan Higgs > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11026.002.patch, blocktokenidentifier-protobuf.patch > > > {{BlockTokenIdentifier}} currently uses a {{DataInput}}/{{DataOutput}} > (basically a {{byte[]}}) and manual serialization to get data into and out of > the encrypted buffer (in {{BlockKeyProto}}). Other TokenIdentifiers (e.g. > {{ContainerTokenIdentifier}}, {{AMRMTokenIdentifier}}) use Protobuf. The > {{BlockTokenIdenfitier}} should use Protobuf as well so it can be expanded > more easily and will be consistent with the rest of the system. > NB: Release of this will require a version update since 2.8.x won't be able > to decipher {{BlockKeyProto.keyBytes}} from 2.8.y. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7343) A comprehensive and flexible storage policy engine
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393680#comment-15393680 ] Thomas Demoor commented on HDFS-7343: - Seems to me this (partly) overlaps with HDFS-10285, which already has a design doc. [~drankye], as you've been active in both tickets, do you think these should be linked up? And what part of it is exclusive to the current ticket? > A comprehensive and flexible storage policy engine > -- > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Kai Zheng >Assignee: Kai Zheng > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system
[ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292968#comment-15292968 ] Thomas Demoor commented on HDFS-9806: - Thanks [~chris.douglas] for the architecture doc. Very interesting feature. First, the way we interpreted the document, the external (provided) storage is the source of truth so any changes there should be updated in HDFS and any inconsistencies that arise would favour the external store. With that in mind, we had some questions mostly relating to the following two paragraphs in section 3.4: {quote} Periodically, and/or when a particular directory or file is accessed on the Namenode, the Namenode queries the PROVIDED store to validate its cache. If the ID changed since its last update, the Namenode updates the corresponding metadata and block information. The Datanode is also responsible for verifying the nonce when servicing read requests. Without this check, it may return data that does not match the record in the Namenode (e.g., if another file is renamed onto the same path in the external store). {quote} Questions: # If the Namenode is accessing the PROVIDED storage to update its mapping shouldn’t it also update the nonce data at the same time and instruct the datanode to refresh too? Or is the intention for the Namenode to only update the directory information and not the actual nonce data for the files? (If so, how could the Namenode apply heuristics to detect “promoting output to a parent directory”?). # How should this work in the face of Storage Policies? For example, if we have a StoragePolicy of {SSD, DISK, PROVIDED} it seems to us that it would make sense for the Namenode to use a HEAD request (or equivalent) to see if the data is still valid. If so, tell the client to talk to the Datanode with the file on SSD. Otherwise, the data needs to be refreshed across all three Datanodes. As the Namenode currently manages replication requests, it seems that it would make sense for it to trigger requests to refresh the data from the PROVIDED storage system. # When you say “Periodically and/or when a particular directory or file is accessed on the Namenode” do you mean this is something to be configured, or just that it hasn’t been decided if both are required. We think periodically is required since this is the only way to clean up directory listings with files that have been removed from the PROVIDED storage. On access, it makes sense to always make a HEAD request (or equivalent) to make sure it isn’t stale. # Finally, do you anticipate changes to the wire protocol between the Namenode and Datanode? > Allow HDFS block replicas to be provided by an external storage system > -- > > Key: HDFS-9806 > URL: https://issues.apache.org/jira/browse/HDFS-9806 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Chris Douglas > Attachments: HDFS-9806-design.001.pdf > > > In addition to heterogeneous media, many applications work with heterogeneous > storage systems. The guarantees and semantics provided by these systems are > often similar, but not identical to those of > [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html]. > Any client accessing multiple storage systems is responsible for reasoning > about each system independently, and must propagate/and renew credentials for > each store. > Remote stores could be mounted under HDFS. Block locations could be mapped to > immutable file regions, opaque IDs, or other tokens that represent a > consistent view of the data. While correctness for arbitrary operations > requires careful coordination between stores, in practice we can provide > workable semantics with weaker guarantees. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614705#comment-14614705 ] Thomas Demoor commented on HDFS-7240: - [~john.jian.fang] and [~jnp]: * Avoiding rename happens in [HADOOP-9565] by introducing ObjectStore (extends Filesystem) and letting FileOutputCommitter, Hadoop CLI, ... act on this (by avoiding rename). Ozone could easily extend ObjectStore and benefit from this. * [HADOOP-11262] extends DelegateToFileSystem to implement s3a as an AbstractFileSystem and works around issues as modification times for directories (cfr. Azure). > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573635#comment-14573635 ] Thomas Demoor commented on HDFS-7240: - Very interesting call yesterday. Might be interesting to have a group discussion at Hadoop Summit next week? > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567258#comment-14567258 ] Thomas Demoor commented on HDFS-7240: - Maybe some of the (ongoing) work for currently supported object stores can be reused here (f.i. HADOOP-9565)? Will probably call-in. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)