subject:"\[jira\] \[Commented\] \(HDFS\-7240\) Object store in HDFS"

[jira] [Commented] (HDFS-7240) Object store in HDFS

2018-01-26 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341956#comment-16341956
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


I have created HDFS-13074 to track ozone KV namespace work separately, as 
suggested by [~owen.omalley].

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2018-01-26 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341735#comment-16341735
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


[~owen.omalley], that's a great suggestion. I agree it is better to track the 
key value namespace work in a separate jira, while this jira focusses on the 
new storage layer.

Hadoop Storage Layer is a good suggestion, I would suggest a slight 
modification to call it Hadoop Distributed Storage Layer (HDSL).

It is important to keep the storage layer separate and independent of the 
namespace implementation, as multiple different namespaces can be built on top 
of HDSL. For example, HDFS-10419 implements a hierarchical namespace/NN on this 
storage layer.

I will create a separate jira for ozone's KV namespace.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2018-01-26 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341429#comment-16341429
 ] 

Owen O'Malley commented on HDFS-7240:
-

I think that the major contribution of this work is pulling out the block 
management layer and the naming should reflect that.

I'd propose that:
 * Ozone should be the object store
 * The block layer should have a different name such as Hadoop Storage Layer 
(HSL).

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2018-01-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334748#comment-16334748
 ] 

Anu Engineer commented on HDFS-7240:


{quote}I think we should at least have a design doc for security before merging 
in order to avoid API changes.
{quote}
[~shv] I have attached a [^HadoopStorageLayerSecurity.pdf] for the block layer 
and how it can be used with HDFS and Ozone. Please take a look when you get a 
chance. Thanks.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-12-19 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297454#comment-16297454
 ] 

Sanjay Radia commented on HDFS-7240:


One of issues raised is that connecting the NN to the new block-container layer 
will be very difficult because removing the FSN/BM lock is challenging.
I have attached a doc [Evolving NN using new block container 
layer|https://issues.apache.org/jira/secure/attachment/12902931/Evolving%20NN%20using%20new%20block-container%20layer.pdf]
 to  HDFS-10419 that describes 2 milestones for connecting the NN to the new 
block-container layer. The first one does *not* require removing the FSN/BM 
lock and still gives close to 2x scalability because the block map (which 
becomes the container map) is reduced significantly.

I would still like to also point out (as stated above) and in the doc that the 
new block-container layer keeps a consistent state using Raft and hence 
eliminates the coupling between the namespace layer and block layer and that 
the 2nd milestone of removing the FSN/BM lock is much easier with the new block 
layer. If you disagree with my lock argument, then the first milestone get good 
scalability without removing the lock.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-12-07 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282441#comment-16282441
 ] 

Andrew Wang commented on HDFS-7240:
---

Hi Sanjay,

Thanks for writing up that summary. It's clear there's still disagreement on 
the merge. How should we proceed on reaching consensus? On the last call you 
suggested making a document, or we could do another call.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263515#comment-16263515
 ] 

Anu Engineer commented on HDFS-7240:


h1. Ozone - Second community meeting
Time: Friday, November 17, 2017, at 4:00 pm PST


_Participants: Arpit Agarwal, Robert Boyd, Wei Chiu, Marton Elek, Anu Engineer, 
Aaron  Fabri, Manoj Govindassamy, Virajith Jalaparti, Aaron  Myers, Jitendra  
Pandey, Sanjay Radia, ChaoSun, Bharat Viswanadham, Andrew Wang, Lei (Eddy) 
Xu, Wei Yan, Xiaoyu Yao. \[Apologies to anyone that I might have missed who 
joined over phone\]_


We started the meeting with discussing the notions of ozone's block storage 
layer and followed by the deep dive into the code. 
We discussed the notions of the block layer, which is similar to HDFS block 
layer, Ozone's container layer and how replication works via pipelines. Then we 
did a code walk-thru of the ozone codebase, starting with KSM, SCM, Container 
layer and Rest handler.

We had some technical questions about containers.  Is the unit of replication 
the containers, and if we can truncate a block that is already part of 
containers, say block three inside a container. Both of these were answered in 
affirmative, that the unit of the replication is indeed a container and you can 
terminate block three inside the container without any issues.


Once we finished the technical discussion, we discussed some of the merge 
issues; essentially the question was whether we should postpone the merge of 
ozone into HDFS.

* Andrew Wang wanted to know how this would benefit the enterprise customers?.
** It was pointed out that customers can use the storage via a Hadoop 
Compatible filesystem (FileSystem or FileContext), and more important, apps 
such as  Hive and spark, etc which those APIs will work (we are testing Hive 
and Spark). In fact, all the data in ozone is expected to come via Hive, YARN, 
Spark, etc. Making ozone work seamlessly via such  Hadoop frameworks is very 
important because it enables real customer use.

* ATM objected to the Ozone’s merge, as wanted to see the new block layer 
integrated with the existing NN. He argued that creating the block layer is 
just the first phase, and separation of block layer inside Namenode needs to be 
done. He further argued that we should merge after Namenode block separation is 
completely done. 
** Sanjay refuted that a project of this size can only be implemented in 
phases. Fixing HDFS’s scalability in a fundamental way requires fixing both the 
Namespace layer and the block layer. We provide a simpler namespace (Key-Value) 
as an intermediate step to allow real customer usage via spark and hive, and 
also as a way of stabilizing the new block layer. This is a good consistency 
point for integration to start working on integrating with a hierarchical 
namespace of the NN.

* Aaron Fabbri was concerned that code is new and may not be stable and that 
the support load for HDFS is quite high. This would further destabilize HDFS.
** Sanjay’s Response It was pointed out that the feature is not on by default 
and that the code is in a separate module. Indeed new shareable parts like the 
new netty protocol-engine in the DN will replace the old thread-based protocol 
engine only with HDFS community’s blessing after it has been sufficiently been 
tested via the Ozone path.  Further,  Ozone can be kept disabled if so desired 
by a customer.

* ATM’s concern is that connecting the NN to the new block layer will require 
separating the NSM/BM lock  (a good thing to do) which is very hard to do.
** Sanjay’s response. This issue was raised and explained at yesterday’s 
meeting. A very strong coupling was added between block layer and namespace 
layer when we wrote the new block pipeline as part of the append work in 2010: 
the block length of each replica at finalizing time, esp under failures, has to 
be consistent. This is done in the central NN today (due to lack of raft/paxos 
like protocol in the original block layer). The new block-container layer uses 
the raft for consistency and no longer needs a central agent like the NN. Thus 
the new block-container layer’s built-in consistent state management eliminates 
this coupling and hence simplifies the separation of the lock.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-22 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263485#comment-16263485
 ] 

Aaron Fabbri commented on HDFS-7240:


Thanks for taking notes, and thank you for the lively discussion.  A lot of 
valid concerns on all sides here. A couple of minor corrections:
{quote}AaronF: Ozone is a lot of new code and Hadoop already has so much code.. 
{quote}
My concerns particularly are around things that affect stability and 
operability. Not concerned about lines of code, more about "how manageable" is 
the codebase in terms of getting a stable release, and supporting in production.

{quote}
shallow data copy is practical only if within same project and daemon otherwise 
have deal with security setting and coordinations across daemons.
{quote}
We can factor common code, if any, to a shared dependency. I don't see how 
which repository code lives in really affects fast copy between storage 
systems.  I can think of ways to do it both within a JVM process consisting of 
code from multiple git repositories, and via IPC (hand off ownership of a file 
to another process--not even talking about fancy stuff like shmem).

{quote}
The opponents will raise the same issue as today: show feature parity 
{quote}

I get your concern, but I didn't hear anyone say feature parity. I only heard 
"integrate with HDFS".  Even integrated with HDFS, there is still a high bar of 
"utility" to pass, IMO, to justify a very large patch which affects production 
code.

We all want stable, scalable HDFS. Nobody opposes that ideal. 

I'm not sure trying to evolve HDFS to scale is a better approach than being 
separate with maybe some shared, well-factored dependencies.  The latter, IMO, 
could result in better code and dramatically less risk to HDFS.

Appreciate all your hard work thus far and appreciate the challenges you guys 
face here. I hope you can understand my perspective as well.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-21 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261791#comment-16261791
]

Sanjay Radia commented on HDFS-7240:

Ozone Cloudera Meeting Date: Thursday, November 16th 2017
Location: online conferencing

Attendees: ATM, Andrew, Anu, Aaron Fabbri, Jitendra, Sanjay, Sean Mackrory,
other listeners on the phone

Main discussion centered around:
* Wouldn't Ozone be better off as a separate project?
* Why should it be merged now?

Discussion: (This incorporate Andrew’s minutes and adds to it.)

* Anu: Don't want to have this separate since it confuses people about the
long-term vision of Ozone. It's intended as block management for HDFS.
* Andrew: In its current state, Ozone cannot be plugged into the NN as the
BM layer, so it seems premature to merge. Can't benefit existing users, and
they can't test it.
* Response: The Ozone block layer is at a good integration point, and we
want to move on with the NameNode integration as new block layer. Benefits via
KV namespace/FileSystemAPI is there and completely usable for Hive and Spark
apps.
* Andrew: We can do the FSN/BM lock split without merging Ozone. Separate
efforts. This lock split is also a major effort by itself, and is a dangerous
change. It's something that should be baked in production.
* Sanjay: Agree that the lock split should be done in branch. But disagree on
how hard it will be. The split was hard in past but will be easier with new
block layer: one of the key reasons for the coupling of Block-layer to
Namespace layer is that the block length of the each replica at block close
time, esp under failures, has to be consistent. This is done in the central NN
today (due to lack of raft/paxos like protocol in the original block layer).
The block-container layer uses raft for consistency and no longer needs a
central agent like the NN. Then new block-layers built-in consistent state
management simplifies the separation.
* Sanjay: Ozone developers "willing to take the hit" of the slow Hadoop release
cadence. Want to make this part of HDFS since it's easier for users to test and
consume without installing a new cluster.
* ATM: Can still share the same hardware, and run the Ozone daemons
alongside.
* Sanjay countered this
* Sanjay: Want to keep Ozone block management inside the Datanode process to
enable various synergies such as sharing the new netty based protocol engy or
fast-copy between HDFS and Ozone. Not all data needs all the HDFS features like
encryption, erasure coding, etc, and this data could be stored in Ozone.
* Andrew: This fast-copy hasn't been implemented or discussed yet. Unclear
if it'll work at all with existing HDFS block management. Won't work with
encryption or erasure coding. Not clear whether it requires being in the same
DN process even.
* It does have to work with encryption and EC to give value. It can work
with non-encrypted and non EC which are majority of blocks in most Hadoop
clusters. We will provide a design of the shallow-copy.
Sanjay/Anu: Ozone is also useful to test with just the key-value interface.
It's a Hadoop-compatible FileSystem, so apps many apps such as Hive and Spark
can work also on Ozone since they have or ensured that they work well on KV
flat namespace.
* Andrew: If it provides a new API and doesn't support the HDFS feature-set,
doesn't this support it being its own project?
* Sanjay - It provides the EXISTING Hadoop FileSystem interface now. Note
customers are used to have different parts of the namespace(s) having different
features: Customers have asked for Zones with different features enabled [ see
summary - to avoid duplication].
* AaronF: Ozone is a lot of new code and Hadoop already has so much code.. It
is better to have separate projects and not add to Hadoop/HDFS.
Sanjay: Agree it is lot of code. Sometimes, we often have to add siginficant
new code for a project move forward. We have tried to incrementally work around
HDFS Scaling, the NN’s manageability and slow startup issues. This new code
base fundamentally moves us forward in addressing the long standing issues.
Besides this “lots of new code” argument can be used later to prevent the merge
of the projects.

Summary:
There is agreement that the new block-container layer is a good way to solve
the block scaling issue of HDFS. There is no consensus on merging the branch
in vs fork Ozone into a new project. The main objection to merging into HDFS is
that integrating the new block-container layer with the exiting NN will be a
very hard project since the lock split in the NN is very challenging.

Cloudera’s team perspective: (taken from Anderew’s minutes)
* Ozone could be its own project and integrated later, or remain on an HDFS
branch. There are benefits to Ozone being a separate project. Can release
faster, iterate more quickly on feedback, and

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-20 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260017#comment-16260017
 ] 

Anu Engineer commented on HDFS-7240:


h1. Ozone - First community meeting
{{Time: Thursday, November 16, 2017, at 1:00:00 am PST}}
_Participants:  Anu Engineer, Mukul Kumar Singh, Nandakumar Vadivelu, Weiwei 
Yang, Steve Loughran, Thomas Demoor, Shashikant Banerjee, Lokesh Jain_

We discussed quite a large number of technical issues at this meeting.

We went over how Ozone's works, the Namespace architecture of KSM and how it 
interacts with SCM. We traced both a write I/O path and read I/O path.

There was some discussion over the REST protocol and making sure that Rest 
protocol is good enough to support Hadoop based workloads. We look at various 
REST APIs of Ozone and also discussed O3 FS working over RPC instead of REST 
protocol. This is a work in progress.

Steve Loughran suggested that we add Storm to the applications that are tested 
against Ozone. Currently, we use Hive, Spark, YARN, as the applications to test 
against Ozone. We will add Storm to this testing mix.

We discussed performance and scale of testing; ozone has been tested with 
millions of keys. We have also tested with cluster sizes up to 300 nodes.

Steve suggested that we upgrade the Ratis version and lock that down before the 
merge.

Thomas Demoor pointed out the difference between the commit ordering of S3 and 
Ozone. Ozone uses the actual commit time to decide the key ordering, S3 uses 
the key creation time to decide the ordering of the keys. He also mentioned 
that this should not matter in the real world as he is not aware hard-coded 
dependency on commit ordering.



> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-17 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257730#comment-16257730
 ] 

Andrew Wang commented on HDFS-7240:
---

Some Hortonworkers and Clouderans met yesterday, here are my meeting notes. I 
wanted to get them up before the broader meeting today. I already sent these 
around to the attendees, but please comment if I got anything incorrect.

Attendees: ATM, Andrew, Anu, Aaron Fabbri, Jitendra, Sanjay, other listeners on 
the phone

High-level questions raised:

* Wouldn't Ozone be better off as a separate project?
* Why should it be merged now?

Things we agree on:

* We're all on Team Ozone, and applaud any effort to address scaling HDFS.
* There are benefits to Ozone being a separate project. Can release faster, 
iterate more quickly on feedback, and mature without having to worry about 
features like high-availability, security, encryption, etc. that not all 
customers need.
* No agreement on whether the benefits of separation outweigh the downsides.

Discussion:

* Anu: Don't want to have this separate since it confuses people about the 
long-term vision of Ozone. It's intended as block management for HDFS.
* Andrew: In its current state, Ozone cannot be plugged into the NN as the 
BM layer, so it seems premature to merge. Can't benefit existing users, and 
they can't test it.
* Response: The Ozone block layer is at a good integration point, and we 
want to move onto the NameNode changes like splitting the FSN/BM lock.
* Andrew: We can do the FSN/BM lock split without merging Ozone. Separate 
efforts. This lock split is also a major effort by itself, and is a dangerous 
change. It's something that should be baked in production.
* Sanjay: Ozone developers "willing to take the hit" of the slow Hadoop release 
cadence. Want to make this part of HDFS since it's easier for users to test and 
consume without installing a new cluster.
* ATM: Can still share the same hardware, and run the Ozone daemons 
alongside.
* Sanjay: Want to keep Ozone block management inside the Datanode process to 
enable a fast-copy between HDFS and Ozone. Not all data needs all the HDFS 
features like encryption, erasure coding, etc, and this data could be stored in 
Ozone.
* Andrew: This fast-copy hasn't been implemented or discussed yet. Unclear 
if it'll work at all with existing HDFS block management. Won't work with 
encryption or erasure coding. Not clear whether it requires being in the same 
DN process even.
* Sanjay/Anu: Ozone is also useful to test with just the key-value interface. 
It's a Hadoop-compatible FileSystem, so apps that work on S3 will work on Ozone 
too.
* Andrew: If it provides a new API and doesn't support the HDFS 
feature-set, doesn't this support it being its own project?

Summary

* No consensus on the high-level questions raised
* Ozone could be its own project and integrated later, or remain on an HDFS 
branch
* Without the FSN/BM lock split, it can't serve as the block management layer 
for HDFS
* Without fast copy, there's no need for the to be part of the DataNode 
process, and it might not need to be in the same process anyway.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-16 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256323#comment-16256323
 ] 

Anu Engineer commented on HDFS-7240:


bq. Thanks for organizing community meeting(s). Hope there will be a deep-dive 
into Ozone impl, as it may take a long time to go through the code on your own.
I will be happy to do it.

bq. Anything on Ozone security design?
We are working on a design, we will post it soon. 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-16 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256314#comment-16256314
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

Thanks for organizing community meeting(s). Hope there will be a deep-dive into 
Ozone impl, as it may take a long time to go through the code on your own.
Would be good to give people some time to review the code before starting the 
vote.

*Anything on Ozone security design?*

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-16 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256311#comment-16256311
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

?? How does this align with the router-based federation HDFS-10467? ??

Hey [~ywskycn], router-based federation (in fact all federation approaches) are 
orthogonal to distributed NN. One should be able to run RBF over multiple HDFS 
clusters, potentially having different versions.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-15 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254498#comment-16254498
 ] 

Anu Engineer commented on HDFS-7240:


[~eddyxu])  Yes it is same. Thanks for checking.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-15 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254360#comment-16254360
 ] 

Lei (Eddy) Xu commented on HDFS-7240:
-

[~anu], [~pono],  thanks for posting the meeting details.  One question: is the 
Americas time zone meeting (Friday, November 17th 4PM) has the same dial in 
numbers and zoom.us URL? We'd like to join this one.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-14 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251939#comment-16251939
 ] 

Anu Engineer commented on HDFS-7240:


[~pono] Thank you very much for you help. I really appreciate it.

My apologies for not posting this meeting message earlier. I have been trying 
for a while to post this message, but apparently if you try to post a phone 
number , then you get banned from the JIRA. I did *not* know that and had been 
without JIRA access for a while.  Thanks to [~pono] for using his Admin super 
powers and helping me post the meeting invite.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-14 Thread Daniel Takamori (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251930#comment-16251930
 ] 

Daniel Takamori commented on HDFS-7240:
---

*+This message is being proxy posted by an Infra admin because it contains 
banned strings triggering our spam filter.
+*
Before we send out the [Vote] thread for ozone, I propose that we do two 
community
meetings. This allows us to address any questions issues over a high bandwidth 
medium.
Since many contributors/committers of ozone are spread across the world, the 
first meeting is friendly toward Europe/Asia time Zone. I propose the following 
time for the meeting.
 
 
||Location || Local Time || Time zone || UTC Offset
|Seattle / San Jose | Thursday, November 16, 2017 at 1:00:00 am (night) | PST | 
UTC-8 hours
|London| Thursday, November 16, 2017 at 9:00:00 am | GMT | UTC
|Budapest/Brussels|Thursday, November 16, 2017 at 10:00:00 am | CET | UTC+1 hour
|Bangalore | Thursday, November 16, 2017 at 2:30:00 pm |IST | UTC+5:30 hours
|Shanghai | Thursday, November 16, 2017 at 5:00:00 pm | CST | UTC+8 hours
 
I propose that we have a follow-up meeting targetting Americas time zone, and I 
propose *Friday, November 17, 2017, at 4:00 pm PST* for that meeting.
 
Here is the meeting info:
{noformat}
 
Topic: Ozone Merge meeting
Time: Nov 16, 2017 1:00 AM Pacific Time (US and Canada)
 
Join from PC, Mac, Linux, iOS or Android: 
https://hortonworks.zoom.us/j/5451676776
Or join by phone:
 
+1 646 558 8656 (US Toll) or +1 669 900 6833 (US Toll)
+1 877 369 0926 (US Toll Free)
+1 877 853 5247  (US Toll Free)
Meeting ID: 545 167 6776 
International numbers available: 
https://hortonworks.zoom.us/zoomconference?m=rYZYSAOVLYtFE6wkwIrjJeqO3CP_I6ij

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-14 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251078#comment-16251078
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


Thanks for posting meeting minutes [~shv] and [~anu]. It is great to see 
alignment in the roadmap.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-13 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250063#comment-16250063
 ] 

Wei Yan commented on HDFS-7240:
---

Thanks [~shv] for the detailed notes.

Have a qq there
{quote}
2. A single NameNode with namespace implemented as KV-collection. The 
KV-collection is partitionable in memory, which allows breaking the single lock 
restriction of current NN. Performance gains not measured yet.
3. Split the KV-namespace into two or more physical NNs.
{quote}
How does this align with the router-based federation HDFS-10467?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-12 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249112#comment-16249112
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

We had a F2F meeting with Ozone authors. Anu is publishing his notes. The focus 
was on the following issues:
* Should Ozone be a part of HDFS or a separate project?
* How Ozone can help addressing scalable RPC performance?
* Can Ozone be used as a block management layer for HDFS?
* Migration to Ozone from HDFS

h3. Ozone as a block management layer
I think we made pretty good progress in understanding the role of Ozone and the 
future of HDFS.
On large production Hadoop clusters such as LinkedIn's and others, traced via 
multiple publications, we see that
# We read 90% of data that we write. No cold metadata
# RPC load on the NameNode increases proportionally to the growth of storage, 
which is exponential.

Thus, the idea of a NameNode with partial namespace in memory does not fully 
solve these growth problems. Because a) it is still limited by the single NN 
performance, and b) we will still have to provision NN to keep most of the 
namespace in memory.

We came to the following high-level roadmap for evolving HDFS:
# NameNode with the block management delegated to Ozone layer. There is a 
prototype of such NN, which is believed to show 30-50% performance improvement. 
POC would be good.
# A single NameNode with namespace implemented as KV-collection. The 
KV-collection is partitionable in memory, which allows breaking the single lock 
restriction of current NN. Performance gains not measured yet.
# Split the KV-namespace into two or more physical NNs.

_Important requirement:_ we should provide a *no-data-copy migration of the 
clusters* along the entire transformation.
It is not feasible to DistCp a e.g. 100PB cluster, since it requires a 
prolonged down-time and is expensive - doubles the amount of hardware involved.
Thus, an upgrade should keep the data blocks on the same DataNodes, and may 
need to provide an offline tool to convert metadata (fsimage) to new format.

There is a lot to design here, but it looks to me like a gradual path from 
current single NN to distributed namespace architecture. So if people agree 
with the direction in general I'll be glad to create a Wiki page describing 
this intention so that folks could comment and discuss.
Could Ozone authors ([~anu], [~jnp], [~sanjay.radia]) please confirm our common 
understanding of the roadmap.

h3. Merging Ozone to HDFS
There are pros and cons to merging Ozone into Hadoop vs a separate project. The 
pros include (please expand):
* Code sharing
* Ozone should improve DataNode pipeline code
* Better testing for Ozone within Hadoop

Some cons:
* As part of HDFS it will need to support standard HDFS features, like 
security, snapshots, erasure codes, etc. While as a separate project it can 
implement them later
* As a separate project Ozone can benefit from frequent release cycles
* Bugs in Ozone can affect HDFS and vice versa
* Incompatible changes may be allowed in Ozone on early stages, but not allowed 
in Hadoop
* Rolling upgrades are required for HDFS, which may not be possible for Ozone
 
The roadmap above sets Ozone as a step to partitioned NameNode, which solves 
both RPC scalability and cluster growth problems for big Hadoop installations. 
This validates merging Ozone to Hadoop for me. Given the cons though I'm not 
sure when is the right time. I think we should at least have a design doc for 
security before merging in order to avoid API changes.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248971#comment-16248971
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 137 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
33s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 48m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 24m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 24m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 24m 58s{color} 
| {color:red} root generated 26 new + 1208 unchanged - 26 fixed = 1234 total 
(was 1234) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m 33s{color} | {color:orange} root: The patch generated 16 new + 1399 
unchanged - 19 fixed = 1415 total (was 1418) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
44s{color} | {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
27s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m 26s{color} | {color:orange} The patch generated 260 new + 100 unchanged - 4 
fixed = 360 total (was 104) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
43s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 14m  
5s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248904#comment-16248904
 ] 

Hadoop QA commented on HDFS-7240:
-

(!) A patch to the testing environment has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-HDFS-Build/22054/console in case of 
problems.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243550#comment-16243550
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 134 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m  7s{color} 
| {color:red} root generated 24 new + 1216 unchanged - 24 fixed = 1240 total 
(was 1240) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 14s{color} | {color:orange} root: The patch generated 15 new + 1397 
unchanged - 19 fixed = 1412 total (was 1416) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
27s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} The patch generated 0 new + 100 unchanged - 4 fixed 
= 100 total (was 104) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
21s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . hadoop-client-modules/hadoop-client-check-test-invariants 
hadoop-client-modules/hadoop-client-minicluster hadoop-dist hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} |

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243418#comment-16243418
 ] 

Hadoop QA commented on HDFS-7240:
-

(!) A patch to the testing environment has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-HDFS-Build/21996/console in case of 
problems.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, Ozone-architecture-v1.pdf, 
> Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-06 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240584#comment-16240584
 ] 

stack commented on HDFS-7240:
-

The posted document needs author, date, and ref to this issue. Can it be made a 
google doc so can comment inline rather than here?

I skipped to the end, "So why put the Ozone in HDFS and not 
keep it a separate project". There is no argument here on why Ozone 
needs to be part of Apache Hadoop. As per [~shv] above, Ozone as separate 
project does not preclude its being brought in instead as a dependency nor does 
it dictate the shape of deploy (Bullet #3 is an aspiration, not an argument).




> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-04 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239313#comment-16239313
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

[~sanjay.radia], thank you for sharing the doc, your vision for Ozone 
evolution, motivation, and compelling use cases.
I am glad I had generally correct understanding that you envisioned Ozone as a 
block management layer for HDFS and a NameNode with partial namespace in memory.
[As I mentioned 
above|https://issues.apache.org/jira/browse/HDFS-7240?focusedCommentId=16235080=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16235080]
 the partial namespace architecture does not fully address the problem of 
scaling RPCs on Hadoop clusters, which is the main pain point for me and I 
believe everybody else running big analytics clusters.

You give three main reasons for Ozone inclusion into Hadoop.I think Ozone can 
do all three as a separate project as well.
People run different systems on the same cluster along with Hadoop, e.g. HBase, 
Spark. So Ozone will be yet one more.
Separate project Ozone does not prevent from using it as a scalable 
block-container layer in HDFS. HDFS can always include Ozone as a dependency. 
Especially if Ozone is already optimized for large IO scans.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-02 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236483#comment-16236483
 ] 

Mukul Kumar Singh commented on HDFS-7240:
-

[~ste...@apache.org] I have filed HDFS-12768, HDFS-12767, HDFS-12762 & 
HDFS-12764 to address OzoneFileSystem related review comments. Thanks a lot for 
taking a look at the code and for your valuable comments.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-02 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236371#comment-16236371
 ] 

Anu Engineer commented on HDFS-7240:


[~ste...@apache.org] I have filed HDFS-12761 to discuss code/design/arch 
questions on the merge. I really appreciate your feedback and time. Please do 
share any other issues you find. Sharing HDFS-12761 here so others know which 
jira to follow to see the merge discussion comments.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236271#comment-16236271
 ] 

Steve Loughran commented on HDFS-7240:
--

Anu, I've got some more comments. Given the size of this JIRA, & no of 
watchers, I'm going to suggest a low-level "merge HDFS-7240 in JIRA"where we 
can discuss the low level code details, and attach a .patch of the entire thing 
for Jenkins/yetus to handle. This is how we've done the s3guard work and it 
helps split code issues from more strategic things like Constantin.

For large scale tests, make sure you have test which scale and the test runner 
timeout designed to support multi hour tests. AWS s3 now supports multi-TB 
files through multipart uploads; we do run multi-GB long-haul uploads as part 
of the release process as follows: run

Now, more client side comments. I think this is my client side done, and its 
more at the HTTP REST API level than anything else


h3. OzoneFileSystem

* implement getDefaultBlockSize(); add a config option to let people set it. 
add a sensible default like 64 or 128 MB.
* you could implement {{copyFromLocalFile}} and {{copyToLocalFile}} trivially 
using bucket.putKey(dst, path) & bucket.getKey(). This lines you up for 
HADOOP-14766, which is a high performance upload from the local FS to a store


h3. org.apache.hadoop.ozone.web.ozShell.Shell

# {{dispatch}} always returns "1", so doesn't differentiate success (0) from 
any other error.
# run(String[] args) doesn't check for `parseHandler` returning null; will NPE 
on parse error, when normally you'd want to print a usage command.
# Shell bucket options should return some explicit usage erorr


Have a look at what we've done in 
{{org.apache.hadoop.fs.s3a.s3guard.S3GuardTool}} w.r.t raising exceptions. 
* Any tool (for you: handler) can raise a UnknownOptionException which triggers 
the usage() command, no stack trace printed. 
* Any `ExitUtils.ExitException` sets the exit code of the 
* we use `ExitUtil.terminate(status, text)` to exit the shell in main(); which 
can be turned off, so that tests
can invoke the full CLI, verify failure modes


Tests: I don't see any CLI tests. Look at {{TestS3GuardCLI}} to see how we do 
CLI testing, lookin gfor those ExitExceptions & asserting on the return value,
as well as grabbing all the string output (which can be printed to a string as 
well as stdout) & checking that.

h3. OzoneRestClient

Need to plan for a common failure mode being wrong endpoint, possibly returning 
plaintext or HTML error messages. Causes include: client config, proxy things. 
Also failures where the response is cut off partway through the read. 
Content-length is your friend here.

* All HTTP requests MUST verify content-type of response. Otherwise a GET of an 
http page will return 200 & trigger a parse failure, when its probably "you 
just got the the URL wrong".
* Error handling should handle content: text/plain or text/html and build a 
string from it. Why? Jetty  can raise their own exceptions and they will 
return text.
* Probably good to print the URL being used here on HTTP failures, as it helps 
debug foundational config issues of "Wrong endpoint"
* Anything downloading to a string before parsing should look for a 
Content-Length header, and, if set (1) verify that it's within a sensible range 
& (2) use it to size the download. Actually, EntityUtils.toString(entity) does 
that, but it limits the size of a response to 4KB for that reason. Are you 
confident that all responses parsed that way will be <= 4KB long? I'd write 
tests there.
* You can parse JSON straight off an InputStream: do that for any operation 
which can return large amounts of data.
* why does {{setEndPointURI(URI)}} not just use {{Preconditions.checkArgument}} 
and throw an InvalidArgumentException?
* for consistency {{setEndPoint(String clusterFQDN)}}  needs a 
{{Preconditions.checkArgument(Strings.isNullOrEmpty(clusterFQDN))}}.
* delete(). Allow for the full set of delete responses (200, 204). Less 
critical against your own endpoint, but still best practise.
* putKey. It's very inefficient to generate the checksum before the upload, as 
it means 1x scan of the data before the upload. This won't scale efficiently to 
multiGB uploads. What about: calculate during the put and have it returned by 
the endpoint; client to verify the resulting value. This does mean that a 
corrupted PUT will overwrite the dest data.
* If you do use the checksum, verify that the result of an invalid checksum on 
PUT returns an exception which is handled in OzoneBucket.executePutKey(), that 
is: 400. 422 or whatever is sent back by the web engine. Of course, invalid 
checksum must trigger a failure.
* listVolumes: what's the max # of volumes? Can all volumes be returned in a 
single 4KB payload
* getKey: verify content type; require content-length & use when reading file. 
Otherwise you won't pick up broken GET
*

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235080#comment-16235080
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

??I hope this addresses your concerns.??

I don't think _that_ addressed any of my concerns.
* Ozone by itself does not solve any of HDFS problems. It uses HDFS-agnostic 
S3-like API, and I cannot use it on my clusters.
Unless I can convince thousands of my users to rewrite their thousands of 
applications, along with the existing computational frameworks: YARN, Hive, 
Pig, Spark, ..  created over the past 10 years.
* I was talking about futuristic architecture, when you start using Ozone for 
block management, and rewrite NameNode to store its namespace in LevelDB. If 
this is still your plan. I agree this architecture solves the objects-count 
problem. But it does not solve the problem of scaling RPC requests, which is 
more important to me than the # of objects, since you still cannot grow the 
cluster beyond the single-NameNode's-RPC-processing capability.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234959#comment-16234959
 ] 

Anu Engineer commented on HDFS-7240:


[~ste...@apache.org] Thank you for the comments. 
bq. For now, biggest issue I have is that OzoneException needs to become an IOE
I have filed HDFS-12755 for converting the OzoneException to an IOException. 

bq. What's your scale limit? I see a single PUT for the upload, GET path > tmp 
in open() . Is there a test for different sizes of file?
We have tested with different sizes from 1 byte files to 2 GB. There is no size 
limit imposed by ozone architecture. However, we have always planned to follow 
the S3 limit of 5 GB. We can certainly add tests for different size of files -- 
but creating these data files during unit tests take time. We have strived to 
keep the unit tests of ozone under 4 mins so far.  Large key sizes add 
prohibitive unit test times. So our approach is to use Corona, which is a 
load-generation tool for ozone. we run this 4 times daily with different key 
sizes. It is trivial to setup and run.

For the comments on the OzoneFileSystem, I will let the appropriate person 
respond.





> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234770#comment-16234770
 ] 

Steve Loughran commented on HDFS-7240:
--

I'm starting with hadoop-common and hadoop-ozone; more to follow on thursday.

For now, biggest issue I have is that OzoneException needs to become an IOE, so 
simplifying excpetion handling all round, preserving information, not losing 
stack traces, and generally leading to happy support teams as well as 
developers. Changing the base class isn't itself traumatic, but it will 
implicate the client code as there's almost no longer any need to catch & wrap 
things.


Other: What's your scale limit? I see a single PUT for the upload, GET path > 
tmp in open() . Is there a test for different sizes of file?

h2. hadoop-common

h3. Config


I've filed some comments on thecreated HADOOP-15007, "Stabilize and document 
Configuration  element", to cover making sure that there are the tests & 
docs for this to go in.

* HDFSPropertyTag: s/DEPRICATED/r/DEPRECATED/
* OzonePropertyTag: s/there/their/ 
* OzoneConfig Property.toString() is going to be "key valuenull" if there is no 
tag defined. Space?



h3. FileUtils

minor: imports all shuffled about compared to trunk & branch-2. revert.

h3. OzoneException

This is is own exception, not an IOE, and at least in OzoneFileSystem the 
process to build an IOE from itinvariably loses the inner stack trace and all 
meaningful information about the exception type. Equally, OzoneBucket catches 
all forms of IOException, converts to an {{OzoneRestClientException}}. 

We don't need to do this. 

it will lose stack trace data, cause confusion, is already making the client 
code over complex with catching IOEs, wrapping to OzoneException, catching 
OzoneException and converting to an IOE, at which point all core information is 
lost. 

1. Make this subclass of IOE, consistent with the rest of our code, and then 
clients can throw up untouched, except in the special case that they need to 
perform some form of exception.
1. Except for (any?) special cases, pass up IOEs raised in the http client as 
is.


Also.
* confused by the overridding of message/getmessage. Is for serialization? 
* Consider adding a setMessage(String format, string...args) and calling 
STring.format: it would tie in with uses in the code.
* override setThrowable and setMessage() called to set the nested ex (hence 
full stack) and handle the case where the exception returns null for 
getMessage().

{code}
OzoneException initCause(Throwable t) {
  super.initCause(t)
  setMessage(t.getMessage() != null ? t.getMessage() : t.toString())
}
{code}

h2. OzoneFileSystem

h3. general


* various places use LOG.info("text " + something). they should all move to 
LOG.info("text {}", something)
* Once OzoneException -> IOE, you can cut the catch and translate here.
* qualify path before all uses. That's needed to stop them being relative, and 
to catch things like someone calling ozfs.rename("o3://bucket/src", 
"s3a://bucket/dest"), delete("s3a://bucket/path"), etc, as well as problems 
with validation happening before paths are made absolute.



* {{RenameIterator.iterate()}} it's going to log @ warn whenever it can't 
delete a temp file because it doesn't exist, which may be a distraction in 
failures. Better: {{if(!tmpFile.delete() && tmpFile.exists())}}, as that will 
only warn if the temp file is actually there. 

h3. OzoneFileSystem.rename(). 
Rename() is the operation to fear on an object store. I haven't looked at in 
full detail,. 
* Qualify all the paths before doing directory validation. Otherwise you can 
defeat the "don't rename into self checks"  rename("/path/src", 
"/path/../path/src/dest").
* Log @ debu all the paths taken before returning so you can debug if needed. 
* S3A rename ended up having a special RenameFailedException() which 
innerRename() raises, with text and return code. Outer rename logs the text and 
returns the return code. This means that all failing paths have an exception 
clearly thrown, and when we eventually make rename/3 public, it's lined up to 
throw exceptions back to the caller. Consider copying this code.

h3. OzoneFileSystem.delete

* qualify path before use
* dont' log at error if you can't delete a nonexistent path, it is used 
everywhere for silent cleanup. Cut it

h3. OzoneFileSystem.ListStatusIterator

* make status field final

h3. OzoneFileSystem.mkdir

Liked your algorithm here; took me a moment to understand how rollback didn't 
need to track all created directories. nice.
* do qualify path first.

h3. OzoneFileSystem.getFileStatus

{{getKeyInfo()}} catches all exceptions and maps to null, which is interpreted 
not found and eventually surfaces as FNFE. This is misleading if the failure is 
for any other reason.

Once OzoneException -> IOException, {{getKeyInfo()}} should only catch & 
downgrade the explicit not found (404?)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-30 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225532#comment-16225532
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


[~shv] Thank you for taking out time to review ozone. I appreciate your 
comments and questions.

{quote} 
There are two main limitations in HDFS
a) The throughput of Namespace operations. Which is limited by the number of 
RPCs the NameNode can handle
b) The number of objects (files + blocks) the system can maintain. Which is 
limited by the memory size of the NameNode.
{quote}

   I agree completely. We believe ozone attempts to address both these issues 
for HDFS.
   
   Let us look at the Number of objects problem. Ozone directly addresses the 
scalability of number of blocks by introducing storage containers that can hold 
multiple blocks together. The earlier efforts on this were complicated by the 
fact that block manager and namespace are intertwined in HDFS Namenode. There 
have been efforts in past to separate block manager from namespace for e.g. 
HDFS-5477. Ozone addresses this problem by cleanly separating the block layer.  
Separation of block layer also addresses the file/directories scalability 
because it frees up the blockmap from the namenode.
   
   Separate block layer relieves namenode from handling block reports, IBRs, 
heartbeats, replication monitor etc, and thus reduces the contention on 
FSNamesystem lock and significantly reduces the GC pressure on the namenode. 
These improvements will greatly help the RPC performance of the Namenode.

bq. Ozone is probably just the first step in rebuilding HDFS under a new 
architecture. With the next steps presumably being HDFS-10419 and HDFS-8. 
The design doc for the new architecture has never been published. 
   We do believe that Namenode can leverage the ozone’s storage container 
layer, however, that is also a big effort. We would like to first have block 
layer stabilized in ozone before taking that up. However, we would certainly 
support any community effort on that, and in fact it was brought up in last BoF 
session at the summit.

   Big data is evolving rapidly. We see our customers needing scalable file 
systems, Objects stores(like S3) and Block Store(for docker and VMs). Ozone 
improves HDFS in two ways. It addresses throughput and scale issues of HDFS, 
and enriches it with newer capabilities.


bq. Ozone is a big enough system to deserve its own project.

I took a quick look at the core code in ozone and the cloc command reports 
22,511 lines of functionality changes in Java.

This patch also brings in web framework code like Angular.js and that brings in 
bunch of css and js files that contribute to the size of the patch, and the 
rest are test and documentation changes.

I hope this addresses your concerns.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225462#comment-16225462
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 132 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 37m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 17m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 17m 26s{color} 
| {color:red} root generated 17 new + 1231 unchanged - 17 fixed = 1248 total 
(was 1248) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 24s{color} | {color:orange} root: The patch generated 9 new + 858 unchanged 
- 16 fixed = 867 total (was 874) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
29s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
15s{color} | {color:green} The patch generated 0 new + 100 unchanged - 4 fixed 
= 100 total (was 104) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
28s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . hadoop-client-modules/hadoop-client-check-test-invariants 
hadoop-client-modules/hadoop-client-minicluster hadoop-dist hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-28 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223735#comment-16223735
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

It is an interesting question whether Ozone should be a part of Hadoop. There 
are two main reasons why I think it should not.

# With close to 500 sub-tasks, with 6 MB of code changes, and with a sizable 
community behind, it looks to me like a whole new project.
It is essentially a new storage system, with different (than HDFS) 
architecture, separate S3-like APIs. This is really great - the World sure 
needs more distributed file systems. But it is not clear why Ozone should 
co-exist with HDFS under the same roof.
# Ozone is probably just the first step in rebuilding HDFS under a new 
architecture. With the next steps presumably being HDFS-10419 and HDFS-8.
The design doc for the new architecture has never been published. I can only 
assume based on some presentations and personal communications that the idea is 
to use Ozone as a block storage, and re-implement NameNode, so that it stores 
only a partial namesapce in memory, while the bulk of it (cold data) is 
persisted to a local storage.
Such architecture makes me wonder if it solves Hadoop's main problems. There 
are two main limitations in HDF
a) _The throughput of Namespace operations_. Which is limited by the number of 
RPCs the NameNode can handle
b) _The number of objects_ (files + blocks) the system can maintain. Which is 
limited by the memory size of the NameNode.
The RPC performance (a) is more important for Hadoop scalability than the 
object count (b). The read RPCs being the main priority.
The new architecture targets the object count problem, but in the expense of 
the RPC throughput. Which seems to be a wrong resolution of the tradeoff.
Also based on the use patterns on our large clusters we read up to 90% of the 
data we write, so cold data is a small fraction and most of it must be cached.

To summarize:
- Ozone is a big enough system to deserve its own project.
- The architecture that Ozone leads to does not seem to solve the intrinsic 
problems of current HDFS.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223291#comment-16223291
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 132 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m  7s{color} 
| {color:red} root generated 17 new + 1231 unchanged - 17 fixed = 1248 total 
(was 1248) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 27s{color} | {color:orange} root: The patch generated 10 new + 859 unchanged 
- 16 fixed = 869 total (was 875) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
25s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} The patch generated 0 new + 100 unchanged - 4 fixed 
= 100 total (was 104) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
18s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . hadoop-client-modules/hadoop-client-check-test-invariants 
hadoop-client-modules/hadoop-client-minicluster hadoop-dist hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} |

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219730#comment-16219730
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m 
18s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 132 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants hadoop-dist . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 18m 
43s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 18m 43s{color} 
| {color:red} root generated 17 new + 1231 unchanged - 17 fixed = 1248 total 
(was 1248) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 32s{color} | {color:orange} root: The patch generated 61 new + 858 unchanged 
- 16 fixed = 919 total (was 874) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 15m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
29s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
3s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
37s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . hadoop-client-modules/hadoop-client-check-test-invariants 
hadoop-client-modules/hadoop-client-minicluster hadoop-dist hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}170m 24s{color} 
| {color:red} root in the patch

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-10-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200998#comment-16200998
 ] 

Hadoop QA commented on HDFS-7240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m 
17s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 118 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-client-modules/hadoop-client-minicluster 
hadoop-client-modules/hadoop-client-check-test-invariants . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 13m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 17s{color} 
| {color:red} root generated 160 new + 1115 unchanged - 156 fixed = 1275 total 
(was 1271) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 32s{color} | {color:orange} root: The patch generated 60 new + 872 unchanged 
- 16 fixed = 932 total (was 888) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
26s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch has 11 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
18s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . hadoop-client-modules/hadoop-client-check-test-invariants 
hadoop-client-modules/hadoop-client-minicluster hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-10 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122281#comment-16122281
 ] 

Anu Engineer commented on HDFS-7240:


[~johnament] Would please care to comment on the ASF slack usage for people 
without Apache email ID? 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-10 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122004#comment-16122004
 ] 

Anu Engineer commented on HDFS-7240:


[~steve_l], [~elek] I have asked the slack community how this can be solved. I 
am hopeful there is a way to invite people without Apache email ID. I will 
update this discussion when I hear back from the community in slack. If we 
cannot add people without apache ID, I will move this to IRC as steve suggested.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-10 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121940#comment-16121940
 ] 

Steve Loughran commented on HDFS-7240:
--

Putting my ASF process hat on, it is important that anyone interested in 
collaborating is allowed to join in, especially as real time chats tend to be 
exclusive enough anyway.

IRC channel, perhaps?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-09 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120502#comment-16120502
 ] 

Elek, Marton commented on HDFS-7240:


Yeah, but it seems for the registration I need an {{@apache.org}} email 
address. And no information about invites anywhere.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-09 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120438#comment-16120438
 ] 

Anu Engineer commented on HDFS-7240:


[~elek] Nope, all are welcome.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-08-09 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120436#comment-16120436
 ] 

Elek, Marton commented on HDFS-7240:


Is this chat for commiters only?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-06-09 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044831#comment-16044831
 ] 

Anu Engineer commented on HDFS-7240:


I have opened an ozone channel at the ASF slack. Since we are deploying and 
testing ozone, real time communication is very useful to notify issues as we 
see them.

Please signup at the following page using your apache ID, and join the #Ozone 
channel if you would like to get notified about how the testing and deployment 
is going for ozone.

https://the-asf.slack.com/signup

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-13 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328259#comment-15328259
 ] 

Kai Zheng commented on HDFS-7240:
-

Thanks all for the discussion and [~anu] for this nice summary.

bq. To support Erasure coding, SCM will have to return more than 3 machines, 
let us say we were using 6 + 3 model of erasure coding then then a container is 
spread across nine machines. Once we modify SCM to support this model, the 
container client will have write data to the locations and update the RAFT 
state with the metadata of this block.
This looks like to support the striping erasure coding in client when 
putting/updating a k/v to the store, right? For small objects, the write will 
trigger the relatively expensive work of coding and writing to 6+3 locations, I 
would doubt about the performance/overhead and the benefit. For large objects, 
it sounds fine. So like we did for striping files, users should also be able to 
opt striping or not according to their bucket conditions, I guess.

In HDFS files, in addition to striping, there is another way to do erasure 
coding in block level as discussed in HDFS-8030, mainly targeting to convert 
old/cold data from replica into erasure coded for saving storage. In Ozone, how 
about this approach? Would we have old/cold buckets that can  be frozen and no 
update any longer? I'm not sure about this from users' point of view, but we 
might not reuse the same sets of buckets/containers across many years, right? 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-10 Thread Anu Engineer (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325389#comment-15325389
]

Anu Engineer commented on HDFS-7240:

*Ozone meeting notes – Jun, 9th, 2016*

Attendees: ??Thomas Demoor, Arpit Agarwal, JV Jujjuri, Jing Zhao, Andrew Wang,
Lei Xu, Aaron Myers, Colin McCabe, Aaron Fabbri, Lars Francke, Stiwari, Anu
Engineer??

We started the discussion with how Erasure coding will be supported in ozone.
This was quite a lengthy discussion taking over half the meeting time. Jing
Zhao explained the high-level architecture and pointed to similar work done by
Drobox.

We then divide into details of this problem, since we wanted to make sure that
supporting Erasure coding will be easy and efficient in ozone.

Here are the major points:

SCM currently supports a simple replicated container. To support Erasure
coding, SCM will have to return more than 3 machines, let us say we were using
6 + 3 model of erasure coding then then a container is spread across nine
machines. Once we modify SCM to support this model, the container client will
have write data to the locations and update the RAFT state with the metadata of
this block.

When a file read happens in ozone, container client will go to KSM/SCM and find
out the container to read the metadata from. The metadata will tell the client
where the actual data is residing and it will re-construct the data from EC
coded blocks.

We all agreed that getting EC done for ozone is an important goal, and to get
to that objective, we will need to get the SCM and KSM done first.

We also discussed how small files will cause an issue with EC especially since
container would pack lots of these together and how this would lead to
requiring compaction due to deletes.

Eddy brought up this issue of making sure that data is spread evenly across the
cluster. Currently our plan is to maintain a list of machines based on
container reports. The container reports would contain number of keys, bytes
stored and number of accesses to that container. Based on this SCM would be
able to maintain a list that allows it to pick machines that are under-utilized
from the cluster, thus ensuring a good data spread. Andrew Wang pointed out
that counting I/O requests is not good enough and we actually need the number
of bytes read/written. That is an excellent suggestion and we will modify
container reports to have this information and will use that in SCMs allocation
decisions.

Eddy followed up this question with how would something like Hive behave over
ozone? Say hive creates a bucket, and creates lots of tables and after work, it
deletes all the tables. Ozone would have allocated containers to accommodate
the overflowing bucket. So it is possible to have many empty containers on an
ozone cluster.

SCM is free to delete any container that does not have a key. This is because
in the ozone world, metadata exists inside a container. Therefore, if a
container is empty, then we know that no objects (Ozone volume, bucket or key)
exists in that container. This gives the freedom to delete any empty container.
This is how the containers would be removed in the ozone world.

Andrew Wang pointed out that it is possible to create thousands of volumes and
map them to similar number of containers. He was worried that it would become a
scalability bottle neck. While is this possible in reality if you have cluster
with only volumes – then KSM is free to map as many ozone volumes to a
container. We agreed that if this indeed becomes a problem, we can write a
simple compaction tool for KSM which will move all these volumes to few
containers. Then SCM delete containers would kick in and clean up the cluster.

We reiterated through all the scenarios for merge and concluded the for v1,
ozone can live without needing to support merges of containers.

Then Eddy pointed out that by switching to range partitions from hash
partitions we have introduced a variability in the list operations for a
container. Since it is not documented on JIRA why we switched to using range
partition, we discussed the issue which caused us to switch over to using range
partition.

The original design called for hash partition and operations like list relying
on secondary index. This would create an eventual consistency model where you
might create key, but it is visible in the namespace only after the secondary
index is updated. Colin argued that is easier for our users to see consistent
namespace operations. This is the core reason why we moved to using range
partitions.

However, range partitions do pose the issue, that a bucket might be split
across a large number of containers and list operation does not have fixed time
guarantees. The worst case scenario is if you have bucket with thousands of 5
GB objects which internally causes that the bucket to be mapped over a set of

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-07 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319857#comment-15319857
 ] 

Anu Engineer commented on HDFS-7240:


Just posting here a reminder for the ozone design review. It is scheduled @ Jun 
9, 2016 2:00 PM (GMT-7:00) Pacific Time. 
This meeting is to review ozone's proposed design.  Hopefully everyone has got 
a chance to read the posted doc already.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-06 Thread Anu Engineer (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317035#comment-15317035
]

Anu Engineer commented on HDFS-7240:

Hi [~eddyxu] , Thank you for reviewing the design doc and comments. Please
see my comments below.

bq. Since Ozone is decided to use range partition, how would key / data
distribution achieve balancing from initial state? For example, a user Foo runs
Hive and creates 10GB of data, these data are distributed to up to 6
(containers) DNs?

You bring up a very valid point. This was the most contentious issue in ozone
world for a while. We originally went with hash partition schemes and secondary
index because of these concerns. The issue (and very rightly so) with that
approach was that secondary index is eventually consistent and makes it hard to
use. So we switched over to this scheme.

So our current thought is this, each of the containers will report -- size,
number of operations and number of keys to SCM. This will allow SCM to balance
the allocation of the key space. So if you have a large number of reads and
writes, which are completely independent, then they will fill up the
cluster/container space evenly.

But we have an opposing requirement here, generally there is a locality of
access in the namespace. So for most cases if you are reading and writing to a
bucket, then it is most efficient to keep that data together.

Now let us look at this specific case, if you have containers configured to say
2GB, then 10GB of data will map to 5 containers. So the model works out to 5
containers. These containers will be spread across a set of machines due to the
SCM’s location choosing algorithms.

bq. Would you explain what is the benefit of recovering failure pipeline by
using a parallel writes to all 3 containers? It is not very clear in the design.

The point I was trying to make is that pipeline relies on Quorum as defined by
RSM.
So if we decide to use this pipeline with RAFT, then I was just trying to make
a point that pipeline can be broken, and we will not attempt to heal it. Please
let me know if this makes sense.

bq. How does ozone differentiate a recover write from a malicious (or buggy)
re-write?

Thanks for flagging this, right now we do not. We can always prevent it in the
container layer. It is small extension to make, we can write to a temporary
file and replace the original if and only if the hashes match. I will file a
work item to fix this.

bq. You mentioned that KMS/SCM separation is for future scalability. Do KMS /
SCM maintains 1:1, 1:n or n:m relationship? Though it is not in this phase. I'd
like to know whether it is considered. Btw, they are also Raft replicated?

KSM:SCM has a n:m relationship. Even though in easiest deployment configuration
it is 1:1. So yes it is defined that way. They are always Raft replicated.

bq. The raft ring / leader is per-container?

Yes, and No. Let me explain this a little more. If you think only in terms of
RAFT protocol, then we have a RAFT leader is per machine set. That is, we are
going to have a leader for 3 machines (assuming a 3 machine RAFT ring). Now let
us switch over to a developer’s point of view. Someone like me who is writing
code against containers thinks strictly in terms of containers. So from an
ozone developers point of view, we have a Raft leader for a container. In
other words, containers provide an abstraction that makes you think that RAFT
protocol is for the container, whereas in reality it is a shared ring that is
used by many containers that share those 3 machines. This might be something
that we want to explore in greater depth during the call.

bq. For pipeline, say if we have a pipeline A->B->C, if the data writes
successfully on A->B, and the metadata Raft writes are succeed on B,C, IIUC,
that is a What would be the result for a read request sent to A or C?

I am going to walk thru this with little more details, so that we are all on
the same page.

What you are describing is a situation where the RAFT leader is either B or C
(Since RAFT is an active leader protocol) and for the sake of this illustration
let us assume that we are talking about 2 scenarios. One where data is written
to leader and another datanode and scenario two, where data is written to
followers but not to the leader.

Let us look at both in greater detail.

Case 1: Data is written to machines B (leader) and Machine A. But when RAFT
commit happens, machine A is off-line and RAFT data is written to Machine B and
Machine C.

So we have situation where B is the only machine with metadata as well as data.
We deal with this issue in two ways, one when the commit callback happens in C,
C will check if it has the data block and since it does not, it will attempt to
copy that block from either B or A.

Also when A's RAFT ring comes back up it will catch up with the RAFT log and

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-05 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316202#comment-15316202
 ] 

Lei (Eddy) Xu commented on HDFS-7240:
-

Hi, [~anu] Thanks a lot for organize the meeting.

I also have a few questions that are hopefully be answered in the meeting

* Since Ozone is decided to use range partition, how would key / data 
distribution achieve balancing from initial state? For example, a user Foo runs 
Hive and creates 10GB of data, these data are distributed to up to 6 
(containers) DNs? 
* Would you explain what is the benefit of recovering failure pipeline by using 
a parallel writes to all 3 containers? It is not very clear in the design.
* It seems to me that in the new pipeline in Ozone, there is no multiple 
intermediate states for each chunk? 
bq.  due to immutability of chunks w rite chunk is an idempotent operation
How does ozone differentiate a recover write from a malicious (or buggy) 
re-write?
* You mentioned that KMS/SCM separation is for future scalability. Do KMS / SCM 
maintains 1:1, 1:n or n:m relationship? Though it is not in this phase. I'd 
like to know whether it is considered.  Btw, they are also Raft replicated? 
* The raft ring / leader is per-container? 
* For pipeline, say if we have a pipeline A->B->C,  if the data writes 
successfully on A->B, and the metadata Raft writes are succeed on B,C, IIUC, 
that is a What would be the result for a read request sent to A or C ?
* How to handle split (merge, migrate) container during writes? 
* Since container size is determined by the space usage instead of # of keys, 
would that result large performance variants on listing operation, because {{# 
of DN reached for a list operation = total # of keys / (# of keys per 
container)). And # of keys per container is determined by average object size 
in the container.

Thanks.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-02 Thread Nikhil Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312249#comment-15312249
 ] 

Nikhil Joshi commented on HDFS-7240:


Unsubscribe


Nikhil
@nikhilj0shi 




> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-06-01 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311534#comment-15311534
 ] 

Anu Engineer commented on HDFS-7240:


As promised earlier, we would like to host a ozone design review meeting. 
Agenda is to discuss ozone design and future work.
{noformat}
Anu Engineer is inviting you to a scheduled Zoom meeting. 

Topic: Ozone design review
Time: Jun 9, 2016 2:00 PM (GMT-7:00) Pacific Time (US and Canada) 

Join from PC, Mac, Linux, iOS or Android: 
https://hortonworks.zoom.us/j/679978944

Or join by phone:

+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
+1 855 880 1246 (US Toll Free)
+1 888 974 9888 (US Toll Free)
Meeting ID: 679 978 944 
International numbers available: 
https://hortonworks.zoom.us/zoomconference?m=VJJvnfHtsvBoBXaaCftwMsOm8b-4ZkBj 
{noformat}

[~drankye] [~steve_l] [~ajisakaa] My apologies for a very north america centric 
time for the meeting, We will host another follow up meeting for contributors 
from Asia and Europe. 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-27 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304408#comment-15304408
 ] 

Anu Engineer commented on HDFS-7240:


[~lars_francke] Sorry for the confusion. {{Ozone-architecture-v1.pdf}} is the 
original ozone architecture document that it was referring to. So you are on 
the right track. This is an update of the original design, where we are 
proposing that SCM -- which was similar to Namenode -- that did both namespace 
management and block management -- be separated into KSM and SCM. So most of 
the original document stands as is. This design update also contains a section 
on data pipeline that contains  details on how we would like to use an RSM to 
get strong consistency.  The talk we gave ( 
http://schd.ws/hosted_files/apachebigdata2016/fc/Hadoop%20Object%20Store%20-%20Ozone.pdf
 )  in ApacheCon assumes no prior knowledge of current state of ozone. if you 
like you can look at those slides and then read the updated design. That will 
also provide you with continuity to read the updated design doc.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-27 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303963#comment-15303963
 ] 

Lars Francke commented on HDFS-7240:


I'm trying to get up to speed on the current proposal. Your new document starts 
with

{quote}This document is an Ozone design update that builds on the original 
Ozone Architecture and describes in greater detail Ozone namespace management 
and data replication consistency.{quote}

Does that mean I can disregard everything in the {{Ozone-architecture-v1.pdf}} 
document? I have to be admit at being a bit confused on what the current state 
is. I started reading both.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303085#comment-15303085
 ] 

Colin Patrick McCabe commented on HDFS-7240:


bq. Correct me if I am wrong – before Andrew Wang's contribution, symlink was 
somehow working (based on Eli Collins's work). After Andrew's work, we had no 
choice but disable the symlink feature. It this sense, symlink became even 
worse. Anyway, Andrew/Eli, any plan to fix symlink?

Symlinks were broken before Andrew started working on them.  They had serious 
security, performance, and usability issues.  If you are interested in learning 
more about the issues and helping to fix them, take a look at HADOOP-10019.  
They were disabled to avoid exposing people to serious security risks.  In the 
meantime, I will note that you were one of the reviewers on the JIRA that 
initially introduced symlinks, HDFS-245, before Andrew or I had even started 
working on Hadoop.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302890#comment-15302890
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7240:
---

> ... and added symlink support to FileSystem. The last one was just 
> contributing a new API to the FileSystem class, not implementing the symlink 
> feature itself. You are probably thinking of Eli Collins, who became a 
> committer partly by working on HDFS symlinks.

Thanks Colin for clarifying it.

Correct me if I am wrong -- before [~andrew.wang]'s contribution, symlink was 
somehow working (based on [~eli]'s work).  After Andrew's work, we had no 
choice but disable the symlink feature.  It this sense, symlink became even 
worse.  Anyway, Andrew/Eli, any plan to fix symlink?

Indeed, this JIRA is about object store.  We should not discuss symlink too 
much here.  My previous comment was just a suggestion to Andrew.  Let's discuss 
symlink in the dev mailing list or another JIRA.  Thanks.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302604#comment-15302604
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


bq. Why an object store as part of HDFS?
It is one of the goals to have both hdfs and ozone being available in the same 
deployment. That means same datanodes serve both ozone and hdfs data. 
Therefore, having ozone as a separate subproject in hadoop is ok as long as 
they can share the storage layer. The datanode changes would still be needed in 
hdfs.
   There is another proposal in HDFS-10419, that moves HDFS data into storage 
containers. I think that effort will need a new datanode implementation, that 
shares storage container layer with ozone. 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302344#comment-15302344
 ] 

stack commented on HDFS-7240:
-

bq. It is unfair to say that you are being rebuffed.

Can we please move to discussion of the design. Back and forth on what is 
'fair', 'tone', and how folks got commit bits is corrosive and derails what is 
important here; i.e. landing this big one.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302329#comment-15302329
 ] 

Arpit Agarwal commented on HDFS-7240:
-

bq. So far though it feels like I'm being rebuffed.
As you pointed out, your and Colin's feedback from our last discussion has 
influenced the design (and Anu rightly credited you for that during the 
ApacheCon talk too). Also I recall Anu spending over an hour with you in person 
at ApacheCon to go over your comments. It is unfair to say that you are being 
rebuffed. I again request you avoid such remarks and share your technical 
feedback/ideas with us to help identify gaps in our thinking. We'd be happy to 
schedule a webex. Many of us working on Ozone are remote but perhaps we can get 
together at the Hadoop Summit in June.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302191#comment-15302191
 ] 

stack commented on HDFS-7240:
-

bq. Now, can people stop being territorial or making any form of criticism of 
each other. It is fundamentally against the ASF philosophy of collaborative, 
community development, doesn't help long term collaboration and makes the 
entire project look bad. Thanks.

Amen.

Thanks for posting design [~anu]

bq. Datanodes provide a shared generic storage service called the container 
layer .

Is this HDFS Datanode? We'd add block manager functionality to the Datanode? 
(Did we answer the [~zhz] question, "How about "why an object store as part of 
HDFS"?)

Thanks


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301781#comment-15301781
 ] 

Steve Loughran commented on HDFS-7240:
--

bq. For example, in (all of) Hadoop's s3 filesystem implementations, listStatus 
uses this quick listing of keys between A and B. When someone does "listStatus 
/a/b/c", we can ask s3 for all the keys between /a/b/c/ and /a/b/c0 (0 is the 
ASCII value right after slash). Of course, s3 does not really have directories, 
but we can treat the keys in this range as being in the directory /a/b/c for 
the purposes of s3a or s3n. If we just had hash partitioning, this kind of 
operation would be O(N^2) where N is the number of keys. It would just be 
infeasible for any large bucket.

FWIW I'm looking at bulk-recursive directory listing in s3a for listStatus, 
moving the cost of listing from a very slow O(all-directories) to an 
O(all-files/1000). Be nice to retain that as otherwise dir listing is a very 
expensive operation, which kills split calculation.

Now, can people stop being territorial or making any form of criticism of each 
other. It is fundamentally against the ASF philosophy of collaborative, 
community development, doesn't help long term collaboration and makes the 
entire project look bad. Thanks.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301560#comment-15301560
 ] 

Colin Patrick McCabe commented on HDFS-7240:

bq. [~szetszwo] wrote: I seem to recall that you got your committership by 
contributing the symlink feature, however, the symlink feature is still not 
working as of today. Why don't you fix it? I think you want to build up a good 
track record for yourself.

[~andrew.wang] did not get his commitership by contributing the symlink 
feature.  By the time he was elected as a committer, he had contributed a 
system for efficiently storing and reporting high-percentile metrics, an API to 
expose disk location information to advanced HDFS clients, converted all 
remaining JUnit 3 HDFS tests to JUnit 4, and added symlink support to 
FileSystem.  The last one was just contributing a new API to the FileSystem 
class, not implementing the symlink feature itself.  You are probably thinking 
of [~eli], who became a committer partly by working on HDFS symlinks.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301324#comment-15301324
 ] 

Bikas Saha commented on HDFS-7240:
--

In case there is a conference call, please send an email to hdfs-dev with the 
proposed meeting details for wider dispersal and participation since that is 
the right forum to organize community activities.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301254#comment-15301254
 ] 

Anu Engineer commented on HDFS-7240:


Thank you, I have updated the JIRA and assigned this back to Jitendra

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301243#comment-15301243
 ] 

Jing Zhao commented on HDFS-7240:
-

Looks like contributors do not have permission to attach files anymore ? I 
assign the jira to [~anu] so that he can upload the updated design doc.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Anu Engineer
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301172#comment-15301172
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7240:
---

[~andrew.wang], I understand you like to contribute to this issue.  However, 
why don't you fix HDFS symlink first?  It is also a very useful and important 
feature.  It is one of the most wanted feature.  Many people are asking for it.

I seem to recall that you got your committership by contributing the symlink 
feature, however, the symlink feature is still not working as of today.  Why 
don't you fix it?  I think you want to build up a good track record for 
yourself.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300602#comment-15300602
 ] 

Jing Zhao commented on HDFS-7240:
-

Talking about EC in ozone, I had a general discussion with [~drankye] last week 
while he was visiting us. We think ozone's storage container layer can make EC 
work easier and more clean, especially considering we're planning the EC phase 
II, i.e., to do EC in an offline mode. 

Fundamentally EC/replication should be handled in the storage layer (i.e., the 
block of HDFS, and the storage container in ozone) as two options for 
maintaining data's durability. A ozone's storage container will have the 
capability to support both. The general design to support EC in ozone can be 
very similar to some existing object store such as [magic pocket | 
https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/]. We can have 
more detailed discussion about the design and finally have a section in the 
design doc, but I do not think to support EC can become a hurdle for us.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300565#comment-15300565
 ] 

Anu Engineer commented on HDFS-7240:


[~steve_l] [~drankye] We will certainly have a call to discuss the design once 
a detailed design doc is posted.

[~andrew.wang] Thanks for your comments. 

bq. Here, I'm also trying to be as constructive as possible, raising questions 
as well as proposing possible solutions.
I appreciate the spirit and rest assured that we really appreciate you raising 
questions. It is just that writing a design doc takes a little time.

bq. We discussed the need for range instead of hash partitioning (which I'm 
happy to see made it), as well as the overhead of doing metadata and data 
lookups (which could motivate storing Ozone metadata in Raft instead of in a 
container). 

This has been my sentiment all along, that we have been listening to the 
community feedback and making changes. we will certainly do the same going 
forward. I look forward to your comments and thoughts on the ozone once we post 
the design doc.

[~zhz] [~cmccabe] and [~andrew.wang] I would like to discuss the technical 
issues that have been raised in this JIRA after I post the design doc. It will 
allow us to have a shared understanding of where we are and will eliminate lot 
of repetition.  I personally believe it would be much more productive to have 
the discussion once we all have a shared view of the issues and suggested 
solutions.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300552#comment-15300552
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


Of course the design is flexible and the project would benefit from a 
constructive discussion here. As repeatedly mentioned before, an updated 
document will be posted soon and that essentially means to discuss any input 
and concerns. No design gets frozen until it is implemented. All the 
implementation so far is in the jiras, and will continue to be so.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300521#comment-15300521
 ] 

Steve Loughran commented on HDFS-7240:
--

+1 for some meetup, an online hangout webex would be good for remote people 
like me to catch up.

arguing with each other over a JIRA isn't the way to review designs or their 
implementations

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300518#comment-15300518
 ] 

Andrew Wang commented on HDFS-7240:
---

Sorry, hit reply too early. Quoting from my earlier response to Anu:

bq. Really though, even if the community hadn't explicitly expressed interest, 
all of this activity should still have been done in a public forum. It's very 
hard for newcomers to ramp up unless design discussions are being done publicly.

This is how software is supposed to be developed at Apache, so everyone can 
watch and contribute. It's not a reasonable standard to require each of the 160 
watchers on this JIRA to explicitly reach out to be involved in the 
conversation. And, like I said above, it's very hard to contribute at this 
conversation is happening publicly.

I'm a bit annoyed here since we did reach out in late Feb, and we had a nice 
design convo. We discussed the need for range instead of hash partitioning 
(which I'm happy to see made it), as well as the overhead of doing metadata and 
data lookups (which could motivate storing Ozone metadata in Raft instead of in 
a container). Then, as now, I also asked to be involved in the design 
discussions since this is a topic I'm very interested in. Here, I'm also trying 
to be as constructive as possible, raising questions as well as proposing 
possible solutions.

I keep saying this, but I would like to collaborate on this project. If you're 
willing to revisit some of the design points we're discussing above, we can put 
the past behind us and move forward. So far though it feels like I'm being 
rebuffed.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300488#comment-15300488
 ] 

Andrew Wang commented on HDFS-7240:
---

Quoting my earlier response to Anu:

bq. Really though, even if the community hadn't explicitly expressed interest, 
all of this activity should still have been done in a public forum. It's very 
hard for newcomers to ramp up unless design discussions are being done publicly.

TDevelopment is supposed to be done in the open so everyone can watch and 
contribute.

Also, code is not the only form of contribution. As also mentioned above, we 
had a call in late February this year, which is where we discussed the need for 
range partitioning (something I'm glad is being done in the new design) as well 
as raising concerns about the number of hops to lookup and read data (which I'm 
guessing is why metadata is now replicated via Raft rather than stored in 
containers).

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Andrew Wang
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-24 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299336#comment-15299336
 ] 

Kai Zheng commented on HDFS-7240:
-

Looking forward, things could be better, given the prototype implementation, 
the upcoming updated design doc, and now also important, it's in the right 
track under this active discussion. IMHO, it may help if you guys could meet 
and discuss about this together, as HDFS erasure coding effort did, considering 
this as another significant architecture change to the project. I wish the 
overall direction and design doc could be settled down sooner, and would also 
try to catch up.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-24 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299319#comment-15299319
 ] 

Arpit Agarwal commented on HDFS-7240:
-

bq. I actually would be delighted to commit my time and energy to Ozone 
development
bq. I would love to collaborate with everyone on this project. 
Andrew, what has been your technical contribution over the last year to help 
move the project forward? Did you give any thought to how the architecture spec 
could be converted to a technically feasible design and did you at any time 
post your ideas on the Jira or approach the developers who were prototyping in 
the feature branch?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-24 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299197#comment-15299197
 ] 

Andrew Wang commented on HDFS-7240:
---

[~arpitagarwal] the impedance mismatch here is illustrated in your most recent 
comment:

bq. We will post a design doc soon and there is ample opportunity/need for 
community contributions as the implementation is still in an early stage. This 
is in line with how features are developed in Apache.

The Apache community is supposed to be involved in the design too, not just the 
implementation. I thought we were doing this, since we had a nice design 
discussion when the architecture doc was released, and when we last spoke in 
late February this year, the design seemed unchanged from the design doc.

Since then, it's clear that a lot of work has been done internally at 
Hortonworks, without community involvement. I consider changing how metadata is 
stored to be a very significant design change, as well as the addition of a new 
master service.

If the design is still flexible and under discussion, great. What it feels like 
though is a completed design being dropped on us. It's hard for external 
contributors to interpret these design changes without the related context and 
discussions. If the design is viewed as completed and just needs 
implementation, it's also hard for us to make meaningful design changes.

Again, I would love to collaborate with everyone on this project. HDFS scale is 
a topic at the forefront of my mind, and we would all benefit from working 
together on a single solution. But that requires opening it up so 
non-Hortonworkers can be deeply involved in the requirements and design, not 
just implementation.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-20 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294022#comment-15294022
 ] 

Arpit Agarwal commented on HDFS-7240:
-

Andrew, this tone is not helpful. Nothing we presented at ApacheCon (also an 
Apache forum) was a significant change from the architecture doc. We will post 
a design doc soon and there is ample opportunity/need for community 
contributions as the implementation is still in an early stage. This is in line 
with how features are developed in Apache.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-20 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293822#comment-15293822
 ] 

Colin Patrick McCabe commented on HDFS-7240:


bq. Another question about reading the ApacheCon slides: the question "Why an 
Object Store" was well answered. How about "why an object store as part of 
HDFS"? IIUC Ozone is only leveraging a very small portion of HDFS code. Why 
should it be a part of HDFS instead of a separate project?

That's a very good question.  Why can't ozone be its own subproject within 
Hadoop?  We could add a hadoop-ozone directory at the top level of the git 
repo.  Ozone seems to be reusing very little of the HDFS code.  For example, it 
doesn't store blocks the way the DataNode stores blocks.  It doesn't run the 
HDFS NameNode.  It doesn't use the HDFS client code.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-19 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292351#comment-15292351
 ] 

Zhe Zhang commented on HDFS-7240:
-

Thanks for the discussions [~andrew.wang] [~anu].

I'm still trying to catch up all the new updates (looking forward to the 
updated design doc, maybe also post the ApacheCon video?). Meanwhile, some 
thoughts around EC:

bq. My biggest concern is that erasure coding is not a first-class 
consideration in this system, and seems like it will be quite difficult to 
implement.
I agree. The most difficult parts in building HDFS-EC was to remove / 
generalize inflexible assumptions in HDFS block management and namespace logics 
about replicas etc. So it would help a lot to at least conceptually discuss the 
plan for EC implementation in the new design doc.

Another question about reading the ApacheCon slides: the question "Why an 
Object Store" was well answered. How about "why an object store as part of 
HDFS"? IIUC Ozone is only leveraging a very small portion of HDFS code. Why 
should it be a part of HDFS instead of a separate project?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-17 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288087#comment-15288087
]

Andrew Wang commented on HDFS-7240:
---

Thanks for the reply Anu, I'd like to follow up on some points.

bq. Nothing in ozone prevents a chunk being EC encoded. In fact ozone makes no
assumptions about the location or the types of chunks at all...
bq. The chunks will support remote blocks...

My understanding was that the SCM was the entity responsible for the equivalent
of BlockPlacementPolicy, and doing it on containers. It sounds like that's
incorrect, and each container is independently doing chunk placement. That
raises a number of questions:

* How are we coordinating distributed data placement and replication? Are all
containers heartbeating to other containers to determine liveness? Giving up
global coordination of replication makes it hard to do throttling and control
use of top-of-rack switches. It also makes it harder to understand the
operation of the system.
* Aren't 4MB chunks a rather small unit for cross-machine replication? We've
been growing the HDFS block size over the years as networks get faster, since
it amortizes overheads.
* Does this mean also we have a "chunk report" from the remote chunk servers to
the master?

I also still have the same questions about mutability of an EC group requiring
the parities to be rewritten. How are we forming and potentially rewriting EC
groups?

bq. The fact that QJM was not written as a library makes it very hard for us to
pull it out in a clean fashion. Again if you feel very strongly about it,
please feel free to move QJM to a library which can be reused and all of us
will benefit from it.

I don't follow the argument that a new consensus implementation is more
understandable than the one we've been supporting and using for years. Working
with QJM, and adding support for missing functionality like multiple logs and
dynamic quorum membership, would also have benefits in HDFS.

I'm also just asking questions here. I'm not required to refactor QJM into a
library to discuss the merits of code reuse.

bq. Nothing in the chunk architecture assumes that chunk files are separate
files. The fact that a chunk is a triplet {FileName, Offset, Length} gives you
the flexibility to store 1000s of chunks in a physical file.

Understood, but in this scenario how do you plan to handle compaction? We
essentially need to implement mutability on immutability. The traditional
answer here is an LSM tree, a la Kudu or HBase. If this is important, it really
should be discussed.

One easy option would be storing the data in LevelDB as well. I'm not sure
about the performance though, and it also doesn't blend well with the
mutability of EC groups.

bq. So once more – just make sure we are on the same page – Merges are rare(not
required generally) and splits happen if we want to re-distribute data on a
same machine.

I think I didn't explain my two points thoroughly enough. Let me try again:

The first problem is the typical write/delete pattern for a system like HDFS.
IIUC in Ozone, each container is allocated a contiguous range of the keyspace
by the KSM. As an example, perhaps the KSM decides to allocate the range
{{(i,j]}} to a container. Then, the user decides to kick off a job that writes
a whole bunch of files with the format {{ingest/file_N}}. Until we do a split,
all those files are landing in that i,j]}} container. So we split. Then,
it's common for ingested data to be ETL'd and deleted. If we split earlier,
that means we now have a lot of very small containers. This kind of hotspotting
is less common in HBase, since DB users aren't encoding this type of nested
structure in their keys.

The other problem is that files can be pretty big. 1GB is common for data
warehouses. If we have a 5GB container, a few deletes could quickly drop us
below that target size. Similarly, a few additions can quickly raise us past it.

Would appreciate an answer in light of the above concerns.

bq. So the container infrastructure that we have built is something that can be
used by both ozone and HDFS...In future, if we want to scale HDFS, containers
might be an easy way to do it.

This sounds like a major refactoring of HDFS. We'd need to start by splitting
the FSNS and BM locks, which is a massive undertaking, and possibly
incompatible for operations like setrep. Moving the BM across an RPC boundary
is also a demanding task.

I think a split FSN / BM is a great architecture, but it's also something that
has been attempted unsuccessfully a number of times in the community.

bq. These applications are written with the assumption of a Posix file system,
so migrating them to Ozone does not make much sense.

If we do not plan to support Hive and HBase, what is the envisioned set of
target applications for Ozone?

bq. Yes, we have seen this issue in real

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-16 Thread Anu Engineer (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285948#comment-15285948
]

Anu Engineer commented on HDFS-7240:

[~andrew.wang] Thank you for your comments, They are well thought out and
extremely valuable questions. I will make sure that all areas that you are
asking about is discussed in the next update of design doc.

bq. Anu said he'd be posting a new design doc soon to address my questions.

I am working on that, but just to make sure your questions are not lost in the
big picture of the design doc, I am answering them individually here.

bq. My biggest concern is that erasure coding is not a first-class
consideration in this system.
Nothing in ozone prevents a chunk being EC encoded. In fact ozone makes no
assumptions about the location or the types of chunks at all. So it is quite
trivial to create a new chunk type and write them into containers. We are
focused on overall picture of ozone right now, and I would welcome any
contribution you can make on EC and ozone chunks if that is a concern that you
would like us to address earlier. From the architecture point of view I do not
see any issues.

bq. Since LevelDB is being used for metadata storage and separately being
replicated via Raft, are there concerns about metadata write amplification?
Metadata is such a small slice of information of a block – really what you are
saying is that block Name, hash for the block gets written twice, once thru
RAFT log and second time when RAFT commits this information. Since the data we
are talking about is so small I am not worried about it at all.

bq. Can we re-use the QJM code instead of writing a new replicated log
implementation? QJM is battle-tested, and bq.sensus is a known hard problem to
get right.
We considered this, however the consensus is to write a *consensus protocol*
that is easier to understand and make it easy for more contributors to work on
it. The fact that QJM was not written as a library makes it very hard for us to
pull it out in a clean fashion. Again if you feel very strongly about it,
please feel free to move QJM to a library which can be reused and all of us
will benefit from it.

bq. Are there concerns about storing millions of chunk files per disk? Writing
each chunk as a separate file requires more metadata ops and fsyncs than
appending to a file. We also need to be very careful to never require a full
scan of the filesystem. The HDFS DN does full scans right now (DU, volume
scanner).
Nothing in the chunk architecture assumes that chunk files are separate files.
The fact that a chunk is a triplet \{FileName, Offset, Length\} gives you the
flexibility to store 1000s of chunks in a physical file.

bq. Any thoughts about how we go about packing multiple chunks into a larger
file?
Yes, write the first chunk and then write the second chunk to the same file. In
fact, chunks are specifically designed to address the small file problem. So
two keys can point to a same file.
For example
KeyA -> \{File,0, 100\}
KeyB -> \{File,101, 1000\} Is a perfectly valid layout under container
architecture

bq. Merges and splits of containers. We need nice large 5GB containers to hit
the SCM scalability targets. Together, these factors mean Ozone will likely be
doing many more merges and splits than HBase to keep the container size high

Ozone actively tries to avoid merges and tries to split only when needed. A
container can be thought of as a really large block, so I am not sure if I am
going to see anything other than standard block workload on containers. The
fact that containers can be split, is something that allows us to avoid
pre-allocation of container space. That is merely a convenience and if you
think of these as blocks, you will see that it is very similar.

Ozone will never try to do merges and splits at HBase level. From the container
and ozone perspective we are more focused on a good data distribution on the
cluster – aka what the balancer does today, and containers are a flat namespace
– just like blocks which we allocate when needed.

So once more – just make sure we are on the same page – Merges are rare(not
required generally) and splits happen if we want to re-distribute data on a
same machine.

bq. What kind of sharing do we get with HDFS, considering that HDFS doesn't use
block containers, and the metadata services are separate from the NN? not
shared?

Great question. We initially started off by attacking the scalability question
of ozone and soon realized that HDFS scalability and ozone scalability has to
solve the same problems. So the container infrastructure that we have built is
something that can be used by both ozone and HDFS. Currently we are focused on
ozone and containers will co-exist on datanodes with blockpools. That is ozone
should be and will be deployable on a vanilla HDFS cluster. In

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-16 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285579#comment-15285579
 ] 

Anu Engineer commented on HDFS-7240:


[~andrew.wang] Thank you for showing up at the talk and having a very 
interesting follow up conversation. I am glad that we are continuing that on 
this JIRA. Just to make sure that others who might read our discussion gets the 
right context, Here are the slides from the talk 

[http://schd.ws/hosted_files/apachebigdata2016/fc/Hadoop%20Object%20Store%20-%20Ozone.pdf]




> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-16 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285372#comment-15285372
]

Andrew Wang commented on HDFS-7240:
---

Hi all, I had the opportunity to hear more about Ozone at Apache Big Data, and
chatted with Anu afterwards. Quite interesting, I learned a lot. Thanks Anu for
the presentation and fielding my questions.

I'm re-posting my notes and questions here. Anu said he'd be posting a new
design doc soon to address my questions.

Notes:

* Key Space Manager and Storage Container Manager are the "master" services in
Ozone, and are the equivalent of FSNamesystem and the BlockManager in HDFS.
Both are Raft-replicated services. There is a new Raft implementation being
worked on internally.
* The block container abstraction is a mutable range of KV pairs. It's
essentially a ~5GB LevelDB for metadata + on-disk files for the data. Container
metadata is replicated via Raft. Container data is replicated via chain
replication.
* Since containers are mutable and the replicas are independent, the on-disk
state will be different. This means we need to do logical rather than physical
replication.
* Container data is stored as chunks, where a chunk is maybe 4-8MB. Chunks are
immutable. Chunks are a (file, offset, length) triplet. Currently each chunk is
stored as a separate file.
* Use of copysets to reduce the risk of data loss due to independent node
failures.

Questions:

* My biggest concern is that erasure coding is not a first-class consideration
in this system, and seems like it will be quite difficult to implement. EC is
table stakes in the blobstore world, it's implemented by all the cloud
blobstores I'm aware of (S3, WASB, etc). Since containers are mutable, we are
not able to erasure-code containers together, else we suffer from the
equivalent of the RAID-5 write hole. It's the same issue we're dealing with on
HDFS-7661 for hflush/hsync EC support. There's also the complexity that a
container is replicated to 3 nodes via Raft, but EC data is typically stored
across 14 nodes.
* Since LevelDB is being used for metadata storage and separately being
replicated via Raft, are there concerns about metadata write amplification?
* Can we re-use the QJM code instead of writing a new replicated log
implementation? QJM is battle-tested, and consensus is a known hard problem to
get right.
* Are there concerns about storing millions of chunk files per disk? Writing
each chunk as a separate file requires more metadata ops and fsyncs than
appending to a file. We also need to be very careful to never require a full
scan of the filesystem. The HDFS DN does full scans right now (DU, volume
scanner).
* Any thoughts about how we go about packing multiple chunks into a larger file?
* Merges and splits of containers. We need nice large 5GB containers to hit the
SCM scalability targets. However, I think we're going to have a harder time
with this than a system like HBase. HDFS sees a relatively high delete rate for
recently written data, e.g. intermediate data in a processing pipeline. HDFS
also sees a much higher variance in key/value size. Together, these factors
mean Ozone will likely be doing many more merges and splits than HBase to keep
the container size high. This is concerning since splits and merges are
expensive operations, and based on HBase's experience, are hard to get right.
* What kind of sharing do we get with HDFS, considering that HDFS doesn't use
block containers, and the metadata services are separate from the NN? not
shared?
* Any thoughts on how we will transition applications like Hive and HBase to
Ozone? These apps use rename and directories for synchronization, which are not
possible on Ozone.
* Have you experienced data loss from independent node failures, thus
motivating the need for copysets? I think the idea is cool, but the RAMCloud
network hardware expectations are quite different from ours. Limiting the set
of nodes for re-replication means you have less flexibility to avoid
top-of-rack switches and decreased parallelism. It's also not clear how this
type of data placement meshes with EC, or the other quite sophisticated types
of block placement we currently support in HDFS.
* How do you plan to handle files larger than 5GB? Large files right now are
also not spread across multiple nodes and disks, limiting IO performance.
* Are all reads and writes served by the container's Raft master? IIUC that's
how you get strong consistency, but it means we don't have the same performance
benefits we have now in HDFS from 3-node replication.

I also ask that more of this information and decision making be shared on
public mailing lists and JIRA. The KSM is not mentioned in the architecture
document, nor the fact that the Ozone metadata is being replicated via Raft
rather than stored in containers. I not aware that there is already progress

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-23 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022706#comment-15022706
 ] 

Anu Engineer commented on HDFS-7240:


Please see INFRA-10720 to see the status of branch deletes. I am presuming 
sooner or later we will be able to delete the smaller caps "hdfs-7240".

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022724#comment-15022724
 ] 

Tsuyoshi Ozawa commented on HDFS-7240:
--

Thank you for following up, Anu and Chris. 

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-22 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021179#comment-15021179
 ] 

Tsuyoshi Ozawa commented on HDFS-7240:
--

2 branches, HDFS-7240 and hdsfs-7240, seems to be created on repository.

{quote}
$ git pull
>From https://git-wip-us.apache.org/repos/asf/hadoop
 * [new branch]  HDFS-7240  -> origin/HDFS-7240
 * [new branch]  hdfs-7240  -> origin/hdfs-7240
{quote}

Is it assumed one?

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021191#comment-15021191
 ] 

Chris Nauroth commented on HDFS-7240:
-

The upper-case one is the correct one.  The lower-case one was created by 
mistake.  We haven't been able to delete it yet though because of the recent 
ASF infrastructure changes to prohibit force pushes.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021192#comment-15021192
 ] 

Chris Nauroth commented on HDFS-7240:
-

The upper-case one is the correct one.  The lower-case one was created by 
mistake.  We haven't been able to delete it yet though because of the recent 
ASF infrastructure changes to prohibit force pushes.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-11-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021193#comment-15021193
 ] 

Chris Nauroth commented on HDFS-7240:
-

The upper-case one is the correct one.  The lower-case one was created by 
mistake.  We haven't been able to delete it yet though because of the recent 
ASF infrastructure changes to prohibit force pushes.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-07-28 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644810#comment-14644810
 ] 

Colin Patrick McCabe commented on HDFS-7240:


[~jnp], [~sanjay.radia], did we come to a conclusion about range partitioning?

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-07-06 Thread Thomas Demoor (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614705#comment-14614705
 ] 

Thomas Demoor commented on HDFS-7240:
-

[~john.jian.fang] and [~jnp]: 
* Avoiding rename happens in [HADOOP-9565] by introducing ObjectStore (extends 
Filesystem) and letting FileOutputCommitter, Hadoop CLI, ... act on this (by 
avoiding rename). Ozone could easily extend ObjectStore and benefit from this.
* [HADOOP-11262] extends DelegateToFileSystem to implement s3a as an 
AbstractFileSystem and works around issues as modification times for 
directories (cfr. Azure).

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-07-06 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615639#comment-14615639
]

Colin Patrick McCabe commented on HDFS-7240:

bq. 3) We will partition objects using hash partitioning and range
partitioning. The document already talks about hash partitioning, I will add
more details for range partition support. The small objects will also be stored
in the container. In the document I mentioned leveldbjni, but we are also
looking at rocksDB for container implementation to store objects in the
container.

Interesting, thanks for posting more details about this.

Maybe this has been discussed on another JIRA (I apologize if so), but does
this mean that the admin will have to choose between hash and range
partitioning for a particular bucket? This seems suboptimal to me... we will
have to support both approaches, which is more complex, and admins will be left
with a difficult choice.

It seems better just to make everything range-partitioned. Although this is
more complex than simple hash partitioning, it provides performance
compatibility with s3 and other object stores. s3 provides a fast
(sub-linear) way of getting all the keys in between some A and B. It will be
very difficult to really position ozone as s3-compatible if operations that are
quick in s3 such as listing all the keys between A and B are O(num_keys^2) in
ozone.

For example, in (all of) Hadoop's s3 filesystem implementations, listStatus
uses this quick listing of keys between A and B. When someone does listStatus
/a/b/c, we can ask s3 for all the keys between /a/b/c/ and /a/b/c0 (0 is the
ASCII value right after slash). Of course, s3 does not really have
directories, but we can treat the keys in this range as being in the directory
/a/b/c for the purposes of s3a or s3n. If we just had hash partitioning, this
kind of operation would be O(N^2) where N is the number of keys. It would just
be infeasible for any large bucket.

Object store in HDFS

Key: HDFS-7240
URL: https://issues.apache.org/jira/browse/HDFS-7240
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
Attachments: Ozone-architecture-v1.pdf

This jira proposes to add object store capabilities into HDFS.
As part of the federation work (HDFS-1052) we separated block storage as a
generic storage layer. Using the Block Pool abstraction, new kinds of
namespaces can be built on top of the storage layer i.e. datanodes.
In this jira I will explore building an object store using the datanode
storage, but independent of namespace metadata.
I will soon update with a detailed design document.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-07-06 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615318#comment-14615318
]

Jian Fang commented on HDFS-7240:
-

Thanks for all your explanations, however, I think you missed my points. Doable
and performance are two difference concepts. From my own experiences with S3
and s3 native file system, the most costly operations are listing keys and
copying data from one bucket to the other one to simulate the rename operation.
The former one will take a very long time for a bucket with millions of objects
and the latter one has a double performance penalty, i.e., assume your objects
are 1TB, you actually almost upload 2TB of data to s3. That is why fast key
listing and native fast rename operations are two of the most desirable
features for s3.

Before you make decision to follow the S3N API, I would suggest you actually
test the performance of S3N and get to know what are good and what are bad. Why
do you need to follow the bad ones at all?

It is still not very clear to me how do you guarantee your partitions are
balanced. HBase used region auto split to achieve that, which is also my
concern that the code and logic would grow rapidly when your object store
becomes really mature. In my personal opinion, it is better to build the object
store on top of HDFS and leave HDFS to be simple.

Object store in HDFS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608751#comment-14608751
]

Andrew Wang commented on HDFS-7240:
---

I see some JIRAs related to volumes; did we resolve the question of the 2-level
vs. 3-level scheme? Based on Colin's (and my own) experiences using S3, we did
not feel the need for users to be able to create buckets, which seemed to be
the primary motivation for volume - bucket. Typically users also create their
own hierarchy under the bucket anyway, and prefix scans become important then.

My main reason for asking is to make the API as simple and as similar to S3 as
possible, which should help with porting applications.

Object store in HDFS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Auto-Re: [jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread wsb

您的邮件已收到！谢谢！

Auto-Re: [jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread wsb

您的邮件已收到！谢谢！

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609050#comment-14609050
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


[~khanderao], thanks for the review.
1) Ozone stores data on datanode themselves, therefore it can provide locality 
for computations running on the datanodes. The hardware can be chosen based on 
the use case and computational needs on the datanodes.
2) Of course, it would be possible to have a dedicated object store or hdfs 
deployment, with datanodes configured to talk to respective blockpool.

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Auto-Re: [jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread wsb

您的邮件已收到！谢谢！

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread Jitendra Nath Pandey (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609037#comment-14609037
]

Jitendra Nath Pandey commented on HDFS-7240:

[~john.jian.fang], thanks for review and comments
1) There is some work going on to support yarn for s3 (HADOOP-11262) by
[~thodemoor] and [~steve_l]]. We hope we can leverage that to support mapreduce
use case.
2) This will be part of hdfs natively, unlike hbase which is a separate
service. There is only one additional daemon needed (storage container manager)
only if ozone is deployed. Hdfs already supports multiple namespaces and
blockpools, we will leverage that work. This work will not impact hdfs
manageability in anyway. There will be some additional code in datanode to
support storage containers. In future, we hope to use containers to store hdfs
blocks as well for hdfs block-space scalability.
3) We will partition objects using hash partitioning and range partitioning.
The document already talks about hash partitioning, I will add more details for
range partition support. The small objects will also be stored in the
container. In the document I mentioned leveldbjni, but we are also looking at
rocksDB for container implementation to store objects in the container.
4) The containers will be replicated and kept consistent using data pipeline,
similar to hdfs blocks. The document talks about it at a high level. We will
update more details in the related jira.

Object store in HDFS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-06-30 Thread Jitendra Nath Pandey (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609099#comment-14609099
]

Jitendra Nath Pandey commented on HDFS-7240:

For prefix scans we are planning to support range partitioned index in the
storage container manager. I will update the design document. That will allow
users to have hierarchy under the bucket.

The storage volumes do not really change the semantics that much from S3.
For a url like {{http://host:port/volume/bucket/key}}, apart from the volume
prefix, the semantics are really similar. I believe, having a notion of 'admin
created volumes' allows us to have buckets which are very similar to S3, as it
provides clear domains of admin control vs user control. Volume can be viewed
as accounts, which are kind of similar to accounts in S3 or Azure.
It will be a bigger deviation if we insist that only admin can create
buckets. It is useful to organize data in buckets using prefix, but having that
as the only mechanism for organizing data seems too restrictive.
For similarity to S3, we will try to have similar headers and auth
semantics as well, which are also very important for portability.

Object store in HDFS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 129 matches

Mail list logo