[jira] [Commented] (HDDS-1499) OzoneManager Cache

2019-08-12 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905335#comment-16905335
 ] 

Anu Engineer commented on HDDS-1499:


Just add some historical context: Many of these arguments predates us inventing 
the current method of doing things. Let me add the notes for the historical 
context; Please note, this tells us how we got here — not defending the design 
decision. At this point, I believe the question you asked is entirely valid.

1. When we first started working on this, we found that when you make an entry 
into the RAFT, it has to generate a LOG ID. This ID is a monotonically 
increasing number and *strictly* serial.

2. The first approach of the HA code was to move the current code into Raft 
callbacks. That created a small problem; We were in this RAFT +critical 
section+ and doing a lot of heavy work for Ozone. For example, during 
createVolume, we would end up looking up Ozone metadata to verify if the volume 
already exists, etc. That is, all this complicate code was getting executed in 
the call back from Ratis, which was serial.

3.  At this point, it should have been evident that if we stepped back and 
looked at the problem, you would see the solution that has been proposed by 
Nanda and you. However, there was a catch that prevented us from going there.

4. Raft protocol does not guarantee that when you read data from a leader, it 
is the real leader.  Please take a look at this issue ( 
https://github.com/etcd-io/etcd/issues/741)  and this paper for a deeper dive 
of the problem. 
https://www.usenix.org/system/files/conference/hotcloud17/hotcloud17-paper-arora.pdf

So whatever solution we can up with we failed in the correctness test because 
of this issue. That, we could never guarantee that the leader that we are 
reading from is the real leader.

At first, we tried to solve the problem with these constraints of RAFT 
protocol. That is, we thought we can solve the issue of RAFT protocol reading 
and writing without needing any extra protocol -- and it led to this callback 
architecture. 

For SCM, it is very obvious that we can move into an architecture which you are 
proposing that we can greatly simplify the code; or after the first version of 
solving all the HA issues in Ozone Manager; we will go and rewrite this code.

That is why we have this current code; *a combination of three things* -- A 
prototype class code, lack of Raft leader guarantees and trying to fit the 
current code into the existing Ratis code base.


For people who have the minimal background and trying to understand how the 
eventual solution should look like:

The current code (the Non-HA code) does the following.

1. A call comes in -- say, create volume -- the server looks up the metadata 
and proceeds to make updates to the metadata of the system in a consistent way.

2. With HA, what we will do is that -- the last update step will move away from 
the current update metadata in place to update metadata via Ratis. That is, a 
future object will be returned. 

3. When the future is completed, Ozone Manager or SCM will reply to the caller.

So yes; eventually we will get to that proposed architecture -- I am just 
telling you the problems that we worked on.

[~arp], [~bharatviswa], [~hanishakoneru] as the original developers of this 
feature, please feel free to jump in and correct me. I have just been an 
observer of this code evolution and commenting from what I recall. I have not 
been a participant to many of these discussions and I might have missed lots of 
details on why certain decisions were made.

Closing Comment: *Yes, I agree we should move into what [~nandakumar131] and 
you ([~elek]) has been proposing*. This current code allows us to get HA 
feature complete and working -- the biggest issue we have the lack of lease for 
the leader -- at this point that is a question of correctness -- once we fix 
that we can go back and refactor code in Ozone manager to reflect the changes 
proposed by both of you.





> OzoneManager Cache
> --
>
> Key: HDDS-1499
> URL: https://issues.apache.org/jira/browse/HDDS-1499
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> In this Jira, we shall implement a cache for Table.
> As with OM HA, we are planning to implement double buffer implementation to 
> flush transaction in a batch, instead of using rocksdb put() for every 
> operation. When this comes in to place we need cache in OzoneManager HA to 
> handle/server the requests for validation/returning responses.
>  
> This Jira will 

[jira] [Commented] (HDDS-1499) OzoneManager Cache

2019-08-05 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900108#comment-16900108
 ] 

Elek, Marton commented on HDDS-1499:


Sorry, I am very late in this party. 

I found some problem with the create volume (volume creation is not cached) and 
I am trying to understand the current design.

bq. well, I was really hoping that the fact that there is a cache is not 
visible to the layer that is reading and writing. Is there a reason why that 
should be exposed to calling applications?

This was as comment by [~anu] and I had the same question very soon. It was not 
clear for me why do we need separated methods for the TypedTable (cached 
put/get instead of the simple put/get which may or may not be cached.)

What I expected is to have a Table implementation where there is an 
in-memory map under the hood and a real rocksdb Table.

If I understood well the arguments in the PR (but fix me if I am wrong) It was 
not possible as this cache is not a traditional cache. When the value is added 
to the cache it may not be committed yet (as the cache is independent from the 
write path). It's more like an in memory overlay-table. If something is added 
to the in-memory table it should be used a return value.

But in this case the in-memory overlay table seems to be an independent 
component. As I can see the TableCache interface is a simplified version of a 
key-value table. In the original TypedTable a lot of methods just ignores the 
cache and if the cache is not updated manually (as it's not updated with the 
put method) the behavior will be inconsistent.

It seems to be more safe to separate the TableCache from TypedTable. 


> OzoneManager Cache
> --
>
> Key: HDDS-1499
> URL: https://issues.apache.org/jira/browse/HDDS-1499
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> In this Jira, we shall implement a cache for Table.
> As with OM HA, we are planning to implement double buffer implementation to 
> flush transaction in a batch, instead of using rocksdb put() for every 
> operation. When this comes in to place we need cache in OzoneManager HA to 
> handle/server the requests for validation/returning responses.
>  
> This Jira will implement Cache as an integral part of the table. In this way 
> users using this table does not need to handle like check cache/db. For this, 
> we can update get API in the table to handle the cache.
>  
> This Jira will implement:
>  # Cache as a part of each Table.
>  # Uses this cache in get().
>  # Exposes api for cleanup, add entries to cache.
> Usage to add the entries in to cache will be done in further jira's.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1499) OzoneManager Cache

2019-05-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843593#comment-16843593
 ] 

Hudson commented on HDDS-1499:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16573 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16573/])
HDDS-1499. OzoneManager Cache. (#798) (arp7: rev 
0d1d7c86ec34fabc62c0e3844aca3733024bc172)
* (add) 
hadoop-hdds/common/src/test/java/org/apache/hadoop/utils/db/cache/package-info.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/CacheKey.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DBStore.java
* (edit) 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
* (edit) 
hadoop-hdds/common/src/test/java/org/apache/hadoop/utils/db/TestTypedRDBTableStore.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/RDBTable.java
* (edit) hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/Table.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/TypedTable.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/PartialTableCache.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/TableCache.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/CacheValue.java
* (add) 
hadoop-hdds/common/src/test/java/org/apache/hadoop/utils/db/cache/TestPartialTableCache.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/EpochEntry.java
* (add) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/package-info.java


> OzoneManager Cache
> --
>
> Key: HDDS-1499
> URL: https://issues.apache.org/jira/browse/HDDS-1499
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> In this Jira, we shall implement a cache for Table.
> As with OM HA, we are planning to implement double buffer implementation to 
> flush transaction in a batch, instead of using rocksdb put() for every 
> operation. When this comes in to place we need cache in OzoneManager HA to 
> handle/server the requests for validation/returning responses.
>  
> This Jira will implement Cache as an integral part of the table. In this way 
> users using this table does not need to handle like check cache/db. For this, 
> we can update get API in the table to handle the cache.
>  
> This Jira will implement:
>  # Cache as a part of each Table.
>  # Uses this cache in get().
>  # Exposes api for cleanup, add entries to cache.
> Usage to add the entries in to cache will be done in further jira's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org