[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-10 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247935#comment-16247935
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

I agree it's a huge task, and there will be significant effort needed to 
refactor existing storage engine. IMO, this is the cost we need to pay, in 
order to make Cassandra a world class database. It helps us to estimate the 
resources and timeline of this project, but should be not the excuse that we 
can not do it.

RocksDB is not within the scope of this particular "pluggable storage engine" 
project, it's the motivation why we want the pluggable storage engine so much. 
Cassandra's read performance, especially P99 read latency is not great, while 
RocksDB is a solid and well tuned LSM engine. We have put multiple engineers, 
spent 6+ months to prove that we can get huge performance gains, by leveraging 
RocksDB as the storage engine. And we have deployed it in our production 
environment, under real traffic. So we are committed to the pluggable storage 
engine project, to avoid a fork of Cassandra within Instagram/Facebook.

Back to step 1, the scope/expectation/guideline of the project, I agree with 
Blake, Aleksey and Sylvain, we want to do it in right way, definitely not a 
hack in the database. I think we are on the same page of the high quality of 
the refactoring, and I'm very happy to discuss more details on step 1.  

I chatted with Nate, Blake and Jon offline, I will convert the design quip to a 
more formal design doc, and we can discuss there.

Also, I will change the title of this jira, the "First version of pluggable 
storage engine API." is probably a bit mis-leading at this moment.


> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247780#comment-16247780
 ] 

Joshua McKenzie commented on CASSANDRA-13475:
-

bq. I'd say it's at the bare minimum a full-time one man-year project assuming 
a solid engineer that is pretty familiar with the code base to start with
Given it took you, arguably one of the most knowledgeable engineers on the 
project since the start, a year and a half just to _refactor the Storage 
Engine_, refactoring even "just" the the inter-connected static state and 
tracking down and plugging sufficient abstraction leaks, not to mention 
invisible reliance on side effects / performance implications of the current 
formats, for things that touch that Storage Engine to make it safe to have it 
be pluggable...

Yeah, I'd be super impressed if a single person working full-time got to a 
deliverable place in two years TBH. It takes a hell of a lot of work and 
deliberation to unwind a decade's worth of code-base debt to where making a 
change like this wouldn't be super high risk.

bq. Don't get me wrong, I'd be very happy to see someone start to tackle that 
first step seriously, I think it's actually important for the project moving 
forward, but we're imo a long long way from being in a state where we can start 
to talk seriously about having pluggable storage engine in a clean way.
I second that. This is something we've talked about extensively for years but 
nobody has ever really been able to start chewing on it in an incremental way.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247651#comment-16247651
 ] 

Sylvain Lebresne commented on CASSANDRA-13475:
--

To chime in a bit here and for what it's worth, I concur with Blake's sentiment 
than a 'pluggable storage engine' is basically not imo realistic at this point.

The Cassandra is everything but modular, we've never really be terribly careful 
in isolating the different parts of the code behind clearly defined and well 
isolated interfaces/APIs. Why that is can probably be debated for hours on end, 
and it's a big code-base so there is certainly parts that are much better than 
others. But on the whole, I'm not sure how one could qualify our story around 
modularity and abstraction of high level concepts
in other terms than "it's a mess".

So, and I'm kind of paraphrasing Blake here, I do feel that talking about a 
pluggable storage API today is completely missing step 1, which is a pretty 
massive refactor of the code to modularize and abstract much more cleanly the 
different part of the code base. Trying to bolt a pluggable storage API on our 
current mess without that first step would, in my professional opinion, but a 
terrible mistake for the project: I suspect it'll create a maintenance mess 
while having minuscule changes to bear real fruits.

And I hope no-one is underestimating that first step: if I had to guess, and 
assuming we're talking about doing this is in a somewhat incremental way so it 
doesn't disturb all other dev in the meantime, I'd say it's _at the bare 
minimum_ a full-time one man-year project assuming a solid engineer that is 
pretty familiar with the code base to start with. I'd be of course the first to 
admit I suck at that kind of estimation so I'm probably way off, but I also 
can't remember the last time I _under_estimated something like that.

Don't get me wrong, I'd be very happy to see someone start to tackle that first 
step seriously, I think it's actually important for the project moving forward, 
but we're imo a long long way from being in a state where we can start to talk 
seriously about having pluggable storage engine in a clean way.



> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246135#comment-16246135
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston] okay, which one do you want to discuss at this moment? 

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246022#comment-16246022
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

No I don’t think we’ve agreed on the scope and boundaries of the project. There 
have been a few ideas thrown around, but we haven't agreed on anything concrete.

Also, let's try to keep the conversation focused on one thing at a time. We 
tentatively agreed on a rough plan last week, then we started talking about 
expectations, guidelines, and non technical stuff. Let’s finish talking about 
that. After that, let’s talk about the scope. If you want to revisit the 
tentative plan after that, that’s fine, but let’s avoid jumping around too much.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245913#comment-16245913
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Steps 4-11 all involve moving stuff into something called the CQLStorageEngine. 
Earlier in the discussion, this was the native cassandra implementation of the 
proposed storage interface. I guess that might have changed, but it's not 
really clear what CQLStorageEngine is

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245921#comment-16245921
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], I thought we already agreed on the scope and boundary of the 
storage engine layer. If that's true, then even when there is no storage engine 
api layer, we should create an CQLStorageEngine package, and start to move 
things belong to the storage engine into that package, right? It will be all 
the stuff within the storage engine layer, and we can create an interface from 
them at the end, in step 16.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245843#comment-16245843
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], sorry, I don't understand, I'm not adding a new storage engine 
api in the proposed plan, right?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-09 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245784#comment-16245784
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Personally, I’m pretty -1 on the idea of adding a new storage engine api. This 
should focus on fixing the leaks in the interfaces currently exposed by 
Keyspace and CFS, and using them as the primary extension points. Using these 2 
classes as extension points should also reduce the surface area of things 
you’ll need to refactor.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-08 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245246#comment-16245246
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

Update: I'm working with [~bdeggleston] and [~iamaleksey] on the plan, here is 
an updated one:

1. Agree on the scope, expectation, and guidelines of this project.
2. Agree on the boundaries of the storage engine layer.
3. Work out the (high level) designs about how to refactor each major component 
in Cassandra.
4. Create an empty package for a CQLStorageEngine.
5. Migrate read path into the CQLStorageEngine. Keep the Partition interface, 
but move other implementations into the CQLStorageEngine package.
6. Migrate write path into the CQLStorageEngine.
7. Migrate compaction into the CQLStorageEngine.
8. Migrate most of the CFS operations implementations, like create, drop, 
flush, etc into the CQLStorageEngine.
9. Migrate other components, like Cache, Indexes, into CQLStorageEngine.
10. Refactor streaming, move logic into CQLStorageEngine.
11. Refactor repair, move logic into CQLStorageEngine.
12. Refactor each leaky group of cfs components.
13. Refactor each non-cfs system that interacts with storage layer.
14. Refactor metrics.
15. Refactor schema, metadata changes.
16. Extract interfaces from CFS and keyspace.
17. Introduce pluggable Keyspace/CFS factories controlled by schema.

Blake thinks we should prioritize the work on streaming and repair.

After we get consensus on it, I will start to create sub jiras for each of the 
item. 

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-04 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239329#comment-16239329
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Dikang and I spoke offline, and my proposed plan seems reasonable to him. 

So I think the next step would be to talk about the non technical side of this. 
The pluggable storage project’s place in Cassandra, and some general guidelines 
for how to approach the sub projects. Once we’ve converged on something in this 
jira, we should put it up on the dev list for a wider audience / additional 
feedback. My thoughts are below:

First, pluggable storage’s place in the Cassandra project:

For the time being, I think we should approach this as an effort to properly 
modularize storage related parts of Cassandra. The motivation being to enable 
experimentation with alternate storage ideas without having to resort to awful 
hacks, not ‘add pluggable storage to Cassandra’

I think this work could definitely lead to pluggable storage being a part of 
Cassandra at some point, and that it could be beneficial to users. However, I 
don’t think it’s a good idea to start with the intention of supporting, 
directly or indirectly, secondary storage layers. Both because of how it would 
impact development on core Cassandra, and also because of how it would affect 
user expectations about the storage options available to them.

Let’s start with making it possible, and then see where things go from there.

The short term implications for rocksdb would be that there may be api changes 
in minor releases they’d have to worry about, and they’ll still need a fork. 
The long term implications would be that pluggable storage may never really 
become an official part of Cassandra, so there’s risk in investing a lot of 
time in it.

Next, guidelines on approaching each incremental component.

Whenever we commit some code modularizing something, the overarching storage 
modularization project itself should remain abandon-able. In other words, if 
work stops on this project for some reason, there shouldn’t be any need to go 
back and revert any of the previous work.

Each component refactor, should, as much as possible, make sense on it’s own. 
Especially larger ones. Each project’s affect on internal decoupling and 
testability should be positive. We also can’t make core development work more 
difficult.

Finally, I’ve discussed this with Dikang offline, but just so no one’s 
surprised if I say this in the future: I don’t think a rocksdb backend makes 
sense for Cassandra. Cassandra is a sorted lsm, rocksdb is a sorted lsm, and we 
don’t need 2 of them. If we want rocksdb performance in Cassandra, it would 
probably take less time to close the gap by optimizing the existing engine than 
it will to do all this work to make storage pluggable. 

That said, I think this project is a good thing, and I’m happy to help. I think 
that the modularization work will be good for Cassandra. It enables a member of 
our dev community to try something new without committing the entire project to 
it, it will clean up some of our messy internals, and I think it will help us 
more quickly adapt to some of the changes in storage technology that are on the 
horizon.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-03 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237818#comment-16237818
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

[~dikanggu] SLOW DOWN!

I’m not asking you questions, or asking for more details, I’m trying to have a 
discussion with you. This is going to be a long process, and without a high 
level plan or strategy, it’s not going anywhere.

Each point in the plan proposal I posted is going to end up being it’s own 
lengthy discussion. A few sentences in a quip doc about a major cassandra 
component is nearly meaningless, and is just noise at this point. Honestly, I’d 
suggest you edit out the pasted doc here just to remove an unnecessary wall of 
text which is linked elsewhere.

Now look, each of my points (well pair of points, discussion/implementation are 
split up chronologically) is a component that’s going to need to be 
individually refactored. As an incremental approach to abstracting the storage 
layer, does this make sense?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-03 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237367#comment-16237367
 ] 

Stefan Podkowinski commented on CASSANDRA-13475:


This sounds... **very** ambitious. ;) 

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237080#comment-16237080
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], I think I merge most of your questions into the quip, here is a 
snapshot of it:

==
Apache Cassandra Pluggable Storage Engine

What is a Cassandra pluggable storage engine

A Cassandra pluggable storage engine is the component in Cassandra database 
sever that is responsible for managing how data is stored, both in memory and 
disk, and performing actual data I/O operations for a database, as well as 
enabling certain features sets that target a specific application need.

More concretely, the storage engine will be **responsible** for:

1. Physical Storage: This involves everything from supporting C* data types and 
table schema, as well as the format used for storing data to physical disk.
2. Query: storage engine will support point query and range query of data 
stored in the database.
3. Memory Caches: The storage engine may implement row cache or block cache, 
for query performance optimization.
4. Advanced Data Types: Like list/map/counter, it's up to storage engine 
whether to support the advanced data types or not.
5. Index Support: It's up to storage engine whether to support secondary index 
of the stored data or not.

The storage engine will **NOT be responsible **for any distributed or network 
features, like schema, gossip, replication, streaming, repair, etc. Those 
features need to be implemented on top of the storage engine.


Project Goal

* Clear interface of the Pluggable Storage Engine, which means there is clear 
boundary on the storage engine, and we can drop in any storage engine 
implementation without any change of other components.
* Refactor existing Cassandra code base to follow the pluggable storage engine 
architecture.

Timelines/Guidelines

I expect it will be year long effort to refactor existing storage engine to 
follow a mature pluggable storage engine API. During the time, we will refactor 
the existing storage engine piece by piece, there should be no regression 
(performance, reliability or testability) introduced during the process.

(Very high level) Designs

streaming

Current streaming is coupled with storage engine, but it's not necessary. The 
StreamSession class could be very general streaming handling framework. My 
proposal is that, for the three streaming phases:

1. Connections Initialization: It could be remain unchanged.
2. Stream preparation phase: We abstract the StreamTransferTask and 
StreamReceiveTask, each storage engine will implement its own TransferTask and 
ReceiveTask, which hide the details about how to buck read/write to the storage 
engine.
3. Streaming phase: Each storage engine implement its own StreamReader and 
StreamWriter, to read/write data from/into the stream. On the receiving side, 
once the streamed message is fully received, the implementation will be 
responsible for ingesting the streamed files into the engine, and make it 
available for client requests.

repair

For repair, my idea is that we can keep the high level design, that uses Merkle 
trees to calculate the difference, and then uses the streaming framework to 
streaming the data. To calculate Merkle trees, different storage engine will 
have different implementation, a naive way is to sequential scan a token range 
to build the Merkel trees, and then stream the inconsistent token range. It 
should be doable. But the incremental repair may not be supported by all 
storage engines.


keyspace Metadata

Let's say we can config the storage engine per keyspace, under this design, we 
can add a storage engine option in the KeyspaceParams which is stored in 
KeyspaceMetadata. We can support setting the storage engine during the creation 
of the keyspace, in CQL. 

Also, we can support the mechanism to be able to overwrite the option per 
server. In this case, streaming between different storage engine needs to be 
supported.

When we open or initial a keyspace, we will pick the specific storage engine 
based on the option in KeyspaceParams.

table metadata

I think we can keep most of the options in the TableParams, 
https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/schema/TableParams.java#L36

The storage engine needs to respect the options in the TableParams, and apply 
them if possible. For example, if the storage engine is not a LSM tree based 
implementation, then it may not need compaction, then it will ignore that 
option.

For storage engine specific options, again, like the compaction, we can move 
them out of the general params, and allow to load them from some config files.

Metrics

Each storage engine can implement its own JMX/MBeans, then metrics can still be 
exposed through JMX.

read path

Each storage 

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236746#comment-16236746
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Let's keep discussion on this jira for the time being. Also, we're just talking 
about a plan at this point. What do you think of the plan as proposed? Any 
concerns? Things you think should be added, removed, or reordered?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236734#comment-16236734
 ] 

Jason Brown commented on CASSANDRA-13475:
-

[~dikanggu] please send out a message to the dev@ ML with the link to your quip 
doc, that way folks who aren't following this ticket (right now) can know where 
the action is taking place.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236710#comment-16236710
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], yeah, they are very good points. To have a central place for 
the discussion, I will try to answer your questions, and add more details to 
the quip: https://quip.com/bhw5ABUCi3co. Everyone should have access to the 
quip, and please feel free to edit/comment on it.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236648#comment-16236648
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

I think it’s too early to start looking at code, or talking about api 
specifics. We should start by getting a rough plan together. My thoughts on an 
initial plan are below. This is just a rough idea dump, so let me know if I’ve 
missed anything.

# Discuss expectations, guidelines, non-technical stuff, etc.
** Let’s start off by making sure we’re all on the same page about:
*** What we expect the end result to be
*** Guidelines on planning / implementing component refactors
*** Any approximate timelines you have in mind, if any
*** Pluggable storage's place in the cassandra project
# Agree on the boundaries of the storage engine layer. What it is and isn’t 
responsible for.
** This has already been discussed to some degree, but let’s agree on a 
definition.
# Work out a strategy for streaming and repair
** This is a bit hand wavy at the moment, and not having a solid streaming and 
repair story is a non starter. So let’s figure out how that’s going to work 
(including incremental repair) before we get too deep into anything els
# Decide how schema ui / metadata will be refactored to support multiple 
storage engines
# Work out a strategy for exposing metrics / monitoring from different engines.
# Migrate read command and write logic into cfs
# Identify remaining leaky parts of CFS class.
** Some of this will be legit storage implementation details. Other parts will 
be systems we’ve missed, or things that need to be abstracted.
# Identify systems not controlled by CFS that interacts with storage layer on 
it’s own
# Implement streaming / repair changes
# Refactor each leaky group of cfs components
# Refactor each non-cfs system that interacts with storage layer.
# Refactor metrics/monitoring systems
# Refactor schema ui, metadata implementation
# Extract interfaces from CFS and keyspace
# Introduce pluggable Keyspace/CFS factories controlled by schema

Thoughts?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-01 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235210#comment-16235210
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~iamaleksey], [~bdeggleston], I write a new 
[patch|https://github.com/DikangGu/cassandra/commit/6d690c859cb640c320f25888cea1bdeb41565c2b]
 of the interfaces abstracted from the Keyspace/, and also a quip about the 
components we need (and not need) to refactor and move to a CQLStorageEngine, 
https://quip.com/bhw5ABUCi3co. 

Before I go deep and start to refactor the Keyspace/CFS classes, I'd like to 
hear your thoughts on it.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-31 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227275#comment-16227275
 ] 

Aleksey Yeschenko commented on CASSANDRA-13475:
---

bq. So next step, I will try to abstract the keyspace and CFS classes, as you 
suggested, and have a list of steps of how I will refactor existing engine

I'd suggest starting with a plan first, before you invest too much work into 
coding it up, so that you don't waste any time in case there is disagreement 
regarding the roadmap (though some code changes just for 
exploration/discovery/illustration could help too).

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-31 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227235#comment-16227235
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~iamaleksey], thanks for commenting on it, I agree with most of what you said, 
like the metadata, metrics, etc. And I totally understand that it's a huge 
effort to refactor current storage engine, that's why I want to do it 
gradually, so that we can have better test coverage for each stage. I still 
think StorageEngine API is a better approach than clean up the Keyspace/CFS as 
the first step. But at the meanwhile, I understand your concern that the 
storage engine api I proposed only shows how I will implement a new engine but 
probably does not demonstrate how I will refactor existing storage engine. 

So next step, I will try to abstract the keyspace and CFS classes, as you 
suggested, and have a list of steps of how I will refactor existing engine. As 
in the storage engine API, I think streaming will be the biggest challenge.

Take one step back, I'm very sold on the pluggable storage engine idea, given 
all the gains we have been seen. I will continue to try to push it forward. 

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-31 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226871#comment-16226871
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

+1 on a plan. I think the CFS vs StorageEngine debate indicates we have very 
different ideas about how this should be done at a high level.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-31 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226828#comment-16226828
 ] 

Aleksey Yeschenko commented on CASSANDRA-13475:
---

I’m with Blake on this.

Now, I’ll be upfront: I’m not 100% sold on the idea of making the storage 
engine pluggable, although I am warming up to it, at least if someone did a 
really good job.

What I was expecting from the patch:

1. Keyspace abstracted away, ColumnFamilyStore (Table) interfaces extracted, 
with all sstable/compaction/etc. specific logic isolated into concrete default 
engine implementation
2. Some kind of StorageEngine interface that would consume metadata and return 
initialized engine-specific instances of Keyspace/Table classes
3. Abstracted away metadata, now that I think of it; default compaction, 
compression, and other params don’t make sense for rocksdb engine, I assume, 
nor would they for some other non-default implementations; also should be able 
to set options for non-default engine via CQL
4. Same goes for metrics; you’d be still creating all the default metric per 
ks/table, many of which making no sense in other engines; leaky
5. Acknowledgement and enumeration of everything that needs to be modified. And 
a lot needs to be - from startup checks to periodic tasks, to even some read 
logic that uses CFS for row size estimation

Assuming that the engine would be set on per-keyspace basis (which IMO makes 
most sense), Keyspace and CFS would be the primary extension points; not 
‘CFS/Keyspace will become a thin wrapper around the storage engine API’, but 
the exact opposite.

The suggested patch/API is very minimal, and not in a good way. It’s a hacky 
default engine bypass mechanism, not a proper abstraction of the storage 
engine, and not even a good starting point, imo :\

More importantly, I feel like the full scope of a properly implemented 
abstracted out storage engine is realized; done right, it’s a *huge* effort. I 
would like to see some kind of plan, with the full scope at least described, 
and perhaps suggested phases/steps of getting there.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-31 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226344#comment-16226344
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], I see what you mean, but my point is that, ColumnFamilyStore is 
a very complicated class, it is leaking the storage details like the sstable 
concept in almost every function it provides. In the future, all the call 
stacks deal with the APIs which leaks the storage details should be moved to a 
CQLStorageEngine (or CQLColumnFamliyStore in your word). And I'm not sure it's 
the top priority to try to clean up the ColumnFamilyStore at this moment.

The process in my mind is that:
1. We define the new API for common work load, which does not require a big 
refactor of Cassandra's code yet, but can hide a new storage engine 
implementation. This is demonstrated in our RocksDBEngine implementation.
2. Start to refactor/cleanup ColumnFamilyStore and Keyspace, which means we 
implement a CQLStorageEngine and move the current storage related business into 
the CQLStorageEnigne. As you said 99.99% of the work will be involved here. And 
according to our experience of implementing the RocksDBEngine, we should be 
able to do it step by step, move things piece by piece.
3. I can image we will take a lot of iterations of step 1 & 2, keep refining 
the API and cleaning up the CFS/Keyspace classes. At the end, I think 
CFS/Keyspace will become a thin wrapper around the storage engine API. 

I don't think there are big differences between our proposals, even for the 
IColumnFamilyStore interface, I can image it will be pretty similar to the 
StorageEngine interface I propose. But I don't want to change everywhere to use 
IColumnFamilyStore interface at step 1, since it requires so many refactoring 
work at once, and I tend to have many small patches instead of one big patch 
for the refactoring. Also for testing purpose, I think small patches are better 
and easier to have better test coverage.

What do you think?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226281#comment-16226281
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Sure, but everywhere it’s used, Cassandra is interacting with the storage 
layer. You’re going to have to make changes everywhere it’s used anyway. I 
don’t think it’s going to be practical to start with an api and refactor 
Cassandra to conform to it.

In other words, a storage api (albeit a leaky one) already exists. Adding 
another api on top of the existing one is just going to complicate things. I 
think any effort to make storage pluggable needs to start with the loose api 
that already exists, and start working out ways to refine it and hide 
implementation details behind a more generic interface as needed, piece by 
piece. That’s going to be 99.99% of the work involved here.


> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-30 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226190#comment-16226190
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], thanks for bring it up, yeah, I thought about that, my concern 
is that ColumnFamilyStore is widely used in Cassandra source code, if I change 
it to be an interface like IColumnFamilyStore, then I will end up need to 
change every usage of ColumnFamilyStore. So instead of inheritance, I choose to 
use composition, where I keep ColumnFamilyStore class, and delegate the storage 
engine related calls to the new storage engine API. I think it will have much 
less impact to current call stack, and it will be easier for me to refactor the 
code base step by step. Does it make sense?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226027#comment-16226027
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

What I mean is that CFS basically exposes a storage api to the rest of 
Cassandra, and has a lot of the per-table storage implementation details and 
state. So instead of adding another interface, why not make CFS one of the 
primary extension points for the storage api? So you’d extract an 
IColumnFamilyStore interface, the current ColumnFamilyStore class would be 
CassandraColumnFamilyStore or whatever, and rocksdb would have a 
RocksDBColumnFamilyStore.

This will be less invasive, highlight the areas where the storage layer 
implementation details leak into other parts of the project, and prevent 
cassandra storage layer specific stuff (like compaction strategies) from 
getting created/started/accessed.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-30 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225983#comment-16225983
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

My major motivation is that we probably don't really need the CFS reference 
inside the StorageEngine, at least in our RocksDB based implementation. In most 
case, I just need an identify to the column family I'm current working on, 
could be table name or table uuid (maybe uuid is better than table name), not 
necessary to be the CFS. That's why I address [~spo...@gmail.com] 's comments 
on the CFS.

Can you explain more about `main hooks for a pluggable storage layer should be 
a CFS implementation`? Current in my proposal, write requests will be 
implemented under the `applyMutate` API and read requests will be handled by 
implementing the Partition interface. For Keyspace and CFS, most of the 
coordinating logic should be remain unchanged.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225906#comment-16225906
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Why get rid of (or minimize the importance of) the ColumnFamilyStore class? CFS 
is a pretty central component of the current storage engine. I think it would 
be a lot more natural to focus on extracting an interface from the current api, 
and not introduce another interface. In fact, I’d say the main hooks for a 
pluggable storage layer should be a CFS implementation (and Keyspace#apply)

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-27 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222867#comment-16222867
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

Uploaded a new [patch | 
https://github.com/DikangGu/cassandra/commit/68c0fe2526034ad21216454a6ccd97a0db7124b6],
 which removed the dependency on CFS.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-20 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213287#comment-16213287
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~spo...@gmail.com], back to your comment, here is a little write up of what I 
think about the responsibility and boundaries for a Cassandra storage engine.
=
A Cassandra pluggable storage engine is the component in Cassandra database 
sever that is responsible for managing how data is stored, both in memory and 
disk, and performing actual data I/O operations for a database, as well as 
enabling certain features sets that target a specific application need.

More concretely, the storage engine will be responsible for:

1. Physical Storage: This involves everything from supporting C* data types and 
table schema, as well as the format used for storing data to physical disk.
2. Query: storage engine will support point query and range query of data 
stored in the database.
3. Memory Caches: The storage engine may implement row cache or block cache, 
for query performance optimization.
4. Advanced Data Types: Like list/map/counter, it's up to storage engine 
whether to support the advanced data types or not.
5. Index Support: It's up to storage engine whether to support secondary index 
of the stored data or not.


The storage engine will NOT be responsible for any distributed or network 
features, like gossip, replication, streaming, repair, etc. Those features need 
to be implemented on top of the storage engine.
=

For CFS in the storage engine API, in the future, I'm thinking the CFS will 
simply be the hold of the metedata for a Cassandra table, so it's convenient to 
get necessary table metadata inside the storage engine, through the CFS class. 
But I'm open to change it, and load the metadata inside the storage engine 
instead.



> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-20 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212904#comment-16212904
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

This doesn’t abstract away the storage layer, so much as it bypasses it in a 
few places. Unfortunately, storage layer implementation details leak all over 
the project. So even though you’ve modified a few paths, C* will still be 
interacting directly with the C* storage layer in a lot of places.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211918#comment-16211918
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

Also, the Partition interface 
(https://github.com/Instagram/cassandra/blob/rocks_3.0/src/java/org/apache/cassandra/db/partitions/Partition.java)
 could be part of the storage engine interface, we implemented a 
RocksDBPartition under that, and it works very well with other iterators in C*.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210649#comment-16210649
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], [~spo...@gmail.com], this little patch shows how read/write 
paths and streaming work with the storage engine api, 
https://github.com/DikangGu/cassandra/commit/3238dfbd8f4abfa2d95d5343edf07985fb732bd1.
 The comment out part should be moved to a CQLEngine in the future.

For streaming, I think the rocksdb streaming impl gives good example about how 
a new storage engine should implement the streaming flow, 
https://github.com/Instagram/cassandra/tree/rocks_3.0/src/java/org/apache/cassandra/rocksdb/streaming.
 Most of the logic are implemented within the storage engine, there is very 
little changes outside the engine.



> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-18 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210021#comment-16210021
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], makes sense, I will come up with a smaller patch, to 
demonstrate the usage.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-18 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209930#comment-16209930
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

I mean something more narrowly scoped. That rocksdb branch is several months of 
work interspersed with commits merged in from upstream.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-18 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209723#comment-16209723
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

Sure, here is the implementation of the RocksDB based storage engine, and how 
we integrate it with other parts in Cassandra. 
https://github.com/Instagram/cassandra/commits/rocks_3.0

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-18 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209717#comment-16209717
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Could you add some code showing how you intend to integrate this with C*? The 
linked branch is just the proposed api, which is kind of difficult to talk 
about in isolation

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209496#comment-16209496
 ] 

Stefan Podkowinski commented on CASSANDRA-13475:


Can we really get away designing a StorageEngine interface that expects 
ColumnFamilyStore instances for most actions? Actually the defined methods look 
like operations on the CFS to me, so it's a bit unclear how the implementation 
would be different between engines. 

I'd also suggest to clearly define the term "storage engine" first. What are 
the properties and guarantees? How does the life-cycle of such an engine look 
like? What are the possible (error) states?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org