Re: storing indexes on ssd

2018-02-11 Thread sankalp kohli
Cassandra does not support this currently. You can create a JIRA and start
the conversation


On Sat, Feb 10, 2018 at 11:09 PM, Dan Kinder  wrote:

> Hi,
>
> We're optimizing Cassandra right now for fairly random reads on a large
> dataset. In this dataset, the values are much larger than the keys. I was
> wondering, is it possible to have Cassandra write the *index* files
> (*-Index.db) to one drive (SSD), but write the *data* files (*-Data.db) to
> another (HDD)? This would be an overall win for us since it's
> cost-prohibitive to store the data itself all on SSD, but we hit the limits
> if we just use HDD; effectively we would need to buy double, since we are
> doing 2 random reads (index + data).
>
> Thanks,
> -dan
>


Roadmap for 4.0

2018-02-11 Thread kurt greaves
Hi friends,
*TL;DR: Making a plan for 4.0, ideally everyone interested should provide
up to two lists, one for tickets they can contribute resources to getting
finished, and one for features they think would be desirable for 4.0, but
not necessarily have the resources to commit to helping with.*

So we had that Roadmap for 4.0 discussion last year, but there was never a
conclusion or a plan that came from it. Times getting on and the changes
list for 4.0 is getting pretty big. I'm thinking it would probably make
sense to define some goals to getting 4.0 released/have an actual plan. 4.0
is already going to be quite an unwieldy release with a lot of testing
required.

Note: the following is open to discussion, if people don't like the plan
feel free to speak up. But in the end it's a pretty basic plan and I don't
think we should over-complicate it, I also don't want to end up in a
discussion where we "make a plan to make a plan". Regardless of whatever
plan we do end up following it would still be valuable to have a list of
tickets for 4.0 which is the overall goal of this email - so let's not get
too worked up on the details just yet (save that for after I
summarise/follow up).

// TODO
I think the best way to go about this would be for us to come up with a
list of JIRA's that we want included in 4.0, tag these as 4.0, and all
other improvements as 4.x. We can then aim to release 4.0 once all the 4.0
tagged tickets (+bug fixes/blockers) are complete.

Now, the catch is that we obviously don't want to include too many tickets
in 4.0, but at the same time we want to make sure 4.0 has an appealing
feature set for both users/operators/developers. To minimise scope creep I
think the following strategy will help:

We should maintain two lists:

   1. JIRA's that people want in 4.0 and can commit resources to getting
   them implemented in 4.0.
   2. JIRA's that people simply think would be desirable for 4.0, but
   currently don't have anyone assigned to them or planned assignment. It
   would probably make sense to label these with an additional tag in
JIRA. *(User's
   please feel free to point out what you want here)*

>From list 1 will come our source of truth for when we release 4.0. (after
aggregating a list I will summarise and we can vote on it).

List 2 would be the "hopeful" list, where stories can be picked up from if
resourcing allows, or where someone comes along and decides it's good
enough to work on. I guess we can also base this on a vote system if we
reach the point of including some of them. (but for the moment it's purely
to get an idea of what users actually want).

Please don't refrain from listing something that's already been mentioned.
The purpose is to get an idea of everyone's priorities/interests and the
resources available. We will need multiple resources for each ticket, so
anywhere we share an interest will make for a lot easier work sharing.

Note that we are only talking about improvements here. Bugs will be treated
the same as always, and major issues/regressions we'll need to fix prior to
4.0 anyway.

TIME FRAME
Generally I think it's a bad idea to commit to any hard deadline, but we
should have some time frames in mind. My idea would be to aim for a Q3/4
2018 release, and as we go we just review the outstanding improvements and
decide whether it's worth pushing it back or if we've got enough to
release. I suppose keep this time frame in mind when choosing your tickets.

We can aim for an earlier date (midyear?) but I figure the
testing/validation/bugfixing period prior to release might drag on a bit so
being a bit conservative here.
The main goal would be to not let list 1 grow unless we're well ahead, and
only cull from it if we're heavily over-committed or we decide the
improvement can wait. I assume this all sounds like common sense but
figured it's better to spell it out now.


NEXT STEPS
After 2 weeks/whenever the discussion dies off I'll consolidate all the
tickets, relevant comments and follow up with a summary, where we can
discuss/nitpick issues and come up with a final list to go ahead with.

On a side note, in conjunction with this effort we'll obviously have to do
something about validation and testing. I'll keep that out of this email
for now, but there will be a follow up so that those of us willing to help
validate/test trunk can avoid duplicating effort.

REVIEW
This is the list of "huge/breaking" tickets that got mentioned in the last
roadmap discussion and their statuses. This is not terribly important but
just so we can keep in mind what we previously talked about. I think we
leave it up to the relevant contributors to decide whether they want to get
the still open tickets into 4.0.

CASSANDRA-9425 Immutable node-local schema
 - Committed
CASSANDRA-10699 Strongly consistent schema alterations
 - Open, no
discussion in quite some time.
CASSANDRA-12229 

Re: LWT broken?

2018-02-11 Thread DuyHai Doan
Mahdi , the issue in your code is here:

else // we lost LWT, fetch the winning value
 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash |
consistency = ONE

You lost LWT, it means that there is a concurrent LWT that has won the
Paxos round and has applied the value using QUORUM/SERIAL.

In best case, it means that the won LWT value has been applied to at least
2 replicas out of 3 (assuming RF=3)
In worst case, the won LWT value has not been applied yet or is pending to
be applied to any replica

Now, if you immediately read with CL=ONE, you may:

1) Read the staled value on the 3rd replica which has not yet received the
correct won LWT value
2) Or worst, read a staled value because the won LWT is being applied when
the read operation is made

That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is
not sufficient enough)

Reading with CL=SERIAL will:

a. like QUORUM, contact strict majority of replicas
b. unlike QUORUM, look for validated (but not yet applied) previous Paxos
round value and force-applied it before actually reading the new value




On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida 
wrote:

> Totally understood that it's not worth (or it's rather incorrect) to mix
> serial and non serial operations for LWT tables. It would be highly
> satisfying to my engineer mind if someone can explain why that would cause
> issues in this particular situation. The only explanation I have is that a
> non serial read may cause a read repair to happen and that could interfere
> with a concurrent serial write, although I still can't explain how that
> would cause two different "insert if not exist" transactions to both
> succeed.
>
> --
> Mahdi.
>
> On 2/9/18 2:40 PM, Jonathan Haddad wrote:
>
> If you want consistent reads you have to use the CL that enforces it.
> There’s no way around it.
> On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida 
> wrote:
>
>> In this case, we only write using CAS (code guarantees that). We also
>> never update, just insert if not exist. Once a hash exists, it never
>> changes (it may get deleted later and that'll be a CAS delete as well).
>>
>> --
>> Mahdi.
>>
>> On 2/9/18 1:38 PM, Jeff Jirsa wrote:
>>
>>
>>
>> On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida 
>> wrote:
>>
>>>  Under what circumstances would we be reading inconsistent results ? Is
>>> there a case where we end up reading a value that actually end up not being
>>> written ?
>>>
>>>
>>>
>>
>> If you ever write the same value with CAS and without CAS (different code
>> paths both updating the same value), you're using CAS wrong, and
>> inconsistencies can happen.
>>
>>
>>
>>
>


Re: LWT broken?

2018-02-11 Thread Mahdi Ben Hamida
Totally understood that it's not worth (or it's rather incorrect) to mix 
serial and non serial operations for LWT tables. It would be highly 
satisfying to my engineer mind if someone can explain why that would 
cause issues in this particular situation. The only explanation I have 
is that a non serial read may cause a read repair to happen and that 
could interfere with a concurrent serial write, although I still can't 
explain how that would cause two different "insert if not exist" 
transactions to both succeed.


--
Mahdi.

On 2/9/18 2:40 PM, Jonathan Haddad wrote:
If you want consistent reads you have to use the CL that enforces it. 
There’s no way around it.
On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida > wrote:


In this case, we only write using CAS (code guarantees that). We
also never update, just insert if not exist. Once a hash exists,
it never changes (it may get deleted later and that'll be a CAS
delete as well).

-- 
Mahdi.


On 2/9/18 1:38 PM, Jeff Jirsa wrote:



On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida
> wrote:

 Under what circumstances would we be reading inconsistent
results ? Is there a case where we end up reading a value
that actually end up not being written ?




If you ever write the same value with CAS and without CAS
(different code paths both updating the same value), you're using
CAS wrong, and inconsistencies can happen.