Re: [VOTE] CEP-34: mTLS based client and internode authenticators

2023-07-21 Thread Yuki Morishita
+1

Just one update to CEP that I would like to propose is to clarify which
"appropriate permissions" are necessary to add/drop identities.
According to the current patch, in order to add identities, the
roles require "CREATE ROLE" permission, and to drop they require "DROP
ROLE" permission.
I have no objection here, just want to make sure we are not introducing
another role resource.

On Sat, Jul 22, 2023 at 7:43 AM Abe Ratnofsky  wrote:

> +1 (nb)
>
> On Jul 21, 2023, at 3:03 PM, Jon Meredith  wrote:
>
> +1
>
> On Fri, Jul 21, 2023 at 2:33 PM Blake Eggleston 
> wrote:
>
>> +1
>>
>> On Jul 21, 2023, at 9:57 AM, Jyothsna Konisa 
>> wrote:
>>
>> Hi Everyone!
>>
>> I would like to start a vote thread for CEP-34.
>>
>> Proposal:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34%3A+mTLS+based+client+and+internode+authenticators
>> JIRA   :
>> https://issues.apache.org/jira/browse/CASSANDRA-18554
>> Draft Implementation : https://github.com/apache/cassandra/pull/2372
>> Discussion :
>> https://lists.apache.org/thread/pnfg65r76rbbs70hwhsz94ds6yo2042f
>>
>> The vote will be open for 72 hours. A vote passes if there are at least 3
>> binding +1s and no binding vetoes.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>>
>>
>


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-11 Thread Yuki Morishita
> folks - I think we’ve achieved lazy consensus here. Please continue with
feedback on the jira.

Hi Dinesh,

As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?

For the future reference, in my opinion, adding new CQL syntax should have
a CEP as it is not something we can easily change once defined.

On Wed, Jul 12, 2023 at 7:19 AM Derek Chen-Becker 
wrote:

> EC - eventual consensus?
>
> On Tue, Jul 11, 2023 at 4:03 PM Dinesh Joshi  wrote:
>
>> folks - I think we’ve achieved lazy consensus here. Please continue with
>> feedback on the jira.
>>
>> Thanks,
>>
>> Dinesh
>>
>>
>> On Jul 7, 2023, at 12:23 PM, Jyothsna Konisa 
>> wrote:
>>
>> 
>> Hi Yuki, Jeremiah & Christopher,
>>
>> Thank you very much for the feedback.
>>
>> Regarding removing superuser check for adding/removing identities, I have
>> relaxed that check and added permissions check instead. With this change
>> only users with appropriate permissions to add/drop identities can perform
>> that action.
>>
>> About extending `Create Role` cqlsh statement, we have a couple of
>> reasons for not doing that. We designed the mTLS authenticator in such a
>> way that a single role can be associated with multiple identities, EX:
>> there can be several identities which are read_only users. Also, having a
>> separate cqlsh statement for identities makes it more pluggable and
>> independent. If we still think that extending the create role statement
>> would be a convenient feature, we can add it as required in the followup
>> patches.
>>
>> Christopher, I will be acting upon your feedback regarding having
>> identity in the cassandra.yaml optionally configurable.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Thu, Jul 6, 2023 at 5:30 PM Dinesh Joshi  wrote:
>>
>>> > On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan 
>>> wrote:
>>> >
>>> > I don’t think users necessarily need to be able to update their own
>>> identities.  I just don’t want to have to use the super user role.  The
>>> super user role has all power over all things in the data base.  I don’t
>>> want to have to give that much power to the person who manages identities,
>>> I just want to give them the power to manage identities.
>>>
>>> Makes sense. I think Jyothsna already pushed an update to the PR to
>>> relax the restriction. Please feel free to take a look at it.
>>>
>>> Dinesh
>>>
>>>
>>>
>>>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-28 Thread Yuki Morishita
Thinking more about "CREATE ROLE" permission, if we can extend CREATE
ROLE/ALTER ROLE statements, it may look streamlined:

I don't have the good example, but something like:
```
CREATE ROLE dev WITH LOGIN = true AND IDENTITIES = {'spiffe://xxx'};
ALTER ROLE dev ADD IDENTITY 'xxx';
LIST ROLES;
```

This requires a role to identities table as well as the current identity to
role table though.

On Thu, Jun 29, 2023 at 12:34 PM Yuki Morishita  wrote:

> Hi Jyothsna,
>
> I think for the *initial* commit, the description looks fine to me.
> I'd like to see/contribute to the future improvement though:
>
> * ADD IDENTITY requires SUPERUSER, this means that the brand new cluster
> needs to start with PasswordAuthenticator/CassandraAuthorizer first, and
> then change to mTLS one.
> * For this, I'd really like to see Cassandra use password authn and
> authz by default.
> * Cassandra allows the user with "CREATE ROLE" permission to create
> roles without superuser privilege. Maybe it is natural to allow them to add
> identities also?
>
>
> On Thu, Jun 29, 2023 at 7:35 AM Jyothsna Konisa 
> wrote:
>
>> Hi Yuki,
>>
>> I have added cassandra docs for CQL syntax that we are adding and how to
>> get started with using mTLS authenticators along with the migration plan.
>> Please review it and let me know if it looks good.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Wed, Jun 21, 2023 at 10:46 AM Jyothsna Konisa 
>> wrote:
>>
>>> Hi Yuki!
>>>
>>> Thanks for the questions.
>>>
>>> Here are the steps for the initial setup.
>>>
>>> 1. Since only super users can add/remove identities from the
>>> `identity_to_roles` table, operators should use that role to add authorized
>>> identities to the table. Note that the authenticator is not an mTLS
>>> authenticator yet.
>>> EX: ADD IDENTITY 'spiffe://testdomain.com/testIdentifier/testValue' TO
>>> ROLE 'read_only_user'
>>>
>>> 2. Change authenticator configuration in cassandra.yaml to use mTLS
>>> authenticator
>>> EX: authenticator:
>>>   class_name :org.apache.cassandra.auth.MutualTlsAuthenticator
>>>   parameters :
>>> validator_class_name:
>>> org.apache.cassandra.auth.SpiffeCertificateValidator
>>> 3. Restart the cluster so that newly configured mTLS authenticator is
>>> used
>>>
>>> What will be the op's first step to set up the roles and identities?
>>> -> Yes, the op should set up roles & identities first.
>>>
>>> Is default cassandra / cassandra superuser login still required to set
>>> up other roles and identities?
>>> -> When transitioning from a password based to mTLS based
>>> authenticators, yes superuser login is required to add identities, as only
>>> super users can add them. However when a cluster is using mTLS based
>>> authenticator, the super user will be associated with some certificate
>>> identity and hence we don't need password based cassandra super user login.
>>>
>>> If initial cassandra super user login is required, does that mean super
>>> users and "cassandra '' superuser bypass mTLS check?
>>> -> No, while adding identities to the roles table in step1 the
>>> authenticator will not be an mTLS authenticator. Once the identities are
>>> added and the authenticator is configured, even super users have to go
>>> through an mTLS check during connection.
>>>
>>>
>>> Regarding migration
>>>
>>> I *think* you need to first use
>>> MutualTlsWithPasswordFallbackAuthenticator so the current roles can login
>>> with their password,
>>> and eventually the admin sets up identity and then can switch to mTLS
>>> auth.
>>> Is this the expected way for migration?
>>> -> Yes you can do that or else we can add identities with password based
>>> login and then change the authenticator to be mTLS authenticator.
>>>
>>> I think a thorough documentation for this new feature including new CQL
>>> syntax, setting up and migration would be greatly appreciated.
>>> -> I have added documentation for the authenticators, cqlsh commands in
>>> the Javadocs in the source code. Maybe I will add the setup process &
>>> migration process in the Javadocs, does this sound good?
>>>
>>> Thanks,
>>> Jyothsna Konisa.
>>>
>>> On Tue, Jun 20, 2023 at 11:33 PM Yuki Morishita 
>>> wrote:
>>>
>>>> Hi Jyothsna,
>>>>
>

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-28 Thread Yuki Morishita
Hi Jyothsna,

I think for the *initial* commit, the description looks fine to me.
I'd like to see/contribute to the future improvement though:

* ADD IDENTITY requires SUPERUSER, this means that the brand new cluster
needs to start with PasswordAuthenticator/CassandraAuthorizer first, and
then change to mTLS one.
* For this, I'd really like to see Cassandra use password authn and
authz by default.
* Cassandra allows the user with "CREATE ROLE" permission to create
roles without superuser privilege. Maybe it is natural to allow them to add
identities also?


On Thu, Jun 29, 2023 at 7:35 AM Jyothsna Konisa 
wrote:

> Hi Yuki,
>
> I have added cassandra docs for CQL syntax that we are adding and how to
> get started with using mTLS authenticators along with the migration plan.
> Please review it and let me know if it looks good.
>
> Thanks,
> Jyothsna Konisa.
>
> On Wed, Jun 21, 2023 at 10:46 AM Jyothsna Konisa 
> wrote:
>
>> Hi Yuki!
>>
>> Thanks for the questions.
>>
>> Here are the steps for the initial setup.
>>
>> 1. Since only super users can add/remove identities from the
>> `identity_to_roles` table, operators should use that role to add authorized
>> identities to the table. Note that the authenticator is not an mTLS
>> authenticator yet.
>> EX: ADD IDENTITY 'spiffe://testdomain.com/testIdentifier/testValue' TO
>> ROLE 'read_only_user'
>>
>> 2. Change authenticator configuration in cassandra.yaml to use mTLS
>> authenticator
>> EX: authenticator:
>>   class_name :org.apache.cassandra.auth.MutualTlsAuthenticator
>>   parameters :
>> validator_class_name:
>> org.apache.cassandra.auth.SpiffeCertificateValidator
>> 3. Restart the cluster so that newly configured mTLS authenticator is used
>>
>> What will be the op's first step to set up the roles and identities?
>> -> Yes, the op should set up roles & identities first.
>>
>> Is default cassandra / cassandra superuser login still required to set up
>> other roles and identities?
>> -> When transitioning from a password based to mTLS based authenticators,
>> yes superuser login is required to add identities, as only super users can
>> add them. However when a cluster is using mTLS based authenticator, the
>> super user will be associated with some certificate identity and hence we
>> don't need password based cassandra super user login.
>>
>> If initial cassandra super user login is required, does that mean super
>> users and "cassandra '' superuser bypass mTLS check?
>> -> No, while adding identities to the roles table in step1 the
>> authenticator will not be an mTLS authenticator. Once the identities are
>> added and the authenticator is configured, even super users have to go
>> through an mTLS check during connection.
>>
>>
>> Regarding migration
>>
>> I *think* you need to first use
>> MutualTlsWithPasswordFallbackAuthenticator so the current roles can login
>> with their password,
>> and eventually the admin sets up identity and then can switch to mTLS
>> auth.
>> Is this the expected way for migration?
>> -> Yes you can do that or else we can add identities with password based
>> login and then change the authenticator to be mTLS authenticator.
>>
>> I think a thorough documentation for this new feature including new CQL
>> syntax, setting up and migration would be greatly appreciated.
>> -> I have added documentation for the authenticators, cqlsh commands in
>> the Javadocs in the source code. Maybe I will add the setup process &
>> migration process in the Javadocs, does this sound good?
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Tue, Jun 20, 2023 at 11:33 PM Yuki Morishita 
>> wrote:
>>
>>> Hi Jyothsna,
>>>
>>> Thanks, sorry I have additional questions regarding set up and migration:
>>>
>>> * Initial set up
>>>
>>> Say, you are building the brand new cassandra cluster with
>>>
>>> authenticator:
>>>   class_name :org.apache.cassandra.auth.MutualTlsAuthenticator
>>>   parameters :
>>> validator_class_name:
>>> org.apache.cassandra.auth.SpiffeCertificateValidator
>>>
>>> What will be the op's first step to set up the roles and identities?
>>> Is default cassandra / cassandra super user login still required to set
>>> up other roles and identities?
>>> If initial cassandra super user login is required, does that mean super
>>> users and "cassandra" superuser bypass mTLS check?
>>>
>>>

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-21 Thread Yuki Morishita
Hi Jyothsna,

Thanks, sorry I have additional questions regarding set up and migration:

* Initial set up

Say, you are building the brand new cassandra cluster with

authenticator:
  class_name :org.apache.cassandra.auth.MutualTlsAuthenticator
  parameters :
validator_class_name:
org.apache.cassandra.auth.SpiffeCertificateValidator

What will be the op's first step to set up the roles and identities?
Is default cassandra / cassandra super user login still required to set up
other roles and identities?
If initial cassandra super user login is required, does that mean super
users and "cassandra" superuser bypass mTLS check?

* Migration

If you are currently using PasswordAuthenticator and would like to migrate
to mTLS authentication:

I *think* you need to first use MutualTlsWithPasswordFallbackAuthenticator
so the current roles can login with their password,
and eventually the admin sets up identity and then can switch to mTLS auth.
Is this the expected way for migration?

I think a thorough documentation for this new feature including new CQL
syntax, setting up and migration would be greatly appreciated.


On Wed, Jun 21, 2023 at 4:13 AM Jyothsna Konisa 
wrote:

> Hi Yuki,
>
> Sorry I missed answering your other question in the above reply. Regarding
> checking what identities are associated with a given role, one can make a
> query to list identities for a given role to the table. Also note that,
> addition or removal of identities from the table can only be performed by
> the super user only. Not even read-write users can perform modifications to
> the table.
>
> Also, If others have no concerns regarding this patch, can we move forward
> with the merge? or do we need voting on this one?
>
> Thanks,
> Jyothsna Konisa.
>
>
> On Mon, Jun 19, 2023 at 4:00 PM Jyothsna Konisa 
> wrote:
>
>> Hi Yuki,
>> You are right regarding adding a custom validator. If one wants to
>> implement a CN based validator, they can do that and configure that
>> validator in Cassandra.yaml in "authenticator.parameters.
>> validator_class_name".
>>
>> Regarding a role having multiple identities, yes a role can have multiple
>> identities associated with it. For example, there can be several read_only
>> users for a given cluster, so the role `readonly_user` can be associated
>> with multiple identities.
>>
>> Regarding the uniqueness of identity, each identity should be associated
>> with only one role. For example, a single identity can not be both admin
>> user and a read only user.
>>
>> We have ensured this by carefully designing the schema of the new table
>> for storing identity information by making identity as the primary key
>> which guarantees that each identity is unique and the same role can have
>> multiple identities.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Sun, Jun 18, 2023 at 5:42 PM Yuki Morishita 
>> wrote:
>>
>>> HI,
>>>
>>> I was discussing with users the other day regarding a similar feature.
>>> They were thinking of implementing the custom Authenticator similar to
>>> what MySQL offers:
>>>
>>> CREATE USER 'jeffrey'@'localhost'
>>>   REQUIRE SUBJECT '/C=SE/ST=Stockholm/L=Stockholm/
>>> O=MySQL demo client certificate/
>>> CN=client/emailAddress=cli...@example.com';
>>>
>>> (
>>> https://dev.mysql.com/doc/refman/8.0/en/create-user.html#create-user-tls
>>> )
>>>
>>> I think they can implement a custom Validator that validates the
>>> identity (for their case, CN) associated with a role using the
>>> certificate's subject, so that's great!
>>>
>>> Regarding new CQL syntax,
>>>
>>> > ADD IDENTITY 'testIdentity' TO ROLE 'testRole';
>>> > DROP IDENTITY 'testIdentity';
>>>
>>> This means a role can have multiple identities, and each identities must
>>> be unique?
>>> How can users check what identities are associated with certain roles?
>>>
>>>
>>> On Sun, Jun 18, 2023 at 12:15 AM Dinesh Joshi  wrote:
>>>
>>>> Folks, any feedback here?
>>>>
>>>> On 6/15/23 12:46, Jyothsna Konisa wrote:
>>>> > Hi Everyone!
>>>> >
>>>> > We are adding the following CQL queries in this patch for adding and
>>>> dropping identities in the new `system_auth.identity_to_role` table.
>>>> >
>>>> > ADD IDENTITY 'testIdentity' TO ROLE 'testRole';
>>>> > DROP IDENTITY 'testIdentity';
>>>> >
>>>> > Please let us know if anyone has any concerns!
>

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-18 Thread Yuki Morishita
HI,

I was discussing with users the other day regarding a similar feature.
They were thinking of implementing the custom Authenticator similar to what
MySQL offers:

CREATE USER 'jeffrey'@'localhost'
  REQUIRE SUBJECT '/C=SE/ST=Stockholm/L=Stockholm/
O=MySQL demo client certificate/
CN=client/emailAddress=cli...@example.com';

(https://dev.mysql.com/doc/refman/8.0/en/create-user.html#create-user-tls)

I think they can implement a custom Validator that validates the identity
(for their case, CN) associated with a role using the certificate's
subject, so that's great!

Regarding new CQL syntax,

> ADD IDENTITY 'testIdentity' TO ROLE 'testRole';
> DROP IDENTITY 'testIdentity';

This means a role can have multiple identities, and each identities must be
unique?
How can users check what identities are associated with certain roles?


On Sun, Jun 18, 2023 at 12:15 AM Dinesh Joshi  wrote:

> Folks, any feedback here?
>
> On 6/15/23 12:46, Jyothsna Konisa wrote:
> > Hi Everyone!
> >
> > We are adding the following CQL queries in this patch for adding and
> dropping identities in the new `system_auth.identity_to_role` table.
> >
> > ADD IDENTITY 'testIdentity' TO ROLE 'testRole';
> > DROP IDENTITY 'testIdentity';
> >
> > Please let us know if anyone has any concerns!
> >
> > Thanks,
> > Jyothsna Konisa.
> >
> >
> > On Sat, Jun 3, 2023 at 7:18 AM Derek Chen-Becker  > > wrote:
> >
> > Sounds great, thanks for the clarification!
> >
> > Cheers,
> >
> > Derek
> >
> > On Sat, Jun 3, 2023 at 12:48 AM Dinesh Joshi  > > wrote:
> >
> >> On Jun 2, 2023, at 9:06 PM, Derek Chen-Becker
> >> mailto:de...@chen-becker.org>> wrote:
> >>
> >> This certainly looks like a nice addition to the operator's
> >> tools for securing cluster access. Out of curiosity, is there
> >> anything in this work that would *preclude* a different
> >> authentication scheme for internode at some point in the
> >> future? Has there ever been discussion of pluggability similar
> >> to the client protocol?
> >
> > This is a pluggable implementation so it's not mandatory to use
> > it and doesn't preclude one from using a different mechanism in
> > the future. We haven't explicitly discussed pluggability i.e.
> > part of protocol negotiation in the past for internode
> > connections. However, this work also does not preclude us from
> > implementing such changes. If we do add negotiation this could
> > be one of the authentication mechanisms. So it would be
> > complimentary.
> >
> >
> >> Also, am I correct in understanding that this would allow for
> >> multiple certificates for the same identity (e.g. distinct
> >> cert per node)? I certainly understand the decision to keep
> >> things simple and have all nodes share identity from the
> >> perspective of operational simplicity, but I also don't want
> >> to get in a situation where a single compromised node would
> >> require an invalidation and redeployment on all nodes in the
> >> cluster.
> >
> > I don't recommend all nodes share the same certificate. Each
> > node in the cluster should obtain a unique certificate with the
> > same SPIFFE. In the event a node is compromised, the operator
> > can revoke that node's certificate without having to redeploy to
> > all nodes in the cluster.
> >
> > thanks,
> >
> > Dinesh
> >
> >
> >
> > --
> > +---+
> > | Derek Chen-Becker |
> > | GPG Key available at https://keybase.io/dchenbecker
> > and   |
> > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
> >  |
> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> > +---+
> >
>
>


Re: Downgradability

2023-02-20 Thread Yuki Morishita
Hi,

What I wanted to address in my comment in CASSANDRA-8110(
https://issues.apache.org/jira/browse/CASSANDRA-8110?focusedCommentId=17641705=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17641705)
is to focus on better upgrade experience.

Upgrading the cluster can be painful for some orgs with mission critical
Cassandra cluster, where they cannot tolerate less availability because of
the inability to replace the downed node.
They also need to plan rolling back to the previous state when something
happens along the way.
The change I proposed in CASSANDRA-8110 is to achieve the goal of at least
enabling SSTable streaming during the upgrade by not upgrading the SSTable
version. This can make the cluster to easily rollback to the previous
version.
Downgrading SSTable is not the primary focus (though Cassandra needs to
implement the way to write SSTable in older versions, so it is somewhat
related.)

I'm preparing the design doc for the change.
Also, if I should create a separate ticket from CASSANDRA-8110 for the
clarity of the goal of the change, please let me know.


On Tue, Feb 21, 2023 at 5:31 AM Benedict  wrote:

> FWIW I think 8110 is the right approach, even if it isn’t a panacea. We
> will have to eventually also tackle system schema changes (probably not
> hard), and may have to think a little carefully about other things, eg with
> TTLs the format change is only the contract about what values can be
> present, so we have to make sure the data validity checks are consistent
> with the format we write. It isn’t as simple as writing an earlier version
> in this case (unless we permit truncating the TTL, perhaps)
>
> On 20 Feb 2023, at 20:24, Benedict  wrote:
>
>
> 
> In a self-organising community, things that aren’t self-policed naturally
> end up policed in an adhoc manner, and with difficulty. I’m not sure that’s
> the same as arbitrary enforcement. It seems to me the real issue is nobody
> noticed this was agreed and/or forgot and didn’t think about it much.
>
> But, even without any prior agreement, it’s perfectly reasonable to
> request that things do not break compatibility if they do not need to, as
> part of the normal patch integration process.
>
> Issues with 3.1->4.0 aren’t particularly relevant as they predate any
> agreement to do this. But we can and should address the problem of new
> columns in schema tables, as this happens often between versions. I’m not
> sure it has in 4.1 though?
>
> Regarding downgrade versions, surely this should simply be the same as
> upgrade versions we support?
>
>
> On 20 Feb 2023, at 20:02, Jeff Jirsa  wrote:
>
> 
> I'm not even convinced even 8110 addresses this - just writing sstables in
> old versions won't help if we ever add things like new types or new types
> of collections without other control abilities. Claude's other email in
> another thread a few hours ago talks about some of these surprises -
> "Specifically during the 3.1 -> 4.0 changes a column broadcast_port was
> added to system/local.  This means that 3.1 system can not read the table
> as it has no definition for it.  I tried marking the column for deletion in
> the metadata and in the serialization header.  The later got past the
> column not found problem, but I suspect that it just means that data
> columns after broadcast_port shifted and so incorrectly read." - this is a
> harder problem to solve than just versioning sstables and network
> protocols.
>
> Stepping back a bit, we have downgrade ability listed as a goal, but it's
> not (as far as I can tell) universally enforced, nor is it clear at which
> point we will be able to concretely say "this release can be downgraded to
> X".   Until we actually define and agree that this is a real goal with a
> concrete version where downgrade-ability becomes real, it feels like things
> are somewhat arbitrarily enforced, which is probably very frustrating for
> people trying to commit work/tickets.
>
> - Jeff
>
>
>
> On Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi  wrote:
>
>> I’m a big fan of maintaining backward compatibility. Downgradability
>> implies that we could potentially roll back an upgrade at any time. While I
>> don’t think we need to retain the ability to downgrade in perpetuity it
>> would be a good objective to maintain strict backward compatibility and
>> therefore downgradability until a certain point. This would imply
>> versioning metadata and extending it in such a way that prior version(s)
>> could continue functioning. This can certainly be expensive to implement
>> and might bloat on-disk storage. However, we could always offer an option
>> for the operator to optimize the on-disk structures for the current version
>> then we can rewrite them in the latest version. This optimizes the storage
>> and opens up new functionality. This means new features that can work with
>> old on-disk structures will be available while others that strictly require
>> new versions of the data structures 

Re: Issue when creating a stream session

2022-12-18 Thread Yuki Morishita
I dug the git history and it looks like CASSANDRA-3569
 changed to not
register to Gossip.
https://github.com/apache/cassandra/commit/0f2d7d0b9540efa3ea3dfe4f8270c3635afdc63c
(It removed the `register` part, not the `unregister` part though.)

Since StreamSession is no longer registered to nor unregistered from
Gosipper, I say the current "implements" code
is a dead code and we can remove it safely.


On Sat, Dec 17, 2022 at 4:57 AM David Capwell  wrote:

> This sounds like a bug to me, but would be good to get feedback from
> others who have touched Streaming…
>
> Repair will fail if membership notifies that a participate node was
> removed, so I think it makes sense for Streaming to also follow this
> behavior.
>
> On Dec 14, 2022, at 1:22 PM, Natnael Adere 
> wrote:
>
> To give more context, StreamSession is not listening to Gossip for
> membership changes. Although it implements an interface for listening to
> membership changes, we do not register with Gossip and, therefore, never
> get these changes. This results in the IEndpointStateChangeSubscriber
> interface being dead code. My question is wether or not this is a bug to be
> fixed or code to delete. Currently, streaming does not fail on membership
> changes because of this problem. If Gossip says that a node was removed or
> restarted, should we fail the stream or not?
>
> Thanks,
> Natnael
>
> On Dec 14, 2022, at 1:39 PM, Natnael Adere 
> wrote:
>
> Hello,
>
> I am working on CASSANDRA-17199
>  and testing for
> this ticket has uncovered some issues with streaming. When creating a
> StreamSession we have an intent for to listen but that never happens. My
> concern is wether or not we should we listen and make sure the interface
> it implements is not dead code or delete the interface and all of its
> methods. Our style guide requires no dead code so this might be
> intentional, but when testing we see that StreamSession not listening. If
> it is intentional then I suppose we make sure that we are listening. Any
> opinions on what to do in this scenario?
>
>
> (PROBLEM) The project compiles in both scenarios:
>
> public class StreamSession implements IEndpointStateChangeSubscriber
>
>
> public class StreamSession //implements IEndpointStateChangeSubscriber
>
>
>
> Thanks,
>
> Natnael
>
>
>
>


[Vote] Remove Windows support from 4.0+

2020-08-09 Thread Yuki Morishita
As per the discussion(*), I propose to remove Windows support from 4.0
release and onward.

Windows scripts are not maintained and we lack windows test
environments. WIndows users can  use docker or cloud environments to
set up Cassandra application development.

If the vote pass, I will create the following tickets to officially
remove Windows support from 4.0:

- Remove Windows scripts and add notice to NEWS.txt
- Update "Getting Started" documents for Windows users (to direct them
to use docker or cloud)

Regards,
Yuki

--
*: 
https://mail-archives.apache.org/mod_mbox/cassandra-dev/202007.mbox/%3CCAGM0Up_3GoPucCP-U18L1akzBXS1eJoKbui997%3DajcCfKJQdng%40mail.gmail.com%3E

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Discussion] Windows support

2020-07-30 Thread Yuki Morishita
Thank you all for your insights!

I will create the new thread to kick off the formal voting process for
removing windows support starting from 4.0.

Regards,
Yuki

On Fri, Jul 31, 2020 at 10:42 AM Erick Ramirez
 wrote:
>
> >
> > My point is, for educational purposes there are plenty of other ways of
> > running small dev clusters that are probably more realistic for most uses
> > cases.
> > I’d be for removing windows support, but I suspect my use case is one of
> > the more minor ones.
> >
>
> Not minor at all. Thanks for that insight, Andy.
>
> I field a lot of questions daily from Windows users and it's a huge drain
> because I mostly work/build/test on Ubuntu. I have a Windows 10 Surface and
> I can say that I waste so much time trying to replicate what mistake users
> made on their PCs so I can help them get past it. But most of the work ends
> up being spent on troubleshooting their PCs and not C* so I'm all for
> dropping it since it causes too much friction with user adoption. Cheers!

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[Discussion] Windows support

2020-07-28 Thread Yuki Morishita
Hi,

I'd like to raise my concern about Windows support, as we are getting
closer to 4.0 release.

Since the support for JDK11 (CASSANDRA-9608), Windows script to start
Cassandra is broken.
The fix for the script is posted to
https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-14608.

Windows scripts are not maintained recently, and I don't think we have
any Windows environment in CI for testing.
I don't think it is a good idea to release Apache Cassandra with
broken Windows scripts.

With the latest update of Windows 10, even the Windows 10 Home edition
users can use Docker for Windows if they enable WSL2 in their machine.
However, the update is not yet available for everyone, and I believe
many Enterprises hold onto upgrading to the latest version. Even if
they do so, they can disable WSL2 from using. Some companies may not
allow installing VirtualBox either.

So, what we can do for 4.0 release:

- Stop supporting Windows. Remove every bat/ps1 scripts from the
source and distribution. Encourage Windows users to use VM/Docker.
- Continue supporting Windows. Set up Windows test environment. Test
every Windows scripts for future releases.

Since I saw enterprises with restricted dev environments (and saw
people trying to use cassandra on Windows on StackOverflow), I want to
have Windows scripts ready to be used.
But I'm also fine if we decide to remove all Windows scripts since I
use Docker anyway.

Regards,
Yuki

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra on Google Summer of Code 2017

2017-02-06 Thread Yuki Morishita
Hi,

I co-mentored GSoC last year with Paulo Motta for Apache Cassandra project.
Here is what we did last year:

* Marked streaming related issues as
"gsoc2016"(https://issues.apache.org/jira/browse/CASSANDRA-8110?jql=project%20%3D%20CASSANDRA%20AND%0A%20labels%20%3D%20gsoc2016).
These issues were presented in GSoC website so that students can
submit their proposal to them.
* Candidate student contacted us and showed interest. We advised the
student for proposal review.
* Candidate submitted the proposal, and we evaluated his ability to
accomplish the task. (Interview included).
* Note that not all candidates are accepted. Google assigns certain
number of students per organization, and since ASF has tons of
projects, it can be competitive.

* Initial plan for mentee to work was CASSANDRA-8928
(https://issues.apache.org/jira/browse/CASSANDRA-8928). Though the
work amount that we (mentors) found after GSoC started was way more
than what we estimated initially, so we switched the target issue to
technically related issue.
* Mentee finished two issues that related to streaming
(CASSANDRA-10810 and CASSANDRA-12008) in timely manner with the
quality we expected, so he passed the final evaluation.

As an organizaiton, ASF will participate to GSoC as usual this year,
but if Apache Cassandra project wants to participate, we need to start
marking JIRA issue as "gsoc2017" by 2/9.
Think before marking issue, though, that  the issue can be tackled
within GSoC period. :)

I'm still not sure if I have time to mentor this year, but I'm willing
to answer any GSoC related question.

On Mon, Feb 6, 2017 at 7:38 PM, Nate McCall  wrote:
> Sorry for the top post, but just want to make sure everyone has the details:
> If any committers want to be involved in this, the ASF is filling this
> out as a whole so we don't need to do too much on a project level.
>
> ASF details for mentorship (including GSoC) can be found here:
> http://community.apache.org/guide-to-being-a-mentor.html
>
> On Tue, Feb 7, 2017 at 2:14 PM, Русак Максим  wrote:
>> Hello, developers of Cassandra. I'm really interested in the Cassandra 
>> project (especially in distributed metadata components(like gossip)) and 
>> eager to take part in GSoC, dive deep into Cassandra internals and make a 
>> contribution under mentorship.
>> Currently I'm working on my undergrad diploma in Yandex YT (analog of 
>> Hadoop). My work is related to distributed metadata and anomalous nodes 
>> detection.
>> So I have few questions:
>> 1. Does Cassandra plan to take part in GSoC this year?
>> 2. Are anyone willing to mentor? (not necessarily relating to distributed 
>> metadata)
>> It'll be nice to see wide range of Cassandra projects and ideas. In this way 
>> Cassandra could attract new talents and accelerate development :)
>> (Mentoring organization application deadline is February 9, 16:00 UTC, so it 
>> need to be done as soon as possible)
>>
>> Regards, Maxim Rusak


Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Yuki Morishita
As an active committer, the most important thing for me is to be able
to *look up* design discussion and decision easily later.

I often look up the git history or CHANGES.txt for changes that I'm
interested in, then look up JIRA by following JIRA ticket number
written to the comment or text.
If we move to dev mailing list, I would request to post permalink to
that thread posted to JIRA, which I think is just one extra step that
isn't necessary if we simply use JIRA.

So, I'm +1 to just post JIRA link to dev list.


On Mon, Aug 15, 2016 at 12:35 PM, Chris Mattmann <mattm...@apache.org> wrote:
> This is a good outward flow of info to the dev list. However, there needs to 
> be
> inward flow too – having the convo on the dev list will be a good start to 
> that.
> I hope to see more inclusivity here.
>
>
>
> On 8/15/16, 10:26 AM, "Aleksey Yeschenko" <alek...@apache.org> wrote:
>
> Well, if you read carefully what Jeremiah and I have just proposed, it 
> wouldn’t be an issue.
>
> The notable major changes would start off on dev@ (think, a summary, a 
> link to the JIRA, and maybe an attached spec doc).
>
> No need to follow the JIRA feed. Watch dev@ for those announcements and 
> start watching the invidual JIRA tickets if interested.
>
> This creates the least amount of noise: you miss nothing important, and 
> at the same time you won’t be receiving mail from
> dev@ for each individual comment - including those on proposals you don’t 
> care about.
>
> We aren’t doing it currently, but we could, and probably should.
>
> --
> AY
>
> On 15 August 2016 at 18:22:36, Chris Mattmann (mattm...@apache.org) wrote:
>
> Discussion belongs on the dev list. Putting discussion in JIRA, is fine, 
> but realize,
> there is a lot of noise in that signal and people may or may not be 
> watching
> the JIRA list. In fact, I don’t see JIRA sent to the dev list at all so 
> you are basically
> forking the conversation to a high noise list by putting it all in JIRA.
>
>
>
>
>
> On 8/15/16, 10:11 AM, "Aleksey Yeschenko" <alek...@apache.org> wrote:
>
> I too feel like it would be sufficient to announce those major JIRAs on 
> the dev@ list, but keep all discussion itself to JIRA, where it belongs.
>
> You don’t need to follow every ticket this way, just subscribe to dev@ 
> and then start watching the select major JIRAs you care about.
>
> --
> AY
>
> On 15 August 2016 at 18:08:20, Jeremiah D Jordan 
> (jeremiah.jor...@gmail.com) wrote:
>
> I like keeping things in JIRA because then everything is in one place, 
> and it is easy to refer someone to it in the future.
> But I agree that JIRA tickets with a bunch of design discussion and POC’s 
> and such in them can get pretty long and convoluted.
>
> I don’t really like the idea of moving all of that discussion to email 
> which makes it has harder to point someone to it. Maybe a better idea would 
> be to have a “design/POC” JIRA and an “implementation” JIRA. That way we 
> could still keep things in JIRA, but the final decision would be kept “clean”.
>
> Though it would be nice if people would send an email to the dev list 
> when proposing “design” JIRA’s, as not everyone has time to follow every JIRA 
> ever made to see that a new design JIRA was created that they might be 
> interested in participating on.
>
> My 2c.
>
> -Jeremiah
>
>
> > On Aug 15, 2016, at 9:22 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> >
> > A long time ago, I was a proponent of keeping most development 
> discussions
> > on Jira, where tickets can be self contained and the threadless nature
> > helps keep discussions from getting sidetracked.
> >
> > But Cassandra was a lot smaller then, and as we've grown it has become
> > necessary to separate out the signal (discussions of new features and 
> major
> > changes) from the noise of routine bug reports.
> >
> > I propose that we take advantage of the dev list to perform that
> > separation. Major new features and architectural improvements should be
> > discussed first here, then when consensus on design is achieved, moved 
> to
> > Jira for implementation and review.
> >
> > I think this will also help with the problem when the initial idea 
> proves
> > to be unworkable and gets revised substantially later after much
> > discussion. It can be difficult to figure out what the conclusion was, 
> as
> > review comments start to pile up afterwards. Having that discussion on 
> the
> > list, and summarizing on Jira, would mitigate this.
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>
>
>
>
>
>
>



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Write latency metric

2016-02-07 Thread Yuki Morishita
Hi Rei,

As you pointed out, table (column family) metric only tracks memtable
and row cache update.
Update is done through Mutation object that can have several column
families at once.
And the unit of commit log is Mutation, so we may first write commit
log of several tables and then we update each table.
We really can't mix commit log latency to table metric right now.
So I would say it is intended design.
Other things you can use are query tracing to track individual query
performance and client request metrics for coordinator level metric.

Maybe we can improve things by adding metrics for Keyspace#apply(). We
have Keyspace level write metric, but it is just an accumulation of
all table level write metrics.
Feel free to open JIRA at
https://issues.apache.org/jira/browse/CASSANDRA for improvement
request.


On Fri, Feb 5, 2016 at 6:44 PM, 大平怜 <rei.oda...@gmail.com> wrote:
> Hi,
>
> I noticed that the write latency metric includes only memtable and rowcache
> updates, but not sync to a commitlog.  I am looking at
> ColumnFamilyStore.apply() and KeySpace.apply().  Is my understanding
> correct?  Is this an intended design?  I guess sync to a commitlog is the
> most time consuming operation during a write (especially when batch commit
> is used).
>
> Appreciate any help.
>
>
> Thanks,
> Rei Odaira



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: After 'git clone', nodetool results in AttributeNotFoundException

2015-11-15 Thread Yuki Morishita
gement.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:267)
> at com.sun.proxy.$Proxy7.getNonSystemKeyspaces(Unknown Source)
> at
> org.apache.cassandra.tools.NodeProbe.getNonSystemKeyspaces(NodeProbe.java:808)
> at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.parseOptionalKeyspace(NodeTool.java:326)
> at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.parseOptionalKeyspace(NodeTool.java:318)
> at org.apache.cassandra.tools.nodetool.Compact.execute(Compact.java:42)
> at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:245)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:159)
>
>
> Regards,
>
> Michael Edge



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 2.1.11

2015-10-12 Thread Yuki Morishita
I also worried about SSTableReaderTest failure in 2.1 and 2.2, since
stack trace seems to relate to MmappedSegmentedFile.

http://cassci.datastax.com/job/cassandra-2.1_utest/lastCompletedBuild/testReport/org.apache.cassandra.io.sstable/SSTableReaderTest/testLoadingSummaryUsesCorrectPartitioner/
http://cassci.datastax.com/job/cassandra-2.2_utest/lastCompletedBuild/testReport/org.apache.cassandra.io.sstable/SSTableReaderTest/testLoadingSummaryUsesCorrectPartitioner/

On Mon, Oct 12, 2015 at 8:56 PM, Stefania Alborghetti
<stefania.alborghe...@datastax.com> wrote:
> -1
>
> The python driver was upgraded and this requires a fix in cqlsh for COPY
> FROM to work, more details in CASSANDRA-10507
> <https://issues.apache.org/jira/browse/CASSANDRA-10507>.
>
> On Tue, Oct 13, 2015 at 2:42 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>
>> +1
>>
>> On Mon, Oct 12, 2015 at 2:32 PM, Brandon Williams <dri...@gmail.com>
>> wrote:
>>
>> > +1
>> >
>> > On Mon, Oct 12, 2015 at 1:05 PM, Jake Luciani <j...@apache.org> wrote:
>> >
>> > > I propose the following artifacts for release as 2.1.11.
>> > >
>> > > sha1: 4acc3a69d319b0e7e00cbd37b27e988ebfa4df4f
>> > > Git:
>> > >
>> > >
>> >
>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.11-tentative
>> > > Artifacts:
>> > >
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachecassandra-1084/org/apache/cassandra/apache-cassandra/2.1.11/
>> > > Staging repository:
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachecassandra-1084/
>> > >
>> > > The artifacts as well as the debian package are also available here:
>> > > http://people.apache.org/~jake
>> > >
>> > > The vote will be open for 48 hours (longer if needed).
>> > >
>> > > [1]: http://goo.gl/cfjxJU (CHANGES.txt)
>> > > [2]: http://goo.gl/nOz2X6 (NEWS.txt)
>> > >
>> >
>>
>
>
>
> --
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Stefania Alborghetti
>
> Apache Cassandra Software Engineer
>
> |+852 6114 9265| stefania.alborghe...@datastax.com



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Requiring Java 8 for C* 3.0

2015-05-07 Thread Yuki Morishita
+1

On Thu, May 7, 2015 at 11:13 AM, Jeremiah D Jordan
jerem...@datastax.com wrote:
 With Java 7 being EOL for free versions I am +1 on this.  If you want to 
 stick with 7, you can always keep running 2.1.

 On May 7, 2015, at 11:09 AM, Jonathan Ellis jbel...@gmail.com wrote:

 We discussed requiring Java 8 previously and decided to remain Java
 7-compatible, but at the time we were planning to release 3.0 before Java 7
 EOL.  Now that 8099 and increased emphasis on QA have delayed us past Java
 7 EOL, I think it's worth reopening this discussion.

 If we require 8, then we can use lambdas, LongAdder, StampedLock, Streaming
 collections, default methods, etc.  Not just in 3.0 but over 3.x for the
 next year.

 If we don't, then people can choose whether to deploy on 7 or 8 -- but the
 vast majority will deploy on 8 simply because 7 is no longer supported
 without a premium contract with Oracle.  8 also has a more advanced G1GC
 implementation (see CASSANDRA-7486).

 I think that gaining access to the new features in 8 as we develop 3.x is
 worth losing the ability to run on a platform that will have been EOL for a
 couple months by the time we release.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Error opening zip file or JAR manifest missing

2014-11-24 Thread Yuki Morishita
Micheal,

Thanks, and done. :)

On Mon, Nov 24, 2014 at 6:37 PM, Michael Shuler mich...@pbandjelly.org wrote:
 On 11/22/2014 05:58 PM, Rajanarayanan Thottuvaikkatumana wrote:

 A huge number of tests are failing with the same error message.
 Reproducing one of such errors.

 testlist:
   [echo] running test bucket 0 tests
  [mkdir] Created dir:
 /Users/RajT/cassandra-source/cassandra-trunk/build/test/cassandra
  [mkdir] Created dir:
 /Users/RajT/cassandra-source/cassandra-trunk/build/test/output
  [junit] WARNING: multiple versions of ant detected in path for junit
  [junit]
 jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class
  [junit]  and
 jar:file:/Users/RajT/cassandra-source/cassandra-trunk/build/lib/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
  [junit] Error occurred during initialization of VM
  [junit] agent library failed to init: instrument
  [junit] objc[1300]: Class JavaLaunchHelper is implemented in both
 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home/jre/bin/java
 and
 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
  [junit] Error opening zip file or JAR manifest missing :
 /Users/RajT/cassandra-source/cassandra-trunk/lib/jamm-0.3.0.jar
  [junit] Testsuite: org.apache.cassandra.cache.AutoSavingCacheTest
  [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
 elapsed: 0 sec
  [junit]
  [junit] Testcase:
 org.apache.cassandra.cache.AutoSavingCacheTest:null: Caused an ERROR
  [junit] Forked Java VM exited abnormally. Please note the time in the
 report does not reflect the time until the VM exit.
  [junit] junit.framework.AssertionFailedError: Forked Java VM exited
 abnormally. Please note the time in the report does not reflect the time
 until the VM exit.
  [junit]at java.lang.Thread.run(Thread.java:745)


 I fixed this last time a new jamm jar was included. The problem is the
 conflict resolution on merge from 2.1 to trunk - value= is overwritten where
 there should be line=.

 Patch attached. Could someone commit this to trunk, pretty please?  :)

 --
 Kind regards,
 Michael




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Performance Tickets

2014-04-16 Thread Yuki Morishita
Yes, I have an idea forming up in my mind for 5220, so I'm still on it.

On Wed, Apr 16, 2014 at 6:25 AM, Benedict Elliott Smith
belliottsm...@datastax.com wrote:

   - CASSANDRA-5220: Repair improvements when using vnodes


 That definitely deserves a performance tag. Yuki, are you looking at this
 or should we unassign in case somebody else wants to jump in?


 On 15 April 2014 14:59, Michael Shuler mich...@pbandjelly.org wrote:

 On 04/15/2014 08:28 AM, Benedict Elliott Smith wrote:

 It's only been six months since the last performance drive, and 2.1 is now
 around the corner. But I'm hoping we can push performance even further for
 3.0. With that in mind, I've picked out what I think are the nearest term
 wins to focus on.

 - CASSANDRA-7039: DirectByteBuffer compatible LZ4 methods
 - CASSANDRA-6726: RAR/CRAR off-heap
 - CASSANDRA-6633: Dynamic bloom filter resizing
 - CASSANDRA-6755: Optimise CellName/Composite comparisons for
 NativeCell
 - CASSANDRA-7032: Improve vnode allocation
 - CASSANDRA-6809: Compressed Commit Log
 - CASSANDRA-5663: write batching in native protocol
 - CASSANDRA-5863: In-process (uncompressed) page cache
 - CASSANDRA-7040: Replace read/write stage with per-disk access
 coordination
 - CASSANDRA-6917: enum data type
 - CASSANDRA-6935: Make clustering part of primary key a first order

 component in the storage engine

 I've arranged them in ascending order of my intuitive impression of their
 difficulty. Don't all leap at the last few :)

 Anything I've missed?


   - CASSANDRA-5220: Repair improvements when using vnodes

 Not sure where that might go in your difficulty ordering, but since vnodes
 are default and repair time seems to be a pretty common question/pain, it's
 important for ops and highly relevant to cluster performance. If some of
 the above list might directly affect/help repair performance, let's get
 them tied together in Jira :)

 --
 Kind regards,
 Michael





-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: inserting a row with a map column when using if not exists results in null column for the row

2013-09-25 Thread Yuki Morishita
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-6069

On Wed, Sep 25, 2013 at 8:29 PM, Joe Stein crypt...@gmail.com wrote:
 Hi, was not sure if there is a reason for this or I am doing something
 wrong or is known issue or not but when trying to insert a row with a map
 collection column and using if not exist the map is coming out as null :(
 see below, let me know, thanks!

 cqlsh:rvag CREATE TABLE users (
 ... id text PRIMARY KEY,
 ... given text,
 ... surname text,
 ... favs maptext, text   // A map of text keys, and text
 values
 ... );
 cqlsh:rvag INSERT INTO users (id, given, surname, favs)
 ...VALUES ('jsmith', 'John', 'Smith', { 'fruit' :
 'apple', 'band' : 'Beatles' });
 cqlsh:rvag select * from users;

  id | favs  | given | surname
 +---+---+-
  jsmith | {'band': 'Beatles', 'fruit': 'apple'} |  John |   Smith

 (1 rows)

 cqlsh:rvag truncate users;
 cqlsh:rvag select * from users;

 (0 rows)

 cqlsh:rvag INSERT INTO users (id, given, surname, favs)
 ...VALUES ('jsmith', 'John', 'Smith', { 'fruit' :
 'apple', 'band' : 'Beatles' }) IF NOT EXISTS;
 cqlsh:rvag select * from users;

  id | favs | given | surname
 +--+---+-
  jsmith | null |  John |   Smith

 (1 rows)

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 1.2.10

2013-09-19 Thread Yuki Morishita
+1

On Thu, Sep 19, 2013 at 8:15 AM, Gary Dusbabek gdusba...@gmail.com wrote:
 +1


 On Thu, Sep 19, 2013 at 3:59 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 The changelog is getting big, I propose the following artifacts for release
 as 1.2.10.

 sha1: 937536363a8a6d86ee32fe5ef90653264e67b6c7
 Git:

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.10-tentative
 Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-078/org/apache/cassandra/apache-cassandra/1.2.10/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-078/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/x1wOFw (CHANGES.txt)
 [2]: http://goo.gl/VOHl5l (NEWS.txt)




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 2.0.1

2013-09-19 Thread Yuki Morishita
I think https://issues.apache.org/jira/browse/CASSANDRA-5661 broke
offline tools like sstableloader.

I am getting following error:

Exception in thread main java.lang.ExceptionInInitializerError
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:36)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:161)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:142)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:896)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:831)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:743)
at 
org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:122)
at java.io.File.list(File.java:1087)
at 
org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:73)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:155)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:66)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.config.DatabaseDescriptor.getFileCacheSizeInMB(DatabaseDescriptor.java:1145)
at 
org.apache.cassandra.service.FileCacheService.clinit(FileCacheService.java:41)
... 11 more

Looks like new file_cache_size_in_mb does not have default value.

On Thu, Sep 19, 2013 at 12:34 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Does that mean we need to rebuild the 2.0.1 artifacts?

 On Thu, Sep 19, 2013 at 12:18 PM, Pavel Yaskevich pove...@gmail.com wrote:
 I want to bump disruptor_thrift_server version to 3.0.1 today before C* 
 2.0.1, otherwise +1.

 Sent from my iPhone

 On Sep 19, 2013, at 8:52 AM, Gary Dusbabek gdusba...@gmail.com wrote:

 +1


 On Thu, Sep 19, 2013 at 7:23 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 We have quite a bunch of bug fixed and new stuffs on the 2.0 branch that we
 should get to the users. So I propose the following artifacts for release
 as
 2.0.1.

 sha1: 72c50bd7505c4d27838134882ef2c2d7d555f7be
 Git:

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.1-tentative
 Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-079/org/apache/cassandra/apache-cassandra/2.0.1/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-079/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/URvgVt (CHANGES.txt)
 [2]: http://goo.gl/EXLlJy (NEWS.txt)




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Time for a release candidate?

2013-08-05 Thread Yuki Morishita
+1.
Also I think it is time for creating cassandra-2.0 branching.

On Sat, Aug 3, 2013 at 2:48 PM, Jonathan Ellis jbel...@gmail.com wrote:
 We've cleaned out everything tagged for rc1, anything else we need to
 do before rolling it up for a vote?

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 2.0.0-beta2

2013-07-22 Thread Yuki Morishita
+1

On Mon, Jul 22, 2013 at 11:42 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Yes, we need to split that 1.2.7 section into Merged from 1.2 for b1 and b2.

 On Mon, Jul 22, 2013 at 11:39 AM, Andrew Cobley a.e.cob...@dundee.ac.uk 
 wrote:
 Just a minor point,   isn't 5768 rolled into beta 2 (it's also rolled into 
 1.2.7)

 https://issues.apache.org/jira/browse/CASSANDRA-5768

 Andy

 On 22 Jul 2013, at 17:23, Sylvain Lebresne 
 sylv...@datastax.commailto:sylv...@datastax.com
  wrote:

 A healthy amount of bugs have been found since beta1, so let's release a new
 beta to shake out the remaining ones before a RC. I thus propose the
 following
 artifacts for release as 2.0.0-beta2

 sha1: e0eacd28183beb6f2b7c995b4cde4e85b7b30e4b
 Git:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.0-beta2-tentative
 Artifacts:
 https://repository.apache.org/content/repositories/orgapachecassandra-004/org/apache/cassandra/apache-cassandra/2.0.0-beta2/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-004/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/wBmSts (CHANGES.txt)
 [2]: http://goo.gl/FZ37wh (NEWS.txt)


 The University of Dundee is a registered Scottish Charity, No: SC015096



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Yuki Morishita
You are right about the behavior of cassandra compaction.
It checks if the key exists on the other SSTable files that are not in
the compaction set.

I think https://issues.apache.org/jira/browse/CASSANDRA-4671 would
help if you upgrade to latest 1.2,
but in your version, I think the workaround is to stop write not to
flush, then compact.

On Thu, May 16, 2013 at 3:07 AM, Boris Yen yulin...@gmail.com wrote:
 Hi All,

 Sorry for the wide distribution.

 Our cassandra is running on 1.0.10. Recently, we are facing a weird
 situation. We have a column family containing wide rows (each row might
 have a few million of columns). We delete the columns on a daily basis and
 we also run major compaction on it everyday to free up disk space (the
 gc_grace is set to 600 seconds).

 However, every time we run the major compaction, only 1 or 2GB disk space
 is freed. We tried to delete most of the data before running compaction,
 however, the result is pretty much the same.

 So, we tried to check the source code. It seems that the column tombstones
 could only be purged when the row key is not in other sstables. I know the
 major compaction should include all sstables, however, in our use case,
 columns get inserted rapidly. This will make the cassandra flush the
 memtables to disk and create new sstables. The newly created sstables will
 have the same keys as the sstables that are being compacted (the compaction
 will take 2 or 3 hours to finish). My question is that will these newly
 created sstables be the cause of why most of the column-tombstone not being
 purged?

 p.s. We also did some other tests. We inserted data to the same CF with the
 same wide-row pattern and deleted most of the data. This time we stopped
 all the writes to cassandra and did the compaction. The disk usage
 decreased dramatically.

 Any suggestions or is this a know issue.

 Thanks and Regards,
 Boris



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 1.2.5 (strike 3)

2013-05-15 Thread Yuki Morishita
+1

On Wed, May 15, 2013 at 10:06 AM, Jason Brown jasedbr...@gmail.com wrote:
 +1


 On Wed, May 15, 2013 at 7:59 AM, Jonathan Ellis jbel...@gmail.com wrote:

 +1

 On Wed, May 15, 2013 at 9:16 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
  Hopefully third times the charm. The #4860 fix has been committed so I
  propose
  the following artifacts for release as 1.2.5.
 
  sha1: 7d4380d661e7bbf3ec075b069cf2c22e9b87375f
  Git:
 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.5-tentative
  Artifacts:
 
 https://repository.apache.org/content/repositories/orgapachecassandra-025/org/apache/cassandra/apache-cassandra/1.2.5/
  Staging repository:
 
 https://repository.apache.org/content/repositories/orgapachecassandra-025/
 
  The artifacts as well as the debian package are also available here:
  http://people.apache.org/~slebresne/
 
  The vote will be open for 72 hours (longer if needed).
 
  [1]: http://goo.gl/RC5VV (CHANGES.txt)
  [2]: http://goo.gl/h38f4 (NEWS.txt)



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 1.2.0

2012-12-29 Thread Yuki Morishita
+1

On Saturday, December 29, 2012, Jason Brown wrote:

 +1
 On Dec 29, 2012 9:31 AM, Vijay vijay2...@gmail.com javascript:;
 wrote:

  +1
 
  Regards,
  /VJ
 
 
  On Sat, Dec 29, 2012 at 4:40 AM, Sylvain Lebresne 
  sylv...@datastax.comjavascript:;
  wrote:
 
   After a quiet 2nd release candidate, it is time to release the final
  1.2.0.
   I thus propose the following artifacts for release as 1.2.0.
  
   sha1: 69337a43670f71ae1fc55e23d6a9031230423900
   Git:
  
  
 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.0-tentative
   Artifacts:
  
  
 
 https://repository.apache.org/content/repositories/orgapachecassandra-081/org/apache/cassandra/apache-cassandra/1.2.0/
   Staging repository:
  
 
 https://repository.apache.org/content/repositories/orgapachecassandra-081/
  
   The artifacts as well as the debian package are also available here:
   http://people.apache.org/~slebresne/
  
   The vote will be open for 72 hours (longer if needed).
  
   [1]: http://goo.gl/zLqf9 (CHANGES.txt)
   [2]: http://goo.gl/aKgkY (NEWS.txt)
  
 



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: [VOTE] Release Apache Cassandra 1.2.0-rc1

2012-12-11 Thread Yuki Morishita
+1 

yuki


On Tuesday, December 11, 2012 at 7:27 AM, Jonathan Ellis wrote:

 +1
 On Dec 11, 2012 5:19 AM, Sylvain Lebresne sylv...@datastax.com 
 (mailto:sylv...@datastax.com) wrote:
 
  We've now fixed all remaining blocking problems on the 1.2.0 branch since
  beta3, and if we want to release the final before the end of the year it's
  time
  to get serious, so I propose the following artifacts for release as
  1.2.0-rc1.
  
  sha1: d791e0b3fa5a615f2df200ecd40f82dc9a5874d6
  Git:
  
  http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.0-rc1-tentative
  Artifacts:
  
  https://repository.apache.org/content/repositories/orgapachecassandra-136/org/apache/cassandra/apache-cassandra/1.2.0-rc1/
  Staging repository:
  https://repository.apache.org/content/repositories/orgapachecassandra-136/
  
  The artifacts as well as the debian package are also available here:
  http://people.apache.org/~slebresne/
  
  The vote will be open for 72 hours (longer if needed).
  
  [1]: http://goo.gl/8RW6q (CHANGES.txt)
  [2]: http://goo.gl/ltVIw (NEWS.txt)