Re: Revisit Cassandra EOL Policy

2016-01-07 Thread Maciek Sakrejda
Anuj, do you have a link to the versioning policy? The tick-tock versioning
blog post [1] says that EOL happens after two major versions come out, but
I can't find this stated more formally anywhere. I'm interested in how long
a given version will receive patches for security issues or critical data
loss bugs (i.e., the policy of the Apache project itself, distinct from any
support that may be available through Datastax). The Postgres project has a
great write-up of their policy [2].

And for what it's worth, we are starting to use Cassandra and do have
automation around it. I don't have strong feelings about what the
versioning policy should look like, but having clear expectations about
what happens if there's a critical bug (i.e., can we expect a patch or do
we need to upgrade major versions?) is very useful.

[1]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
[2]: http://www.postgresql.org/support/versioning/
​


Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-10 Thread Maciek Sakrejda
Thanks, Josh and Paulo--that's much clearer.
​


Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-09 Thread Maciek Sakrejda
I'm still confused, even after reading the blog post twice (and reading the
linked Intel post). I understand what you are doing conceptually, but I'm
having a hard time mapping that to actual planned release numbers.

> The 3.0.2 will only contain bugfixes, while 3.2 will introduce new
features.

Will 3.2 contain the bugfixes that are in 3.0.2 as well? Is 3.x.y just
3.0.x plus new stuff? Where most of the time y is 0, unless there's a
really serious issue that needs fixing?


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-18 Thread Maciek Sakrejda
Just wanted to follow up and say thanks: I went through this process (as
per Robert's suggestion, with the node stopped and no refresh) on an
affected cluster and was able to resolve the issue.
​


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-13 Thread Maciek Sakrejda
Any advice on how to proceed here? Sebastian seems to have guessed
correctly at the underlying issue, but I'm still not sure how to resolve
this given what I see in the data directory and the catalogs.

On Wed, Nov 11, 2015 at 12:15 PM, Maciek Sakrejda <mac...@heroku.com> wrote:

> On Wed, Nov 11, 2015 at 9:55 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> Stupid question, but how do I find the problem table? The error message
>>> complains about a keyspace (by uuid); I haven't seen errors relating to a
>>> specific table. I've poked around in the data directory, but I'm not sure
>>> what I'm looking for.
>>
>>
>> Is the message complaining about a *keyspace* or abou*t a table (cfid)*?
>> You'r original was complaining about a table:
>>
>
>> at=IncomingTcpConnection.run UnknownColumnFamilyException reading from
>>> socket; closing org.apache.cassandra.db.UnknownColumnFamilyException:
>>> Couldn't find *cfId=3ecce750-84d3-11e5-bdd9-**dd7717dcdbd5*
>>
>>
> Sorry, you're absolutely right--it's the table from this error message. I
> confused myself. But now I was able to find it:
>
> cursors-3ecce75084d311e5bdd9dd7717dcdbd5
> cursors-3ed23e8084d311e583b30fc0205655f5
>
> The second uuid is the one that shows up via the schema_columnfamilies
> query, but on two of the nodes, the directory with the *other* uuid exists.
> Can I just rename the directory on these two nodes? Or how should I proceed?
>


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-13 Thread Maciek Sakrejda
On Fri, Nov 13, 2015 at 9:56 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> I think you're just missing the steps in *Bold*:
>
> Thanks, but I wasn't clear on what to do if the "new" directory does not
exist at all on some of the nodes (only the old). Can I just rename the
"old" to the "new" or is there more to it?


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-11 Thread Maciek Sakrejda
On Tue, Nov 10, 2015 at 3:20 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> #1 The cause of this problem is a CREATE TABLE statement collision. Do not 
> generate tables
> dynamically from multiple clients, even with IF NOT EXISTS. First thing you 
> need to do is
> fix your code so that this does not happen. Just create your tables manually 
> from cqlsh allowing
> time for the schema to settle.
>
> #2 Here's the fix:
>
> 1) Change your code to not automatically re-create tables (even with IF NOT 
> EXISTS).
>
> 2) Run a rolling restart to ensure schema matches across nodes. Run nodetool 
> describecluster
>
> around your cluster. Check that there is only one schema version.
>
> Thanks, that seems to have resolved the schema version inconsistency
(though I'm still getting the original error).

> ON EACH NODE:
>
> 3) Check your filesystem and see if you have two directories for the table in
>
> question in the data directory.
>
> Stupid question, but how do I find the problem table? The error message
complains about a keyspace (by uuid); I haven't seen errors relating to a
specific table. I've poked around in the data directory, but I'm not sure
what I'm looking for.


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-11 Thread Maciek Sakrejda
On Wed, Nov 11, 2015 at 9:55 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Stupid question, but how do I find the problem table? The error message
>> complains about a keyspace (by uuid); I haven't seen errors relating to a
>> specific table. I've poked around in the data directory, but I'm not sure
>> what I'm looking for.
>
>
> Is the message complaining about a *keyspace* or abou*t a table (cfid)*?
> You'r original was complaining about a table:
>

> at=IncomingTcpConnection.run UnknownColumnFamilyException reading from
>> socket; closing org.apache.cassandra.db.UnknownColumnFamilyException:
>> Couldn't find *cfId=3ecce750-84d3-11e5-bdd9-**dd7717dcdbd5*
>
>
Sorry, you're absolutely right--it's the table from this error message. I
confused myself. But now I was able to find it:

cursors-3ecce75084d311e5bdd9dd7717dcdbd5
cursors-3ed23e8084d311e583b30fc0205655f5

The second uuid is the one that shows up via the schema_columnfamilies
query, but on two of the nodes, the directory with the *other* uuid exists.
Can I just rename the directory on these two nodes? Or how should I proceed?


UnknownColumnFamily exception / schema inconsistencies

2015-11-10 Thread Maciek Sakrejda
Hello,

I've been having some strange issues with one of our test clusters
(4-day-old, 3-node, 2.1.10 cluster on AWS). I saw a number of messages like
the following:

[] 10 Nov 20:21:00.406 * pri=WARN  t=MessagingService-Incoming-/
192.168.168.202 at=IncomingTcpConnection.run UnknownColumnFamilyException
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=3ecce750-84d3-11e5-bdd9-dd7717dcdbd5

A colleague suggested I run repair, but that failed with:

[2015-11-10 20:06:54,329] Nothing to repair for keyspace 'eventPipesState'
[2015-11-10 20:06:54,348] Starting repair command #1, repairing 768 ranges
for keyspace dbs8okvd7jcurj (parallelism=SEQUENTIAL, full=true)
[2015-11-10 20:06:55,599] Repair command #1 finished
[2015-11-10 20:06:55,610] Starting repair command #2, repairing 487 ranges
for keyspace context (parallelism=SEQUENTIAL, full=true)
[2015-11-10 20:11:21,213] Lost notification. You should check server log
for repair status of keyspace context
[2015-11-10 20:11:21,288] Lost notification. You should check server log
for repair status of keyspace context
Exception occurred during clean-up.
java.lang.reflect.UndeclaredThrowableException
error: JMX connection closed. You should check server log for repair status
of keyspace context(Subsequent keyspaces are not going to be repaired).
-- StackTrace --
java.io.IOException: JMX connection closed. You should check server log for
repair status of keyspace context(Subsequent keyspaces are not going to be
repaired).

I searched for other cases of similar issues, and found some posts (e.g.,
http://stackoverflow.com/questions/22783577/org-apache-cassandra-db-unknowncolumnfamilyexception-couldnt-find-cfid
), but nothing that seemed directly relevant. Still, I tried `nodetool
describecluster` and all the nodes showed up as being on the same schema
version.

The server log did not include any more info. I asked about this on IRC and
got the suggestion to run `nodetool resetlocalschema`. I tried running
that, and it completed (and `nodetool describecluster` now shows this node
as having a different schema version from the other two nodes) but now I
still get the original error in the server logs but also

[] 10 Nov 22:51:10.466 * pri=ERROR t=Thrift:12
at=CustomTThreadPoolServer.run Error occurred during processing of message.
java.lang.IllegalArgumentException: Unknown keyspace/cf pair
(system_auth.credentials)

Further `nodetool repair`s on the same node do complete, but only seem to
process the `system` keyspace (and don't do anything with it):

[2015-11-10 22:38:07,415] Nothing to repair for keyspace 'system'

I also tried running `nodetool repair` from another node in the cluster,
but that just seems to hang:

[2015-11-10 22:53:11,830] Starting repair command #7, repairing 768 ranges
for keyspace dbs8okvd7jcurj (parallelism=SEQUENTIAL, full=true)
[2015-11-10 22:53:12,943] Repair command #7 finished
[2015-11-10 22:53:12,958] Starting repair command #8, repairing 534 ranges
for keyspace context (parallelism=SEQUENTIAL, full=true)

How can I restore this cluster? And ideally, how can I figure out what went
wrong here in the first place?


Re: UnknownColumnFamily exception / schema inconsistencies

2015-11-10 Thread Maciek Sakrejda
Oh and for what it's worth, I've also looked through the logs for this
node, and the oldest error in the logs seems to be:

[] 06 Nov 22:10:53.260 * pri=ERROR t=Thrift:16
at=CustomTThreadPoolServer.run Error occurred during processing of message.
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ConfigurationException: Column family ID
mismatch (found 3ed23e80-84d3-11e5-83b3-0fc0205655f5; expected
3ecce750-84d3-11e5-bdd9-dd7717dcdbd5)

Then the logs show a compaction, and then the UnknownColumnFamilyException
starts occuring.
​


Re: Incremental repair from the get go

2015-11-02 Thread Maciek Sakrejda
Following up on this older question: as per the docs, one *should* still do
full repair periodically (the docs say weekly), right? And run incremental
more often to fill in?


Re: What is your backup strategy for Cassandra?

2015-09-18 Thread Maciek Sakrejda
On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky  wrote:

> This seems like an apt time to quote [1]:
>
> > Remember that you get 1 point for making a backup and 10,000 points for
> restoring one.
>
> Restoring from backups is my goal.
>
> The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem
> to leave the restore operation as a pretty complicated exercise for the
> operator.
>
> Do any include a working way to restore, on a different host, all of node
> X's data from backups to the correct directories, such that the restored
> files are in the proper places and the node restart method [2] "just works"?
>

As someone getting started with Cassandra, I'm very much interested in this
as well. It seems that for the most part, folks seem to rely on replication
and node replacement to recover from failures, and perhaps this is a
testament for how well this works, but as long as we're hauling out
aphorisms, "RAID is not a backup" seems to (partially) apply here too.

I'd love to hear more about how the community does restores, too. This
isn't complaining about shoddy tooling: this is trying to understand--and
hopefully, in time, improve--the status quo re: disaster recovery. E.g.,
given that tableslurp operates on a single table at a time, do people
normally just restore single tables? Is that used when there's filesystem
or disk corruption? Bugs? Other issues? Looking forward to learning more.

Thanks,
Maciek


Re: Replacing dead node and cassandra.replace_address

2015-09-08 Thread Maciek Sakrejda
On Tue, Sep 8, 2015 at 11:14 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> Once the new node is bootstrapped, you could remove replacement_address
> from the env.sh file
>
Thanks, but how do I know when bootstrapping is completed?


Replacing dead node and cassandra.replace_address

2015-09-08 Thread Maciek Sakrejda
According to the docs [1], when replacing a Cassandra node, I should start
the replacement with cassandra.replace_address specified. Does that just
become part of the replacement node's startup configuration? Can I (or do I
have to) stop specifying it at some point? Does this affect subsequent node
restarts (whether intentional or due to a crash)?

I'm running Cassandra 2.1.

Thanks,
Maciek

[1]:
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html


tablesnap / tableslurp usage pointers?

2015-09-04 Thread Maciek Sakrejda
Hi,

I'm trying to use tablesnap [1] for disaster recovery backups, and while my
uploads seem to be working fine, I can't figure out how to run the
associated tableslurp tool for restores. If I pass the full S3 path to the
individual table to tableslurp, it will restore that table, but if I try to
pass the path to, e.g., the full keyspace, I get:

LookupError: Cannot find anything to restore from
my-bucket:my-prefix:/my-path

Based on the source [2], it seems to be only looking for `-listdir.json`
files in the same directory, but my directories in S3 only have
`-listdir.json` files for other *files*, not directories. I am running
tablesnap with the `--recursive` option.

Any ideas?

Thanks,
Maciek

[1]: https://github.com/JeremyGrosser/tablesnap
[2]:
https://github.com/JeremyGrosser/tablesnap/blob/master/tableslurp#L122-L124