Re: Issues, understanding how CQL works

2020-04-21 Thread Voytek Jarnot
As I learned the hard way (and has already been implied), design your
tables to support your queries.

We have, for example, 9 tables storing the same data, because users wish to
query in different ways. Could be several more tables (if one was being a
purist), but indexes get us the rest of the way there.

On Tue, Apr 21, 2020 at 8:20 AM Marc Richter  wrote:

> Hi everyone,
>
> I'm very new to Cassandra. I have, however, some experience with SQL.
>
> I need to extract some information from a Cassandra database that has
> the following table definition:
>
> CREATE TABLE tagdata.central (
> signalid int,
> monthyear int,
> fromtime bigint,
> totime bigint,
> avg decimal,
> insertdate bigint,
> max decimal,
> min decimal,
> readings text,
> PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
> )
>
> The database is already of round about 260 GB in size.
> I now need to know what is the most recent entry in it; the correct
> column to learn this would be "insertdate".
>
> In SQL I would do something like this:
>
> SELECT insertdate FROM tagdata.central
> ORDER BY insertdate DESC LIMIT 1;
>
> In CQL, however, I just can't get it to work.
>
> What I have tried already is this:
>
> SELECT insertdate FROM "tagdata.central"
> ORDER BY insertdate DESC LIMIT 1;
>
> But this gives me an error:
> ERROR: ORDER BY is only supported when the partition key is restricted
> by an EQ or an IN.
>
> So, after some trial and error and a lot of Googling, I learned that I
> must include all rows from the PRIMARY KEY from left to right in my
> query. Thus, this is the "best" I can get to work:
>
>
> SELECT
> *
> FROM
> "tagdata.central"
> WHERE
> "signalid" = 4002
> AND "monthyear" = 201908
> ORDER BY
> "fromtime" DESC
> LIMIT 10;
>
>
> The "monthyear" column, I crafted like a fool by incrementing the date
> one month after another until no results could be found anymore.
> The "signalid" I grabbed from one of the unrestricted "SELECT * FROM" -
> query results. But these can't be as easily guessed as the "monthyear"
> values could.
>
> This is where I'm stuck!
>
> 1. This does not really feel like the ideal way to go. I think there is
> something more mature in modern IT systems. Can anyone tell me what is a
> better way to get these informations?
>
> 2. I need a way to learn all values that are in the "monthyear" and
> "signalid" columns in order to be able to craft that query.
> How can I achieve that in a reasonable way? As I said: The DB is round
> about 260 GB which makes it next to impossible to just "have a look" at
> the output of "SELECT *"..
>
> Thanks for your help!
>
> Best regards,
> Marc Richter
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: How to elect a normal node to a seed node

2020-02-12 Thread Voytek Jarnot
>This means that from the client driver perspective when I define the
contact points I can specify any node in the cluster as contact point and
not necessary a seed node?

Correct.



On Wed, Feb 12, 2020 at 11:48 AM Sergio  wrote:

> So if
> 1) I stop the a Cassandra node that doesn't have in the seeds IP list
> itself
> 2) I change the cassandra.yaml of this node and I add it to the seed list
> 3) I restart the node
>
> It will work completely fine and this is not even necessary.
>
> This means that from the client driver perspective when I define the
> contact points I can specify any node in the cluster as contact point and
> not necessary a seed node?
>
> Best,
>
> Sergio
>
>
> On Wed, Feb 12, 2020, 9:08 AM Arvinder Dhillon 
> wrote:
>
>> I believe seed nodes are not special nodes, it's just that you choose a
>> few nodes from cluster that helps to bootstrap new joining nodes. You can
>> change Cassandra.yaml to make any other node as seed node. There's nothing
>> like promotion.
>>
>> -Arvinder
>>
>> On Wed, Feb 12, 2020, 8:37 AM Sergio  wrote:
>>
>>> Hi guys!
>>>
>>> Is there a way to promote a not seed node to a seed node?
>>>
>>> If yes, how do you do it?
>>>
>>> Thanks!
>>>
>>


Re: sstableloader: How much does it actually need?

2020-02-06 Thread Voytek Jarnot
Been thinking about it, and I can't really see how with 4 nodes and RF=3,
any 2 nodes would *not* have all the data; but am more than willing to
learn.

On the other thing: that's an attractive option, but in our case, the
target cluster will likely come into use before the source-cluster data is
available to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>


sstableloader: How much does it actually need?

2020-02-05 Thread Voytek Jarnot
Scenario: Cassandra 3.11.x, 4 nodes, RF=3; moving to identically-sized
cluster via snapshots and sstableloader.

As far as I can tell, in the topology given above, any 2 nodes contain all
of the data. In terms of migrating this cluster, would there be any
downsides or risks with snapshotting and loading (sstableloader) only 2 of
the nodes rather than all 4?

Apologies for the spate of hypotheticals lately, this project is making
life interesting.

Thanks,
Voytek Jarnot


Re: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Voytek Jarnot
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots
from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike.
That's not to say that it didn't happen, but I think I'd have noticed as I
was loading approx 250GB x 4 (although sequentially rather than 4x
sstableloader in parallel).

Also, thanks to everyone for confirming no issue with num_tokens and
sstableloader; appreciate it.


On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R 
wrote:

> I would suggest to be aware of potential data size expansion. If you load
> (for example) three copies of the data into a new cluster (because the RF
> of the origin cluster is 3), it will also get written to the RF of the new
> cluster (3 more times). So, you could see data expansion of 9x the original
> data size (or, origin RF * target RF), until compaction can run.
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Erick Ramirez 
> *Sent:* Friday, January 24, 2020 11:03 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change
>
>
>
>
>
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>
>
>
> No, there isn't. It will work as designed so you're good to go. Cheers!
>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore
snapshots taken from 256-token nodes into a cluster with 32-token (or your
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

On Fri, Jan 24, 2020 at 11:15 AM Sergio  wrote:

> https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html
>
> Just skimming through the docs
>
> I see examples by loading from CSV / JSON
>
> Maybe there is some other command or doc page that I am missing
>
>
>
>
> On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth  wrote:
>
>> Dsbulk works same as sstableloder.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
>>
>> 
>> I was wondering if that improvement for token allocation would work even
>> with just one rack. It should but I am not sure.
>>
>> Does Dsbulk support migration cluster to cluster without CSV or JSON
>> export?
>>
>> Thanks and Regards
>>
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>>
>>> Instead of sstableloader consider dsbulk by datastax.
>>>
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>>> rpinchb...@tripadvisor.com> wrote:
>>>
>>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>>> Accelerate 2019 talk is available at:
>>>>
>>>>
>>>>
>>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>>
>>>>
>>>>
>>>> You might want to check that out.  Also I think the amount of effort
>>>> you put into evening out the token distribution increases as vnode count
>>>> shrinks.  The caveats are explored at:
>>>>
>>>>
>>>>
>>>>
>>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Voytek Jarnot 
>>>> *Reply-To: *"user@cassandra.apache.org" 
>>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>>> *To: *"user@cassandra.apache.org" 
>>>> *Subject: *sstableloader & num_tokens change
>>>>
>>>>
>>>>
>>>> *Message from External Sender*
>>>>
>>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>>> 4 node RF=3 cluster.
>>>>
>>>>
>>>>
>>>> I've read that 256 is not an optimal default num_tokens value, and that
>>>> 32 is likely a better option.
>>>>
>>>>
>>>>
>>>> We have the "opportunity" to switch, as we're migrating environments
>>>> and will likely be using sstableloader to do so. I'm curious if there are
>>>> any gotchas with using sstableloader to restore snapshots taken from
>>>> 256-token nodes into a cluster with 32-token nodes (otherwise same # of
>>>> nodes and same RF).
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>


Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new
Cassandra are unnecessary steps in my case.

On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth  wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchb...@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>


sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
node RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32
is likely a better option.

We have the "opportunity" to switch, as we're migrating environments and
will likely be using sstableloader to do so. I'm curious if there are any
gotchas with using sstableloader to restore snapshots taken from 256-token
nodes into a cluster with 32-token nodes (otherwise same # of nodes and
same RF).

Thanks in advance.


Log output when Cassandra is "up"?

2020-01-08 Thread Voytek Jarnot
Needing to know when Cassandra is finished initializing and is up & running.

Had some scripts which were looking through system.log for "No gossip
backlog; proceeding", but that turns out not to be 100% reliable.

Is looking for "Starting listening for CQL clients" considered definitive?
I.E., always gets output on success, and not on failure?

Thanks


Re: nodetool rebuild on non-empty nodes?

2019-10-16 Thread Voytek Jarnot
Apologies for the bump, but I'm wondering if anyone has any thoughts on the
question below - specifically about running nodetool rebuild on a
destination that has data that does not exist in the source

Thanks.

On Wed, Sep 11, 2019 at 2:41 PM Voytek Jarnot 
wrote:

> Pardon the convoluted scenario, but we face some pretty ridiculous
> infrastructure restrictions.
>
> datacenter DC1: nodes containing many years of data written before
> 2019-09-01 (for example)
>
> datacenter DC2: nodes containing data written after 2019-09-01
>
> The idea is that these are independent clusters. We now connect them into
> a multi-DC cluster, and alter our keyspace to replicate to both DCs.
>
> What is the effect of running `nodetool rebuild -- DC1` on nodes in DC2? I
> know we'll get that historical DC1 data, but my concern is about the new
> data that had been written to the DC2 datacenter. Would the rebuild end up
> dropping our post 2019-09-01 data?
>
> Thanks,
> Voytek Jarnot
>


nodetool rebuild on non-empty nodes?

2019-09-11 Thread Voytek Jarnot
Pardon the convoluted scenario, but we face some pretty ridiculous
infrastructure restrictions.

datacenter DC1: nodes containing many years of data written before
2019-09-01 (for example)

datacenter DC2: nodes containing data written after 2019-09-01

The idea is that these are independent clusters. We now connect them into a
multi-DC cluster, and alter our keyspace to replicate to both DCs.

What is the effect of running `nodetool rebuild -- DC1` on nodes in DC2? I
know we'll get that historical DC1 data, but my concern is about the new
data that had been written to the DC2 datacenter. Would the rebuild end up
dropping our post 2019-09-01 data?

Thanks,
Voytek Jarnot


Re: Differing snitches in different datacenters

2019-07-31 Thread Voytek Jarnot
Thanks Paul. Yes - finding a definitive answer is where I'm failing as
well. I think we're probably going to try it and see what happens, but
that's a bit worrisome.

On Mon, Jul 29, 2019 at 3:35 PM Paul Chandler  wrote:

> Hi Voytek,
>
> I looked into this a little while ago, and couldn’t really find a
> definitive answer. We ended up keeping the GossipingPropertyFileSnitch in
> our GCP Datacenter, the only downside that I could see is that you have to
> manually specify the rack and DC. But doing it that way does allow you to
> create a multi vendor cluster if you wished in the future.
>
> I would also be interested if anyone has the definitive answer one this.
>
> Thanks
>
> Paul
> www.redshots.com
>
> On 29 Jul 2019, at 17:06, Voytek Jarnot  wrote:
>
> Just a quick bump - hoping someone can shed some light on whether running
> different snitches in different datacenters is a terrible idea or no. It'd
> be fairly temporary, once the new DC is stood up and nodes are rebuilt, the
> old DC will be decomissioned.
>
> On Thu, Jul 25, 2019 at 12:36 PM Voytek Jarnot 
> wrote:
>
>> Quick and hopefully easy question for the list. Background is existing
>> cluster (1 DC) will be migrated to AWS-hosted cluster via standing up a
>> second datacenter, existing cluster will be subsequently decommissioned.
>>
>> We currently use GossipingPropertyFileSnitch and are thinking about using
>> Ec2MultiRegionSnitch in the new AWS DC - that'd position us nicely if in
>> the future we want to run a multi-DC cluster in AWS. My question is: are
>> there any issues with one DC using GossipingPropertyFileSnitch and the
>> other using Ec2MultiRegionSnitch? This setup would be temporary, existing
>> until the new DC nodes have rebuilt and the old DC is decommissioned.
>>
>> Thanks,
>> Voytek Jarnot
>>
>
>


Re: Differing snitches in different datacenters

2019-07-29 Thread Voytek Jarnot
Just a quick bump - hoping someone can shed some light on whether running
different snitches in different datacenters is a terrible idea or no. It'd
be fairly temporary, once the new DC is stood up and nodes are rebuilt, the
old DC will be decomissioned.

On Thu, Jul 25, 2019 at 12:36 PM Voytek Jarnot 
wrote:

> Quick and hopefully easy question for the list. Background is existing
> cluster (1 DC) will be migrated to AWS-hosted cluster via standing up a
> second datacenter, existing cluster will be subsequently decommissioned.
>
> We currently use GossipingPropertyFileSnitch and are thinking about using
> Ec2MultiRegionSnitch in the new AWS DC - that'd position us nicely if in
> the future we want to run a multi-DC cluster in AWS. My question is: are
> there any issues with one DC using GossipingPropertyFileSnitch and the
> other using Ec2MultiRegionSnitch? This setup would be temporary, existing
> until the new DC nodes have rebuilt and the old DC is decommissioned.
>
> Thanks,
> Voytek Jarnot
>


Differing snitches in different datacenters

2019-07-25 Thread Voytek Jarnot
Quick and hopefully easy question for the list. Background is existing
cluster (1 DC) will be migrated to AWS-hosted cluster via standing up a
second datacenter, existing cluster will be subsequently decommissioned.

We currently use GossipingPropertyFileSnitch and are thinking about using
Ec2MultiRegionSnitch in the new AWS DC - that'd position us nicely if in
the future we want to run a multi-DC cluster in AWS. My question is: are
there any issues with one DC using GossipingPropertyFileSnitch and the
other using Ec2MultiRegionSnitch? This setup would be temporary, existing
until the new DC nodes have rebuilt and the old DC is decommissioned.

Thanks,
Voytek Jarnot


Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
Thank you for the very in-depth reply. Thinking more about it, I think in
my case I'm safe with keeping the cluster name.  It actually took a ton of
firewall work to get these DCs talking to each other in the first place, so
I'm not too concerned about undoing that and having future accidental
contact.

On Thu, Jul 11, 2019 at 10:23 AM Jeff Jirsa  wrote:

> Let's talk about the challenges, then talk about the strategy we'll use to
> do this.
>
> The logic cassandra uses to identify the rest of it's cluster comes down
> to ~3 things:
> - Cluster name (in yaml and system.local)
> - Seeds in your seed provider (probably a list of IPs in the yaml)
> - The known peers in system.peers, which it will use to connect / discover
> on cassandra restarts (also used by clients to connect to the rest of the
> cluster, so it's important it's accurate)
>
> The cluster name is meant to be mostly immutable - we don't provide an
> obvious mechanism to change it, really, because if you change the yaml and
> restart, the database should (last I checked) fail to startup when the yaml
> doesn't match the data in system.local.
>
> Cassandra keeps a list of all of the other hosts it knows about in
> system.peers, and will attempt to reconnect to them as long as it's in
> system.peers or gossip.
>
> Most approaches to do this ignore the cluster name and rely on firewalls
> to separate the two DCs, then nodetool assassinate to get the IPs from
> gossip, but ultimately the two clusters have the same name, and if they
> EVER get reminded of the old IP, they'll re-join each other, and you'll be
> unhappy. For that reason, we probably want to change the cluster name in
> one side or the other to make sure we protect ourself.
>
> Strategy wise, I'd pick one cluster that can take downtime. It won't take
> much, but it'll be more than zero, then approach it with something like the
> following:
>
> - Firewall off the two clusters so they cant talk.
> - Figure out which cluster will keep the name (we'll call it 'old'), and
> one which will change the name ('new')
> - Push a yaml to old that removes all seeds in the 'new' dc
> - Push a yaml to new that removes all seeds in 'old' dc, and has a 'new'
> cluster name in it
> - Alter the schema in old to remove the 'new' dc in replication settings
> - Alter the schema in new to remove the 'old' dc in replication settings
> - In the 'new' hosts, change the cluster name in every single instance of
> system.local ( update system.local set cluster_name='new' where
> key='local'; ) and flush (nodetool flush) on every host
> - Restart the new hosts, they'll come up with a new cluster name, and at
> this point if the firewall is turned off, both clusters will TRY to talk to
> each other, but the different cluster names will prevent it.
> - At that point, you can nodetool removenode / nodetool assassinate the
> 'old' IPs in 'new' and the 'new' IPs in 'old'
> - Finally, check system.peers for any stray leftovers - there have been
> times when system.peers leaked data. Cleanup anything that's wrong.
> - Then remove the firewall rules
>
> Obviously, you want to try this on a test cluster first.
>
>
>
> On Thu, Jul 11, 2019 at 8:03 AM Voytek Jarnot 
> wrote:
>
>> My google-fu is failing me this morning. I'm looking for any tips on
>> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
>> about decomissioning a datacenter, but not much in the way of disconnecting
>> datacenters into individual clusters, but keeping each one as-is data-wise
>> (aside from replication factor, of course).
>>
>> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
>> yes, I know not the recommended config), one keyspace (besides the system
>> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
>> DC in each.
>>
>> Would appreciate any input.
>>
>


Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
Premature send, apologies.

At minimum, I see the following needing to happen:

dc2:
update cluster name in system.local
cassandra.yaml in dc2:
cluster_name: change to new cluster name
seeds: change to point at a couple of local nodes

system_auth, system_distributed, system_traces, and my keyspace replication
factors need altering

I guess the main issues I see are related to the timing of things. Seems
like I need to make sure the DCs are disconnected into separate clusters
before doing an ALTER KEYSPACE (since the ALTER will differ between dc1 and
dc2). That, and getting the nodes in each current DC to "forget" about the
other DC.


On Thu, Jul 11, 2019 at 10:03 AM Voytek Jarnot 
wrote:

> My google-fu is failing me this morning. I'm looking for any tips on
> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
> about decomissioning a datacenter, but not much in the way of disconnecting
> datacenters into individual clusters, but keeping each one as-is data-wise
> (aside from replication factor, of course).
>
> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
> yes, I know not the recommended config), one keyspace (besides the system
> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
> DC in each.
>
> Would appreciate any input.
>


Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
My google-fu is failing me this morning. I'm looking for any tips on
splitting a 2 DC cluster into two separate clusters. I see a lot of docs
about decomissioning a datacenter, but not much in the way of disconnecting
datacenters into individual clusters, but keeping each one as-is data-wise
(aside from replication factor, of course).

Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
yes, I know not the recommended config), one keyspace (besides the system
ones) replicated in both DCs. I'm trying to end up with two clusters with 1
DC in each.

Would appreciate any input.


Ec2MultiRegionSnitch difficulties (3.11.2)

2019-06-27 Thread Voytek Jarnot
Curious if anyone could shed some light on this. Trying to set up a 4-node,
one DC (for now, same region, same AZ, same VPC, etc) cluster in AWS.

All nodes have the following config (everything else basically standard):
cassandra.yaml:
  listen_address: NODE?_PRIVATE_IP
  seeds: "NODE1_ELASTIC_IP"
  endpoint_snitch: Ec2MultiRegionSnitch
cassandra-rackdc.properties:
  empty except prefer_local=true

I've tried setting
  broadcast_address: NODE?_ELASTIC_IP
But that didn't help - and it seems redundant, as it appears that that's
what the Ec2MultiRegionSnitch does anyway.

Node 1 starts up fine, because it's the seed. No other nodes will start,
reporting:
"Exception (java.lang.RuntimeException) encountered during startup: Unable
to gossip with any seeds"

Adding iptables rules to the nodes to translate outgoing packets with
destination of NODE?_ELASTIC_IP to destination of NODE?_PRIVATE_IP solves
the issue, but that seems like a hack.
(For Example: iptables -t nat -A OUTPUT -p tcp -d ELASTIC_IP -j DNAT
--to-destination PRIVATE_IP)

Not sure if I'm missing a config item, or something in AWS is blocking me,
or if 3.11.2 has an issue.

Thanks,
Voytek Jarnot


Re: Expanding from 1 to 2 datacenters

2019-06-26 Thread Voytek Jarnot
Expanding on number 4 below:

I'm not quite sure what the easiest course of action might be. Currently
the 4 existing nodes listen on private IPs and seeds are set to internal
DNS names. Is setting broadcast_address to their public IPs and restarting
a viable solution?

I'm trying to minimize changes to the existing DC, if that means Amazon
charges me for transferring data via public IPs, so be it.


On Wed, Jun 26, 2019, 10:19 AM Voytek Jarnot 
wrote:

> I started an higher-level thread years ago about moving a cluster by
> expanding from 1 to 2 datacenters, replicating over, then decommissioning
> the original DC. Corporate plans being what they are, we're finally getting
> into this; I'm largely following the writeup here:
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>  ,
> but have a few more-specific questions:
>
> current setup: 1 DC, 4 nodes, RF=3, 1 keyspace
> new DC will be 4 nodes as well, RF=3
>
> 1) We currently have 2 seed nodes, I'd like to confirm that the correct
> course is to make 1-2 (let's say 2) of the new-DC nodes seeds as well, and
> update all nodes in both DCs to point at all 4 seeds before I get into
> altering the keyspace.
>
> 2) Prior to altering the replication on my keyspace to include the new DC,
> I do not need/want to create the keyspace in the new DC, correct?
>
> 3) The datastax docs mention the auto_bootstrap=false setting, but don't
> go into much detail - I'm leaning toward setting it to false on all the new
> nodes, sound reasonable?
>
> 4) One of the three environments in which this'll happen is slightly more
> complicated due to the existing DC living in AWS, whereas the new DC will
> be in a different AZ. Do I need to get into switching
> from GossipingPropertyFileSnitch to Ec2MultiRegionSnitch? If so, could
> someone shed a bit of light on that process, and the associated changes
> needed for listen_address and broadcast_address?
>
> Thanks for getting this far,
> Voytek Jarnot
>


Expanding from 1 to 2 datacenters

2019-06-26 Thread Voytek Jarnot
I started an higher-level thread years ago about moving a cluster by
expanding from 1 to 2 datacenters, replicating over, then decommissioning
the original DC. Corporate plans being what they are, we're finally getting
into this; I'm largely following the writeup here:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
,
but have a few more-specific questions:

current setup: 1 DC, 4 nodes, RF=3, 1 keyspace
new DC will be 4 nodes as well, RF=3

1) We currently have 2 seed nodes, I'd like to confirm that the correct
course is to make 1-2 (let's say 2) of the new-DC nodes seeds as well, and
update all nodes in both DCs to point at all 4 seeds before I get into
altering the keyspace.

2) Prior to altering the replication on my keyspace to include the new DC,
I do not need/want to create the keyspace in the new DC, correct?

3) The datastax docs mention the auto_bootstrap=false setting, but don't go
into much detail - I'm leaning toward setting it to false on all the new
nodes, sound reasonable?

4) One of the three environments in which this'll happen is slightly more
complicated due to the existing DC living in AWS, whereas the new DC will
be in a different AZ. Do I need to get into switching
from GossipingPropertyFileSnitch to Ec2MultiRegionSnitch? If so, could
someone shed a bit of light on that process, and the associated changes
needed for listen_address and broadcast_address?

Thanks for getting this far,
Voytek Jarnot


Re: high latency on one node after replacement

2018-03-27 Thread Voytek Jarnot
Have you ruled out EBS snapshot initialization issues (
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html)?

On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra  wrote:

> Hi There -
>
> I have noticed an issue where I consistently see high p999 read latency on
> a node for a few hours after replacing the node. Before replacing the node,
> the p999 read latency is ~30ms, but after it increases to 1-5s. I am
> running C* 3.11.2 in EC2.
>
> I am testing out using EBS snapshots of the /data disk as a backup, so
> that I can replace nodes without having to fully bootstrap the replacement.
> This seems to work ok, except for the latency issue. Some things I have
> noticed:
>
> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
> 'Dropped', while this is going on. There are only a few of these.
> - the logs show warnings like this:
>
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
> with average duration of 235.88ms, 86 have exceeded the configured commit
> interval by an average of 113.66ms
>   and I can see some slow queries in debug.log, but I can't figure out
> what is causing it
> - gc seems normal
>
> Could this have something to do with starting the node with the EBS
> snapshot of the /data directory? My first thought was that this is related
> to the EBS volumes, but it seems too consistent to be actually caused by
> that. The problem is consistent across multiple replacements, and multiple
> EC2 regions.
>
> I appreciate any suggestions!
>
> - Mike
>


Migrating a cluster

2017-05-01 Thread Voytek Jarnot
Have a scenario where it's necessary to migrate a cluster to a different
set of hardware with minimal downtime. Setup is:

Current cluster: 4 nodes, RF 3
New cluster: 6 nodes, RF 3

My initial inclination is to follow this writeup on setting up the 6 new
nodes as a new DC:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

Basically, set up new DC, nodetool rebuild on new nodes to instruct
Cassandra to migrate data, change client to hit new DC, kill original DC.

First question - is this the recommended way to migrate an in-use cluster
to new hardware?

Secondly, on the assumption that it is: That link gives the impression that
DC-aware clients will not hit the "remote" DC - is that the case for the
Java driver? We don't currently explicitly set PoolingOptions
ConnectionsPerHost for HostDistance.REMOTE to 0 - seems like that would be
an important thing to do?

Thank you.


Re: upgrade to Cassandra 3.0.12

2017-04-04 Thread Voytek Jarnot
Multiple versions of python can coexist, the cqlsh shell script will
attempt to execute via a python2.7 executable if it finds one.

On Tue, Apr 4, 2017 at 9:49 AM, Jacob Shadix  wrote:

> I've recently upgraded to 3.0.12 and unable to run CQLSH.
> No appropriate python interpreter found.
>
> The current python version installed is 2.6.6. I realize I need to upgrade
> to 2.7.12 at least, but I also cannot remove the 2.6.6 version. Are there
> any recommendations for installing a newer version of python alongside the
> older release?
>
> -- Jacob Shadix
>


Re: nodetool tablestats reporting local read count of 0, incorrectly

2017-04-03 Thread Voytek Jarnot
Continuing to grasp at straws...

Is it possible that indexing is modifying the read path such that the
tablestats/tablehistograms output is no longer trustworthy?  I notice more
realistic "local read count" numbers on tables which do not utilize SASI.

Would greatly appreciate any thoughts,
Thanks.

On Mon, Apr 3, 2017 at 9:56 AM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Further info - tablehistograms reports zeros for all percentiles for Read
> Latency; tablestats also reports really low numbers for Bloom filter usage
> (3-4 KiB, depending on node, whereas I'd expect orders of magnitude more
> given other - less accessed - tables in this keyspace).  This is the most
> written-to and read-from table in the keyspace, seems to keep up with
> tracking of writes, but not reads.
>
> Full repair on this table is the only thing I can think of; but that's a
> guess and doesn't get me any closer to understanding what has happened.
>
> On Fri, Mar 31, 2017 at 11:11 PM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Cassandra 3.9
>>
>> Have a keyspace with 5 tables, one of which is exhibiting rather poor
>> read performance. In starting an attempt to get to the bottom of the
>> issues, I noticed that, when running nodetool tablestats against the
>> keyspace, that particular table reports "Local read count: 0" on all nodes
>> - which is incorrect.
>>
>> It tallies "Local write count", presumably correctly, as at least it's
>> not 0. Other tables in the keyspace do not exhibit this behavior, as they
>> provide non-zero numbers for both read and write values.
>>
>> Is this perhaps indicative of a deeper issue with this particular table?
>>
>> Thank you.
>>
>
>


Re: nodetool tablestats reporting local read count of 0, incorrectly

2017-04-03 Thread Voytek Jarnot
Further info - tablehistograms reports zeros for all percentiles for Read
Latency; tablestats also reports really low numbers for Bloom filter usage
(3-4 KiB, depending on node, whereas I'd expect orders of magnitude more
given other - less accessed - tables in this keyspace).  This is the most
written-to and read-from table in the keyspace, seems to keep up with
tracking of writes, but not reads.

Full repair on this table is the only thing I can think of; but that's a
guess and doesn't get me any closer to understanding what has happened.

On Fri, Mar 31, 2017 at 11:11 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Cassandra 3.9
>
> Have a keyspace with 5 tables, one of which is exhibiting rather poor read
> performance. In starting an attempt to get to the bottom of the issues, I
> noticed that, when running nodetool tablestats against the keyspace, that
> particular table reports "Local read count: 0" on all nodes - which is
> incorrect.
>
> It tallies "Local write count", presumably correctly, as at least it's not
> 0. Other tables in the keyspace do not exhibit this behavior, as they
> provide non-zero numbers for both read and write values.
>
> Is this perhaps indicative of a deeper issue with this particular table?
>
> Thank you.
>


nodetool tablestats reporting local read count of 0, incorrectly

2017-03-31 Thread Voytek Jarnot
Cassandra 3.9

Have a keyspace with 5 tables, one of which is exhibiting rather poor read
performance. In starting an attempt to get to the bottom of the issues, I
noticed that, when running nodetool tablestats against the keyspace, that
particular table reports "Local read count: 0" on all nodes - which is
incorrect.

It tallies "Local write count", presumably correctly, as at least it's not
0. Other tables in the keyspace do not exhibit this behavior, as they
provide non-zero numbers for both read and write values.

Is this perhaps indicative of a deeper issue with this particular table?

Thank you.


Tracing output regarding number of sstables hit and am I chasing my tail

2017-03-31 Thread Voytek Jarnot
Was about to optimize some queries, given tracing out, but then saw
CASSANDRA-13120 (https://issues.apache.org/jira/browse/CASSANDRA-13120) and
am now wondering if there's nothing to be gained.

We have a table with a (quite simplified/sanitized) structure as such:

created_week_year int,
created_week int,
created_date date,
lots of value columns,
primary key((created_week_year, created_week), created_date, two others)

So, fundamentally time-series data, partitioned by calendar week.

If, for example, a user executes a query covering a 30-day timespan, we
split and parallelize the query by partition, so for a query from
2017-03-01 to 2017-03-31, we'll execute multiple queries as such:

select * from tab where created_week_year=2017 and created_week=13 and
created_date <= '2017-03-31' and created_date >= '2017-03-01';

select * from tab where created_week_year=2017 and created_week=12 and
created_date <= '2017-03-31' and created_date >= '2017-03-01'

and so forth, one query for every week partition.

Notice that we do not bother to narrow the created_date params to match the
created_week begin/end dates - the only thing that changes from query to
query is the created_week_year and created_week.

Performance isn't great in the current setup, and I was thinking a valid
optimization would be to change things such that in addition to specifying
a unique created_week_year and created_week parameter, we also calculate
the start-of-week date and end-of-week date client side as such:

select * from tab where created_week_year=2017 and created_week=13 and
created_date <= '2017-03-31' and created_date >= '2017-03-27'.

I did some tracing in cqlsh, and it does seem like this would help, fewer
sstables are merged in when the date ranges are more-specifically
restricted. However, CASSANDRA-13120 seems to indicate that that is simply
false tracing output, and there are no gains to be had from this
optimization.

Really looking for confirmation one way or another on this. The client-side
change is not all that significant to re-calculate the date-range for each
week partition, but if there nothing to be gained, then we don't need to
waste our time doing it.

Thank you.


Re: Very odd & inconsistent results from SASI query

2017-03-20 Thread Voytek Jarnot
Apologies for the stream-of-consciousness replies, but are the dropped
message stats output by tpstats an accumulation since the node came up, or
are there processes which clear and/or time-out the info?

On Mon, Mar 20, 2017 at 3:18 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> No dropped messages in tpstats on any of the nodes.
>
> On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Appreciate the reply, Kurt.
>>
>> I sanitized it out of the traces, but all trace outputs listed the same
>> node for all three queries (1 working, 2 not working). Read repair chance
>> set to 0.0 as recommended when using TWCS.
>>
>> I'll check tpstats - in this environment, load is not an issue, but
>> network issues may be.
>>
>> On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves <k...@instaclustr.com>
>> wrote:
>>
>>> As secondary indexes are stored individually on each node what you're
>>> suggesting sounds exactly like a consistency issue. the fact that you read
>>> 0 cells on one query implies the node that got the query did not have any
>>> data for the row. The reason you would sometimes see different behaviours
>>> is likely because of read repairs. The fact that the repair guides the
>>> issue pretty much guarantees it's a consistency issue.
>>>
>>> You should check for dropped mutations in tpstats/logs and if they are
>>> occurring try and stop that from happening (probably load related). You
>>> could also try performing reads and writes at LOCAL_QUORUM for stronger
>>> consistency, however note this has a performance/latency impact.
>>>
>>>
>>>
>>
>


Re: Very odd & inconsistent results from SASI query

2017-03-20 Thread Voytek Jarnot
No dropped messages in tpstats on any of the nodes.

On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Appreciate the reply, Kurt.
>
> I sanitized it out of the traces, but all trace outputs listed the same
> node for all three queries (1 working, 2 not working). Read repair chance
> set to 0.0 as recommended when using TWCS.
>
> I'll check tpstats - in this environment, load is not an issue, but
> network issues may be.
>
> On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves <k...@instaclustr.com>
> wrote:
>
>> As secondary indexes are stored individually on each node what you're
>> suggesting sounds exactly like a consistency issue. the fact that you read
>> 0 cells on one query implies the node that got the query did not have any
>> data for the row. The reason you would sometimes see different behaviours
>> is likely because of read repairs. The fact that the repair guides the
>> issue pretty much guarantees it's a consistency issue.
>>
>> You should check for dropped mutations in tpstats/logs and if they are
>> occurring try and stop that from happening (probably load related). You
>> could also try performing reads and writes at LOCAL_QUORUM for stronger
>> consistency, however note this has a performance/latency impact.
>>
>>
>>
>


Re: Very odd & inconsistent results from SASI query

2017-03-20 Thread Voytek Jarnot
Appreciate the reply, Kurt.

I sanitized it out of the traces, but all trace outputs listed the same
node for all three queries (1 working, 2 not working). Read repair chance
set to 0.0 as recommended when using TWCS.

I'll check tpstats - in this environment, load is not an issue, but network
issues may be.

On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves  wrote:

> As secondary indexes are stored individually on each node what you're
> suggesting sounds exactly like a consistency issue. the fact that you read
> 0 cells on one query implies the node that got the query did not have any
> data for the row. The reason you would sometimes see different behaviours
> is likely because of read repairs. The fact that the repair guides the
> issue pretty much guarantees it's a consistency issue.
>
> You should check for dropped mutations in tpstats/logs and if they are
> occurring try and stop that from happening (probably load related). You
> could also try performing reads and writes at LOCAL_QUORUM for stronger
> consistency, however note this has a performance/latency impact.
>
>
>


Re: Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
A wrinkle further confounds the issue: running a repair on the node which
was servicing the queries has cleared things up and all the queries now
work.

That doesn't make a whole lot of sense to me - my assumption was that a
repair shouldn't have fixed it.

On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Cassandra 3.9, 4 nodes, rf=3
>
> Hi folks, we're see 0 results returned from queries that (a) should return
> results, and (b) will return results with minor tweaks.
>
> I've attached the sanitized trace outputs for the following 3 queries (pk1
> and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
> non-key column):
>
> Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
> ALLOW FILTERING;
> Q1 works - it returns a list of records, one of which has
> val1='abcdefghijklmn'.
>
> Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
> 1001 ALLOW FILTERING;
> Q2 does not work - 0 results returned. Only difference to Q1 is one
> additional character provided in LIKE comparison.
>
> Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
> 1001 ALLOW FILTERING;
> Q3 does not work - 0 results returned.
>
> As I've written above, the data set *does* include a record with
> val1='abcdefghijklmn'.
>
> Confounding the issue is that this behavior is inconsistent.  For
> different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2
> do not. Now, that particular behavior I could explain with index/like
> problems, but it is Q3 that sometimes does not work and that's a simply
> equality comparison (although still using the index).
>
> Further confounding the issue is that if my testers run these same queries
> with the same parameters tomorrow, they're likely to work correctly.
>
> Only thing I've been able to glean from tracing execution is that the
> queries that work follow "Executing read..." with "Executing single
> partition query on t1" and so forth,  whereas the queries that don't work
> simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
> with no actual work seemingly done. But that's not helping me narrow this
> down much.
>
> Thanks for your time - appreciate any help.
>


Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
Cassandra 3.9, 4 nodes, rf=3

Hi folks, we're see 0 results returned from queries that (a) should return
results, and (b) will return results with minor tweaks.

I've attached the sanitized trace outputs for the following 3 queries (pk1
and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
non-key column):

Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
ALLOW FILTERING;
Q1 works - it returns a list of records, one of which has
val1='abcdefghijklmn'.

Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
1001 ALLOW FILTERING;
Q2 does not work - 0 results returned. Only difference to Q1 is one
additional character provided in LIKE comparison.

Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
1001 ALLOW FILTERING;
Q3 does not work - 0 results returned.

As I've written above, the data set *does* include a record with
val1='abcdefghijklmn'.

Confounding the issue is that this behavior is inconsistent.  For different
values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not.
Now, that particular behavior I could explain with index/like problems, but
it is Q3 that sometimes does not work and that's a simply equality
comparison (although still using the index).

Further confounding the issue is that if my testers run these same queries
with the same parameters tomorrow, they're likely to work correctly.

Only thing I've been able to glean from tracing execution is that the
queries that work follow "Executing read..." with "Executing single
partition query on t1" and so forth,  whereas the queries that don't work
simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
with no actual work seemingly done. But that's not helping me narrow this
down much.

Thanks for your time - appreciate any help.
Results found query (which include record where val='abcdefghijklmn'):

 Parsing SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >= 
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001 
ALLOW FILTERING; [Native-Transport-Requests-1]

  Preparing 
statement [Native-Transport-Requests-1]
  Index 
mean cardinalities are idx_my_idx:-9223372036854775808. Scanning with 
idx_my_idx. [Native-Transport-Requests-1]

Computing ranges to 
query [Native-Transport-Requests-1]
   Submitting range 
requests on 1 ranges with a concurrency of 1 (-1.08086395E16 rows per range 
expected) [Native-Transport-Requests-1]

Submitted 1 concurrent range 
requests [Native-Transport-Requests-1]

 Executing read on keyspace.t1 
using index idx_my_idx [ReadStage-2]

   Executing 
single-partition query on t1 [ReadStage-2]

 Acquiring 
sstable references [ReadStage-2]

   Key cache 
hit for sstable 2223 [ReadStage-2]

  Skipped 34/35 non-slice-intersecting sstables, included 1 
due to tombstones [ReadStage-2]

   Key cache 
hit for sstable 2221 [ReadStage-2]

Merged data from 
memtables and 2 sstables [ReadStage-2]

Read 1 live and 
0 tombstone cells [ReadStage-2]


   

Re: Queries execution time

2017-01-12 Thread Voytek Jarnot
We use QueryLogger which is baked in to the datastax java driver; gives you
basic query execution times (and bind params) in your logs, can be tweaked
using log levels.

On Thu, Jan 12, 2017 at 12:31 PM, Jonathan Haddad  wrote:

> You're likely to benefit a lot more if you log query times from your
> application, as you can customize the metadata that you add around logging
> to increase its relevancy.
>
> On Thu, Jan 12, 2017 at 10:24 AM Benjamin Roth 
> wrote:
>
>> Hi Salvatore,
>>
>> 1. Cassandra offers tons of metrics through JMX to monitor performance on
>> keyspace and CF level
>> 2. There is a config option to log slow queries, unfortunately JIRA is
>> currently down, so I can't find the ticket with more details
>>
>> 2017-01-12 19:21 GMT+01:00 D. Salvatore :
>>
>> Hi,
>> Does anyone know if there is a way to record in a log file the queries
>> total or partial execution time? I am interested in something similar to
>> the tracing option but on file.
>>
>> Thanks
>> Best Regards
>> Salvatore
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>


Re: Odd behavior when querying using two SASI indexes

2017-01-05 Thread Voytek Jarnot
Opened https://issues.apache.org/jira/browse/CASSANDRA-13105 because it
does seem like this should work.

On Wed, Jan 4, 2017 at 5:53 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Seeing queries return 0 rows incorrectly, running 3.9
>
> Setup:
>
> create table test1(id1 text PRIMARY KEY, val1 text, val2 text);
>
> create custom index test1_idx_val1 on test1(val1) using
> 'org.apache.cassandra.index.sasi.SASIIndex';
> create custom index test1_idx_val2 on test1(val2) using
> 'org.apache.cassandra.index.sasi.SASIIndex';
>
> insert into test1(id1, val1, val2) values ('1', '1val1', '1val2');
> insert into test1(id1, val1, val2) values ('2', '~~', '2val2');
>
> Queries:
>
> (1) select * from test1 where val1 = '~~';
> (2) select * from test1 where val1 < '~~' allow filtering;
> (3) select * from test1 where val2 = '1val2';
> (4) select * from test1 where val1 < '~~' and val2 = '1val2' allow
> filtering;
>
> 1, 2, and 3 all work correctly.  4 does not work.  2, 3, and 4 should
> return the same row (id1='1'); 2 and 3 do, 4 returns 0 rows.
>
> Weird, because you'd think that if 2 works fine, then 4 ought to as well.
>
> Anyone else run into this?  Not sure if I'm breaking a where-clause rule,
> or if I'm running into a bug.
>
> Thanks.
>


Odd behavior when querying using two SASI indexes

2017-01-04 Thread Voytek Jarnot
Seeing queries return 0 rows incorrectly, running 3.9

Setup:

create table test1(id1 text PRIMARY KEY, val1 text, val2 text);

create custom index test1_idx_val1 on test1(val1) using
'org.apache.cassandra.index.sasi.SASIIndex';
create custom index test1_idx_val2 on test1(val2) using
'org.apache.cassandra.index.sasi.SASIIndex';

insert into test1(id1, val1, val2) values ('1', '1val1', '1val2');
insert into test1(id1, val1, val2) values ('2', '~~', '2val2');

Queries:

(1) select * from test1 where val1 = '~~';
(2) select * from test1 where val1 < '~~' allow filtering;
(3) select * from test1 where val2 = '1val2';
(4) select * from test1 where val1 < '~~' and val2 = '1val2' allow
filtering;

1, 2, and 3 all work correctly.  4 does not work.  2, 3, and 4 should
return the same row (id1='1'); 2 and 3 do, 4 returns 0 rows.

Weird, because you'd think that if 2 works fine, then 4 ought to as well.

Anyone else run into this?  Not sure if I'm breaking a where-clause rule,
or if I'm running into a bug.

Thanks.


Re: Read efficiency question

2016-12-30 Thread Voytek Jarnot
Thank you Janne.  Yes, these are random-access (scatter) reads - I've
decided on option 1; having also considered (as you wrote) that it will
never make sense to look at ranges of key3.

On Fri, Dec 30, 2016 at 3:40 AM, Janne Jalkanen <janne.jalka...@ecyrd.com>
wrote:

> In practice, the performance you’re getting is likely to be impacted by
> your reading patterns.  If you do a lot of sequential reads where key1 and
> key2 stay the same, and only key3 varies, then you may be getting better
> peformance out of the second option due to hitting the row and disk caches
> more often. If you are doing a lot of scatter reads, then you’re likely to
> get better performance out of the first option, because the reads will be
> distributed more evenly to multiple nodes.  It also depends on how large
> rows you’re planning to use, as this will directly impact things like
> compaction which has an overall impact of the entire cluster speed.  For
> just a few values of key3, I doubt there would be much difference in
> performance, but if key3 has a cardinality of say, a million, you might be
> better off with option 1.
>
> As always the advice is - benchmark your intended use case - put a few
> hundred gigs of mock data to a cluster, trigger compactions and do perf
> tests for different kinds of read/write loads. :-)
>
> (Though if I didn’t know what my read pattern would be, I’d probably go
> for option 1 purely on a gut feeling if I was sure I would never need range
> queries on key3; shorter rows *usually* are a bit better for performance,
> compaction, etc.  Really wide rows can sometimes be a headache
> operationally.)
>
> May you have energy and success!
> /Janne
>
>
>
> On 28 Dec 2016, at 16:44, Manoj Khangaonkar <khangaon...@gmail.com> wrote:
>
> In the first case, the partitioning is based on key1,key2,key3.
>
> In the second case, partitioning is based on key1 , key2. Additionally you
> have a clustered key key3. This means within a partition you can do range
> queries on key3 efficiently. That is the difference.
>
> regards
>
> On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Wondering if there's a difference when querying by primary key between
>> the two definitions below:
>>
>> primary key ((key1, key2, key3))
>> primary key ((key1, key2), key3)
>>
>> In terms of read speed/efficiency... I don't have much of a reason
>> otherwise to prefer one setup over the other, so would prefer the most
>> efficient for querying.
>>
>> Thanks.
>>
>
>
>
> --
> http://khangaonkar.blogspot.com/
>
>
>


Re: Insert with both TTL and timestamp behavior

2016-12-28 Thread Voytek Jarnot
>It's not clear to me why for your use case you would want to manipulate
the timestamps as you're loading the records unless you're concerned about
conflicting writes getting applied in the correct order.

Simple use-case: want to load historical data, want to use TWCS, want to
use TTL.

Scenario:
Importing data using standard write path (inserts)
Using timestamp to give TWCS something to work with (import records contain
a created-on timestamp from which I populate "using timestamp")
Need records to expire according to TTL
Don't want to calculate TTL for every insert individually (obviously what I
want and what I get differ)
I'm importing in chrono order, so TWCS should be able to keep things from
getting out of hand.

>I think in general timestamp manipulation is *caveat utilitor*.

Yeah; although I'd probably choose stronger words. TWCS (and perhaps DTCS?)
appears to treat writetimes as timestamps; the rest of Cassandra appears to
treat them as integers.


On Wed, Dec 28, 2016 at 2:50 PM, Eric Stevens <migh...@gmail.com> wrote:

> The purpose of timestamps is to guarantee out-of-order conflicting writes
> are resolved as last-write-wins.  Cassandra doesn't really expect you to be
> writing timestamps with wide variations from record to record.  Indeed, if
> you're doing this, it'll violate some of the assumptions in places such as
> time windowed / date tiered compaction.  It's possible to dodge those
> landmines but it would be hard to know if you got it wrong.
>
> I think in general timestamp manipulation is *caveat utilitor*.  It's not
> clear to me why for your use case you would want to manipulate the
> timestamps as you're loading the records unless you're concerned about
> conflicting writes getting applied in the correct order.
>
> Probably worth a footnote in the documentation indicating that if you're
> doing both USING TTL and WITH TIMESTAMP that those don't relate to each
> other.  At rest TTL'd records get written with an expiration timestamp, not
> a delta from the writetime.
>
> On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> It appears as though, when inserting with "using ttl [foo] and timestamp
>> [bar]" that the TTL does not take the provided timestamp into account.
>>
>> In other words, the TTL starts at insert time, not at the time specified
>> by the timestamp.
>>
>> Similarly, if inserting with just "using timestamp [bar]" and relying on
>> the table's default_time_to_live property, the timestamp is again ignored
>> in terms of TTL expiration.
>>
>> Seems like a bug to me, but I'm guessing this is intended behavior?
>>
>> Use-case is importing data (some of it historical) and setting the
>> timestamp manually (based on a timestamp within the data itself). Anyone
>> familiar with any work-arounds that don't rely on calculating a TTL
>> client-side for each record?
>>
>


Insert with both TTL and timestamp behavior

2016-12-28 Thread Voytek Jarnot
It appears as though, when inserting with "using ttl [foo] and timestamp
[bar]" that the TTL does not take the provided timestamp into account.

In other words, the TTL starts at insert time, not at the time specified by
the timestamp.

Similarly, if inserting with just "using timestamp [bar]" and relying on
the table's default_time_to_live property, the timestamp is again ignored
in terms of TTL expiration.

Seems like a bug to me, but I'm guessing this is intended behavior?

Use-case is importing data (some of it historical) and setting the
timestamp manually (based on a timestamp within the data itself). Anyone
familiar with any work-arounds that don't rely on calculating a TTL
client-side for each record?


Re: Read efficiency question

2016-12-27 Thread Voytek Jarnot
Thank you Oskar.  I think you may be missing the double parentheses in the
first example - difference is between partition key of (key1, key2, key3)
and (key1, key2).  With that in mind, I believe your answer would be that
the first example is more efficient?

Is this essentially a case of the coordinator node being able to exactly
pinpoint a row (first example) vs the coordinator node pinpointing the
partition and letting the partition-owning node refine down to the right
row using the clustering key (key3 in the second example)?

On Tue, Dec 27, 2016 at 10:06 AM, Oskar Kjellin <oskar.kjel...@gmail.com>
wrote:

> The second one will be the most efficient.
> How much depends on how unique key1 is.
>
> In the first case everything for the same key1 will be on the same
> partition.  If it's not unique at all that will be very bad.
> In the second case the combo of key1 and key2 will decide what partition.
>
> If you don't ever have to find all key2 for a given key1 I don't see any
> reason to do case 1
>
>
> > On 27 Dec 2016, at 16:42, Voytek Jarnot <voytek.jar...@gmail.com> wrote:
> >
> > Wondering if there's a difference when querying by primary key between
> the two definitions below:
> >
> > primary key ((key1, key2, key3))
> > primary key ((key1, key2), key3)
> >
> > In terms of read speed/efficiency... I don't have much of a reason
> otherwise to prefer one setup over the other, so would prefer the most
> efficient for querying.
> >
> > Thanks.
>


Read efficiency question

2016-12-27 Thread Voytek Jarnot
Wondering if there's a difference when querying by primary key between the
two definitions below:

primary key ((key1, key2, key3))
primary key ((key1, key2), key3)

In terms of read speed/efficiency... I don't have much of a reason
otherwise to prefer one setup over the other, so would prefer the most
efficient for querying.

Thanks.


Re: Not timing out some queries (Java driver)

2016-12-21 Thread Voytek Jarnot
cassandra.yaml has various timeouts such as read_request_timeout,
range_request_timeout, write_request_timeout, etc.  The driver does as well
(via Cluster -> Configuration -> SocketOptions -> setReadTimeoutMillis).

Not sure if you can (or would want to) set them to "forever", but it's a
starting point.

On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar  wrote:

> I have some queries which need to be processed in a consistent manner. I'm
> setting the consistently level = ALL option on these queries.
>
> However, I've noticed that sometimes these queries fail because of a
> timeout (2 seconds).
>
> In my use case, for certain queries, I want them to never time out and
> block until they have been acknowledged by all nodes.
>
> Is that possible thru the Datastax Java driver, or another way?
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Voytek Jarnot
Reading that article the only conclusion I can reach (unless I'm
misreading) is that all the stuff that was never free is still not free -
the change is that Oracle may actually be interested in the fact that some
are using non-free products for free.

Pretty much a non-story, it seems like.

On Tue, Dec 20, 2016 at 11:55 PM, Kant Kodali  wrote:

> Looking at this http://www.theregister.co.uk/2016/12/16/oracle_
> targets_java_users_non_compliance/?mt=1481919461669 I don't know why
> Cassandra recommends Oracle JVM?
>
> JVM is a great piece of software but I would like to stay away from Oracle
> as much as possible. Oracle is just horrible the way they are dealing with
> Java in General.
>
>
>


Re: Choosing a compaction strategy (TWCS)

2016-12-21 Thread Voytek Jarnot
Just want to bump this thread if possible... having trouble ferreting out
the specifics of TWCS configuration, google's not being particularly
helpful.

If tombstone compactions are disabled by default in TWCS, does one enable
them by setting values for tombstone_compaction_interval and
tombstone_threshold?  Or am I was off - is there more to it?



On Sat, Dec 17, 2016 at 11:08 AM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Thanks again.
>
> I swear I'd look this up instead, but my google-fu is failing me
> completely ... That said, I presume that they're enabled by setting values
> for tombstone_compaction_interval and tombstone_threshold?  Or is there
> more to it?
>
> On Fri, Dec 16, 2016 at 10:41 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> With the caveat that tombstone compactions are disabled by default in
>> TWCS (and DTCS)
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Dec 16, 2016, at 8:34 PM, Voytek Jarnot <voytek.jar...@gmail.com>
>> wrote:
>>
>> Gotcha.  "never compacted" has an implicit asterisk referencing
>> tombstone_compaction_interval and tombstone_threshold, sounds like.  More
>> of a "never compacted" via strategy selection, but eligible for
>> tombstone-triggered compaction.
>>
>> On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> Tombstone compaction subproperties can handle tombstone removal for you
>>> (you’ll set a ratio of tombstones worth compacting away – for example, 80%,
>>> and set an interval to prevent continuous compaction – for example, 24
>>> hours, and then anytime there’s no other work to do, if there’s an sstable
>>> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a
>>> single sstable compaction).
>>>
>>>
>>>
>>> -  Jeff
>>>
>>>
>>>
>>> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>> *Date: *Friday, December 16, 2016 at 7:34 PM
>>>
>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>>
>>>
>>>
>>> Thanks again, Jeff.
>>>
>>>
>>>
>>> Thinking about this some more, I'm wondering if I'm overthinking or if
>>> there's a potential issue:
>>>
>>>
>>>
>>> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
>>> some (relatively small percentage) of my records - am I going to be leaving
>>> tombstones around all over the place?  My noob-read on this is that TWCS
>>> will not compact tables comprised of records older than 7 days (
>>> https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dm
>>> lHowDataMaintain.html#dmlHowDataMaintain__twcs
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_3.x_cassandra_dml_dmlHowDataMaintain.html-23dmlHowDataMaintain-5F-5Ftwcs=DgMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=L4TzIyjP32pjustWSsxm3_fFNKA2QK84X7oK9lBKhvo=De9MdTP7WY7skYPIsIt8ZM5G0cMAquAkSFun7iqCV_g=>),
>>> but Cassandra will not evict my tombstones until 7 days + consideration for
>>> gc_grace_seconds have passed ... resulting in no tombstone removal (?).
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>> The issue is that your partitions will likely be in 2 sstables instead
>>> of “theoretically” 1. In practice, they’re probably going to bleed into 2
>>> anyway (memTable flush to sstable isn’t going to happen exactly when the
>>> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>>>
>>>
>>>
>>> -  Jeff
>>>
>>>
>>>
>>> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>> *Date: *Friday, December 16, 2016 at 11:12 AM
>>>
>>>
>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>>
>>>
>>>
>>> Thank you Jeff - always nice to hear straight from the source.
>>>
>>>
>

Re: Choosing a compaction strategy (TWCS)

2016-12-17 Thread Voytek Jarnot
Thanks again.

I swear I'd look this up instead, but my google-fu is failing me completely
... That said, I presume that they're enabled by setting values for
tombstone_compaction_interval
and tombstone_threshold?  Or is there more to it?

On Fri, Dec 16, 2016 at 10:41 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> With the caveat that tombstone compactions are disabled by default in TWCS
> (and DTCS)
>
> --
> Jeff Jirsa
>
>
> On Dec 16, 2016, at 8:34 PM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
> Gotcha.  "never compacted" has an implicit asterisk referencing
> tombstone_compaction_interval and tombstone_threshold, sounds like.  More
> of a "never compacted" via strategy selection, but eligible for
> tombstone-triggered compaction.
>
> On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> Tombstone compaction subproperties can handle tombstone removal for you
>> (you’ll set a ratio of tombstones worth compacting away – for example, 80%,
>> and set an interval to prevent continuous compaction – for example, 24
>> hours, and then anytime there’s no other work to do, if there’s an sstable
>> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a
>> single sstable compaction).
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Friday, December 16, 2016 at 7:34 PM
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>
>>
>>
>> Thanks again, Jeff.
>>
>>
>>
>> Thinking about this some more, I'm wondering if I'm overthinking or if
>> there's a potential issue:
>>
>>
>>
>> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
>> some (relatively small percentage) of my records - am I going to be leaving
>> tombstones around all over the place?  My noob-read on this is that TWCS
>> will not compact tables comprised of records older than 7 days (
>> https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dm
>> lHowDataMaintain.html#dmlHowDataMaintain__twcs
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_3.x_cassandra_dml_dmlHowDataMaintain.html-23dmlHowDataMaintain-5F-5Ftwcs=DgMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=L4TzIyjP32pjustWSsxm3_fFNKA2QK84X7oK9lBKhvo=De9MdTP7WY7skYPIsIt8ZM5G0cMAquAkSFun7iqCV_g=>),
>> but Cassandra will not evict my tombstones until 7 days + consideration for
>> gc_grace_seconds have passed ... resulting in no tombstone removal (?).
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>> The issue is that your partitions will likely be in 2 sstables instead of
>> “theoretically” 1. In practice, they’re probably going to bleed into 2
>> anyway (memTable flush to sstable isn’t going to happen exactly when the
>> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Friday, December 16, 2016 at 11:12 AM
>>
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>
>>
>>
>> Thank you Jeff - always nice to hear straight from the source.
>>
>>
>>
>> Any issues you can see with 3 (my calendar-week bucket not aligning with
>> the arbitrary 7-day window)? Or am I confused (I'd put money on this
>> option, but I've been wrong once or twice before)?
>>
>>
>>
>> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>> I skipped over the more important question  - loading data in. Two
>> options:
>>
>> 1)   Load data in order through the normal writepath and use “USING
>> TIMESTAMP” to set the timestamp, or
>>
>> 2)   Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
>> then sstableloader them into the cluster.
>>
>>
>>
>> Either way, try not to mix writes of old data and new data in the
>> “normal” write path  at th

Re: Choosing a compaction strategy (TWCS)

2016-12-16 Thread Voytek Jarnot
Gotcha.  "never compacted" has an implicit asterisk referencing
tombstone_compaction_interval and tombstone_threshold, sounds like.  More
of a "never compacted" via strategy selection, but eligible for
tombstone-triggered compaction.

On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Tombstone compaction subproperties can handle tombstone removal for you
> (you’ll set a ratio of tombstones worth compacting away – for example, 80%,
> and set an interval to prevent continuous compaction – for example, 24
> hours, and then anytime there’s no other work to do, if there’s an sstable
> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a
> single sstable compaction).
>
>
>
> -  Jeff
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 7:34 PM
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> Thanks again, Jeff.
>
>
>
> Thinking about this some more, I'm wondering if I'm overthinking or if
> there's a potential issue:
>
>
>
> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
> some (relatively small percentage) of my records - am I going to be leaving
> tombstones around all over the place?  My noob-read on this is that TWCS
> will not compact tables comprised of records older than 7 days (
> https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/
> dmlHowDataMaintain.html#dmlHowDataMaintain__twcs
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_3.x_cassandra_dml_dmlHowDataMaintain.html-23dmlHowDataMaintain-5F-5Ftwcs=DgMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=L4TzIyjP32pjustWSsxm3_fFNKA2QK84X7oK9lBKhvo=De9MdTP7WY7skYPIsIt8ZM5G0cMAquAkSFun7iqCV_g=>),
> but Cassandra will not evict my tombstones until 7 days + consideration for
> gc_grace_seconds have passed ... resulting in no tombstone removal (?).
>
>
>
>
>
>
>
> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The issue is that your partitions will likely be in 2 sstables instead of
> “theoretically” 1. In practice, they’re probably going to bleed into 2
> anyway (memTable flush to sstable isn’t going to happen exactly when the
> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>
>
>
> -  Jeff
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 11:12 AM
>
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> Thank you Jeff - always nice to hear straight from the source.
>
>
>
> Any issues you can see with 3 (my calendar-week bucket not aligning with
> the arbitrary 7-day window)? Or am I confused (I'd put money on this
> option, but I've been wrong once or twice before)?
>
>
>
> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> I skipped over the more important question  - loading data in. Two options:
>
> 1)   Load data in order through the normal writepath and use “USING
> TIMESTAMP” to set the timestamp, or
>
> 2)   Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
> then sstableloader them into the cluster.
>
>
>
> Either way, try not to mix writes of old data and new data in the “normal”
> write path  at the same time, even if you write “USING TIMESTAMP”, because
> it’ll get mixed in the memTable, and flushed into the same sstable – it
> won’t kill you, but if you can avoid it, avoid it.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Jeff Jirsa <jeff.ji...@crowdstrike.com>
> *Date: *Friday, December 16, 2016 at 10:47 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> With a 10 year retention, just ignore the target sstable count (I should
> remove that guidance, to be honest), and go for a 1 week window to match
> your partition size. 520 sstables on disk isn’t going to hurt you as long
> as you’re not reading from all of them, and with a partition-per-week the
> bloom filter is going to make things nice and easy for you.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Voytek Jarno

Re: Choosing a compaction strategy (TWCS)

2016-12-16 Thread Voytek Jarnot
Thanks again, Jeff.

Thinking about this some more, I'm wondering if I'm overthinking or if
there's a potential issue:

If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
some (relatively small percentage) of my records - am I going to be leaving
tombstones around all over the place?  My noob-read on this is that TWCS
will not compact tables comprised of records older than 7 days (
https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs),
but Cassandra will not evict my tombstones until 7 days + consideration for
gc_grace_seconds have passed ... resulting in no tombstone removal (?).



On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> The issue is that your partitions will likely be in 2 sstables instead of
> “theoretically” 1. In practice, they’re probably going to bleed into 2
> anyway (memTable flush to sstable isn’t going to happen exactly when the
> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>
>
>
> -      Jeff
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 11:12 AM
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> Thank you Jeff - always nice to hear straight from the source.
>
>
>
> Any issues you can see with 3 (my calendar-week bucket not aligning with
> the arbitrary 7-day window)? Or am I confused (I'd put money on this
> option, but I've been wrong once or twice before)?
>
>
>
> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> I skipped over the more important question  - loading data in. Two options:
>
> 1)   Load data in order through the normal writepath and use “USING
> TIMESTAMP” to set the timestamp, or
>
> 2)   Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
> then sstableloader them into the cluster.
>
>
>
> Either way, try not to mix writes of old data and new data in the “normal”
> write path  at the same time, even if you write “USING TIMESTAMP”, because
> it’ll get mixed in the memTable, and flushed into the same sstable – it
> won’t kill you, but if you can avoid it, avoid it.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Jeff Jirsa <jeff.ji...@crowdstrike.com>
> *Date: *Friday, December 16, 2016 at 10:47 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> With a 10 year retention, just ignore the target sstable count (I should
> remove that guidance, to be honest), and go for a 1 week window to match
> your partition size. 520 sstables on disk isn’t going to hurt you as long
> as you’re not reading from all of them, and with a partition-per-week the
> bloom filter is going to make things nice and easy for you.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 10:37 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Choosing a compaction strategy (TWCS)
>
>
>
> Scenario:
>
> Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra
> tables, basically time-series - think log or auditing.  Retention is 10
> years, but greater than 95% of reads will occur on data written within the
> last year. 7 day TTL used on a small percentage of the records, majority do
> not use TTL. Other than the aforementioned TTL, and the 10-year purge, no
> updates or deletes are done.
>
>
>
> Seems like TWCS is the right choice, but I have a few questions/concerns:
>
>
>
> 1) I'll be bulk loading a few years of existing data upon deployment - any
> issues with that?  I assume using "with timestamp" when inserting this data
> will be mandatory if I choose TWCS?
>
>
>
> 2) I read here (https://github.com/jeffjirsa/twcs/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jeffjirsa_twcs_=DgMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=mFIirekKLKHUeQ-Jop1JR4gIXJx8KEQcmtgh15v0Vqo=m0O2Z6XGdat-bljOtiuWnVblHHtyJM4TKZ80mhwVBDs=>)
> that "You should target fewer than 50 buckets per table based on your TTL."
> That's going to be a tough goal with a 10 year retention ... can anyone
> speak to how important this target really is?
>
>
>
> 3) If I'm bucketing my data with week/year (i.e., partition on year, week
> - so today would be in 2016, 50), it seems like a natural fit for
> compaction_window_size would be 7 days, but I'm thinking my calendar-based
> weeks will never align with TWCS 7-day-period weeks anyway - am I missing
> something there?
>
>
>
> I'd appreciate any other thoughts on compaction and/or twcs.
>
>
>
> Thanks
>
>
>


Re: Choosing a compaction strategy (TWCS)

2016-12-16 Thread Voytek Jarnot
Thank you Jeff - always nice to hear straight from the source.

Any issues you can see with 3 (my calendar-week bucket not aligning with
the arbitrary 7-day window)? Or am I confused (I'd put money on this
option, but I've been wrong once or twice before)?

On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> I skipped over the more important question  - loading data in. Two options:
>
> 1)   Load data in order through the normal writepath and use “USING
> TIMESTAMP” to set the timestamp, or
>
> 2)   Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
> then sstableloader them into the cluster.
>
>
>
> Either way, try not to mix writes of old data and new data in the “normal”
> write path  at the same time, even if you write “USING TIMESTAMP”, because
> it’ll get mixed in the memTable, and flushed into the same sstable – it
> won’t kill you, but if you can avoid it, avoid it.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Jeff Jirsa <jeff.ji...@crowdstrike.com>
> *Date: *Friday, December 16, 2016 at 10:47 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> With a 10 year retention, just ignore the target sstable count (I should
> remove that guidance, to be honest), and go for a 1 week window to match
> your partition size. 520 sstables on disk isn’t going to hurt you as long
> as you’re not reading from all of them, and with a partition-per-week the
> bloom filter is going to make things nice and easy for you.
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 10:37 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Choosing a compaction strategy (TWCS)
>
>
>
> Scenario:
>
> Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra
> tables, basically time-series - think log or auditing.  Retention is 10
> years, but greater than 95% of reads will occur on data written within the
> last year. 7 day TTL used on a small percentage of the records, majority do
> not use TTL. Other than the aforementioned TTL, and the 10-year purge, no
> updates or deletes are done.
>
>
>
> Seems like TWCS is the right choice, but I have a few questions/concerns:
>
>
>
> 1) I'll be bulk loading a few years of existing data upon deployment - any
> issues with that?  I assume using "with timestamp" when inserting this data
> will be mandatory if I choose TWCS?
>
>
>
> 2) I read here (https://github.com/jeffjirsa/twcs/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jeffjirsa_twcs_=DgMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=mFIirekKLKHUeQ-Jop1JR4gIXJx8KEQcmtgh15v0Vqo=m0O2Z6XGdat-bljOtiuWnVblHHtyJM4TKZ80mhwVBDs=>)
> that "You should target fewer than 50 buckets per table based on your TTL."
> That's going to be a tough goal with a 10 year retention ... can anyone
> speak to how important this target really is?
>
>
>
> 3) If I'm bucketing my data with week/year (i.e., partition on year, week
> - so today would be in 2016, 50), it seems like a natural fit for
> compaction_window_size would be 7 days, but I'm thinking my calendar-based
> weeks will never align with TWCS 7-day-period weeks anyway - am I missing
> something there?
>
>
>
> I'd appreciate any other thoughts on compaction and/or twcs.
>
>
>
> Thanks
>


Choosing a compaction strategy (TWCS)

2016-12-16 Thread Voytek Jarnot
Scenario:
Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra
tables, basically time-series - think log or auditing.  Retention is 10
years, but greater than 95% of reads will occur on data written within the
last year. 7 day TTL used on a small percentage of the records, majority do
not use TTL. Other than the aforementioned TTL, and the 10-year purge, no
updates or deletes are done.

Seems like TWCS is the right choice, but I have a few questions/concerns:

1) I'll be bulk loading a few years of existing data upon deployment - any
issues with that?  I assume using "with timestamp" when inserting this data
will be mandatory if I choose TWCS?

2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target
fewer than 50 buckets per table based on your TTL." That's going to be a
tough goal with a 10 year retention ... can anyone speak to how important
this target really is?

3) If I'm bucketing my data with week/year (i.e., partition on year, week -
so today would be in 2016, 50), it seems like a natural fit for
compaction_window_size would be 7 days, but I'm thinking my calendar-based
weeks will never align with TWCS 7-day-period weeks anyway - am I missing
something there?

I'd appreciate any other thoughts on compaction and/or twcs.

Thanks


Preventing data showing up in Cassandra logs

2016-12-09 Thread Voytek Jarnot
I'm happy with INFO level logging from Cassandra in principle, but am
wondering if there's any option to prevent Cassandra from exposing data in
the logs (without necessarily changing log levels)?

SSTableIndex.open logs minTerm, maxTerm, minKey, and maxKey which expose
data, as does the "Writing large partition" WARNing (exposes the partition
key).  There are probably others.  The large partition warning would
probably be mostly useless without logging the partition key, but - in any
case - there are usage scenarios where data in logs is prohibited.

Thanks,
Voytek Jarnot


Re: Batch size warnings

2016-12-09 Thread Voytek Jarnot
Right you are, thank you Cody.

Wondering if I may reach out again to the list and ask a similar question
in a more specific way:

Scenario: Cassandra 3.x, small cluster (<10 nodes), 1 DC

Is a batch warn threshold of 50kb, and average batch sizes in the 40kb
range a recipe for regret?  Should we be considering a solution such as
Cody elucidated earlier in the thread, or am I over-worrying the issue?


On Wed, Dec 7, 2016 at 4:08 PM, Cody Yancey <yan...@uber.com> wrote:

> There is a disconnect between write.3 and write.4, but it can only affect
> performance, not consistency. The presence or absence of a row's txnUUID in
> the IncompleteTransactions table is the ultimate source of truth, and rows
> whose txnUUID are not null will be checked against that truth in the read
> path.
>
> And yes, it is a good point, failures with this model will accumulate and
> degrade performance if you never clear out old failed transactions. The
> tables we have that use this generally use TTLs so we don't really care as
> long as irrecoverable transaction failures are very rare.
>
> Thanks,
> Cody
>
> On Wed, Dec 7, 2016 at 1:56 PM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Appreciate the long writeup Cody.
>>
>> Yeah, we're good with temporary inconsistency (thankfully) as well.  I'm
>> going to try to ride the batch train and hope it doesn't derail - our load
>> is fairly static (or, more precisely, increase in load is fairly slow and
>> can be projected).
>>
>> Enjoyed your two-phase commit text.  Presumably one would also have some
>> cleanup implementation that culls any failed updates (write.5) which could
>> be identified in read.3 / read.4?  Still a disconnect possible between
>> write.3 and write.4, but there's always something...
>>
>> We're insert-only (well, with some deletes via TTL, but anyway), so
>> that's somewhat tempting, but I'd rather not prematurely optimize.  Unless,
>> of course, anyone's got experience such that "batches over XXkb are
>> definitely going to be a problem".
>>
>> Appreciate everyone's time.
>> --Voytek Jarnot
>>
>> On Wed, Dec 7, 2016 at 11:31 AM, Cody Yancey <yan...@uber.com> wrote:
>>
>>> Hi Voytek,
>>> I think the way you are using it is definitely the canonical way.
>>> Unfortunately, as you learned, there are some gotchas. We tried
>>> substantially increasing the batch size and it worked for a while, until we
>>> reached new scale, and we increased it again, and so forth. It works, but
>>> soon you start getting write timeouts, lots of them. And the thing about
>>> multi-partition batch statements is that they offer atomicity, but not
>>> isolation. This means your database can temporarily be in an inconsistent
>>> state while writes are propagating to the various machines.
>>>
>>> For our use case, we could deal with temporary inconsistency, as long as
>>> it was for a strictly bounded period of time, on the order of a few
>>> seconds. Unfortunately, as with all things eventually consistent, it
>>> degrades to "totally inconsistent" when your database is under heavy load
>>> and the time-bounds expand beyond what the application can handle. When a
>>> batch write times out, it often still succeeds (eventually) but your tables
>>> can be inconsistent for
>>>
>>> minutes, even while nodetool status shows all nodes up and normal.
>>>
>>> But there is another way, that requires us to take a page from our RDBMS
>>> ancestors' book: multi-phase commit.
>>>
>>> Similar to logged batch writes, multi-phase commit patterns typically
>>> entail some write amplification cost for the benefit of stronger
>>> consistency guarantees across isolatable units (in Cassandra's case,
>>> *partitions*). However, multi-phase commit offers stronger guarantees
>>> that batch writes, and ALL of the additional write load is completely
>>> distributed as per your load-balancing policy, where as batch writes all go
>>> through one coordinator node, then get written in their entirety to the
>>> batch log on two or three nodes, and then get dispersed in a distributed
>>> fashion from there.
>>>
>>> A typical two-phase commit pattern looks like this:
>>>
>>> The Write Path
>>>
>>>1. The client code chooses a random UUID.
>>>2. The client writes the UUID into the IncompleteTransactions table,
>>>which only has one column, the transactionUUID.
>>>3. The client makes all of the inserts involved in t

Re: Batch size warnings

2016-12-07 Thread Voytek Jarnot
Appreciate the long writeup Cody.

Yeah, we're good with temporary inconsistency (thankfully) as well.  I'm
going to try to ride the batch train and hope it doesn't derail - our load
is fairly static (or, more precisely, increase in load is fairly slow and
can be projected).

Enjoyed your two-phase commit text.  Presumably one would also have some
cleanup implementation that culls any failed updates (write.5) which could
be identified in read.3 / read.4?  Still a disconnect possible between
write.3 and write.4, but there's always something...

We're insert-only (well, with some deletes via TTL, but anyway), so that's
somewhat tempting, but I'd rather not prematurely optimize.  Unless, of
course, anyone's got experience such that "batches over XXkb are definitely
going to be a problem".

Appreciate everyone's time.
--Voytek Jarnot

On Wed, Dec 7, 2016 at 11:31 AM, Cody Yancey <yan...@uber.com> wrote:

> Hi Voytek,
> I think the way you are using it is definitely the canonical way.
> Unfortunately, as you learned, there are some gotchas. We tried
> substantially increasing the batch size and it worked for a while, until we
> reached new scale, and we increased it again, and so forth. It works, but
> soon you start getting write timeouts, lots of them. And the thing about
> multi-partition batch statements is that they offer atomicity, but not
> isolation. This means your database can temporarily be in an inconsistent
> state while writes are propagating to the various machines.
>
> For our use case, we could deal with temporary inconsistency, as long as
> it was for a strictly bounded period of time, on the order of a few
> seconds. Unfortunately, as with all things eventually consistent, it
> degrades to "totally inconsistent" when your database is under heavy load
> and the time-bounds expand beyond what the application can handle. When a
> batch write times out, it often still succeeds (eventually) but your tables
> can be inconsistent for
>
> minutes, even while nodetool status shows all nodes up and normal.
>
> But there is another way, that requires us to take a page from our RDBMS
> ancestors' book: multi-phase commit.
>
> Similar to logged batch writes, multi-phase commit patterns typically
> entail some write amplification cost for the benefit of stronger
> consistency guarantees across isolatable units (in Cassandra's case,
> *partitions*). However, multi-phase commit offers stronger guarantees
> that batch writes, and ALL of the additional write load is completely
> distributed as per your load-balancing policy, where as batch writes all go
> through one coordinator node, then get written in their entirety to the
> batch log on two or three nodes, and then get dispersed in a distributed
> fashion from there.
>
> A typical two-phase commit pattern looks like this:
>
> The Write Path
>
>1. The client code chooses a random UUID.
>2. The client writes the UUID into the IncompleteTransactions table,
>which only has one column, the transactionUUID.
>3. The client makes all of the inserts involved in the transaction, IN
>PARALLEL, with the transactionUUID duplicated in every inserted row.
>4. The client deletes the UUID from IncompleteTransactions table.
>5. The client makes parallel updates to all of the rows it inserted,
>IN PARALLEL, setting the transactionUUID to null.
>
> The Read Path
>
>1. The client reads some rows from a partition. If this particular
>client request can handle extraneous rows, you are done. If not, read on to
>step #2.
>2. The client gathers the set of unique transactionUUIDs. In the main
>case, they've all been deleted by step #5 in the Write Path. If not, go to
>#3.
>3. For remaining transactionUUIDs (which should be a very small
>number), query the IncompleteTransactions table.
>4. The client code culls rows where the transactionUUID existed in the
>IncompleteTransactions table.
>
> This is just an example, one that is reasonably performant for
> ledger-style non-updated inserts. For transactions involving updates to
> possibly existing data, more effort is required, generally the client needs
> to be smart enough to merge updates based on a timestamp, with a periodic
> batch job that cleans out obsolete inserts. If it feels like reinventing
> the wheel, that's because it is. But it just might be the quickest path to
> what you need.
>
> Thanks,
> Cody
>
> On Wed, Dec 7, 2016 at 10:15 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> I have been circling around a thought process over batches. Now that
>> Cassandra has aggregating functions, it might be possible write a type of
>> record that has an END_OF_BATCH type marker

Re: Batch size warnings

2016-12-07 Thread Voytek Jarnot
Been about a month since I have up on it, but it was very much related to
the stuff you're dealing with ... Basically Cassandra just stepping on its
own er, tripping over its own feet streaming MVs.

On Dec 7, 2016 10:45 AM, "Benjamin Roth" <benjamin.r...@jaumo.com> wrote:

> I meant the mv thing
>
> Am 07.12.2016 17:27 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>
>> Sure, about which part?
>>
>> default batch size warning is 5kb
>> I've increased it to 30kb, and will need to increase to 40kb (8x default
>> setting) to avoid WARN log messages about batch sizes.  I do realize it's
>> just a WARNing, but may as well avoid those if I can configure it out.
>> That said, having to increase it so substantially (and we're only dealing
>> with 5 tables) is making me wonder if I'm not taking the correct approach
>> in terms of using batches to guarantee atomicity.
>>
>> On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> Could you please be more specific?
>>>
>>> Am 07.12.2016 17:10 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>>>
>>>> Should've mentioned - running 3.9.  Also - please do not recommend MVs:
>>>> I tried, they're broken, we punted.
>>>>
>>>> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <voytek.jar...@gmail.com
>>>> > wrote:
>>>>
>>>>> The low default value for batch_size_warn_threshold_in_kb is making
>>>>> me wonder if I'm perhaps approaching the problem of atomicity in a
>>>>> non-ideal fashion.
>>>>>
>>>>> With one data set duplicated/denormalized into 5 tables to support
>>>>> queries, we use batches to ensure inserts make it to all or 0 tables.  
>>>>> This
>>>>> works fine, but I've had to bump the warn threshold and fail threshold
>>>>> substantially (8x higher for the warn threshold).  This - in turn - makes
>>>>> me wonder, with a default setting so low, if I'm not solving this problem
>>>>> in the canonical/standard way.
>>>>>
>>>>> Mostly just looking for confirmation that we're not unintentionally
>>>>> doing something weird...
>>>>>
>>>>
>>>>
>>


Re: Batch size warnings

2016-12-07 Thread Voytek Jarnot
Sure, about which part?

default batch size warning is 5kb
I've increased it to 30kb, and will need to increase to 40kb (8x default
setting) to avoid WARN log messages about batch sizes.  I do realize it's
just a WARNing, but may as well avoid those if I can configure it out.
That said, having to increase it so substantially (and we're only dealing
with 5 tables) is making me wonder if I'm not taking the correct approach
in terms of using batches to guarantee atomicity.

On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> Could you please be more specific?
>
> Am 07.12.2016 17:10 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>
>> Should've mentioned - running 3.9.  Also - please do not recommend MVs: I
>> tried, they're broken, we punted.
>>
>> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <voytek.jar...@gmail.com>
>> wrote:
>>
>>> The low default value for batch_size_warn_threshold_in_kb is making me
>>> wonder if I'm perhaps approaching the problem of atomicity in a non-ideal
>>> fashion.
>>>
>>> With one data set duplicated/denormalized into 5 tables to support
>>> queries, we use batches to ensure inserts make it to all or 0 tables.  This
>>> works fine, but I've had to bump the warn threshold and fail threshold
>>> substantially (8x higher for the warn threshold).  This - in turn - makes
>>> me wonder, with a default setting so low, if I'm not solving this problem
>>> in the canonical/standard way.
>>>
>>> Mostly just looking for confirmation that we're not unintentionally
>>> doing something weird...
>>>
>>
>>


Batch size warnings

2016-12-07 Thread Voytek Jarnot
The low default value for batch_size_warn_threshold_in_kb is making me
wonder if I'm perhaps approaching the problem of atomicity in a non-ideal
fashion.

With one data set duplicated/denormalized into 5 tables to support queries,
we use batches to ensure inserts make it to all or 0 tables.  This works
fine, but I've had to bump the warn threshold and fail threshold
substantially (8x higher for the warn threshold).  This - in turn - makes
me wonder, with a default setting so low, if I'm not solving this problem
in the canonical/standard way.

Mostly just looking for confirmation that we're not unintentionally doing
something weird...


Re: Batch size warnings

2016-12-07 Thread Voytek Jarnot
Should've mentioned - running 3.9.  Also - please do not recommend MVs: I
tried, they're broken, we punted.

On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> The low default value for batch_size_warn_threshold_in_kb is making me
> wonder if I'm perhaps approaching the problem of atomicity in a non-ideal
> fashion.
>
> With one data set duplicated/denormalized into 5 tables to support
> queries, we use batches to ensure inserts make it to all or 0 tables.  This
> works fine, but I've had to bump the warn threshold and fail threshold
> substantially (8x higher for the warn threshold).  This - in turn - makes
> me wonder, with a default setting so low, if I'm not solving this problem
> in the canonical/standard way.
>
> Mostly just looking for confirmation that we're not unintentionally doing
> something weird...
>


Re: SASI index creation assertion error

2016-11-05 Thread Voytek Jarnot
Indeed.  I did throw a comment on 11990 - not sure if that triggers emails
to those participants, but was hoping someone would take a look.

On Sat, Nov 5, 2016 at 2:26 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> So from code review, the error message you get from the log is coming from
> the CASSANDRA-11990:  https://github.com/ifesdjeen/cassandra/commit/
> dc4ae57f452e19adbe5a6a2c85f8a4b5a24d4103#diff-
> eae81aa3b81f9b1e07b109c446447a50R357
>
> Now, it's just the consequence of the problem (throwing an assertion
> error), we have to dig further to understand why we fall into this situation
>
> On Sat, Nov 5, 2016 at 5:15 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
>> Can you file a Jira for this? Would be good to make sure 3.10 doesn't get
>> released with this bug.
>> On Fri, Nov 4, 2016 at 6:11 PM Voytek Jarnot <voytek.jar...@gmail.com>
>> wrote:
>>
>>> Thought I'd follow-up to myself, in case anyone else comes across this
>>> problem.  I found a reasonably easy test case to reproduce the problem:
>>>
>>> This works in 3.9, but doesn't work in 3.10-snapshot:
>>>
>>> CREATE KEYSPACE vjtest WITH replication = {'class': 'SimpleStrategy',
>>> 'replication_factor': '1'};
>>> use vjtest ;
>>> create table tester(id1 text, id2 text, id3 text, val1 text, primary
>>> key((id1, id2), id3));
>>> create custom index tester_idx_val1 on tester(val1) using '
>>> org.apache.cassandra.index.sasi.SASIIndex';
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','1-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','2-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','3-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','4-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','5-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','6-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','7-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','8-3','asdf');
>>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','9-3','asdf');
>>>
>>> That's it - when Cassandra tries to flush all hell breaks loose (well,
>>> maaybe not, but an unhandled error gets logged).  Also, the index doesn't
>>> actually work subsequently.
>>>
>>> On Fri, Nov 4, 2016 at 3:58 PM, Voytek Jarnot <voytek.jar...@gmail.com>
>>> wrote:
>>>
>>> Wondering if anyone has encountered the same...
>>>
>>> Full story and stacktraces below, short version is that creating a SASI
>>> index fails for me when running a 3.10-SNAPSHOT build. One caveat: creating
>>> the index on an empty table doesn't fail; however, soon after I start
>>> pumping data into the table similar problems occur.
>>>
>>> I created CASSANDRA-12877 for this, but am beginning to suspect it might
>>> be related to CASSANDRA-11990.  The thing that's throwing me is that I
>>> can't seem to duplicate this with a simple test table.
>>>
>>> Background:
>>>
>>> Ended up building/loading a 3.10-SNAPSHOT to try to get past
>>> CASSANDRA-11670, CASSANDRA-12223, and CASSANDRA-12689.
>>>
>>> 1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
>>> 2) created keyspace (SimpleStrategy, RF 1)
>>> 3) created table: (simplified version below, many more valX columns
>>> present)
>>>
>>> CREATE TABLE test_table (
>>> id1 text,
>>> id2 text,
>>> id3 date,
>>> id4 timestamp,
>>> id5 text,
>>> val1 text,
>>> val2 text,
>>> val3 text,
>>> task_id text,
>>> val4 text,
>>> val5 text,
>>> PRIMARY KEY ((id1, id2), id3, id4, id5)
>>> ) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)
>>>
>>> 4) created materialized view:
>>>
>>> CREATE MATERIALIZED VIEW test_table_by_task_id AS
>>> SELECT *
>>> FROM test_table
>>> WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND
>>> id4 IS NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
>>> PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
>>> WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5
>>> ASC)
>>>
>>> 5) inserted 27 million "rows" (i.e., unique values for id5)
>>> 6) create index attempt
>&

Re: SASI index creation assertion error

2016-11-05 Thread Voytek Jarnot
Yep, already done: https://issues.apache.org/jira/browse/CASSANDRA-12877

On Fri, Nov 4, 2016 at 11:15 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Can you file a Jira for this? Would be good to make sure 3.10 doesn't get
> released with this bug.
> On Fri, Nov 4, 2016 at 6:11 PM Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Thought I'd follow-up to myself, in case anyone else comes across this
>> problem.  I found a reasonably easy test case to reproduce the problem:
>>
>> This works in 3.9, but doesn't work in 3.10-snapshot:
>>
>> CREATE KEYSPACE vjtest WITH replication = {'class': 'SimpleStrategy',
>> 'replication_factor': '1'};
>> use vjtest ;
>> create table tester(id1 text, id2 text, id3 text, val1 text, primary
>> key((id1, id2), id3));
>> create custom index tester_idx_val1 on tester(val1) using
>> 'org.apache.cassandra.index.sasi.SASIIndex';
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','1-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','2-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','3-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','4-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','5-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','6-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','7-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','8-3','asdf');
>> insert into tester(id1,id2,id3, val1) values ('1-1','1-2','9-3','asdf');
>>
>> That's it - when Cassandra tries to flush all hell breaks loose (well,
>> maaybe not, but an unhandled error gets logged).  Also, the index doesn't
>> actually work subsequently.
>>
>> On Fri, Nov 4, 2016 at 3:58 PM, Voytek Jarnot <voytek.jar...@gmail.com>
>> wrote:
>>
>> Wondering if anyone has encountered the same...
>>
>> Full story and stacktraces below, short version is that creating a SASI
>> index fails for me when running a 3.10-SNAPSHOT build. One caveat: creating
>> the index on an empty table doesn't fail; however, soon after I start
>> pumping data into the table similar problems occur.
>>
>> I created CASSANDRA-12877 for this, but am beginning to suspect it might
>> be related to CASSANDRA-11990.  The thing that's throwing me is that I
>> can't seem to duplicate this with a simple test table.
>>
>> Background:
>>
>> Ended up building/loading a 3.10-SNAPSHOT to try to get past
>> CASSANDRA-11670, CASSANDRA-12223, and CASSANDRA-12689.
>>
>> 1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
>> 2) created keyspace (SimpleStrategy, RF 1)
>> 3) created table: (simplified version below, many more valX columns
>> present)
>>
>> CREATE TABLE test_table (
>> id1 text,
>> id2 text,
>> id3 date,
>> id4 timestamp,
>> id5 text,
>> val1 text,
>> val2 text,
>> val3 text,
>> task_id text,
>> val4 text,
>> val5 text,
>> PRIMARY KEY ((id1, id2), id3, id4, id5)
>> ) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)
>>
>> 4) created materialized view:
>>
>> CREATE MATERIALIZED VIEW test_table_by_task_id AS
>> SELECT *
>> FROM test_table
>> WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND id4
>> IS NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
>> PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
>> WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5
>> ASC)
>>
>> 5) inserted 27 million "rows" (i.e., unique values for id5)
>> 6) create index attempt
>>
>> create custom index idx_test_table_val5 on test_table(val5) using
>> 'org.apache.cassandra.index.sasi.SASIIndex';
>>
>> 7) no error in cqlsh, but system.log shows many of the following:
>>
>> INFO  [SASI-General:1] 2016-11-04 13:46:47,578
>> PerSSTableIndexWriter.java:277 - Flushed index segment
>> /mydir/cassandra/apache-cassandra-3.10-SNAPSHOT/data/
>> data/mykeyspace/test_table-133dd090a2b411e6b1bf6df2a1af06
>> f0/mc-149-big-SI_idx_test_table_val5.db_0, took 869 ms.
>> ERROR [SASI-General:1] 2016-11-04 13:46:47,584 CassandraDaemon.java:229 -
>> Exception in thread Thread[SASI-General:1,5,main]
>> java.lang.AssertionError: cannot have more than 8 overflow collisions per
>> leaf, but had: 12
>> at org.apache.cassandra.index.sasi.disk.
>> AbstractTokenTreeBuilder$Leaf.createOverflowEntry(
>&g

Re: SASI index creation assertion error

2016-11-04 Thread Voytek Jarnot
Thought I'd follow-up to myself, in case anyone else comes across this
problem.  I found a reasonably easy test case to reproduce the problem:

This works in 3.9, but doesn't work in 3.10-snapshot:

CREATE KEYSPACE vjtest WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'};
use vjtest ;
create table tester(id1 text, id2 text, id3 text, val1 text, primary
key((id1, id2), id3));
create custom index tester_idx_val1 on tester(val1) using
'org.apache.cassandra.index.sasi.SASIIndex';
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','1-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','2-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','3-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','4-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','5-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','6-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','7-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','8-3','asdf');
insert into tester(id1,id2,id3, val1) values ('1-1','1-2','9-3','asdf');

That's it - when Cassandra tries to flush all hell breaks loose (well,
maaybe not, but an unhandled error gets logged).  Also, the index doesn't
actually work subsequently.

On Fri, Nov 4, 2016 at 3:58 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:

> Wondering if anyone has encountered the same...
>
> Full story and stacktraces below, short version is that creating a SASI
> index fails for me when running a 3.10-SNAPSHOT build. One caveat: creating
> the index on an empty table doesn't fail; however, soon after I start
> pumping data into the table similar problems occur.
>
> I created CASSANDRA-12877 for this, but am beginning to suspect it might
> be related to CASSANDRA-11990.  The thing that's throwing me is that I
> can't seem to duplicate this with a simple test table.
>
> Background:
>
> Ended up building/loading a 3.10-SNAPSHOT to try to get past
> CASSANDRA-11670, CASSANDRA-12223, and CASSANDRA-12689.
>
> 1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
> 2) created keyspace (SimpleStrategy, RF 1)
> 3) created table: (simplified version below, many more valX columns
> present)
>
> CREATE TABLE test_table (
> id1 text,
> id2 text,
> id3 date,
> id4 timestamp,
> id5 text,
> val1 text,
> val2 text,
> val3 text,
> task_id text,
> val4 text,
> val5 text,
> PRIMARY KEY ((id1, id2), id3, id4, id5)
> ) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)
>
> 4) created materialized view:
>
> CREATE MATERIALIZED VIEW test_table_by_task_id AS
> SELECT *
> FROM test_table
> WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND id4
> IS NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
> PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
> WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5
> ASC)
>
> 5) inserted 27 million "rows" (i.e., unique values for id5)
> 6) create index attempt
>
> create custom index idx_test_table_val5 on test_table(val5) using
> 'org.apache.cassandra.index.sasi.SASIIndex';
>
> 7) no error in cqlsh, but system.log shows many of the following:
>
> INFO  [SASI-General:1] 2016-11-04 13:46:47,578
> PerSSTableIndexWriter.java:277 - Flushed index segment
> /mydir/cassandra/apache-cassandra-3.10-SNAPSHOT/data/
> data/mykeyspace/test_table-133dd090a2b411e6b1bf6df2a1af06
> f0/mc-149-big-SI_idx_test_table_val5.db_0, took 869 ms.
> ERROR [SASI-General:1] 2016-11-04 13:46:47,584 CassandraDaemon.java:229 -
> Exception in thread Thread[SASI-General:1,5,main]
> java.lang.AssertionError: cannot have more than 8 overflow collisions per
> leaf, but had: 12
> at org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.
> createOverflowEntry(AbstractTokenTreeBuilder.java:357)
> ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
> at org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.
> createEntry(AbstractTokenTreeBuilder.java:346) ~[apache-cassandra-3.10-
> SNAPSHOT.jar:3.10-SNAPSHOT]
> at org.apache.cassandra.index.sasi.disk.DynamicTokenTreeBuilder$
> DynamicLeaf.serializeData(DynamicTokenTreeBuilder.java:180)
> ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
> at org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.
> serialize(AbstractTokenTreeBuilder.java:306) ~[apache-cassandra-3.10-
> SNAPSHOT.jar:3.10-SNAPSHOT]
> at org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder.
> write(AbstractTokenTreeBuilder.java:90) ~[apache-cassandra-3.10-
> SNAPSHOT.jar:3.10-SNAPSHOT]
> at org.apache.cassandra.index.sasi.disk.OnDi

SASI index creation assertion error

2016-11-04 Thread Voytek Jarnot
Wondering if anyone has encountered the same...

Full story and stacktraces below, short version is that creating a SASI
index fails for me when running a 3.10-SNAPSHOT build. One caveat: creating
the index on an empty table doesn't fail; however, soon after I start
pumping data into the table similar problems occur.

I created CASSANDRA-12877 for this, but am beginning to suspect it might be
related to CASSANDRA-11990.  The thing that's throwing me is that I can't
seem to duplicate this with a simple test table.

Background:

Ended up building/loading a 3.10-SNAPSHOT to try to get past
CASSANDRA-11670, CASSANDRA-12223, and CASSANDRA-12689.

1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
2) created keyspace (SimpleStrategy, RF 1)
3) created table: (simplified version below, many more valX columns present)

CREATE TABLE test_table (
id1 text,
id2 text,
id3 date,
id4 timestamp,
id5 text,
val1 text,
val2 text,
val3 text,
task_id text,
val4 text,
val5 text,
PRIMARY KEY ((id1, id2), id3, id4, id5)
) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)

4) created materialized view:

CREATE MATERIALIZED VIEW test_table_by_task_id AS
SELECT *
FROM test_table
WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND id4
IS NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5 ASC)

5) inserted 27 million "rows" (i.e., unique values for id5)
6) create index attempt

create custom index idx_test_table_val5 on test_table(val5) using
'org.apache.cassandra.index.sasi.SASIIndex';

7) no error in cqlsh, but system.log shows many of the following:

INFO  [SASI-General:1] 2016-11-04 13:46:47,578
PerSSTableIndexWriter.java:277 - Flushed index segment
/mydir/cassandra/apache-cassandra-3.10-SNAPSHOT/data/data/mykeyspace/test_table-133dd090a2b411e6b1bf6df2a1af06f0/mc-149-big-SI_idx_test_table_val5.db_0,
took 869 ms.
ERROR [SASI-General:1] 2016-11-04 13:46:47,584 CassandraDaemon.java:229 -
Exception in thread Thread[SASI-General:1,5,main]
java.lang.AssertionError: cannot have more than 8 overflow collisions per
leaf, but had: 12
at
org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createOverflowEntry(AbstractTokenTreeBuilder.java:357)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createEntry(AbstractTokenTreeBuilder.java:346)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.DynamicTokenTreeBuilder$DynamicLeaf.serializeData(DynamicTokenTreeBuilder.java:180)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.serialize(AbstractTokenTreeBuilder.java:306)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder.write(AbstractTokenTreeBuilder.java:90)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableDataBlock.flushAndClear(OnDiskIndexBuilder.java:629)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.flush(OnDiskIndexBuilder.java:446)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.add(OnDiskIndexBuilder.java:433)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.addTerm(OnDiskIndexBuilder.java:207)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:293)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:258)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:241)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at
org.apache.cassandra.index.sasi.disk.PerSSTableIndexWriter$Index.lambda$scheduleSegmentFlush$0(PerSSTableIndexWriter.java:267)
~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

As well as some of these:

ERROR [CompactionExecutor:3] 2016-11-04 13:49:13,142 DataTracker.java:168 -
Can't open index file at

Re: Mix and match SASI with standard indexes not allowed?

2016-10-29 Thread Voytek Jarnot
Right.  I don't mind that SASI is not implemented on collections and OR is
not supported... well, I do, but that's neither here nor there.

The surprise was that including both a SASI and a non-SASI in the same
where clause causes a runtime exception.

On Sat, Oct 29, 2016 at 3:13 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> There was a plan to replace the native 2nd index by SASI. There are a
> couple of JIRA on SASI to support collection & OR clause but all of them
> are blocked by some internal refactoring. I don't know the current status
> of it. Pavel is quite idle recently
>
> On Sat, Oct 29, 2016 at 3:25 AM, Voytek Jarnot <voytek.jar...@gmail.com>
> wrote:
>
>> Scenario (running 3.9, by the way):
>>
>> CREATE TABLE atc_test1.idxtest (
>> pk text PRIMARY KEY,
>> col1 text,
>> col2 text);
>> CREATE CUSTOM INDEX idx2 ON atc_test1.idxtest (col2) USING '
>> org.apache.cassandra.index.sasi.SASIIndex';
>> CREATE INDEX idx1 ON atc_test1.idxtest (col1);
>>
>> Queries:
>>
>> Works: select * from idxtest where col1='asdf';
>> Works: select * from idxtest where col2='asdf';
>>
>> Does not work: select * from idxtest where col1='asdf' and col2='asdf'
>> allow filtering;
>>
>> Cassandra logs the following:
>> java.lang.ClassCastException: org.apache.cassandra.index.int
>> ernal.composites.RegularColumnIndex cannot be cast to
>> org.apache.cassandra.index.sasi.SASIIndex
>>
>> I'm guessing the takeaway is that once you go SASI you're committed to
>> only SASI?  Unfortunately - in our case - the non-SASI index is in place
>> because one can't do a SASI index on a map column.
>>
>> Questions:
>>
>> Anyone else encounter this?  Any workarounds you're willing to share?
>>
>> Seems like throwing a runtime exception is not a terrific way of handling
>> this... makes me wonder if there's something amiss with my particular
>> instance/config or if this is simply the way it is...
>>
>
>


Mix and match SASI with standard indexes not allowed?

2016-10-28 Thread Voytek Jarnot
Scenario (running 3.9, by the way):

CREATE TABLE atc_test1.idxtest (
pk text PRIMARY KEY,
col1 text,
col2 text);
CREATE CUSTOM INDEX idx2 ON atc_test1.idxtest (col2) USING
'org.apache.cassandra.index.sasi.SASIIndex';
CREATE INDEX idx1 ON atc_test1.idxtest (col1);

Queries:

Works: select * from idxtest where col1='asdf';
Works: select * from idxtest where col2='asdf';

Does not work: select * from idxtest where col1='asdf' and col2='asdf'
allow filtering;

Cassandra logs the following:
java.lang.ClassCastException:
org.apache.cassandra.index.internal.composites.RegularColumnIndex cannot be
cast to org.apache.cassandra.index.sasi.SASIIndex

I'm guessing the takeaway is that once you go SASI you're committed to only
SASI?  Unfortunately - in our case - the non-SASI index is in place because
one can't do a SASI index on a map column.

Questions:

Anyone else encounter this?  Any workarounds you're willing to share?

Seems like throwing a runtime exception is not a terrific way of handling
this... makes me wonder if there's something amiss with my particular
instance/config or if this is simply the way it is...