Re: sasi index question (read timeout on many selects)
Btw: They break incremental repair if you use CDC: https://issues.apache. org/jira/browse/CASSANDRA-12888 Not only when using CDC! You shouldn't use incremental repairs with MVs. Never (right now). 2017-02-16 17:42 GMT+01:00 Jonathan Haddad : > My advice to avoid them is based on the issues that have been filed in > Jira. Benjamin Roth is one of the only people talking about his MV usage, > and has filed a few JIRAs discussing their problems when bootstrapping new > nodes, as well as issues repairing. > > https://issues.apache.org/jira/browse/CASSANDRA-12730? > jql=project%20%3D%20CASSANDRA%20and%20reporter%20%3D% > 20brstgt%20and%20text%20~%20%22materialized%22 > > They also can't be altered: https://issues.apache.org/jira/browse/ > CASSANDRA-9736 > > They may be less performant than managing the data yourself: > https://issues.apache.org/jira/browse/CASSANDRA-10295, https:// > issues.apache.org/jira/browse/CASSANDRA-10307 > > They're not as flexible as your own tables: https://issues.apache. > org/jira/browse/CASSANDRA-9928, https://issues.apache.org/ > jira/browse/CASSANDRA-11194, https://issues.apache.org/jira/ > browse/CASSANDRA-12463 > > They break incremental repair if you use CDC: https://issues.apache. > org/jira/browse/CASSANDRA-12888 > > I don't know why DataStax advises using them. Perhaps ask them? > > Jon > > On Thu, Feb 16, 2017 at 7:57 AM Micha wrote: > >> >> >> On 16.02.2017 16:33, Jonathan Haddad wrote: >> > >> > Regarding MVs, do not use the ones that shipped with 3.x. They're not >> > ready for production. Manage it yourself by using a second table and >> > inserting a second record there. >> > >> >> Out of interest... there is a slight discrepance between the advice not >> to use mv and the docu about the feature on the datastax side. Or do I >> have to use another cassandra version (instead of 3.9)? >> >> -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: sasi index question (read timeout on many selects)
My advice to avoid them is based on the issues that have been filed in Jira. Benjamin Roth is one of the only people talking about his MV usage, and has filed a few JIRAs discussing their problems when bootstrapping new nodes, as well as issues repairing. https://issues.apache.org/jira/browse/CASSANDRA-12730?jql=project%20%3D%20CASSANDRA%20and%20reporter%20%3D%20brstgt%20and%20text%20~%20%22materialized%22 They also can't be altered: https://issues.apache.org/jira/browse/CASSANDRA-9736 They may be less performant than managing the data yourself: https://issues.apache.org/jira/browse/CASSANDRA-10295, https://issues.apache.org/jira/browse/CASSANDRA-10307 They're not as flexible as your own tables: https://issues.apache.org/jira/browse/CASSANDRA-9928, https://issues.apache.org/jira/browse/CASSANDRA-11194, https://issues.apache.org/jira/browse/CASSANDRA-12463 They break incremental repair if you use CDC: https://issues.apache.org/jira/browse/CASSANDRA-12888 I don't know why DataStax advises using them. Perhaps ask them? Jon On Thu, Feb 16, 2017 at 7:57 AM Micha wrote: > > > On 16.02.2017 16:33, Jonathan Haddad wrote: > > > > Regarding MVs, do not use the ones that shipped with 3.x. They're not > > ready for production. Manage it yourself by using a second table and > > inserting a second record there. > > > > Out of interest... there is a slight discrepance between the advice not > to use mv and the docu about the feature on the datastax side. Or do I > have to use another cassandra version (instead of 3.9)? > >
Re: sasi index question (read timeout on many selects)
On 16.02.2017 16:33, Jonathan Haddad wrote: > > Regarding MVs, do not use the ones that shipped with 3.x. They're not > ready for production. Manage it yourself by using a second table and > inserting a second record there. > Out of interest... there is a slight discrepance between the advice not to use mv and the docu about the feature on the datastax side. Or do I have to use another cassandra version (instead of 3.9)?
Re: sasi index question (read timeout on many selects)
On 16.02.2017 16:33, Jonathan Haddad wrote: > I agree w/ DuyHai regarding the index. The use case described here is a > terrible one for SASI indexes. > > Regarding MVs, do not use the ones that shipped with 3.x. They're not > ready for production. Manage it yourself by using a second table and > inserting a second record there. yes, thanks for pointing this out. Michael
Re: sasi index question (read timeout on many selects)
I agree w/ DuyHai regarding the index. The use case described here is a terrible one for SASI indexes. Regarding MVs, do not use the ones that shipped with 3.x. They're not ready for production. Manage it yourself by using a second table and inserting a second record there. On Thu, Feb 16, 2017 at 7:06 AM DuyHai Doan wrote: > Using MV and put id as partition key is your best bet right now. SASI will > be too expensive for this simple use case > > On Thu, Feb 16, 2017 at 3:21 PM, Micha wrote: > > > > it's like having a table (sha256 blob primary key, id timeuuid, data1 > text, ., ) > > So both, sha256 and id are unique. > I would like to query *either* with sha256 *or* with id. > > I thought this can be done with a sasi index, but it has to be done with > a second table (manual way) or with a mv with id as partition key. > > On 16.02.2017 15:11, Benjamin Roth wrote: > > No matter what has to be indexed here, the preferrable way is most > > probably denormalization instead of another index. > > it's rather manual inserting the data with another partition key or make > a mv for with the other key. > > >
Re: sasi index question (read timeout on many selects)
Using MV and put id as partition key is your best bet right now. SASI will be too expensive for this simple use case On Thu, Feb 16, 2017 at 3:21 PM, Micha wrote: > > > it's like having a table (sha256 blob primary key, id timeuuid, data1 > text, ., ) > > So both, sha256 and id are unique. > I would like to query *either* with sha256 *or* with id. > > I thought this can be done with a sasi index, but it has to be done with > a second table (manual way) or with a mv with id as partition key. > > On 16.02.2017 15:11, Benjamin Roth wrote: > > No matter what has to be indexed here, the preferrable way is most > > probably denormalization instead of another index. > > it's rather manual inserting the data with another partition key or make > a mv for with the other key. > >
Re: sasi index question (read timeout on many selects)
it's like having a table (sha256 blob primary key, id timeuuid, data1 text, ., ) So both, sha256 and id are unique. I would like to query *either* with sha256 *or* with id. I thought this can be done with a sasi index, but it has to be done with a second table (manual way) or with a mv with id as partition key. On 16.02.2017 15:11, Benjamin Roth wrote: > No matter what has to be indexed here, the preferrable way is most > probably denormalization instead of another index. it's rather manual inserting the data with another partition key or make a mv for with the other key.
Re: sasi index question (read timeout on many selects)
[image: Inline image 1] On Thu, Feb 16, 2017 at 3:08 PM, Micha wrote: > > > On 16.02.2017 14:30, DuyHai Doan wrote: > > Why indexing BLOB data ? It does not make any sense > > My partition key is a secure hash sum, I don't index a blob. > > > > >
Re: sasi index question (read timeout on many selects)
No matter what has to be indexed here, the preferrable way is most probably denormalization instead of another index. 2017-02-16 15:09 GMT+01:00 DuyHai Doan : > [image: Inline image 1] > > On Thu, Feb 16, 2017 at 3:08 PM, Micha wrote: > >> >> >> On 16.02.2017 14:30, DuyHai Doan wrote: >> > Why indexing BLOB data ? It does not make any sense >> >> My partition key is a secure hash sum, I don't index a blob. >> >> >> >> >> > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: sasi index question (read timeout on many selects)
On 16.02.2017 14:30, DuyHai Doan wrote: > Why indexing BLOB data ? It does not make any sense My partition key is a secure hash sum, I don't index a blob.
Re: sasi index question (read timeout on many selects)
Why indexing BLOB data ? It does not make any sense "I thought sasi index is globally held, in contrast to the normal secondary index.." --> Who said that ? It's just wrong On Thu, Feb 16, 2017 at 1:50 PM, Micha wrote: > Hi, > > > my table has (among others) three columns, which are unique blobs. > So I made the first column the partition key and created two sasi > indices for the two other columns. > > After inserting ca 90m records I'm not able to query a bunch of rows > (sending 1 selects to the cluster) using only a sasi index. After a > few seconds I get timeouts. > > I have read the documents about the sasi index but I don't get why this > happens. Is this because I don't include the partition key in the query? > > I thought sasi index is globally held, in contrast to the normal > secondary index.. > > > thanks for helping, > Michael > >
sasi index question (read timeout on many selects)
Hi, my table has (among others) three columns, which are unique blobs. So I made the first column the partition key and created two sasi indices for the two other columns. After inserting ca 90m records I'm not able to query a bunch of rows (sending 1 selects to the cluster) using only a sasi index. After a few seconds I get timeouts. I have read the documents about the sasi index but I don't get why this happens. Is this because I don't include the partition key in the query? I thought sasi index is globally held, in contrast to the normal secondary index.. thanks for helping, Michael