secondary index on static column

2017-01-27 Thread Micha
Hi, I'm quite new to cassandra and allthough there is much info on the net, sometimes I cannot find the solution to a problem. In this case, I have a second index on a static column and I don't understand the answer I get from my select. A cut down version of the table is: create table demo (id

Re: secondary index on static column

2017-02-02 Thread Micha
> > Maybe your dataset is incorrect (try on a new table) or you hit a bug. > > Best, > > Romain > > > > Le Vendredi 27 janvier 2017 9h44, Micha a écrit : > > > Hi, > > I'm quite new to cassandra and allthough there is much info on the net, &g

ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Micha
Hi, my table has a sha-1 sum as partition key. Would in this case the ByteOrdered partitioner be a better choice than the Murmur3partitioner, since the keys are quite random? cheers, Michael

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Micha
AM Edward Capriolo <mailto:edlinuxg...@gmail.com>> wrote: > > Probably best to avoid bop even if you are aflready hashing keys > yourself. What do you do when checksuma collide? It is possible right? > > On Saturday, February 11, 2017, Micha <mailto:m

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-12 Thread Micha
Am 11.02.2017 um 23:56 schrieb Jonathan Haddad: > The time it takes to calculate the hash is so insignificant that it > doesn't even remotely come close to justifying all the drawbacks. yes, most tasks (at least for me) are not cpu bound but io and network bound > You can, of course, benchmar

sasi index question (read timeout on many selects)

2017-02-16 Thread Micha
Hi, my table has (among others) three columns, which are unique blobs. So I made the first column the partition key and created two sasi indices for the two other columns. After inserting ca 90m records I'm not able to query a bunch of rows (sending 1 selects to the cluster) using only a sas

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha
On 16.02.2017 14:30, DuyHai Doan wrote: > Why indexing BLOB data ? It does not make any sense My partition key is a secure hash sum, I don't index a blob.

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha
it's like having a table (sha256 blob primary key, id timeuuid, data1 text, ., ) So both, sha256 and id are unique. I would like to query *either* with sha256 *or* with id. I thought this can be done with a sasi index, but it has to be done with a second table (manual way) or with a mv with

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha
On 16.02.2017 16:33, Jonathan Haddad wrote: > I agree w/ DuyHai regarding the index. The use case described here is a > terrible one for SASI indexes. > > Regarding MVs, do not use the ones that shipped with 3.x. They're not > ready for production. Manage it yourself by using a second table a

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha
On 16.02.2017 16:33, Jonathan Haddad wrote: > > Regarding MVs, do not use the ones that shipped with 3.x. They're not > ready for production. Manage it yourself by using a second table and > inserting a second record there. > Out of interest... there is a slight discrepance between the advic

recovering from failed repair , cassandra 3.10

2017-05-31 Thread Micha
Hi, after failed repair on a three node cluster all nodes were down. It cannot start, since it finds a mismatch in a mc_txn_anticompactionafterrepair log file: "got ADD " "expected "ADD:..." The two log files are different: one has "ADD, ADD; REMOVE, REMOVE, COMMIT" the other is missing an "

Re: recovering from failed repair , cassandra 3.10

2017-05-31 Thread Micha
/cassandra/data/KEYSPACE/TABLE-8e40c6b0f4fa11e6a7912b3358087dc0/mc-4241-big,1495845618000,8][2443235315] REMOVE:[/data/1/cassandra/data/KEYSPACE/TABLE-8e40c6b0f4fa11e6a7912b3358087dc0/mc-4249-big,1495856254000,8][681858089] COMMIT:[,0,0][2613697770] On 31.05.2017 11:10, Oleksandr Shulgin wrote: > On We

jbod disk usage unequal

2017-06-29 Thread Micha
Hi, I use a jbod setup (2 * 1TB) and the distribution is a little bit unequal on my three nodes: 270MB and 540MB 150 and 580 290 and 500 SStable size varies between 2GB and 130GB. Is is possible to move sstables from one disk to another to balance the disk usage? Otherwise is a raid-0 setup the

Re: jbod disk usage unequal

2017-07-04 Thread Micha
thanks for answering, On 03.07.2017 20:01, Jeff Jirsa wrote: > > > Is there a reason you feel it's required, other than being bothered by the > fact that they're not equal? Just out of interest. I'm not sure if it would spread the io better between the disks if the files are spread more even

error 1300 from csv export

2017-07-10 Thread Micha
Hi, I got some errors from a csv export of a table. They are of the form: "Error for (number-1, number-2): ReadFailure Error from server: code=1300 ... " At the end "Exported 650 ranges out of 658 total, some records might be missing" Is there a way to start the export only for the failed rang

Re: error 1300 from csv export

2017-07-10 Thread Micha
Sorry for the noise, somehow overread the copy option BEGINTOKEN and ENDTOKEN.. Michael On 10.07.2017 13:11, Micha wrote: > Hi, > > I got some errors from a csv export of a table. > They are of the form: > "Error for (number-1, number-2): ReadFailure Error from

adding nodes to a cluster and changing rf

2017-07-13 Thread Micha
Hi, I want to extend my cluster (C* 3.9) from three nodes with RF 2 to seven nodes with RF 3. Is there a preferable way to do this? For example: setting "auto_bootstrap: true" and bootstrapping each new node at a time? setting "auto_bootstrap: false" , starting all new nodes at once and then "

secondary index use case

2017-07-20 Thread Micha
Hi, even after reading much about secondary index usage I'm not sure if I have the correct use case for it. My table will contain about 150'000'000 records (each about 2KB data). There are two uuids used to identify a row. One uuid is unique for each row, the other uuid is something like a groupi

UndeclaredThrowableException, C* 3.11

2017-08-02 Thread Micha
Hi, has someone experienced this? I added a fourth node to my cluster, after the boostrap I changed RP from 2 to 3 and ran nodetool repair on the new node. A few hours later the repair command exited with the UndeclaredThrowableException and the node was down. In the logs I don't see a reason

Re: UndeclaredThrowableException, C* 3.11

2017-08-02 Thread Micha
ok, thanks, so I'll just start it again... On 02.08.2017 11:51, kurt greaves wrote: > If the repair command failed, repair also failed. Regarding % repaired, > no it's unlikely you will see 100% repaired after a single repair. Maybe > after a few consecutive repairs with no data load you might g

rebuild constantly fails, 3.11

2017-08-08 Thread Micha
Hi, it seems I'm not able to add add 3 node dc to a 3 node dc. After starting the rebuild on a new node, nodetool netstats show it will receive 1200 files from node-1 and 5000 from node-2. The stream from node-1 completes but the stream from node-2 allways fails, after sending ca 4000 files. Afte

Re: rebuild constantly fails, 3.11

2017-08-08 Thread Micha
no, I have left it at the default value of 24hours. I've read about adjusting phi_convict_threshold, but I haven't done this yet as the network is stable. maybe I set this to 10. On 08.08.2017 15:24, ZAIDI, ASAD A wrote: > Is there any chance you've set streaming_socket_timeout_in_ms parameter s

Re: rebuild constantly fails, 3.11

2017-08-08 Thread Micha
t detailing the failure error so we can > have better idea of the nature failure. > Adjusting phi_convict_threshold may yet be another shot in the dark when we > don’t know what is causing the failure and network is supposedly stable. > > ~Asad > > > > -----Original M

effect of partition size

2017-12-11 Thread Micha
Hi, What are the effects of large partitions? I have a few tables which have partitions sizes as: 95% 24000 98% 42000 99% 85000 Max 82000 So, should I redesign the schema to get this max smaller or doesn't it matter much, since 99% of the partitions are <= 85000 ? Thanks for answerin

Re: effect of partition size

2017-12-11 Thread Micha
ok, thanks for the answer. So the better approach here is to adjust the table schema to get the partition size to around 100MB max. This means using a partition key with multiple parts and making more selects instead of one when querying the data (which may increase parallelism). Michael ---