copy to stdout fails in cqlsh

2017-01-31 Thread Micha
Hi,

with cqlsh running on one of the cluster machines I get the following
error when issuing

use my_keyspace;
copy demo to stdout;

Error:
() got an unexpected keyword argument 'encoding'

Seems like a python driver issue.

Whereas, if I start cqlsh in debug mode, the export works without errors.
Does debug mode switches some libs?


On my local machine I compiled the newest masterbranch python driver.
When connecting to the cassandra cluster and issuing the copy command,
it fails with the same error as above, this time in normal and debug mode.
The stack trace of the copy command is :


home:~/python-driver-master$ cqlsh --debug cassandra-dev01


Using CQL driver: 
Using connect timeout: 5 seconds
Using 'utf-8' encoding
Using ssl: False
Connected to TestCluster at cassandra-dev01:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> USE spielplatz_5;
cqlsh:spielplatz_5> copy demo3  to STDOUT  ;
Detected 8 core(s)
Using 7 child processes

Starting copy of spielplatz_5.demo3 with columns [id, added, dest, id2,
source].
Closing parent cluster sockets
Closing parent cluster sockets
Closing parent cluster sockets
Closing parent cluster sockets
Closing parent cluster sockets
Closing parent cluster sockets
Closing parent cluster sockets
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Created connection to ('192.168.178.1',) with page size 1000 and timeout
10 seconds per page
Created connection to ('192.168.178.1',) with page size 1000 and timeout
10 seconds per page
Created connection to ('192.168.178.1',) with page size 1000 and timeout
10 seconds per page
Created connection to ('192.168.178.1',) with page size 1000 and timeout
10 seconds per page
Created connection to ('192.168.178.1',) with page size 1000 and timeout
10 seconds per page
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Process ExportProcess-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
Closing queues...
self.run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1521, in run
self.inner_run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1544, in inner_run
self.start_request(token_range, info)
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1573, in start_request
metadata =
session.cluster.metadata.keyspaces[self.ks].tables[self.table]
KeyError: 'spielplatz_5'
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Created connection to ('127.0.0.1',) with page size 1000 and timeout 10
seconds per page
Closing queues...
Process ExportProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1521, in run
self.inner_run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1544, in inner_run
self.start_request(token_range, info)
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1573, in start_request
metadata =
session.cluster.metadata.keyspaces[self.ks].tables[self.table]
KeyError: 'spielplatz_5'
Process ExportProcess-7:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1521, in run
self.inner_run()
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1544, in inner_run
self.start_request(token_range, info)
  File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
1573, in start_request
metadata =
session.cluster.metadata.keyspaces[self.ks].tables[self.table]
KeyError: 'spielplatz_5'
Child process 14957 died with exit code 1
Child process 14961 died with exit code 1
Child process 14968 died with exit code 1
Exported 5 ranges out of 513 total ranges, some records might be missing
Processed: 0 rows; Rate:   0 rows/s; Avg. rate:   0 rows/s
0 rows exported to 0 files in 0.311 seconds.




Any hints?


cheers,
 Michael



Re: unexpected select result on secondary index on static column

2017-01-30 Thread Micha
I have restarted the three node cluster with new directories for data
and commitlog and made the test again.

This time the resultset size is 62 rows for the select.
If I execute the select often it jumps between 62 and 65 rows.

After inserting a second row I get 129 rows back I select using the id2
of the new row.


I have no idea what the cause is, espacially if it works for you.

One thing:  cassandra startup log says I should use the newest oracle
release instead of openjdk. Is this possibly causing trouble?


cheers,
 Michael



unexpected select result on secondary index on static column

2017-01-30 Thread Micha
Hi,

I have a second index on a static column and I don't
understand the answer I get from my select.

Maybe someone who understands the inner working of the second index can
give me a hint on this (cassandra 3.9)

A cut down version of the table is:

create table demo (id text, id2 bigint static, added timestamp, source
text static, dest text, primary key (id, added));

create index on demo (id2);

id and id2 match one to one.

I make one insert:
insert into demo (id, id2, added, source, dest) values ('id1', 22,
'2017-01-28', 'src1', 'dst1');


The "select from demo;" gives the expected answer of the one inserted row.

But "select from demo where id2=22" gives 70 rows as result (all the same).

Why? I have read
https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive

but I don't get it...

thanks for helping
 Michael



Re: unexpected select result on secondary index on static column

2017-01-30 Thread Micha
Hi,

forget my the last mail. On the single node cluster it works.

I can try it on the three-node cluster  with a keyspace with replication
factor of 1 and see what happens.
I left most of the default config of cassandra untouched, except storage
directories and ip addresses.

Cheers,
 Michael







On 30.01.2017 10:54, Benjamin Lerer wrote:
> Hi Michael,
> 
> It sound like a bug but I could not reproduced it on 3.9 or on the current
> 3.11 branch (which will become 3.10).
> Now, that does not mean that there are not problem. Something might be
> different in your environement.
> Do you still see that problem when you start from a clean environment with
> a single node?
> 
> Benjamin


Re: unexpected select result on secondary index on static column

2017-01-30 Thread Micha
Hi,

 my cluster is quite new, with three (jessie) nodes and only some test
tables with a few rows of data in it.

I just started a fresh one-node-cluster on another machine, created the
table then the second index on the static column and inserted one row of
data.

create table demo (id text, added timestamp, dest text, id2 bigint
static, source text static, primary key (id, added));

create index id2_index on demo (id2);

insert into demo (id, added,dest,id2,source)
values ('id-1', '2017-01-30', 'dest-1', 22, 'source');

select * from demo gives one row
select * from demo where id2 = 22 gives 194 rows(!), all the same.

The only difference is that the replication factor is 1 instead of 2
this time.

If you need more info or logs, I would like to help.

Cheers,
 Michael






On 30.01.2017 10:54, Benjamin Lerer wrote:
> Hi Michael,
> 
> It sound like a bug but I could not reproduced it on 3.9 or on the
> current 3.11 branch (which will become 3.10).
> Now, that does not mean that there are not problem. Something might be
> different in your environement.
> Do you still see that problem when you start from a clean environment
> with a single node?
> 
> Benjamin


Re: unexpected select result on secondary index on static column

2017-01-30 Thread Micha
OK, thanks, that was good!

You have allocated the keyspace with replication factor 3. If I do this
it works on my cluster too!

If I try this in a new keyspace with replication factor 2 I get the same
result as before, nearly at least, this time 58 rows.

I can reproduce this: 3-node cluster and replication factor 3 -> it works
3-node cluster and replication factor 2 (or 1) -> wrong result.


Are there restrictions on the replication factor?  Does this matter as
the index is stored locally (as I have read)?

I did use cqlsh and also get the same result when using the java api.

Thanks for helping,
 Michael


On 30.01.2017 16:50, Benjamin Lerer wrote:
> So, far I do not have any sucess in trying to reproduce the problem.
> I created a 3 node clusters using 3.9  with ccm and used CQLSH to reproduce
> the problem but I only got back one row:
> 
> cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 3};
> cqlsh> use test;
> cqlsh:test> create table demo (id text, id2 bigint static, added timestamp,
> source
> ... text static, dest text, primary key (id, added));
> cqlsh:test> create index on demo (id2);
> cqlsh:test> insert into demo (id, id2, added, source, dest) values ('id1',
> 22,
> ... '2017-01-28', 'src1', 'dst1');
> cqlsh:test> select * from demo where id2=22;
> 
>  id  | added   | id2 | source | dest
> -+-+-++--
>  id1 | 2017-01-27 23:00:00.00+ |  22 |   src1 | dst1
> 
> (1 rows)
> cqlsh:test>
> 
> Did you use CQLSH to reproduce the problem or another client?
> 
> 


UndeclaredThrowableException, C* 3.11

2017-08-01 Thread Micha
Hi,


I added a fourth node to my cluster, after the boostrap I changed RP
from 2 to 3 and ran nodetool repair on the new node.

A few hours later the repair command exited with the
UndeclaredThrowableException and the node was down.

In the logs I don't see a reason for the exception or shutdown.
How can I know that the repair was successful?

There are messages with the repair id and "Session with... is complete"
and "All sessions completed" and "Sync completed using session..."


Is this an indicate for a completed repair?

The output of nodetool info shows "percent Repaired 26.18%"   Should
this be 100% after a completed repair?


Thanks,
 Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: rebuild constantly fails, 3.11

2017-08-11 Thread Micha
the nodes have 32G ram, there are no other processes running.

Thanks for the info about the g1gc.
I used bootstrap resume to finish the bootstrap, then added another two
nodes.
This worked, but I saw in munin a constantly rising memory consumption
during streaming, while on the other nodes there was a constantly amount
of memory used. Nothing was killed this time, since the streaming took
much less time to complete.
I think I'll switch to cms on the problem node and compare the memory
usage and compare the nodetool info output.



 Michael

Am 11.08.2017 um 17:55 schrieb kurt greaves:
> How much memory do these machines have?  Typically we've found that G1
> isn't worth it until you get to around 24G heaps, and even at that it's
> not really better than CMS. You could try CMS with an 8G heap and 2G new
> size. 
> 
> However as the oom is only happening on one node have you ensured there
> are no extra processes running on that node that could be consuming
> extra memory? Note that the oom killer will kill the process with the
> highest oom score, which generally corresponds to the process using the
> most memory, but not necessarily the problem.
> 
> Also could you run nodetool info on the problem node and 1 other and
> dump the output in a gist? It would be interesting to see if there is a
> significant difference in off-heap.
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: rebuild constantly fails, 3.11

2017-08-11 Thread Micha
It's an oom issue, the kernel kills the cassandra job.
The config was to use offheap buffers and 20G java heap, I changed this
to use heap buffers and 16G java heap. I added a  new node yesterday
which got streams from 4 other nodes. They all succeeded except on the
one node which failed before. This time again the db was killed by the
kernel. At the moment I don't know what is the reason here, since the
nodes are equal.

For me it seems the g1gc is not able to free the memory fast enough.
The settings were for  MaxGCPauseMillis=600 and ParallelGCThreads=10
ConcGCThreads=10 which maybe are too high since the node has only 8 cores..
I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned
in the comments of jvm.options

Since the bootstrap of the fifth node did not complete I will start it
again and check if the memory is still decreasing over time.



 Michael



On 11.08.2017 01:25, Jeff Jirsa wrote:
> 
> 
> On 2017-08-08 01:00 (-0700), Micha <mich...@fantasymail.de> wrote: 
>> Hi,
>>
>> it seems I'm not able to add add 3 node dc to a 3 node dc. After
>> starting the rebuild on a new node, nodetool netstats show it will
>> receive 1200 files from node-1 and 5000 from node-2. The stream from
>> node-1 completes but the stream from node-2 allways fails, after sending
>> ca 4000 files.
>>
>> After restarting the rebuild it again starts to send the 5000 files.
>> The whole cluster is connected via one switch only , no firewall
>> between, the networks shows no errors.
>> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
>> the logs show no errors. The size of the data is ca 1TB.
> 
> Is there anything in `dmesg` ?  System logs? Nothing? Is node2 running? Is 
> node3 running? 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



possible race in copy from csv, cassandra 3.9

2017-07-13 Thread Micha
I use "copy from" to import a bunch of csv files, each with 100 rows
(exported by "copy to" from another table)

There is a chance that the copy from just hangs, after importing 9995000
lines, doing nothing, waiting forever.

Could this be a race in the copy code? I use NUMPROCESSES=6, in bash a
"ps" shows 8 started cqlsh.py processes. For me it seems that all of the
processes are waiting for the signal that the import is done, but they
are all blocked. So they wait forever.

Checking random records of the imported file indicate that it is
completely imported.

The logs show no errors.

Any suggestions?

Thanks,
 Michael





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: sstabledump expects jna 5.1.0

2017-07-18 Thread Micha

ok, if I deinstall the package libjna-jni, which also removes
libjna-java, the error is gone and sstabledump works.

On startup cassandras system log still contains "JNA mlockall
successful" which means that jna is working for cassandra.
I thought this lib is necessary for using JNA?



 Michael



On 18.07.2017 13:36, Stefan Podkowinski wrote:
> I haven't been able to reproduce this on Ubuntu or CentOS. Which OS do
> you use? Did you install a pre-build package or tarball?
> 
> On 18.07.2017 11:43, Micha wrote:
>> Hello,
>>
>> when calling sstabledump from cassandra 3.11 I get the error:
>>
>>
>> "There is an incompatible JNA native library installed on this system
>> Expected: 5.1.0
>> Found: 4.0.0"
>>
>> Maybe I overlooked something, but after searching I found the newest
>> version to be 4.4 with 4.5 the upcoming new version.
>>
>> My java version is 1.8.0_131, build 25.131-b11
>>
>> Setting jna.nosys=true works however.
>>
>> So, where does the required version 5.1 come from?
>>
>> thanks,
>>  Michael
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



sstabledump expects jna 5.1.0

2017-07-18 Thread Micha
Hello,

when calling sstabledump from cassandra 3.11 I get the error:


"There is an incompatible JNA native library installed on this system
Expected: 5.1.0
Found: 4.0.0"

Maybe I overlooked something, but after searching I found the newest
version to be 4.4 with 4.5 the upcoming new version.

My java version is 1.8.0_131, build 25.131-b11

Setting jna.nosys=true works however.

So, where does the required version 5.1 come from?

thanks,
 Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



rebuild constantly fails, 3.11

2017-08-08 Thread Micha
Hi,

it seems I'm not able to add add 3 node dc to a 3 node dc. After
starting the rebuild on a new node, nodetool netstats show it will
receive 1200 files from node-1 and 5000 from node-2. The stream from
node-1 completes but the stream from node-2 allways fails, after sending
ca 4000 files.

After restarting the rebuild it again starts to send the 5000 files.
The whole cluster is connected via one switch only , no firewall
between, the networks shows no errors.
The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
the logs show no errors. The size of the data is ca 1TB.


Any help is really welcome,

cheers
 Michael






The error is:

Cassandra has shutdown.
error: null
-- StackTrace --
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
at
javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
at com.sun.proxy.$Proxy7.rebuild(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1190)
at
org.apache.cassandra.tools.nodetool.Rebuild.execute(Rebuild.java:58)
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:254)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:168)

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



exception during repair (3.11)

2017-11-13 Thread Micha
Hi,

I get the following exception during repair. After some of these are
thrown, the cassandra node shuts down. Now I don't know how to get
things working again.

A few days ago there were errors due to a column being much too big.
This is fixed but could this be the reason for this, some corrupt data
because of an insert with a much too big column?



Cheers
 Michael

C* 3.11,  7 node cluster


Exception:

ERROR [MessagingService-Incoming-/192.168.0.3] 2017-11-13 09:27:42,263
CassandraDaemon.java:228 - Exception in thread
Thread[MessagingService-Incoming-/192.168.0.3,5,main]
java.io.IOError: java.io.EOFException: EOF after 261977 bytes out of 316099
at
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:227)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:215)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:848)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:809)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.hints.Hint$Serializer.deserialize(Hint.java:148)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.hints.HintMessage$Serializer.deserialize(HintMessage.java:123)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.hints.HintMessage$Serializer.deserialize(HintMessage.java:82)
~[apache-cassandra-3.11.1.jar:3.11.1]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
~[apache-cassandra-3.11.1.jar:3.11.1]
Caused by: java.io.EOFException: EOF after 261977 bytes out of 316099
at
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.io.util.TrackedDataInputPlus.readFully(TrackedDataInputPlus.java:93)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:446)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(UnfilteredSerializer.java:644)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$deserializeRowBody$1(UnfilteredSerializer.java:609)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1242)
~[apache-cassandra-3.11.1.jar:3.11.1]
at org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1197)
~[apache-cassandra-3.11.1.jar:3.11.1]
at org.apache.cassandra.db.Columns.apply(Columns.java:377)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:605)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeOne(UnfilteredSerializer.java:480)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(UnfilteredSerializer.java:436)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:222)
~[apache-cassandra-3.11.1.jar:3.11.1]
... 13 common frames omitted




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



some repair failed, 3.11

2017-12-15 Thread Micha
Hi,

after 135 min a "nodetool repair -pr .." failed with "Some repair failed"

Is there a possibility to find out what exactly failed?

The log states "repair command #1 finished in ..."

What to do now? Just start again and hope that it will finish this time
successfully?

thanks
 Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



how to fix constantly getting out of memory (3.11)

2017-12-12 Thread Micha
Hi,

I have seven nodes, debian stretch with c*3.11, each with 2TB disk (500G
free), 32G Ram.
I have a keyspace with seven tables. At the moment the cluster doesn't
work at all reliably. Every morning at least 2 nodes are shut down due
to out of memory. Repair afterwards fails with "some repair failed".
I use G1 with 16G heap on 6 six nodes and cms with 8G heap on one node
to see a difference. In munin it's easy to see a constantly rising
memory consumption. There are no other services running. I cannot
understand who is not releasing the memory.
Some tables have some big rows (as mentioned in my last mail to the
list). Can this be a source of the memory consumption?

How do you track down this?  Is there memory which doesn't get released
and accumulates over time? I have not yet debugged such gc/memory issues.

cheers
 Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org