Re: Unbalanced cluster

2017-07-10 Thread Nate McCall
You wouldnt have a build file laying around for that, would you?

On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall  wrote:

> On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity  wrote:
>
>>
>>
>>
>> [1] https://github.com/avikivity/shardsim
>>
>
> Avi, that's super handy - thanks for posting.
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: index_interval

2017-07-10 Thread Jeff Jirsa


On 2017-07-10 15:09 (-0700), Fay Hou [Storage Service] ­  
wrote: 
> BY defaults:
> 
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> 
> "Cassandra maintains index offsets per partition to speed up the lookup
> process in the case of key cache misses (see cassandra read path overview
> ).
> By default it samples a subset of keys, somewhat similar to a skip list.
> The sampling interval is configurable with min_index_interval and
> max_index_interval CQL schema attributes (see describe table). For
> relatively large blobs like HTML pages we seem to get better read latencies
> by lowering the sampling interval from 128 min / 2048 max to 64 min / 512
> max. For large tables like parsoid HTML with ~500G load per node this
> change adds a modest ~25mb off-heap memory."
> 
> I wonder if any one has experience on working with max and min index_interval
> to increase the read speed.

It's usually more efficient to try to tune the key cache, and hope you never 
have to hit the partition index at all. Do you have reason to believe you're 
spending an inordinate amount of IO scanning the partition index? Do you know 
what your key cache hit rate is? 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Unbalanced cluster

2017-07-10 Thread Nate McCall
On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity  wrote:

>
>
>
> [1] https://github.com/avikivity/shardsim
>

Avi, that's super handy - thanks for posting.


Re: Unbalanced cluster

2017-07-10 Thread kurt greaves
the reason for the default of 256 vnodes is because at that many tokens the
random distribution of tokens is enough to balance out each nodes token
allocation almost evenly. any less and some nodes will get far more
unbalanced, as Avi has shown. In 3.0 there is a new token allocating
algorithm however it requires configuring prior to adding a node and also
only really works well if your RF=# of racks, or you only use 1 rack. have
a look around for the allocate_token_keyspace option for more details.


index_interval

2017-07-10 Thread Fay Hou [Storage Service] ­
BY defaults:

AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128

"Cassandra maintains index offsets per partition to speed up the lookup
process in the case of key cache misses (see cassandra read path overview
).
By default it samples a subset of keys, somewhat similar to a skip list.
The sampling interval is configurable with min_index_interval and
max_index_interval CQL schema attributes (see describe table). For
relatively large blobs like HTML pages we seem to get better read latencies
by lowering the sampling interval from 128 min / 2048 max to 64 min / 512
max. For large tables like parsoid HTML with ~500G load per node this
change adds a modest ~25mb off-heap memory."

I wonder if any one has experience on working with max and min index_interval
to increase the read speed.

Thanks,
Fay


Re: private interface for interdc messaging

2017-07-10 Thread Anuj Wadehra
Hi, 
I am not sure why you would want to connect clients on public interface. Are 
you making db calls from clients outside the DC? 
Also, not sure why you expect two DCs to communicate on private networks unless 
they are two logical DCs within same physical DC. 
Generally,  you configure multi dc setup in yaml as follows:
-use GossipingPropertyFileSnitch and set prefer_local to true in 
cassandra-rackdc.properties. This would ensure that local node to node 
communication within a dc happens on private. 
-set Rpc_address to private ip so that clients connect to private interface. 
-set listen_address to private IP.  Cassandra would communicate to nodes in 
local dc using this address. 
-set broadcast_address to public ip.Cassandra would communicate to nodes in 
other dc using this address. 
-set listen_on_broadcast_address to true
ThanksAnuj

 
 
  On Fri, 7 Jul 2017 at 22:58, CPC wrote:   Hi,
We are building 2 datacenters with each machine have one public(for native 
client connections) and one for private(internode communication). What we 
noticed that nodes in one datacenters trying to communicate with other nodes in 
other dc over their public interfaces. I mean:DC1 Node1 public interface -> DC2 
Node1 private interfaceBut what we perefer is:DC1 Node1 private interface -> 
DC2 Node1 private interface

Is there any configuration so a node make interdc connection over its private 
network?
Thank you...  


Re: Data Model Suggestion Required

2017-07-10 Thread Jeff Jirsa


On 2017-07-10 07:13 (-0700), Siddharth Prakash Singh  wrote: 
> I am planning to build a user activity timeline. Users on our system
> generates different kind of activity. For example - Search some product,
> Calling our sales team, Marking favourite etc.
> Now I would like to generate timeline based on these activities. Timeline
> could be for all events, filtered on specific set of events, filtered on
> time interval, filtered on specific set of events between time intervals.
> Composite column keys looks like a viable solution.
> 
> Any other thoughts here?
> 

You probably want to take advantage of multiple/compound clustering keys, at 
least one of which being a timeuuid to give yourself ordering, and one giving 
you a 'type' of event. 

CREATE TABLE whatever (
product_id uuid ,
event_type text,
event_id timeuuid,
event_action text,
event_data text,
PRIMARY KEY(product_id, event_id, event_type, event_action, event_data));

This will let you do "SELECT * FROM whatever WHERE product_id=?" and get all of 
the events, sorted by time, then by type, then you can have another unique 
"action", and finally a data field where you can shove your blob of whatever it 
is.  This would let you do time slices by specifying "event_id >= X and 
event_id < Y", but you'd need (want) to filter event_type client side.

Alternatively, PRIMARY KEY(product_id, event_type, event_id, event_action, 
event_data) would let you do event_type=X and event_id >= Y and event_id < Z, 
which is all events of a given type within a slice.

"product_id" may not be the natural partition key, feel free to use a compound 
partition key as well (may be "PRIMARY KEY((product_id, office_id), event_type, 
event_id, event_action, event_data)" to make a partition-per-office, as a silly 
example.



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread Jeff Jirsa


On 2017-07-10 02:07 (-0700), 张强  wrote: 
> Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a
> time series database), after running 4 days with stable workload, the
> database client start to get "request errors", but there are not a lot of
> error or warning messages in the cassandra log file, the client start to
> receive error message at about 7-7 21:03:00, and kairosdb keep retrying
> after that time, but there isn't much logs in the cassandra log file.
> I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a
> "nodetool tablestats" command to get some information, the command got an
> error, and while that time, the cassandra process start to crash, and
> generated a dump file.
> After C* shutdown, I take the logs to see what happened, and I found
> something strange inside the logs.
> 
> 1. In the system.log, there are two lines shows that no logs between
> 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long
> period without any logs, and in gc.log file, there are a lot of logs shows
> long time GC, that should be logged in system.log.
> INFO  [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 - Maximum
> memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB

Failing to allocate during read stage is a good indication that you're out of 
memory - either the heap is too small, or it's a direct memory allocation 
failure, or something, but that log line probably shouldn't be at INFO, because 
it seems like it's probably hiding a larger problem. 

> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347
> NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with
> average duration of 60367.73ms, 1 have exceeded the configured commit
> interval by an average of 50367.73ms

It's taking a full minute to sync your memtable to disk. This is either 
indication that your disk is broken, or your JVM is pausing for GC. 

> 
> 2. In the system.log, there is a log shows very long time GC, and then the
> C* start to close.
> WARN  [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 -
> Some operations timed out, details available at debug level (debug.log)
> WARN  [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 688850ms.  CMS Old Gen: 2114938312 -> 469583832;
> Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 ->
> 25784008
> ..
> ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 -
> Exception in thread Thread[Thrift:22,5,main]
> java.lang.OutOfMemoryError: Java heap space

You ran out of heap. We try to clean up and kill things when this happens, but 
by definition, the JVM is in an undefined state, and we may not be able to shut 
things down properly. 

> 
> 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59, the
> log is:
> INFO  [IndexSummaryManager:1] 2017-07-07 14:43:59,967
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
> After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no
> more INFO level or other level logs in that log file, while there are still
> many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these
> two log files match?
> 
> My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012
> r2.

That's a bit thin - depending on data model and data volume, you may be able to 
construct a read that fills up your 3G heap, and causes you to OOM with a 
single read. How much data is involved?  What does 'nodetool tablestats' look 
like, and finally, how many reads/seconds are you doing on this workload?


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Unbalanced cluster

2017-07-10 Thread Avi Kivity
32 tokens is too few for 33 nodes. I have a sharding simulator [1] and 
it shows



$ ./shardsim --vnodes 32 --nodes 33 --shards 1
33 nodes, 32 vnodes, 1 shards
maximum node overcommit:  1.42642
maximum shard overcommit: 1.426417


So 40% overcommit over the average. Since some nodes can be 
undercommitted, this easily explains the 2X difference (40% overcommit + 
30% undercommit = 2X).



Newer versions of Cassandra have better token selection and will suffer 
less from this.




[1] https://github.com/avikivity/shardsim


On 07/10/2017 04:02 PM, Loic Lambiel wrote:

Hi,

One of our clusters is becoming somehow unbalanced, at least some of the
nodes:

(output edited to remove unnecessary information)
--  Address Load   Tokens  Owns (effective)   Rack
UN  192.168.1.22   2.99 TB32  10.6%   RACK1
UN  192.168.1.23   3.35 TB32  11.7%   RACK1
UN  192.168.1.20   3.22 TB32  11.3%   RACK1
UN  192.168.1.21   3.21 TB32  11.2%   RACK1
UN  192.168.1.18   2.87 TB32  10.3%   RACK1
UN  192.168.1.19   3.49 TB32  12.0%   RACK1
UN  192.168.1.16   5.32 TB32  12.9%   RACK1
UN  192.168.1.17   3.77 TB32  12.0%   RACK1
UN  192.168.1.26   4.46 TB32  11.2%   RACK1
UN  192.168.1.24   3.24 TB32  11.4%   RACK1
UN  192.168.1.25   3.31 TB32  11.2%   RACK1
UN  192.168.1.134  2.75 TB18  7.2%RACK1
UN  192.168.1.135  2.52 TB18  6.0%RACK1
UN  192.168.1.132  1.85 TB18  6.8%RACK1
UN  192.168.1.133  2.41 TB18  5.7%RACK1
UN  192.168.1.130  2.95 TB18  7.1%RACK1
UN  192.168.1.131  2.82 TB18  6.7%RACK1
UN  192.168.1.128  3.04 TB18  7.1%RACK1
UN  192.168.1.129  2.47 TB18  7.2%RACK1
UN  192.168.1.14   5.63 TB32  13.4%   RACK1
UN  192.168.1.15   2.95 TB32  10.4%   RACK1
UN  192.168.1.12   3.83 TB32  12.4%   RACK1
UN  192.168.1.13   2.71 TB32  9.5%RACK1
UN  192.168.1.10   3.51 TB32  11.9%   RACK1
UN  192.168.1.11   2.96 TB32  10.3%   RACK1
UN  192.168.1.126  2.48 TB18  6.7%RACK1
UN  192.168.1.127  2.23 TB18  5.5%RACK1
UN  192.168.1.124  2.05 TB18  5.5%RACK1
UN  192.168.1.125  2.33 TB18  5.8%RACK1
UN  192.168.1.122  1.99 TB18  5.1%RACK1
UN  192.168.1.123  2.44 TB18  5.7%RACK1
UN  192.168.1.120  3.58 TB28  11.4%   RACK1
UN  192.168.1.121  2.33 TB18  6.8%RACK1

Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
They both have 32 tokens.

The cluster is running:

* Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
enabled)
* RF=3 with single DC and single rack. LCS as the compaction strategy,
JBOD storage
* Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
* Node cleanup performed on all nodes

Almost all of the cluster load comes from a single CF:

CREATE TABLE blobstore.block (
 inode uuid,
 version timeuuid,
 block bigint,
 offset bigint,
 chunksize int,
 payload blob,
 PRIMARY KEY ((inode, version, block), offset)
) WITH CLUSTERING ORDER BY (offset ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
 AND comment = ''
 AND compaction = {'tombstone_threshold': '0.1',
'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
'false', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
 AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 172000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';

The payload column is almost the same size in each record.

I understand that an unbalanced cluster may be the result of a bad
Primary key, which I believe isn't the case here.

Any clue on what could be the cause ? How can I re-balance it without
any decommission ?

My understanding is that nodetool move may only be used when not using
the vnodes feature.

Any help appreciated, thanks !


Loic Lambiel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org





Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread 张强
Thanks for your reply!
There are 3 column families, they are created by kairosdb, one column
family takes almost all the workload.
I didn't tune the heap size, so by default it'll be 3GB.
I have monitored the cpu and memory usage, the cpu usage is about 30% in
average, and the available memory is about 1.5G in average, so the memory
usage is about 87% in average.
My workload is generated by test programs, it's stable and periodic, before
the test programs receive error messages, there are no signs of high cpu
usage or memory usage changes. That makes me confused.

2017-07-10 17:30 GMT+08:00 Varun Barala :

> Hi,
>
>
> *How many column families are there? What is the heap size?*
>
> You can turn off logs for statusLogger.java and gc to optimize heap usage.
>
> Can you also monitor cpu usage and memory usage? IMO, in your case memory
> is the bottle-neck.
>
> Thanks!!
>
> On Mon, Jul 10, 2017 at 5:07 PM, 张强  wrote:
>
>> Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a
>> time series database), after running 4 days with stable workload, the
>> database client start to get "request errors", but there are not a lot of
>> error or warning messages in the cassandra log file, the client start to
>> receive error message at about 7-7 21:03:00, and kairosdb keep retrying
>> after that time, but there isn't much logs in the cassandra log file.
>> I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a
>> "nodetool tablestats" command to get some information, the command got an
>> error, and while that time, the cassandra process start to crash, and
>> generated a dump file.
>> After C* shutdown, I take the logs to see what happened, and I found
>> something strange inside the logs.
>>
>> 1. In the system.log, there are two lines shows that no logs between
>> 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long
>> period without any logs, and in gc.log file, there are a lot of logs shows
>> long time GC, that should be logged in system.log.
>> INFO  [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 -
>> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
>> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347
>> NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with
>> average duration of 60367.73ms, 1 have exceeded the configured commit
>> interval by an average of 50367.73ms
>>
>> 2. In the system.log, there is a log shows very long time GC, and then
>> the C* start to close.
>> WARN  [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 -
>> Some operations timed out, details available at debug level (debug.log)
>> WARN  [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 -
>> ConcurrentMarkSweep GC in 688850ms.  CMS Old Gen: 2114938312 -> 469583832;
>> Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 ->
>> 25784008
>> ..
>> ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 -
>> Exception in thread Thread[Thrift:22,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>
>> 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59,
>> the log is:
>> INFO  [IndexSummaryManager:1] 2017-07-07 14:43:59,967
>> IndexSummaryRedistribution.java:75 - Redistributing index summaries
>> After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no
>> more INFO level or other level logs in that log file, while there are still
>> many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these
>> two log files match?
>>
>> My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012
>> r2.
>> Do you know what's going on here? and is there anything I can do to
>> prevent that situation?
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
>


Data Model Suggestion Required

2017-07-10 Thread Siddharth Prakash Singh
I am planning to build a user activity timeline. Users on our system
generates different kind of activity. For example - Search some product,
Calling our sales team, Marking favourite etc.
Now I would like to generate timeline based on these activities. Timeline
could be for all events, filtered on specific set of events, filtered on
time interval, filtered on specific set of events between time intervals.
Composite column keys looks like a viable solution.

Any other thoughts here?

Regards
Siddharth


Re: private interface for interdc messaging

2017-07-10 Thread Nitan Kainth
Did your network team setup route?
Can you run traceroute and see?

Sent from my iPhone

> On Jul 10, 2017, at 2:12 AM, CPC  wrote:
> 
> Hi,
> 
> setting broadcast adress does not resolve problem. We still saw interdc 
> traffic like below:
> dc1 public => dc2 private
> 
> traffic. We we want is:
> 
> dc1 private => dc2 private
> 
> Any idea? Also could it be related to 
> https://issues.apache.org/jira/browse/CASSANDRA-12673. Any idea?
> 
>> On 7 July 2017 at 21:51, CPC  wrote:
>> Thank you Nitan.
>> 
>> 
>> On Jul 7, 2017 8:59 PM, "Nitan Kainth"  wrote:
>> Yes. Because that's the ip used for internode communication
>> 
>> Sent from my iPhone
>> 
>>> On Jul 7, 2017, at 10:52 AM, CPC  wrote:
>>> 
>>> Hi Nitan,
>>> 
>>> Do you mean setting broadcast_address to private network would suffice? 
>>> 
 On 7 July 2017 at 20:45, Nitan Kainth  wrote:
 You can setup broadcast address for the IP on which Nodes will communicate 
 with each other. You network team can setup routing table from source to 
 target.
 We had similar setup done in one of my previous project where we 
 segregated network between application and C* nodes communication.
 
 > On Jul 7, 2017, at 10:28 AM, CPC  wrote:
 >
 > Hi,
 >
 > We are building 2 datacenters with each machine have one public(for 
 > native client connections) and one for private(internode communication). 
 > What we noticed that nodes in one datacenters trying to communicate with 
 > other nodes in other dc over their public interfaces.
 > I mean:
 > DC1 Node1 public interface -> DC2 Node1 private interface
 > But what we perefer is:
 > DC1 Node1 private interface -> DC2 Node1 private interface
 >
 > Is there any configuration so a node make interdc connection over its 
 > private network?
 >
 > Thank you...
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org
 
>>> 
>> 
> 


Unbalanced cluster

2017-07-10 Thread Loic Lambiel
Hi,

One of our clusters is becoming somehow unbalanced, at least some of the
nodes:

(output edited to remove unnecessary information)
--  Address Load   Tokens  Owns (effective)   Rack
UN  192.168.1.22   2.99 TB32  10.6%   RACK1
UN  192.168.1.23   3.35 TB32  11.7%   RACK1
UN  192.168.1.20   3.22 TB32  11.3%   RACK1
UN  192.168.1.21   3.21 TB32  11.2%   RACK1
UN  192.168.1.18   2.87 TB32  10.3%   RACK1
UN  192.168.1.19   3.49 TB32  12.0%   RACK1
UN  192.168.1.16   5.32 TB32  12.9%   RACK1
UN  192.168.1.17   3.77 TB32  12.0%   RACK1
UN  192.168.1.26   4.46 TB32  11.2%   RACK1
UN  192.168.1.24   3.24 TB32  11.4%   RACK1
UN  192.168.1.25   3.31 TB32  11.2%   RACK1
UN  192.168.1.134  2.75 TB18  7.2%RACK1
UN  192.168.1.135  2.52 TB18  6.0%RACK1
UN  192.168.1.132  1.85 TB18  6.8%RACK1
UN  192.168.1.133  2.41 TB18  5.7%RACK1
UN  192.168.1.130  2.95 TB18  7.1%RACK1
UN  192.168.1.131  2.82 TB18  6.7%RACK1
UN  192.168.1.128  3.04 TB18  7.1%RACK1
UN  192.168.1.129  2.47 TB18  7.2%RACK1
UN  192.168.1.14   5.63 TB32  13.4%   RACK1
UN  192.168.1.15   2.95 TB32  10.4%   RACK1
UN  192.168.1.12   3.83 TB32  12.4%   RACK1
UN  192.168.1.13   2.71 TB32  9.5%RACK1
UN  192.168.1.10   3.51 TB32  11.9%   RACK1
UN  192.168.1.11   2.96 TB32  10.3%   RACK1
UN  192.168.1.126  2.48 TB18  6.7%RACK1
UN  192.168.1.127  2.23 TB18  5.5%RACK1
UN  192.168.1.124  2.05 TB18  5.5%RACK1
UN  192.168.1.125  2.33 TB18  5.8%RACK1
UN  192.168.1.122  1.99 TB18  5.1%RACK1
UN  192.168.1.123  2.44 TB18  5.7%RACK1
UN  192.168.1.120  3.58 TB28  11.4%   RACK1
UN  192.168.1.121  2.33 TB18  6.8%RACK1

Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
They both have 32 tokens.

The cluster is running:

* Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
enabled)
* RF=3 with single DC and single rack. LCS as the compaction strategy,
JBOD storage
* Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
* Node cleanup performed on all nodes

Almost all of the cluster load comes from a single CF:

CREATE TABLE blobstore.block (
inode uuid,
version timeuuid,
block bigint,
offset bigint,
chunksize int,
payload blob,
PRIMARY KEY ((inode, version, block), offset)
) WITH CLUSTERING ORDER BY (offset ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'tombstone_threshold': '0.1',
'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
'false', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 172000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

The payload column is almost the same size in each record.

I understand that an unbalanced cluster may be the result of a bad
Primary key, which I believe isn't the case here.

Any clue on what could be the cause ? How can I re-balance it without
any decommission ?

My understanding is that nodetool move may only be used when not using
the vnodes feature.

Any help appreciated, thanks !


Loic Lambiel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: error 1300 from csv export

2017-07-10 Thread Micha
Sorry for the noise, somehow overread the copy option BEGINTOKEN and
ENDTOKEN..

 Michael



On 10.07.2017 13:11, Micha wrote:
> Hi,
> 
> I got some errors from a csv export of a table.
> They are of the form:
> "Error for (number-1, number-2): ReadFailure  Error from server:
> code=1300 ... "
> 
> At the end "Exported 650 ranges out of 658 total, some records might be
> missing"
> 
> 
> Is there a way to start the export only for the failed ranges again?
> 
> Thanks,
>  Michael
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



error 1300 from csv export

2017-07-10 Thread Micha
Hi,

I got some errors from a csv export of a table.
They are of the form:
"Error for (number-1, number-2): ReadFailure  Error from server:
code=1300 ... "

At the end "Exported 650 ranges out of 658 total, some records might be
missing"


Is there a way to start the export only for the failed ranges again?

Thanks,
 Michael



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread Varun Barala
Hi,


*How many column families are there? What is the heap size?*

You can turn off logs for statusLogger.java and gc to optimize heap usage.

Can you also monitor cpu usage and memory usage? IMO, in your case memory
is the bottle-neck.

Thanks!!

On Mon, Jul 10, 2017 at 5:07 PM, 张强  wrote:

> Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a
> time series database), after running 4 days with stable workload, the
> database client start to get "request errors", but there are not a lot of
> error or warning messages in the cassandra log file, the client start to
> receive error message at about 7-7 21:03:00, and kairosdb keep retrying
> after that time, but there isn't much logs in the cassandra log file.
> I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a
> "nodetool tablestats" command to get some information, the command got an
> error, and while that time, the cassandra process start to crash, and
> generated a dump file.
> After C* shutdown, I take the logs to see what happened, and I found
> something strange inside the logs.
>
> 1. In the system.log, there are two lines shows that no logs between
> 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long
> period without any logs, and in gc.log file, there are a lot of logs shows
> long time GC, that should be logged in system.log.
> INFO  [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 - Maximum
> memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347
> NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with
> average duration of 60367.73ms, 1 have exceeded the configured commit
> interval by an average of 50367.73ms
>
> 2. In the system.log, there is a log shows very long time GC, and then the
> C* start to close.
> WARN  [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 -
> Some operations timed out, details available at debug level (debug.log)
> WARN  [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 688850ms.  CMS Old Gen: 2114938312 -> 469583832;
> Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 ->
> 25784008
> ..
> ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 -
> Exception in thread Thread[Thrift:22,5,main]
> java.lang.OutOfMemoryError: Java heap space
>
> 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59,
> the log is:
> INFO  [IndexSummaryManager:1] 2017-07-07 14:43:59,967
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
> After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no
> more INFO level or other level logs in that log file, while there are still
> many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these
> two log files match?
>
> My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012
> r2.
> Do you know what's going on here? and is there anything I can do to
> prevent that situation?
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


Re: private interface for interdc messaging

2017-07-10 Thread CPC
Hi,

setting broadcast adress does not resolve problem. We still saw interdc
traffic like below:
dc1 public => dc2 private

traffic. We we want is:

dc1 private => dc2 private

Any idea? Also could it be related to
https://issues.apache.org/jira/browse/CASSANDRA-12673. Any idea?

On 7 July 2017 at 21:51, CPC  wrote:

> Thank you Nitan.
>
>
> On Jul 7, 2017 8:59 PM, "Nitan Kainth"  wrote:
>
> Yes. Because that's the ip used for internode communication
>
> Sent from my iPhone
>
> On Jul 7, 2017, at 10:52 AM, CPC  wrote:
>
> Hi Nitan,
>
> Do you mean setting broadcast_address to private network would suffice?
>
> On 7 July 2017 at 20:45, Nitan Kainth  wrote:
>
>> You can setup broadcast address for the IP on which Nodes will
>> communicate with each other. You network team can setup routing table from
>> source to target.
>> We had similar setup done in one of my previous project where we
>> segregated network between application and C* nodes communication.
>>
>> > On Jul 7, 2017, at 10:28 AM, CPC  wrote:
>> >
>> > Hi,
>> >
>> > We are building 2 datacenters with each machine have one public(for
>> native client connections) and one for private(internode communication).
>> What we noticed that nodes in one datacenters trying to communicate with
>> other nodes in other dc over their public interfaces.
>> > I mean:
>> > DC1 Node1 public interface -> DC2 Node1 private interface
>> > But what we perefer is:
>> > DC1 Node1 private interface -> DC2 Node1 private interface
>> >
>> > Is there any configuration so a node make interdc connection over its
>> private network?
>> >
>> > Thank you...
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Understanding of cassandra metrics

2017-07-10 Thread Павел Сапежко
So, Range latency(or Scan in case of coordinator) appears only when we are
reading multiple partition key in the same query?

пт, 7 июл. 2017 г. в 20:29, Chris Lohfink :

> The coordinator read/scan (Scan is just different naming for the Range, so
> coordinator view of RangeLatency) is the latencies from the coordinator
> perspective, so it includes network latency between replicas and such. This
> which is actually added for speculative retry (why there is no
> coordinatorWriteLatency). Only the CoordinatorReadLatency is used for it
> however.
>
> The Read/RangeLatency metrics are for local reads, basically just how long
> to read from disk and merge with sstables.
>
> The View* metrics are only relevant to materialized views. There actually
> is a partition lock for updates which ViewLockAcquireTime gives
> visibility too. Also there are sometimes reads required for updating
> materialized views, which ViewReadTime is for tracking. For more details id
> recommend
> https://opencredo.com/everything-need-know-cassandra-materialized-views/
>
> Chris
>
> On Fri, Jul 7, 2017 at 9:42 AM, ZAIDI, ASAD A  wrote:
>
>> What exactly does mean CoordinatorScanLatency for example
>>
>> CoordinatorScanLatency  is a timer metric that present coordinator range
>> scan latency for  table.
>>
>> Is it latency on full table scan or maybe range scan by clustering key?
>>
>> It is range scan.. clustering key is used to only store
>> data in sorted fashion – partition key along with chosen partitioner helps
>> in range scan of data.
>>
>> Can anybody write into partition while locked?
>>
>> Writes are atomic – it depends on your chosen consistency
>> level to determine if writes will fail or succeed.
>>
>>
>>
>> *From:* Павел Сапежко [mailto:amelius0...@gmail.com]
>> *Sent:* Friday, July 07, 2017 8:23 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Understanding of cassandra metrics
>>
>>
>>
>> Are you really think that I don't read docs? Do you have enough
>> information in the documentation? I think no. What exactly does mean 
>> CoordinatorScanLatency
>> for example? Is it latency on full table scan or maybe range scan by
>> clustering key? What exactly mean ViewLockAcquireTime? What is
>> "partition lock"? Can anybody write into partition while locked? Etc.
>>
>> пт, 7 июл. 2017 г. в 13:01, Ivan Iliev :
>>
>> 1st result on google returns:
>>
>>
>>
>> http://cassandra.apache.org/doc/latest/operating/metrics.html
>> 
>>
>>
>>
>> On Fri, Jul 7, 2017 at 12:16 PM, Павел Сапежко 
>> wrote:
>>
>> Hello, I have several question about cassandra metrics. What does exactly
>> mean the next metrics:
>>
>>- CoordinatorReadLatency
>>- CoordinatorScanLatency
>>- ReadLatency
>>- RangeLatency
>>- ViewLockAcquireTime
>>- ViewReadTime
>>
>> --
>>
>> С уважением,
>>
>> Павел Сапежко
>>
>> skype: p.sapezhko
>>
>>
>>
>> --
>>
>> С уважением,
>>
>> Павел Сапежко
>>
>> skype: p.sapezhko
>>
>
> --

С уважением,

Павел Сапежко

skype: p.sapezhko


Re: UDF for sorting

2017-07-10 Thread Eduardo Alonso
I have read HowToContribute
 and created the ticket
CASSANDRA-13682 
with PR and path available flag.


Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-07-06 14:43 GMT+02:00 Eduardo Alonso :

> Hi Jeff:
>
> Do you mean something like this
> ?
>
> This is so basic(with no code change ) that i I have skipped the JIRA
> ticket creation
>
> Could you please review?
> Thank you
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>
> 2017-07-05 23:49 GMT+02:00 Jeff Jirsa :
>
>>
>>
>> On 2017-07-03 16:19 (-0700), Justin Cameron 
>> wrote:
>> > While you can't do this with Cassandra, you can get the functionality
>> you
>> > want with the cassandra-lucene-plugin (
>> > https://github.com/Stratio/cassandra-lucene-index/blob/branc
>> h-3.0.10/doc/documentation.rst#searching
>> > ).
>> >
>> > Keep in mind that as with any secondary index there are
>> performance-related
>> > limitations:
>> > https://github.com/Stratio/cassandra-lucene-index/blob/branc
>> h-3.0.10/doc/documentation.rst#performance-tips
>>
>>
>> We just added a "Plugins" section to the docs (
>> https://github.com/apache/cassandra/blob/trunk/doc/source/
>> plugins/index.rst )  - it would be nice if someone would add the
>> Cassandra-Lucene-Index plugin there.
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>