Inconsistent Reads after Restoring Snapshot

2016-04-25 Thread Anuj Wadehra
Hi,
We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use 
incremental backups. As per the documentation at 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
 , if i need to restore a Snapshot on SINGLE node in a cluster, I would run 
repair at the end. But while the repair is going on, reads may get inconsistent.

Consider following scenario:10 AM Daily Snapshot taken of node A and moved to 
backup location11 AM A record is inserted such that node A and B insert the 
record but there is a mutation drop on node C.1 PM Node A crashes and data is 
restored from latest 10 AM snapshot. Now, only Node B has the record.
Now, my question is:
Till the repair is completed on node A,a read at Quorum may return inconsistent 
result based on the nodes from which data is read.If data is read from node A 
and node C, nothing is returned and if data is read from node A and node B, 
record is returned. This is a vital point which is not highlighted anywhere.

Please confirm my understanding.If my understanding is right, how to make sure 
that my reads are not inconsistent while a node is being repair after restoring 
a snapshot.
I think, autobootstrapping the node without joining the ring till the repair is 
completed, is an alternative option. But snapshots save lot of streaming as 
compared to bootstrap.
Will incremental backups guarantee that 
ThanksAnuj

Sent from Yahoo Mail on Android

Data repairing on one node questionably affects data on others

2016-04-25 Thread ssiv...@gmail.com

Hi All,

I have cluster of 7 nodes completely balanced (each node owns ~500GB of 
data).
And I have one keyspace and one table and three replicas. Than, I just 
failed one node's disk, replace it with a new one and started repairing.
During that process I noticed that additional two nodes have started 
getting data, and at the end of the repairing three nodes have twice 
more data than at the beginning.
I'm curious, is it a normal behavior for Cassandra? Why not only one 
node, but three, have gotten data during repairing? May be it's because 
of clocks skew?


Thanks!

--
best regards,
Sergey



MX4J with Cassandra 3.5 always empty response

2016-04-25 Thread Nico Haller
I just added the mx4j-tools jar (version 3.0.2) in my lib folder and also
enabled remote jmx access without authentication (using firewall to protect
access).
During startup I can see the two following log statements:

HttpAdaptor version 3.0.2 started on port 8081
mx4j successfuly loaded

So, it seems like everything is working fine. In netstat I can also see
that it is listening on that port, but when I try to open the web interface
(http://myhost:8081/), I just get an empty response (HTTP 200, but
content-length 0).
Tried several different paths (e.g. /server /serverbydomain /mbean), all
200 with an empty response.
There are no further logs related to mx4j.

Has anybody seen such a behavior before? Did I miss any mandatory
configuration?

Thanks


Re: Unable to reliably count keys on a thrift CF

2016-04-25 Thread Anuj Wadehra
Hi Carlos,
Please check if the JIRA : 
https://issues.apache.org/jira/browse/CASSANDRA-11467 fixes your problem.
We had been facing row count issue with thrift cf / compact storage and this 
fixed it.
Above is fixed in latest 2.1.14. Its a two line fix. So, you can also prepare a 
custom jar and check if that works.
ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Thu, 21 Apr, 2016 at 9:29 PM, Carlos Alonso wrote:   
Hi guys.
I've been struggling for the last days to find a reliable and stable way to 
count keys in a thrift column family.
My idea is to basically iterate the whole ring using the token function, as 
documented here: 
https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html in batches of 
1 records
The only corner case is that if there were more than 1 records in a single 
partition (not the case, but the program should still handle it) it explores 
the partition in depth by getting all records for that particular token (see 
below). In the end, all keys are saved into a hash to guarantee uniqueness. The 
count of unique keys is always different (and random, sometimes more keys, 
sometimes less are retrieved) and, of course, I'm sure no activity is going on 
in that cf.
I'm running Cassandra 2.1.11 with MurMur3 partitioner. RF=3 and CL=QUORUM
the column family structure is
CREATE TABLE tbl (    key blob,    column1 ascii,    value blob,    PRIMARY 
KEY(key, column1))
and I'm running the following script
connection = open_cql_connectionresults = connection.execute("SELECT 
token(key), key FROM tbl LIMIT 1")
keys_hash = {} // Hash to save the keys to guarantee uniquenesslast_token = 
niltoken = nil
while results != nil  results.each do |row|    keys_hash[row['key']] = true    
token = row['token(key)']  end  if token == last_token    results = 
connection.execute("SELECT token(key), key FROM tbl WHERE token(key) = 
#{token}")  else    results = connection.execute("SELECT token(key), key FROM 
tbl WHERE token(key) >= #{token} LIMIT 1")  end  last_token = tokenend

puts keys.keys.count
What am I missing?
Thanks!
Carlos Alonso | Software Engineer | @calonso
  


Re: StatusLogger is logging too many information

2016-04-25 Thread Anuj Wadehra
Hi,
You can set the property gc_warn_threshold_in_ms in yaml.For example, if your 
application is ok with a 2000ms pause, you can set the value to 2000 such that 
only gc pauses greater than 2000ms will lead to gc and status log.
Please refer 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8907 for 
details.

ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Mon, 25 Apr, 2016 at 3:20 PM, jason zhao yang 
wrote:   Hi,
Currently StatusLogger will log info when there are dropped messages or GC more 
than 200 ms.
In my use case, there are about 1000 tables.  The status-logger is logging too 
many information for each tables. 
I wonder is there a way to reduce this log? for example, only print the thread 
pool information.
Thanks.  


Re: Changing snitch from PropertyFile to Gossip

2016-04-25 Thread Carlos Rolo
I just run it to be sure. Sometimes mistakes happen and it's a way to be
sure.
Em 25/04/2016 10:19, "Alain RODRIGUEZ"  escreveu:

> Hi Carlos,
>
> Why running a repair there if the topology did not change? Is it as a best
> practice, just in case, or is there a specific reason?
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-04-24 15:44 GMT+02:00 Carlos Rolo :
>
>> As long as the topology doesn't change, yes. Repair once you finish.
>> Em 24/04/2016 13:23, "AJ"  escreveu:
>>
>>> Is it possible to do this without down time i.e. run in mixed mode while
>>> doing a rolling upgrade?
>>
>>
>> --
>>
>>
>>
>>
>

-- 


--





Re: Basic query in setting up secure inter-dc cluster

2016-04-25 Thread Ajay Garg
Hi Everyone.

Kindly reply in "yes" or "no", as to whether it is possible to setup
encryption only between particular pair of nodes?
Or is it an "all" or "none" feature, where encryption is present between
EVERY PAIR of nodes, or in NO PAIR of nodes.


Thanks and Regards,
Ajay

On Mon, Apr 18, 2016 at 9:55 AM, Ajay Garg  wrote:

> Also, wondering what is the difference between "all" and "dc" in
> "internode_encryption".
> Perhaps my answer lies in this?
>
> On Mon, Apr 18, 2016 at 9:51 AM, Ajay Garg  wrote:
>
>> Ok, trying to wake up this thread again.
>>
>> I went through the following links ::
>>
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
>>
>>
>> and I am wondering *if it is possible to setup secure
>> inter-communication only between some nodes*.
>>
>> In particular, if I have a 2*2 cluster, is it possible to setup secure
>> communication ONLY between the nodes of DC2?
>> Once it works well, we would then setup secure-communication everywhere.
>>
>> We are wanting this, because DC2 is the backup centre, while DC1 is the
>> primary-centre connected directly to the application-server. We don't want
>> to screw things if something goes bad in DC1.
>>
>>
>> Will be grateful for pointers.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg 
>> wrote:
>>
>>> Hi All.
>>>
>>> A gentle query-reminder.
>>>
>>> I will be grateful if I could be given a brief technical overview, as to
>>> how secure-communication occurs between two nodes in a cluster.
>>>
>>> Please note that I wish for some information on the "how it works below
>>> the hood", and NOT "how to set it up".
>>>
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg 
>>> wrote:
>>>
 Thanks everyone for the reply.

 I actually have a fair bit of questions, but it will be nice if someone
 could please tell me the flow (implementation-wise), as to how node-to-node
 encryption works in a cluster.

 Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
 (with *"require_client_auth: false*").
 I presume it would be like below (please correct me if am wrong) ::

 a)
 node1 tries to connect to node2, using the certificate *as defined on
 node1* in cassandra.yaml.

 b)
 node2 will confirm if the certificate being offered by node1 is in the
 truststore *as defined on node2* in cassandra.yaml.
 if it is, secure-communication is allowed.


 Is my thinking right?
 I

 On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave 
 wrote:

> Hi Ajay,
> Have a look here :
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>
> You can configure for DC level Security:
>
> Procedure
>
> On each node under sever_encryption_options:
>
>- Enable internode_encryption.
>The available options are:
>   - all
>   - none
>   - dc: Cassandra encrypts the traffic between the data centers.
>   - rack: Cassandra encrypts the traffic between the racks.
>
> regards
>
> Neha
>
>
>
> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
> absi...@informatica.com> wrote:
>
>> Security is a very wide concept. What exactly do you want to achieve ?
>>
>>
>>
>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Basic query in setting up secure inter-dc cluster
>>
>>
>>
>> Hi All.
>>
>> We have a 2*2 cluster deployed, but no security as of now.
>>
>> As a first stage, we wish to implement inter-dc security.
>>
>> Is it possible to enable security one machine at a time?
>>
>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>
>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>> AFTER the changes are made in all the 4 machines?
>>
>> Asking here, because I don't want to screw up a live cluster due to
>> my lack of experience.
>>
>> Looking forward to some pointers.
>>
>>
>> --
>>
>> Regards,
>> Ajay
>>
>
>


 --
 Regards,
 Ajay

>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


StatusLogger is logging too many information

2016-04-25 Thread jason zhao yang
Hi,

Currently StatusLogger will log info when there are dropped messages or GC
more than 200 ms.

In my use case, there are about 1000 tables.  The status-logger is logging
too many information for each tables.

I wonder is there a way to reduce this log? for example, only print the
thread pool information.

Thanks.


Re: Changing snitch from PropertyFile to Gossip

2016-04-25 Thread Alain RODRIGUEZ
Hi Carlos,

Why running a repair there if the topology did not change? Is it as a best
practice, just in case, or is there a specific reason?

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-24 15:44 GMT+02:00 Carlos Rolo :

> As long as the topology doesn't change, yes. Repair once you finish.
> Em 24/04/2016 13:23, "AJ"  escreveu:
>
>> Is it possible to do this without down time i.e. run in mixed mode while
>> doing a rolling upgrade?
>
>
> --
>
>
>
>


Re: Upgrading to SSD

2016-04-25 Thread Alain RODRIGUEZ
Hi Anuj,


> You could do the following instead to minimize server downtime:
>
> 1. rsync while the server is running
> 2. rsync again to get any new files
> 3. shut server down
> 4. rsync for the 3rd time
>
5. change directory in yaml and start back up
>

+1

Here are some more details about that process and a script doing most of
the job:
thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html

Hope it will be useful to you

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-23 21:47 GMT+02:00 Jonathan Haddad :

> You could do the following instead to minimize server downtime:
>
> 1. rsync while the server is running
> 2. rsync again to get any new files
> 3. shut server down
> 4. rsync for the 3rd time
> 5. change directory in yaml and start back up
>
>
>
> On Sat, Apr 23, 2016 at 12:23 PM Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> As long as you shut down the node before you start copying and moving
>> stuff around it shouldn't matter if you take backups or snapshots or
>> whatever.
>>
>> When you add the filesystem for the ssd will you be removing the existing
>> filesystem? Or will you be able to keep both filesystems mounted at the
>> same time for the migration?  If you can keep them both at the same time
>> then an of system backup isn't strictly necessary.  Just change your data
>> dir config in your yaml. Copy the data and commit from the old dir to the
>> new ssd and restart the node.
>>
>> If you can't keep both filesystems mounted concurrently then a remote
>> system is necessary to copy the data to. But the steps and procedure are
>> the same.
>>
>> Running repair before you do the migration isn't strictly necessary. Not
>> a bad idea if you don't mind spending the time. Definitely run repair after
>> you restart the node, especially if you take longer than the hint interval
>> to perform the work.
>>
>> As for your filesystems, there is really nothing special to worry about.
>> Depending on which filesystem you use there are recommendations for tuning
>> and configuration that you should probably follow.  (Datastax's
>> recommendations as well as AL tobey's tuning guide are great resources.
>> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html )
>>
>> Clint
>> On Apr 23, 2016 3:05 PM, "Anuj Wadehra"  wrote:
>>
>> Hi
>>
>> We have a 3 node cluster of 2.0.14. We use Read/Write Qorum and RF is 3.
>> We want to move data and commitlog directory from a SATA HDD to SSD. We
>> have planned to do a rolling upgrade.
>>
>> We plan to run repair -pr on all nodes  to sync data upfront and then
>> execute following steps on each server one by one:
>>
>> 1. Take backup of data/commitlog directory to external server.
>> 2. Change mount points so that Cassandra data/commitlog directory now
>> points to SSD.
>> 3. Copy files from external backup to SSD.
>> 4. Start Cassandra.
>> 5. Run full repair on the node before starting step 1 on next node.
>>
>> Questions:
>> 1. Is copying commitlog and data directory good enough or we should go
>> for taking snapshot of each node and restoring data from that snapshot?
>>
>> 2. Any precautions we need to take while moving data to new SSD? File
>> system format of two disks etc.
>>
>> 3. Should we drain data before taking backup? We are also restoring
>> commitlog directory from backup.
>>
>> 4. I have added repair to sync full data upfront and a repair after
>> restoring data on each node. Sounds safe and logical?
>>
>> 5. Any problems you see with mentioned approach? Any better approach?
>>
>> Thanks
>> Anuj
>>
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>>


Re: Debugging out of memory

2016-04-25 Thread Alain RODRIGUEZ
Hi,

This one is old, do you still need help there? Sorry we missed it.


   1. What Cassandra version do you use?
   2. What does "nodetool tpstats" show you. Any dropped or pending message?
   3. Is your error a full heap memory issue or native one?
   4. What configurations did you change from default and you think might
   be related to this (memtable size, GC config, bloom filters...)?


the cluster starts flapping between being down and up


Using AWS, even more with small instances make sure to use
phi_convict_threshold set to about 12, just in case. This will prevent node
from flapping that much.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-12 19:53 GMT+02:00 Bo Finnerup Madsen :

> Hi,
>
> We have an application that reads data from a set of external sources and
> loads them into our cassandra cluster. The load goes ok for some time
> (~24h) and then some servers in the cluster starts flapping between being
> down and up, and finally they go out of memory.
> The cluster consists of 5 m4.xlarge machines with 16gb memory, cassandra
> has an 8gb heap. All machines have a high load while data is being written,
> with a load between 6 and 20.
>
> I have tried sifting through the information available from nodetool, but
> I am unable to find anything helping me determine what is causing the oom.
> I am quite new to cassandra, so I might very well overlook the obvious. So
> any pointers on how to proceed with identifying the problem will be much
> appriciated :)
>
> In the following I have included information from 10.61.70.110 when it was
> flapping.
>
> Status for ddp keyspace(only keyspace containing any real data):
>
> -
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UN  10.61.70.108  65.97 GB   256  59,1%
> de79a554-9296-4575-8b79-2089f92069cd  rack1
> UN  10.61.70.110  58.95 GB   256  63,3%
> 310460f6-b7ce-45a7-be63-a7dd409f6b17  rack1
> UN  10.61.70.72   58.17 GB   256  60,3%
> 44fd4f8e-18cd-4487-8174-3a22fb9ed24f  rack1
> UN  10.61.70.107  58.69 GB   256  58,5%
> f8118fc2-e340-45db-a06e-a5842107d6c8  rack1
> UN  10.61.70.64   68 GB  256  58,7%
> 84bee9fe-2adc-48aa-915c-f43d972f5a2f  rack1
>
> -
>
>
> Snippet from system.log:
>
> -
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,547 MessagingService.java:980
> - MUTATION messages were dropped in last 5000 ms: 5776 for internal timeout
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,547 StatusLogger.java:52 -
> Pool NameActive   Pending  Completed   Blocked  All
> Time Blocked
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,551 StatusLogger.java:56 -
> MutationStage32   488170517826061870 0
>   0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 -
> ViewMutationStage 0 0  0 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 -
> ReadStage 0 03266887 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 -
> RequestResponseStage  0 0  389429305 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 -
> ReadRepairStage   0 0 322804 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 -
> CounterMutationStage  0 0  0 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 -
> MiscStage 0 0  0 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 -
> CompactionExecutor454  31305 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 -
> MemtableReclaimMemory 0 0   3310 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 -
> PendingRangeCalculator0 0 10 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 -
> GossipStage   0 0 338170 0
> 0
> INFO  [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 -
>