Snapshot verification

2017-10-30 Thread Pradeep Chhetri
Hi,

We are taking daily snapshots for backing up our cassandra data and then
use our backups to restore in a different environment. I would like to
verify that the data is consistent and all the data during the time backup
was taken is actually restored.

Currently I just count the number of rows in each table. Was wondering if
there any inbuilt way to accomplish this.

Thank you.
Pradeep


How do I connect to Cassandra on Amazon EC2 via a Java Application

2017-10-30 Thread Lutaya Shafiq Holmes
I have installed Cassandra on EC2 using Bitnami,
I would like to connect to the Cassandra database using a Java
application on AWS

How do I do that,

Thanks in Advance

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Need help with incremental repair

2017-10-30 Thread Blake Eggleston
Ah cool, I didn't realize reaper did that.

On October 30, 2017 at 1:29:26 PM, Paulo Motta (pauloricard...@gmail.com) wrote:

> This is also the case for full repairs, if I'm not mistaken. Assuming I'm not 
> missing something here, that should mean that he shouldn't need to mark 
> sstables as unrepaired? 

That's right, but he mentioned that he is using reaper which uses 
subrange repair if I'm not mistaken, which doesn't do anti-compaction. 
So in that case he should probably mark data as unrepaired when no 
longer using incremental repair. 

2017-10-31 3:52 GMT+11:00 Blake Eggleston : 
>> Once you run incremental repair, your data is permanently marked as 
>> repaired 
> 
> This is also the case for full repairs, if I'm not mistaken. I'll admit I'm 
> not as familiar with the quirks of repair in 2.2, but prior to 
> 4.0/CASSANDRA-9143, any global repair ends with an anticompaction that marks 
> sstables as repaired. Looking at the RepairRunnable class, this does seem to 
> be the case. Assuming I'm not missing something here, that should mean that 
> he shouldn't need to mark sstables as unrepaired? 

- 
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
For additional commands, e-mail: user-h...@cassandra.apache.org 



Re: Need help with incremental repair

2017-10-30 Thread Paulo Motta
> This is also the case for full repairs, if I'm not mistaken. Assuming I'm not 
> missing something here, that should mean that he shouldn't need to mark 
> sstables as unrepaired?

That's right, but he mentioned that he is using reaper which uses
subrange repair if I'm not mistaken, which doesn't do anti-compaction.
So in that case he should probably mark data as unrepaired when no
longer using incremental repair.

2017-10-31 3:52 GMT+11:00 Blake Eggleston :
>> Once you run incremental repair, your data is permanently marked as
>> repaired
>
> This is also the case for full repairs, if I'm not mistaken. I'll admit I'm
> not as familiar with the quirks of repair in 2.2, but prior to
> 4.0/CASSANDRA-9143, any global repair ends with an anticompaction that marks
> sstables as repaired. Looking at the RepairRunnable class, this does seem to
> be the case. Assuming I'm not missing something here, that should mean that
> he shouldn't need to mark sstables as unrepaired?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Data sync between 2 clusters in single DC

2017-10-30 Thread suraj pasuparthy
Yes, should be possible .. u will need to set ur keyspaces accordingly to
create replicas on each cass cluster ..

Thanks
Suraj

On Mon, Oct 30, 2017 at 1:11 PM Rahul Neelakantan  wrote:

> Why wouldnt you set it up as a single cluster that spans 2 DCs?
>
> On Mon, Oct 30, 2017 at 4:09 PM, Vincent Lee  wrote:
>
>> For high availability in a single DC region, I would like to install one
>> Cassandra cluster on one AZ and a second cluster on a different AZ.
>> The data between them needs to be synchronized. Is this possible?
>>
>> Note that this is for a single DC (region).
>> Currently I am using GossipingPropertyFileSnitch.
>>
>> I look forward for your input.
>>
>
> --
Suraj Pasuparthy


Re: Data sync between 2 clusters in single DC

2017-10-30 Thread Rahul Neelakantan
Why wouldnt you set it up as a single cluster that spans 2 DCs?

On Mon, Oct 30, 2017 at 4:09 PM, Vincent Lee  wrote:

> For high availability in a single DC region, I would like to install one
> Cassandra cluster on one AZ and a second cluster on a different AZ.
> The data between them needs to be synchronized. Is this possible?
>
> Note that this is for a single DC (region).
> Currently I am using GossipingPropertyFileSnitch.
>
> I look forward for your input.
>


Data sync between 2 clusters in single DC

2017-10-30 Thread Vincent Lee
For high availability in a single DC region, I would like to install one
Cassandra cluster on one AZ and a second cluster on a different AZ.
The data between them needs to be synchronized. Is this possible?

Note that this is for a single DC (region).
Currently I am using GossipingPropertyFileSnitch.

I look forward for your input.


Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

2017-10-30 Thread Bill Walters
Hi DuyHai,

Thank you for providing your feedback to our question.
Just to elaborate on the 2 factors that you have provided above.

1) Collection cardinality e.g. the number of elements in the collection. A
maximum of 64,000 elements can be stored.

2) the size of each element in the collection. The bigger the element (UDT
in your case), the more memory it will requires on the coordinator side for
decoding / deserialization. Each UDT shouldn't exceed 64 KB size.



Thank You,
Bill Walters.

On Mon, Oct 30, 2017 at 3:52 AM, DuyHai Doan  wrote:

> Hello Bill
>
> First if you don't care about insertion order it's better to use Set
> rather than list. List implementation requires read before write for some
> operations.
>
> Second, the read performance of the collection itself depends on 2 factors
> :
>
> 1) collection cardinality e.g. the number of elements in the collection
>
> 2) the size of each element in the collection. The bigger the element (UDT
> in your case), the more memory it will requires on the coordinator side for
> decoding / deserialization
>
> If you manage to keep both numbers reasonable it should be fine
>
>
>
> Le 30 oct. 2017 07:33, "Bill Walters"  a écrit :
>
> Hi Everyone,
>
>
> We need some help in deciding whether to use User Defined Type(UDT) nested
> in LIST collection columns in our table.
> In a couple of months, we are planning to roll out a new solution that
> will incorporate a Read heavy use case.
> We have one big table which will hold around 250 million records with 2
> LIST type columns holding UDT elements.(UDT nested in LIST)
>
> Below is our cluster setup that we are planning.
>
> *Cassandra version:* DSE 5.0.7
> *No of Data centers:* 2 (AWS East and AWS West regions)
> *No of Nodes:* 12 nodes (6 nodes in AWS East and 6 nodes in AWS West)
> *Replication Factor:* 3 in each data center.
> *Read Consistency Level:* Local_Quorum
> *Table Compaction Strategy:* LevelTieredCompactionStrategy
> *Use Case:* Read Heavy
>
> Table Schema:
>
> CREATE TYPE account (
> acct_system_id text,
> acct_id text,
> acct_sec_cust_id text,
> attributes frozen>);
>
> CREATE TYPE login (
> login_source_id text,
> login_id text,
> attributes frozen>);
>
>
> CREATE TABLE consumers_id (
> unique_consumer_id text PRIMARY KEY,
> *accounts list*,
> details map,
> dob text,
> background text,
> *logins list*,
> p_id text);
>
>
> Currently, we are running performance tests, but not entirely confident
> whether reads would yield good performance. Since UDTs are frozen and
> stored as BLOBs will there be any impediment while converting them after
> read by coordinator.
>
> If anyone has implemented a similar use-case, please let us know your
> suggestions.
>
> Thank You,
> Bill Walters.
>
>
>
>
>
>
>


Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-10-30 Thread Lucas Benevides
Kurt,

I apreciate your answer but I don't believe CompletedTasks count the
"validation compactions". These are compactions that occur from repair
operations. I am running tests on 10 cluster nodes in the same physical
rack, with Cassandra Stress Tool and I didn't make any Repair commands. The
tables only last for seven hours, so it is not reasonable that tens of
thousands of these validation compactions occur per node.

I tried to see the code and the CompletedTasks counter seems to be
populated by a method from the class
java.util.concurrent.ThreadPoolExecutor.
So I really don't know what it is but surely is not the amount of
Compaction Completed Tasks.

Thank you
Lucas Benevides

   -


2017-10-30 8:05 GMT-02:00 kurt greaves :

> I believe (may be wrong) that CompletedTasks counts Validation compactions
> while TotalCompactionsCompleted does not. Considering a lot of validation
> compactions can be created every repair it might explain the difference.
> I'm not sure why they are named that way or work the way they do. There
> appears to be no documentation around this in the code (what a surprise)
> and looks like it was last touched in CASSANDRA-4009
> , which also has no
> useful info.
>
> On 27 October 2017 at 13:48, Lucas Benevides 
> wrote:
>
>> Dear community,
>>
>> I am studying the behaviour of the Cassandra
>> TimeWindowCompactionStragegy. To do so I am watching some metrics. Two of
>> these metrics are important: Compaction.CompletedTasks, a gauge, and the
>> TotalCompactionsCompleted, a Meter.
>>
>> According to the documentation (http://cassandra.apache.org/d
>> oc/latest/operating/metrics.html#table-metrics):
>> Completed Taks = Number of completed compactions since server [re]start.
>> TotalCompactionsCompleted = Throughput of completed compactions since
>> server [re]start.
>>
>> As I realized, the TotalCompactionsCompleted, in the Meter object, has a
>> counter, which I supposed would be numerically close to the CompletedTasks
>> gauge. But they are very different, with the Completed Tasks being much
>> higher than the TotalCompactions Completed.
>>
>> According to the code, in github (class metrics.CompactionMetrics.java):
>> Completed Taks - Number of completed compactions since server [re]start
>> TotalCompactionsCompleted - Total number of compactions since server
>> [re]start
>>
>> Can you help me and explain the difference between these two metrics, as
>> they seem to have very distinct values, with the Completed Tasks being
>> around 1000 times the value of the counter in TotalCompactionsCompleted.
>>
>> Thanks in Advance,
>> Lucas Benevides
>>
>>
>


Anticompaction

2017-10-30 Thread Vlad
Hi,
I run repair, then I see that anticompaction started on all nodes.Does it mean 
that all data is already repaired. Actually I increased RF, so can I already 
use database?
Thanks.



Re: Need help with incremental repair

2017-10-30 Thread Blake Eggleston
> Once you run incremental repair, your data is permanently marked as repaired

This is also the case for full repairs, if I'm not mistaken. I'll admit I'm not 
as familiar with the quirks of repair in 2.2, but prior to 4.0/CASSANDRA-9143, 
any global repair ends with an anticompaction that marks sstables as repaired. 
Looking at the RepairRunnable class, this does seem to be the case. Assuming 
I'm not missing something here, that should mean that he shouldn't need to mark 
sstables as unrepaired?


Re: Need help with incremental repair

2017-10-30 Thread kurt greaves
Yes mark them as unrepaired first. You can get sstablerepairedset from
source if you need (probably make sure you get the correct branch/tag).
It's just a shell script so as long as you have C* installed in a
default/canonical location it should work.
https://github.com/apache/cassandra/blob/trunk/tools/bin/sstablerepairedset​