Downside to running multiple nodetool repairs at the same time?

2017-04-20 Thread eugene miretsky
In Cassandra 3.0 the default nodetool repair behaviour is incremental and
parallel.
Is there a downside to triggering repair from multiple nodes at the same
time?

Basically, instead of scheduling a cron job on one node to run repair, I
want to schedule the job on every node (this way, I don't have to worry
about repair if the one node goes down). Alternatively, I could build a
smarter solution for HA repair jobs, but that seems like an overkill.


Re: Why are automatic anti-entropy repairs required when hinted hand-off is enabled?

2017-04-20 Thread eugene miretsky
Thanks Jayesh,

Watched all of those.

Still not sure I fully get the theory behind it

Aside from the 2 failure  cases I mentioned earlier, the only other way
data can become inconsistent  is error when replicating the data in the
background. Does Cassandra have a retry policy for internal replication? Is
there a setting to change it?





On Thu, Apr 6, 2017 at 10:54 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> I had asked a similar/related question - on how to carry out repair, etc
> and got some useful pointers.
>
> I would highly recommend the youtube video or the slideshare link below
> (both are for the same presentation).
>
>
>
> https://www.youtube.com/watch?v=1Sz_K8UID6E
>
>
>
> http://www.slideshare.net/DataStax/real-world-repairs-
> vinay-chella-netflix-cassandra-summit-2016
>
>
>
> https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/
>
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsRepair.html
>
>
>
> https://www.datastax.com/dev/blog/repair-in-cassandra
>
>
>
>
>
>
>
>
>
> *From: *eugene miretsky 
> *Date: *Thursday, April 6, 2017 at 3:35 PM
> *To: *
> *Subject: *Why are automatic anti-entropy repairs required when hinted
> hand-off is enabled?
>
>
>
> Hi,
>
>
>
> As I see it, if hinted handoff is enabled, the only time data can be
> inconsistent is when:
>
>1. A node is down for longer than the max_hint_window
>2. The coordinator node crushes before all the hints have been replayed
>
> Why is it still recommended to perform frequent automatic repairs, as well
> as enable read repair? Can't I just run a repair after one of the nodes is
> down? The only problem I see with this approach is a long repair job
> (instead of small incremental repairs). But other than that, are there any
> other issues/corner-cases?
>
>
>
> Cheers,
>
> Eugene
>


Re: Migrating to LCS : Disk Size recommendation clashes

2017-04-20 Thread Mark Rose
Hi Amit,

The size recommendations are based on balancing CPU and the amount of data
stored on a node. LCS requires less disk space but generally requires much
more CPU to keep up with compaction for the same amount of data, which is
why the size recommendation is smaller. There is nothing wrong with
attaching a larger disk, of course. The sizes are recommendations to start
with when you have nothing else to go by. If your cluster is light on
writes, you may be able to add much larger amounts data than the suggested
sizes and have no problem keeping up with LCS compaction. If your cluster
is heavy on writes, you may find you can only store a small fraction of the
data per node you were able to store with STCS. You will have to benchmark
for your use-case.

The 10 TB number is from a theoretical situation where LCS would result in
reading a maximum of 7 SSTables to return a read -- if LCS compaction can
keep up.

Cheers,
Mark

On Thu, Apr 13, 2017 at 8:23 AM, Amit Singh F 
wrote:

> Hi All,
>
>
>
> We are in process of migrating from STCS to LCS and was just doing few
> reads on line . Below is the excerpt from Datastax recommendation on data
> size  :
>
>
>
> Doc link : https://docs.datastax.com/en/landing_page/doc/landing_page/
> planning/planningHardware.html
>
>
>
>
>
> Also there is one more recommendation where it hints down to disk size can
> be limited to 10 TB (worst case) . Below is also excerpt also :
>
>
>
> Doc link : http://www.datastax.com/dev/blog/leveled-compaction-in-
> apache-cassandra
>
>
>
>
>
> So are there any restrictions/scenarios due to which 600GB is the
> preferred one in LCS.
>
>
>
> Thanks & Regards
>
> Amit Singh
>
>
>


Re: Multiple nodes decommission

2017-04-20 Thread Vlad
>There's a system property (actually 2)Which ones?
 

On Wednesday, April 19, 2017 9:17 AM, Jeff Jirsa  wrote:
 

 

On 2017-04-12 11:30 (-0700), Vlad  wrote: 
> Interesting, there is no such explicit warning for v.3 
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html
> It says  
>    - Start the bootstrap node.
>    - verify that the node is fully bootstrapped and all other nodes are up 
>(UN)
> 
> Does it mean that we should start them one by one? May somebody from 
> developers can clarify this issue? 

You should treat range movements (bootstrap/decom/etc) in 3.0 the same way you 
treated 2.0/2.1/2.2 - there's nothing special (as far as I know) to make it any 
more safe than 2.x was.

The warnings and restrictions are because simultaneous range movements PROBABLY 
violate your assumed consistency guarantees if you're using vnodes. If you're 
using single token, this can be avoided. 

If you really know what you're doing, you can tell cassandra to let you do 
simultaneous range movements anyway. There's a system property (actually 2) 
that will let you tell cassandra you know the tradeoffs, and then you can 
bootstrap/decom/etc more than one node at a time. Generally, it's one of those 
things where if you have to ask about it, you probably should just stick to the 
default one-at-a-time guidelines (which isn't meant to sound condescending, but 
it's an area where you can definitely violate consistency and maybe even lose 
data if you're not sure).

- Jeff


   

Re: Drop tables takes too long

2017-04-20 Thread Bohdan Tantsiura
Thanks Carlos,

In each keyspace we also have 11 MVs.

It is impossible to reduce number of tables now. Long GC Pauses take about
one minute. But why it takes so much time and how that can be fixed?

Each node in cluster has 128GB RAM, so resources are not constrained now

Thanks

2017-04-20 13:18 GMT+03:00 Carlos Rolo :

> You have 4800 Tables in total? That is a lot of tables, plus MVs? or MVs
> are already considered in the 60*80 account?
>
> I would recommend to reduce the table number. Other thing is that you need
> to check your log file for GC Pauses, and how long those pauses take.
>
> You also might need to increase the node count if you're resource
> constrained.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
> www.pythian.com
>
> On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura 
> wrote:
>
>> Hi,
>>
>> We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
>> MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
>> with about 80 tables in each keyspace. We had to delete three tables and
>> two materialized views from each keyspace. It began to take more and more
>> time for each next keyspace (for some keyspaces it took about 30 minutes)
>> and then failed with "Cannot achieve consistency level ALL". After
>> restarting the same repeated. It seems that cassandra hangs on GC. How that
>> can be solved?
>>
>> Thanks
>>
>
>
> --
>
>
>
>


Re: [Cassandra 3.0.9] In Memory table

2017-04-20 Thread Jacob Shadix
no, in-memory is only available in DataStax Enterprise 4.0+

-- Jacob Shadix

On Thu, Apr 20, 2017 at 3:00 AM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi All,
>
>
>
> As Datastax Cassandra version provide a in memory table. Can we achieve
> same thing in apache Cassandra?
>
>
>
> http://docs.datastax.com/en/archived/datastax_enterprise/
> 4.6/datastax_enterprise/inMemory.html
>
>
>
>
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
> Attend LEAP Edtech , India's largest EdTech
> Summit focused on forging partnerships between different ecosystem players.
> Register with Discount code LPTBS  to avail 50%
> discount on event tickets.
>


Re: Drop tables takes too long

2017-04-20 Thread Carlos Rolo
You have 4800 Tables in total? That is a lot of tables, plus MVs? or MVs
are already considered in the 60*80 account?

I would recommend to reduce the table number. Other thing is that you need
to check your log file for GC Pauses, and how long those pauses take.

You also might need to increase the node count if you're resource
constrained.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura 
wrote:

> Hi,
>
> We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
> MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
> with about 80 tables in each keyspace. We had to delete three tables and
> two materialized views from each keyspace. It began to take more and more
> time for each next keyspace (for some keyspaces it took about 30 minutes)
> and then failed with "Cannot achieve consistency level ALL". After
> restarting the same repeated. It seems that cassandra hangs on GC. How that
> can be solved?
>
> Thanks
>

-- 


--





Drop tables takes too long

2017-04-20 Thread Bohdan Tantsiura
Hi,

We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
with about 80 tables in each keyspace. We had to delete three tables and
two materialized views from each keyspace. It began to take more and more
time for each next keyspace (for some keyspaces it took about 30 minutes)
and then failed with "Cannot achieve consistency level ALL". After
restarting the same repeated. It seems that cassandra hangs on GC. How that
can be solved?

Thanks


Re: Streaming errors during bootstrap

2017-04-20 Thread kurt greaves
Did this error persist? What was the expected outcome? Did you drop this CF
and now expect it to no longer exist?

On 12 April 2017 at 01:26, Jai Bheemsen Rao Dhanwada 
wrote:

> Hello,
>
> I am seeing streaming errors while adding new nodes(in the same DC) to the
> cluster.
>
> ERROR [STREAM-IN-/x.x.x.x] 2017-04-11 23:09:29,318 StreamSession.java:512
> - [Stream #a8d56c70-1f0b-11e7-921e-61bb8bdc19bb] Streaming error occurred
> java.io.IOException: CF *465ed8d0-086c-11e6-9744-2900b5a9ab11* was
> dropped during streaming
> at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(
> CompressedStreamReader.java:77) ~[apache-cassandra-2.1.16.jar:2.1.16]
> at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.
> deserialize(IncomingFileMessage.java:48) ~[apache-cassandra-2.1.16.jar:
> 2.1.16]
> at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.
> deserialize(IncomingFileMessage.java:38) ~[apache-cassandra-2.1.16.jar:
> 2.1.16]
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
> ~[apache-cassandra-2.1.16.jar:2.1.16]
> at org.apache.cassandra.streaming.ConnectionHandler$
> IncomingMessageHandler.run(ConnectionHandler.java:276)
> ~[apache-cassandra-2.1.16.jar:2.1.16]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
>
> The CF : 465ed8d0-086c-11e6-9744-2900b5a9ab11 is actually present and all
> the nodes are in sync. I am sure there is  not n/w connectivity issues.
> Not sure why this error is happening.
>
> I tried to run repair/scrub on the CF with metadata :
> 465ed8d0-086c-11e6-9744-2900b5a9ab11 but didn't help.
>
> Any idea what else to look for in this case?
>
> Thanks in advance.
>
>


Re: Query on Data Modelling of a specific usecase

2017-04-20 Thread Naresh Yadav
Hi Jon,

Thanks for your guidance.

In above mentioned table i can have different scale depending on Report.

One report may have 1 rows.
Second report may have half million rows.
Third report may have 1 million rows.
Fourth report may have 10 million rows.

As this is timeseries data that was main reason of modelling in cassandra.
We preferred separate table for each report as there is no usecase of
quering across reports and also Light reports will work faster.
I can plan to reduce no of tables drastically by combining lighter reports
in one table at application level.

If you can suggest optimal table design keeping one table in mind with 10
millions to 1 billion rows scale for the mentioned queries.

Thanks,
Naresh Yadav

On Wed, Apr 19, 2017 at 9:26 PM, Jon Haddad 
wrote:

> How much data do you plan to store in each table?
>
> I’ll be honest, this doesn’t sound like a Cassandra use case at first
> glance.  1 table per report x 1000 is going to be a bad time.  Odds are
> with different queries, you’ll need multiple views, so lets call that a
> handful of tables per report.  Sounds to me like you need CSV (for small
> reports) or Parquet + a file system (for large ones).
>
> Jon
>
>
> On Apr 18, 2017, at 11:34 PM, Naresh Yadav  wrote:
>
> Looking for cassandra expert's recommendation on above usecase, please
> reply.
>
> On Mon, Apr 17, 2017 at 7:37 PM, Naresh Yadav 
> wrote:
>
>> Hi all,
>>
>> This is my existing table configured on apache-cassandra-3.0.9:
>>
>> CREATE TABLE report_id1 (
>>mc_id text,
>>tag_id text,
>>e_date timestamp.
>>value text
>>PRIMARY KEY ((mc_id, tag_id), e_date)
>> }
>>
>> I create table dynamically for each report from application. Need to
>> support upto 1000 reports means 1000 such tables.
>> unique mc_id will be in range of 5 to 100 in a report.
>> For a mc_id there will be unique tag_id in range of 100 to 1 million in a
>> report.
>> For a mc_id, tag_id there will be unique e_date values in range of 10 to
>> 5000.
>>
>> Current queries to answer :
>> 1)SELECT * FROM report_id1 WHERE mc_id='x' AND tag_id IN('a','b','c') AND
>> e_date='16Apr2017 23:59:59';
>> 2)SELECT * FROM report_id1 WHERE mc_id='x' AND tag_id IN('a','b','c') AND
>> e_date >='01Apr2017 00:00:00' AND e_date <='16Apr2017 23:59:59;
>>
>> 3)SELECT * FROM report_id1 WHERE mc_id='x' AND e_date='16Apr2017
>> 23:59:59';
>>Current design this works with ALLOW FILTERING ONLY
>> 4)SELECT * FROM report_id1 WHERE mc_id='x' AND e_date >='01Apr2017
>> 00:00:00' AND e_date <='16Apr2017 23:59:59';
>>Current design this works with ALLOW FILTERING ONLY
>>
>> Looking for better design for this case, keeping in mind dynamic tables
>> usecase and queries listed.
>>
>> Thanks in advance,
>> Naresh
>>
>>
>
>


[Cassandra 3.0.9] In Memory table

2017-04-20 Thread Abhishek Kumar Maheshwari
Hi All,

As Datastax Cassandra version provide a in memory table. Can we achieve same 
thing in apache Cassandra?

http://docs.datastax.com/en/archived/datastax_enterprise/4.6/datastax_enterprise/inMemory.html




Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

Attend LEAP Edtech, India's largest EdTech Summit 
focused on forging partnerships between different ecosystem players. Register 
with Discount code LPTBS  to avail 50% discount on event 
tickets.