Re: Change STCS to TWCS

2017-12-28 Thread Jeff Jirsa
It’s going to cause a lot of compactions - this is especially true with stcs 
where many of your sstables (especially the big ones) will overlap and be joined

Monitor free space (and stop compactions as needed), free memory (bloom filters 
during compaction will take a big chunk as you build), and of course cpu and IO 
- compaction touches just about everything 

You can test the operation impact by changing it on just one instance using JMX 
- compaction strategy can be set as a json string and it won’t change the 
cluster wide schema (or persist through reboot).


-- 
Jeff Jirsa


> On Dec 28, 2017, at 11:40 PM, "wxn...@zjqunshuo.com"  
> wrote:
> 
> Hi All,
> My production cluster is running 2.2.8. It is used to store time series data 
> with only insertion with TTL, no update and deletion. From the mail lists 
> seems TWCS is more suitable than STCS for my use case. I'm thinking about 
> changing STCS to TWCS in production. I have read the 
> guide(http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html) someone have 
> posted.
> 
> The cluster info:
> UN  XX.XX.44.149   939.23 GB  25625.8%
> 9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
> UN  XX.XX.106.218  995.4 GB   25626.0%
> e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
> UN  XX.XX.42.113   905.85 GB  25623.8%
> 385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
> UN  XX.XX.41.165   859.85 GB  25623.1%
> 46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
> UN  XX.XX.106.210  1.15 TB25626.8%
> a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1
> UN  XX.XX.104.41   900.21 GB  25623.6%
> db08f0d7-d71f-400a-85a6-1f637fa839ee  rack1
> UN  XX.XX.41.95960.89 GB  25626.3%
> cf80924b-885f-42fb-b8f8-f9e1946ec30a  rack1
> UN  XX.XX.103.239  919.14 GB  25624.7%
> c3f883a8-3643-46a1-ac7a-ea1b1046b400  rack1
> 
> I plan to use "alter table" to switch STCS to TWCS in production. My concern 
> is:
> 1. Does the switch have a big impact on cluster performance?
> 2. To ensure a smooth switch, what could I pay attention to?
> 
> Best Regards,
> -Simon


Change STCS to TWCS

2017-12-28 Thread wxn...@zjqunshuo.com
Hi All,
My production cluster is running 2.2.8. It is used to store time series data 
with only insertion with TTL, no update and deletion. From the mail lists seems 
TWCS is more suitable than STCS for my use case. I'm thinking about changing 
STCS to TWCS in production. I have read the 
guide(http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html) someone have 
posted.

The cluster info:
UN XX.XX.44.149 939.23 GB 256 25.8% 9180b7c9-fa0b-4bbe-bf62-64a599c01e58 rack1 
UN XX.XX.106.218 995.4 GB 256 26.0% e24d13e2-96cb-4e8c-9d94-22498ad67c85 rack1 
UN XX.XX.42.113 905.85 GB 256 23.8% 385ad28c-0f3f-415f-9e0a-7fe8bef97e17 rack1 
UN XX.XX.41.165 859.85 GB 256 23.1% 46f37f06-9c45-492d-bd25-6fef7f926e38 rack1 
UN XX.XX.106.210 1.15 TB 256 26.8% a31b6088-0cb2-40b4-ac22-aec718dbd035 rack1 
UN XX.XX.104.41 900.21 GB 256 23.6% db08f0d7-d71f-400a-85a6-1f637fa839ee rack1 
UN XX.XX.41.95 960.89 GB 256 26.3% cf80924b-885f-42fb-b8f8-f9e1946ec30a rack1 
UN XX.XX.103.239 919.14 GB 256 24.7% c3f883a8-3643-46a1-ac7a-ea1b1046b400 rack1

I plan to use "alter table" to switch STCS to TWCS in production. My concern is:
1. Does the switch have a big impact on cluster performance?
2. To ensure a smooth switch, what could I pay attention to?

Best Regards,
-Simon


How to get page id without transmitting data to client

2017-12-28 Thread Eunsu Kim
Hello everybody,

I am using the datastax Java driver (3.3.0).

When query large amounts of data, we set the fetch size (1) and transmit 
the data to the browser on a page-by-page basis.

I am wondering if I can get the page id without receiving the real rows from 
the cassandra to my server.

I only need 100 in front of 100,000. But I want the next page to be 11th.

If you have a good idea, please share it.

Thank you.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: [EXTERNAL] Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-28 Thread Durity, Sean R
DataStax Enterprise (pay to license) has embedded SOLR search with Cassandra if 
you don’t want to move the data to another cluster for indexing/searching. 
Similar to Cassandra modeling, you will need to understand the exact search 
queries in order to build the SOLR schema to support them.

The basic answer, though, is that Cassandra itself is not built for handling 
search queries like the ones you mention.


Sean Durity

From: Bradford Stephens [mailto:bradfordsteph...@gmail.com]
Sent: Friday, December 08, 2017 11:31 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Reg:- Data modelling For E-Commerce Pattern data 
modelling for Search

Hi -- you want to use Elasticsearch with a Cassandra store for the blob data.

On Thu, Dec 7, 2017 at 7:39 PM, @Nandan@ 
mailto:nandanpriyadarshi...@gmail.com>> wrote:
Hi Peoples,

As currently around the world 60-70% websites are excelling with E-commerce in 
which we have to store huge amount of data and select pattern based on Partial 
Search, Text match, Full-Text Search and all.

So below questions comes to mind :
1) Is Cassandra a correct choice for data modeling which gives complex Search 
patterned as  Amazon or eBay is using?
2) If we will use denormalized data modeling then is it will be effective?

Please clarify this.

Thanks and Best regards,
Nandan Priyadarshi



--
Bradford Stephens
roboticprofit.com
Data for Driving Revenue



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: [EXTERNAL] Add nodes change

2017-12-28 Thread Jeff Jirsa



> On Dec 28, 2017, at 11:09 AM, Durity, Sean R  
> wrote:
> 
> --> See inline
> 
> Hello All,
> 
> We are going add 2 new nodes to our production server, there are 2 questions 
> would like to have some advices?
> 
> 1. In current production env, the cassandra version is 3.0.4, is it ok if we 
> use 3.0.15 for the new node?
> 
> --> I would not do this. Streaming between versions usually doesn't work. 
> Either upgrade the existing node before adding new ones OR do the upgrade 
> after adding 3.0.4 nodes.


Especially across 3.0.14, we changed the internode messaging protocol. You’ll 
want to upgrade first and add new nodes second


> 
> 2. Current RF is 1, we would like change to 2, how does this change affect 
> our production data, do we need to do any data migration?
> 
> --> Once you make the RF change, you will need to run repair on all nodes to 
> have the existing data moved to the proper replicas. (New writes will be 
> fine.) Until that is complete, query results may not be "correct." Depending 
> on how critical the data and application are, it might be better to schedule 
> an outage until the repairs are complete.

Or change the application to read at consistency level ALL until the repair is 
done


> 
> Thanks a lot
> Shijie
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> 
> 
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: [EXTERNAL] Add nodes change

2017-12-28 Thread Durity, Sean R
--> See inline

Hello All,

We are going add 2 new nodes to our production server, there are 2 questions 
would like to have some advices?

1. In current production env, the cassandra version is 3.0.4, is it ok if we 
use 3.0.15 for the new node?

--> I would not do this. Streaming between versions usually doesn't work. 
Either upgrade the existing node before adding new ones OR do the upgrade after 
adding 3.0.4 nodes.

2. Current RF is 1, we would like change to 2, how does this change affect our 
production data, do we need to do any data migration?

--> Once you make the RF change, you will need to run repair on all nodes to 
have the existing data moved to the proper replicas. (New writes will be fine.) 
Until that is complete, query results may not be "correct." Depending on how 
critical the data and application are, it might be better to schedule an outage 
until the repairs are complete.

Thanks a lot
Shijie

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: [EXTERNAL] Lots of simultaneous connections?

2017-12-28 Thread Durity, Sean R
Have you determined if a specific query is the one getting timed out? It is 
possible that the query/data model does not scale well, especially if you are 
trying to do something like a full table scan.

It is also possible that your OS settings will limit the number of connections 
to the host. Do you see any timewait connections in netstat? I would agree that 
5,000 connections per host seems on the high side. Each one requires resources, 
like memory, so reducing connections is a good idea.


Sean Durity

-Original Message-
From: Max Campos [mailto:mc_cassan...@core43.com]
Sent: Thursday, December 14, 2017 3:18 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Lots of simultaneous connections?

Hi -

We’re finally putting our new application under load, and we’re starting to get 
this error message from the Python driver when under heavy load:

('Unable to connect to any servers', {‘x.y.z.205': 
OperationTimedOut('errors=None, last_host=None',), ‘x.y.z.204': 
OperationTimedOut('errors=None, last_host=None',), ‘x.y.z.206': 
OperationTimedOut('errors=None, last_host=None',)})' (22.7s)

Our cluster is running 3.0.6, has 3 nodes and we use RF=3, CL=QUORUM 
reads/writes.  We have a few thousand machines which are each making 1-10 
connections to C* at once, but each of these connections only reads/writes a 
few records, waits several minutes, and then writes a few records — so while 
netstat reports ~5K connections per node, they’re generally idle.  Peak 
read/sec today was ~1500 per node, peak writes/sec was ~300 per node.  
Read/write latencies peaked at 2.5ms.

Some questions:
1) Is anyone else out there making this many simultaneous connections?  Any 
idea what a reasonable number of connections is, what is too many, etc?

2) Any thoughts on which JMX metrics I should look at to better understand what 
exactly is exploding?  Is there a “number of active connections” metric?  We 
currently look at:
- client reads/writes per sec
- read/write latency
- compaction tasks
- repair tasks
- disk used by node
- disk used by table
- avg partition size per table

3) Any other advice?

I think I’ll try doing an explicit disconnect during the waiting period of our 
application’s execution; so as to get the C* connection count down.  Hopefully 
that will solve the timeout problem.

Thanks for your help.

- Max
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Bring 2 nodes down

2017-12-28 Thread Durity, Sean R
Decommission the two nodes, one at a time (assumes you have enough disk space 
on the remaining hosts). That will move the data to the remaining nodes and 
keep RF=3. Then fix the host. Then add the hosts back into the cluster, one at 
a time. This is easier with vnodes. Finally, run clean-up on the 6 nodes that 
stayed up to recover disk space.


Sean Durity
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
MTC 2250
#cassandra - for the latest news and updates

From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com]
Sent: Thursday, December 14, 2017 2:00 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Bring 2 nodes down

Hi,
I have a cluster of 8 Nodes, 4 physical machines 2 VMs each physical machine.
RF=3, and we have read/write with QUORUM consistency requirement.

One of the machines needs to be down for an hour or two to fix local disk.
What is the best way to do that with out losing data?

Regards
-- Alaa

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.com.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.