Re: Disabling auto snapshots

2015-05-21 Thread Mark Reddy
To disable auto snapshots, set the property auto_snapshot: false in your
cassandra.yaml file.

Mark

On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote:

 Is there a config setting where automatic snapshots can be disabled? I
 have a use case where a table is truncated quite often, and would like to
 not have snapshots. I can't find anything on google.

 Thanks.



Re: Disabling auto snapshots

2015-05-21 Thread Ali Akhtar
Thanks!

On Thu, May 21, 2015 at 12:34 PM, Mark Reddy mark.l.re...@gmail.com wrote:

 To disable auto snapshots, set the property auto_snapshot: false in your
 cassandra.yaml file.

 Mark

 On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote:

 Is there a config setting where automatic snapshots can be disabled? I
 have a use case where a table is truncated quite often, and would like to
 not have snapshots. I can't find anything on google.

 Thanks.





Re: amd64 binaries gone from Debian apt repository (20x series)?

2015-05-21 Thread Andrej Pregl
They seem to be in place now. Thanks!

On Wed, May 20, 2015 at 5:46 PM, Michael Shuler mich...@pbandjelly.org
wrote:

 On 05/20/2015 03:19 AM, Andrej Pregl wrote:

 So i was trying to install Cassandra via the Debian repository at
 http://www.apache.org/dist/cassandra/debian and for some reason the 20x
 series doesn't have the amd64 binaries.
 They have previously been available there so I'm wondering why they are
 gone?


 This should be corrected now - I asked the devs to resync apache-mirror.
 Thanks!

 $ apt-cache policy cassandra
 cassandra:
   Installed: (none)
   Candidate: 2.0.15
   Version table:
  2.0.15 0
 500 http://www.apache.org/dist/cassandra/debian/ 20x/main amd64
 Packages



Disabling auto snapshots

2015-05-21 Thread Ali Akhtar
Is there a config setting where automatic snapshots can be disabled? I have
a use case where a table is truncated quite often, and would like to not
have snapshots. I can't find anything on google.

Thanks.


Re: Multiple cassandra instances per physical node

2015-05-21 Thread Carlos Rolo
Hi,

I also advice against multiple instances on the same hardware. If you have
really big boxes why not virtualize?

Other option is experiment with CCM. Although there are some limitations
with CCM (ex: JNA is disabled)

If you follow up on this I would to hear how it went.
Em 21/05/2015 19:33, Dan Kinder dkin...@turnitin.com escreveu:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing this
 seems to unreasonably go against the commodity hardware intentions of
 Cassandra's design. In general it seems to be recommended against (at least
 as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes
 (each with 5 data disks, 1 commit log disk) and either give each its own
 container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?


-- 


--





Re: Disabling auto snapshots

2015-05-21 Thread Ken Hancock
Is there any method to disable this programmatically on a table-by-table
basis.

I'm running into an issue regarding drop table which I'll post in a
separate thread.

On Thu, May 21, 2015 at 3:34 AM, Mark Reddy mark.l.re...@gmail.com wrote:

 To disable auto snapshots, set the property auto_snapshot: false in your
 cassandra.yaml file.

 Mark

 On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote:

 Is there a config setting where automatic snapshots can be disabled? I
 have a use case where a table is truncated quite often, and would like to
 not have snapshots. I can't find anything on google.

 Thanks.





Re: Does Cassandra support delayed cross-datacenter replication?

2015-05-21 Thread Jonathan Haddad
No.

On Thu, May 21, 2015 at 7:07 AM Eax Melanhovich m...@eax.me wrote:

 Say I would like to have a replica cluster, which state is a state of
 real cluster 12 hours ago. Does Cassandra support such feature?

 --
 Best regards,
 Eax Melanhovich
 http://eax.me/



Re: Multiple cassandra instances per physical node

2015-05-21 Thread Sebastian Estevez
JBOD -- just a bunch of disks, no raid.

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me
wrote:

 Hmmm ... Not familiar with JBOD. Is that just RAID-0?

 Also ... wrt  the container talk, is that a Docker container you're
 talking about?



 On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com
 wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do this.

 But it is less than ideal. We need to be able to do it by changing ports
 in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka
 or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/





Re: Multiple cassandra instances per physical node

2015-05-21 Thread Dan Kinder
@James Rothering yeah I was thinking of container in a broad sense: either
full virtual machines, docker containers, straight LXC, or whatever else
would allow the Cassandra nodes to have their own IPs and bind to default
ports.

@Jonathan Haddad thanks for the blog post. To ensure the same host does not
replicate its own data, would I basically need the nodes on a single host
to be labeled as one rack? (Assuming I use vnodes)

On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 JBOD -- just a bunch of disks, no raid.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me
 wrote:

 Hmmm ... Not familiar with JBOD. Is that just RAID-0?

 Also ... wrt  the container talk, is that a Docker container you're
 talking about?



 On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar 
 khangaon...@gmail.com wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do
 this.

 But it is less than ideal. We need to be able to do it by changing
 ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache
 Kafka or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops
 team argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are 
 the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/






-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
You could use docker but it's not required.  You could use LXC if you
wanted.

Shameless self promo:
http://rustyrazorblade.com/2013/08/advanced-devops-with-vagrant-and-lxc/


On Thu, May 21, 2015 at 1:00 PM James Rothering jrother...@codojo.me
wrote:

 Hmmm ... Not familiar with JBOD. Is that just RAID-0?

 Also ... wrt  the container talk, is that a Docker container you're
 talking about?



 On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com
 wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do this.

 But it is less than ideal. We need to be able to do it by changing ports
 in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka
 or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/





Re: Multiple cassandra instances per physical node

2015-05-21 Thread Horký , Jiří
Hi,
we do operate multiple instances (of possibly different versions) of
Cassandra on rather thick nodes. The only problem we encountered so far was
sharing same physical data disk among multiple instances - it proved to not
be the best idea.Sharing of commitlog disks caused no troubles so far.
Other than that, it works without any problems. We manage the instances by
a set of helper scripts (which change the env variables, so nodetool and
such operates on right instance) and puppet templates.

Jiri Horky

On Thu, May 21, 2015 at 11:06 PM, Dan Kinder dkin...@turnitin.com wrote:

 @James Rothering yeah I was thinking of container in a broad sense: either
 full virtual machines, docker containers, straight LXC, or whatever else
 would allow the Cassandra nodes to have their own IPs and bind to default
 ports.

 @Jonathan Haddad thanks for the blog post. To ensure the same host does
 not replicate its own data, would I basically need the nodes on a single
 host to be labeled as one rack? (Assuming I use vnodes)

 On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 JBOD -- just a bunch of disks, no raid.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me
 wrote:

 Hmmm ... Not familiar with JBOD. Is that just RAID-0?

 Also ... wrt  the container talk, is that a Docker container you're
 talking about?



 On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar 
 khangaon...@gmail.com wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do
 this.

 But it is less than ideal. We need to be able to do it by changing
 ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache
 Kafka or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops
 team argues that due to cooling, power, and other datacenter costs, 
 having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each 
 its
 own container  IP or change the listen ports. Will this work? What are 
 the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/






 --
 Dan Kinder
 Senior Software Engineer
 Turnitin – www.turnitin.com
 dkin...@turnitin.com



Re: Multiple cassandra instances per physical node

2015-05-21 Thread James Rothering
Hmmm ... Not familiar with JBOD. Is that just RAID-0?

Also ... wrt  the container talk, is that a Docker container you're talking
about?



On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com
 wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do this.

 But it is less than ideal. We need to be able to do it by changing ports
 in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka
 or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/




Re: Performance penalty of multiple UPDATEs of non-pk columns

2015-05-21 Thread Sebastian Estevez
Counters differ significantly between 2.0 and 2.1 (
https://issues.apache.org/jira/browse/CASSANDRA-6405 among others). But in
both scenarios, you will pay more for counter reconciles and compactions
vs. regular updates.

The final counter performance fix will come with CASSANDRA-6506.

For details read Aleksey's post -
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, May 21, 2015 at 5:48 AM, Jens Rantil jens.ran...@tink.se wrote:

 Artur,

 That's not entirely true. Writes to Cassandra are first written to a
 memtable (in-memory table) which is periodically flushed to disk. If
 multiple writes are coming in before the flush, then only a single record
 will be written to the disk/sstable. If your have writes that aren't coming
 within the same flush, they will get removed when you are compacting just
 like you say.

 Unfortunately I can't answer this regarding Counters as I haven't worked
 with them.

 Hope this helped at least.

 Cheers,
 Jens

 On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote:

 I've seen some discussions about the topic on the list recently, but I
 would like to get more clear answers.

 Given the table:

 CREATE TABLE t1 (
 f1 text,
 f2 text,
 f3 text,
 PRIMARY KEY(f1, f2)
 );

 and assuming I will execute UPDATE of f3 multiple times (say, 1000) for
 the same key values k1, k2 and different values of 'newval':

 UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;

 How will the performance of selecting the current 'f3' value be affected?:

 SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;

 It looks like all the previous values are preserved until compaction, but
 does executing the SELECT reads all the values (O(n), n - number of
 updates) or only the current one (O(1)) ?


 How the situation looks for Counter types?




 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink



Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
Yep, that would be one way to handle it.

On Thu, May 21, 2015 at 2:07 PM Dan Kinder dkin...@turnitin.com wrote:

 @James Rothering yeah I was thinking of container in a broad sense: either
 full virtual machines, docker containers, straight LXC, or whatever else
 would allow the Cassandra nodes to have their own IPs and bind to default
 ports.

 @Jonathan Haddad thanks for the blog post. To ensure the same host does
 not replicate its own data, would I basically need the nodes on a single
 host to be labeled as one rack? (Assuming I use vnodes)

 On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 JBOD -- just a bunch of disks, no raid.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me
 wrote:

 Hmmm ... Not familiar with JBOD. Is that just RAID-0?

 Also ... wrt  the container talk, is that a Docker container you're
 talking about?



 On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If you run it in a container with dedicated IPs it'll work just fine.
 Just be sure you aren't using the same machine to replicate it's own data.

 On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar 
 khangaon...@gmail.com wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do
 this.

 But it is less than ideal. We need to be able to do it by changing
 ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache
 Kafka or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com
 wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops
 team argues that due to cooling, power, and other datacenter costs, 
 having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each 
 its
 own container  IP or change the listen ports. Will this work? What are 
 the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/






 --
 Dan Kinder
 Senior Software Engineer
 Turnitin – www.turnitin.com
 dkin...@turnitin.com



Re: Performance penalty of multiple UPDATEs of non-pk columns

2015-05-21 Thread Jens Rantil
Artur,

That's not entirely true. Writes to Cassandra are first written to a
memtable (in-memory table) which is periodically flushed to disk. If
multiple writes are coming in before the flush, then only a single record
will be written to the disk/sstable. If your have writes that aren't coming
within the same flush, they will get removed when you are compacting just
like you say.

Unfortunately I can't answer this regarding Counters as I haven't worked
with them.

Hope this helped at least.

Cheers,
Jens

On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote:

 I've seen some discussions about the topic on the list recently, but I
 would like to get more clear answers.

 Given the table:

 CREATE TABLE t1 (
 f1 text,
 f2 text,
 f3 text,
 PRIMARY KEY(f1, f2)
 );

 and assuming I will execute UPDATE of f3 multiple times (say, 1000) for
 the same key values k1, k2 and different values of 'newval':

 UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;

 How will the performance of selecting the current 'f3' value be affected?:

 SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;

 It looks like all the previous values are preserved until compaction, but
 does executing the SELECT reads all the values (O(n), n - number of
 updates) or only the current one (O(1)) ?


 How the situation looks for Counter types?




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


Re: Drop/Create table with same CF Name

2015-05-21 Thread Mark Reddy
Yes, it's a known issue. For more information on the topic see this support
post from DataStax:

https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

Mark

On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote:


 We've been running into the reused key cache issue (CASSANDRA-5202) with
 dropping and recreating the same table in Cassandra 1.2.18 so we've been
 testing with key caches disabled which does not seem to solve the issue.
 In the latest logs it seems that old SSTables metadata gets read after the
 tables have been deleted by the previous drop, eventually causing an
 exception and the Thrift interface shut down.

 At this point is it a known issue that one CANNOT reuse a table name prior
 to Cassandra 2.1 ?



Drop/Create table with same CF Name

2015-05-21 Thread Ken Hancock
We've been running into the reused key cache issue (CASSANDRA-5202) with
dropping and recreating the same table in Cassandra 1.2.18 so we've been
testing with key caches disabled which does not seem to solve the issue.
In the latest logs it seems that old SSTables metadata gets read after the
tables have been deleted by the previous drop, eventually causing an
exception and the Thrift interface shut down.

At this point is it a known issue that one CANNOT reuse a table name prior
to Cassandra 2.1 ?


Does Cassandra support delayed cross-datacenter replication?

2015-05-21 Thread Eax Melanhovich
Say I would like to have a replica cluster, which state is a state of
real cluster 12 hours ago. Does Cassandra support such feature?

-- 
Best regards,
Eax Melanhovich
http://eax.me/


Re: Nodetool on 2.1.5

2015-05-21 Thread Jason Wee
yeah, you can confirm in the log such as the one below.
WARN  [main] 2015-05-22 11:23:25,584 CassandraDaemon.java:81 - JMX is not
enabled to receive remote connections. Please see cassandra-env.sh for more
info.

we are running c* with ipv6, cqlsh works superb but not on local link.
$ nodetool -h fe80::224:1ff:fed7:82ea cfstats system.hints;
nodetool: Failed to connect to 'fe80::224:1ff:fed7:82ea:7199' -
ConnectException: 'Connection refused'.



On Fri, May 22, 2015 at 12:39 AM, Yuki Morishita mor.y...@gmail.com wrote:

 For security reason, Cassandra changes JMX to listen localhost only
 since version 2.0.14/2.1.4.
 From NEWS.txt:

 The default JMX config now listens to localhost only. You must enable
 the other JMX flags in cassandra-env.sh manually. 

 On Thu, May 21, 2015 at 11:05 AM, Walsh, Stephen
 stephen.wa...@aspect.com wrote:
  Just wondering if anyone else is seeing this issue on the nodetool after
  installing 2.1.5
 
 
 
 
 
  This works
 
  nodetool -h 127.0.0.1 cfstats keyspace.table
 
 
 
  This works
 
  nodetool -h localhost cfstats keyspace.table
 
 
 
  This works
 
  nodetool cfstats keyspace.table
 
 
 
  This doesn’t work
 
  nodetool -h 192.168.1.10 cfstats keyspace.table
 
  nodetool: Failed to connect to ‘192.168.1.10:7199' - ConnectException:
  'Connection refused'.
 
 
 
  Where 192.168.1.10 is the machine IP,
 
  All firewalls are disabled and it worked fine on version 2.0.13
 
 
 
  This has happened on both of our upgraded clusters.
 
  Also no longer able to view the “CF: Total MemTable Size”  “flushes
  pending” in Ops Center 5.1.1, related issue?
 
 
 
  This email (including any attachments) is proprietary to Aspect Software,
  Inc. and may contain information that is confidential. If you have
 received
  this message in error, please do not read, copy or forward this message.
  Please notify the sender immediately, delete it from your system and
 destroy
  any copies. You may not further disclose or distribute this email or its
  attachments.



 --
 Yuki Morishita
  t:yukim (http://twitter.com/yukim)



Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
If you run it in a container with dedicated IPs it'll work just fine.  Just
be sure you aren't using the same machine to replicate it's own data.

On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com
wrote:

 +1.

 I agree we need to be able to run multiple server instances on one
 physical machine. This is especially necessary in development and test
 environments where one is experimenting and needs a cluster, but do not
 have access to multiple physical machines.

 If you google , you  can find a few blogs that talk about how to do this.

 But it is less than ideal. We need to be able to do it by changing ports
 in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka
 or Redis and many other distributed systems)


 regards



 On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing
 this seems to unreasonably go against the commodity hardware intentions
 of Cassandra's design. In general it seems to be recommended against (at
 least as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?




 --
 http://khangaonkar.blogspot.com/



Re: Multiple cassandra instances per physical node

2015-05-21 Thread Manoj Khangaonkar
+1.

I agree we need to be able to run multiple server instances on one physical
machine. This is especially necessary in development and test environments
where one is experimenting and needs a cluster, but do not have access to
multiple physical machines.

If you google , you  can find a few blogs that talk about how to do this.

But it is less than ideal. We need to be able to do it by changing ports in
cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or
Redis and many other distributed systems)


regards



On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote:

 Hi, I'd just like some clarity and advice regarding running multiple
 cassandra instances on a single large machine (big JBOD array, plenty of
 CPU/RAM).

 First, I am aware this was not Cassandra's original design, and doing this
 seems to unreasonably go against the commodity hardware intentions of
 Cassandra's design. In general it seems to be recommended against (at least
 as far as I've heard from @Rob Coli and others).

 However maybe this term commodity is changing... my hardware/ops team
 argues that due to cooling, power, and other datacenter costs, having
 slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
 better price point. Now, I am not a hardware guy, so if this is not
 actually true I'd love to hear why, otherwise I pretty much need to take
 them at their word.

 Now, Cassandra features seemed to have improved such that JBOD works
 fairly well, but especially with memory/GC this seems to be reaching its
 limit. One Cassandra instance can only scale up so much.

 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes
 (each with 5 data disks, 1 commit log disk) and either give each its own
 container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?




-- 
http://khangaonkar.blogspot.com/


Nodetool on 2.1.5

2015-05-21 Thread Walsh, Stephen
Just wondering if anyone else is seeing this issue on the nodetool after 
installing 2.1.5


This works
nodetool -h 127.0.0.1 cfstats keyspace.table

This works
nodetool -h localhost cfstats keyspace.table

This works
nodetool cfstats keyspace.table

This doesn't work
nodetool -h 192.168.1.10 cfstats keyspace.table
nodetool: Failed to connect to '192.168.1.10:7199' - ConnectException: 
'Connection refused'.

Where 192.168.1.10 is the machine IP,
All firewalls are disabled and it worked fine on version 2.0.13

This has happened on both of our upgraded clusters.
Also no longer able to view the CF: Total MemTable Size  flushes pending 
in Ops Center 5.1.1, related issue?

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Drop/Create table with same CF Name

2015-05-21 Thread Ken Hancock
Thanks Mark (though that article doesn't appear publicly accessible for
others).

Truncate would have been the tool of choice, however my understanding is
truncate fails unless all nodes are up and running which makes it a
non-workable choice since we can't determine when failures will occur.

Ken


On Thu, May 21, 2015 at 11:00 AM, Mark Reddy mark.l.re...@gmail.com wrote:

 Yes, it's a known issue. For more information on the topic see this
 support post from DataStax:


 https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

 Mark

 On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote:


 We've been running into the reused key cache issue (CASSANDRA-5202) with
 dropping and recreating the same table in Cassandra 1.2.18 so we've been
 testing with key caches disabled which does not seem to solve the issue.
 In the latest logs it seems that old SSTables metadata gets read after the
 tables have been deleted by the previous drop, eventually causing an
 exception and the Thrift interface shut down.

 At this point is it a known issue that one CANNOT reuse a table name
 prior to Cassandra 2.1 ?





Re: Nodetool on 2.1.5

2015-05-21 Thread Yuki Morishita
For security reason, Cassandra changes JMX to listen localhost only
since version 2.0.14/2.1.4.
From NEWS.txt:

The default JMX config now listens to localhost only. You must enable
the other JMX flags in cassandra-env.sh manually. 

On Thu, May 21, 2015 at 11:05 AM, Walsh, Stephen
stephen.wa...@aspect.com wrote:
 Just wondering if anyone else is seeing this issue on the nodetool after
 installing 2.1.5





 This works

 nodetool -h 127.0.0.1 cfstats keyspace.table



 This works

 nodetool -h localhost cfstats keyspace.table



 This works

 nodetool cfstats keyspace.table



 This doesn’t work

 nodetool -h 192.168.1.10 cfstats keyspace.table

 nodetool: Failed to connect to ‘192.168.1.10:7199' - ConnectException:
 'Connection refused'.



 Where 192.168.1.10 is the machine IP,

 All firewalls are disabled and it worked fine on version 2.0.13



 This has happened on both of our upgraded clusters.

 Also no longer able to view the “CF: Total MemTable Size”  “flushes
 pending” in Ops Center 5.1.1, related issue?



 This email (including any attachments) is proprietary to Aspect Software,
 Inc. and may contain information that is confidential. If you have received
 this message in error, please do not read, copy or forward this message.
 Please notify the sender immediately, delete it from your system and destroy
 any copies. You may not further disclose or distribute this email or its
 attachments.



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Multiple cassandra instances per physical node

2015-05-21 Thread Dan Kinder
Hi, I'd just like some clarity and advice regarding running multiple
cassandra instances on a single large machine (big JBOD array, plenty of
CPU/RAM).

First, I am aware this was not Cassandra's original design, and doing this
seems to unreasonably go against the commodity hardware intentions of
Cassandra's design. In general it seems to be recommended against (at least
as far as I've heard from @Rob Coli and others).

However maybe this term commodity is changing... my hardware/ops team
argues that due to cooling, power, and other datacenter costs, having
slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a
better price point. Now, I am not a hardware guy, so if this is not
actually true I'd love to hear why, otherwise I pretty much need to take
them at their word.

Now, Cassandra features seemed to have improved such that JBOD works fairly
well, but especially with memory/GC this seems to be reaching its limit.
One Cassandra instance can only scale up so much.

So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes
(each with 5 data disks, 1 commit log disk) and either give each its own
container  IP or change the listen ports. Will this work? What are the
risks? Will/should Cassandra support this better in the future?