Re: Pattern to store maps of maps...
Every thing is possible with Thrift, provided that you manage every thing manually client side. Having coded an implementation of Achilles (object mapper) over Hector, it was really painfull to manage collections in Thrift. Now to stick to the topic, if you want to nest collections into collections, it'll be possible in C* 2.1 with user defined types: http://www.datastax.com/dev/blog/cql-in-2-1 On Sat, Jun 14, 2014 at 7:01 AM, Johan Edstrom seij...@gmail.com wrote: Well to throw fire on the debate, that was actually really simple in Thrift. On Jun 13, 2014, at 10:50 PM, Kevin Burton bur...@spinn3r.com wrote: I could see just saying screw it and storing a serialized json object that gets read back in automatically as a map. That wouldn't be too painful but just not super pretty in terms of representing the data in cassandra. On Fri, Jun 13, 2014 at 8:45 PM, Jack Krupansky j...@basetechnology.com wrote: The first question is how you need to access this data. Do you need to directly access “bar” from a SELECT? Do you need to access “foo” as... what – Java Map, or what? That said, you can always flatten a map of maps by simply concatenating the keys, such as {“foo_bar”: “hello”} and then you can select ‘foo_bar’. Ditto for additional levels. And if you want each of the intermediate levels, pick a serialization format such as JSON or BSON in addition to the flattened leaf values. Anything in your use case(s) that doesn’t cover? -- Jack Krupansky From: Kevin Burton Sent: Friday, June 13, 2014 8:17 PM To: user@cassandra.apache.org Subject: Pattern to store maps of maps... So the cassandra map support in CQL is nice but it's got me wanting deeper nesting. For example { foo: { bar: hello } } … but that's not possible with CQL. Of course… one solution is something like avro, and then store your entire record as a blob. I guess that's not TOO bad but that means all my data is somewhat opaque to cqlsh. What are my options here? What are you guys doing to work around this problem? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
incremental backups
Is it ok to delete files from backups directory (hardlinks) once I have it copied over remotely? Any caution to take? Thanks,Kumar
RE: Backup Cassandra to
Despite storing a replica in the backup node, what is the guarantee that the backup node has all the data? Unless you make consistency a priority over availability of your cluster. I could think of another approach. You can design your cluster with a topology such that your work load is split into to virtual datacenters and you keep replicas in both datacenters. During read you can do LOCAL_QUORUM and during write you can do EACH_QUORUM. With this set up you can call the second virtual datacenter as a backup. Sorry there are no tapes involved here. Thanks,KumarFrom: maria.cama...@nsn.com To: user@cassandra.apache.org Subject: RE: Backup Cassandra to Date: Fri, 13 Jun 2014 10:04:49 + Thanks a lot for your responses. Maria. From: ext Jabbar Azam [mailto:aja...@gmail.com] Sent: Thursday, June 12, 2014 10:09 PM To: user@cassandra.apache.org Cc: Jack Krupansky Subject: Re: Backup Cassandra to Yes, I never thought of that. Thanks Jabbar Azam On 12 June 2014 19:45, Jeremy Jongsma jer...@barchart.com wrote: That will not necessarily scale, and I wouldn't recommend it - your backup node will need as much disk space as an entire replica of the cluster data. For a cluster with a couple of nodes that may be OK, for dozens of nodes, probably not. You also lose the ability to restore individual nodes - the only way to replace a dead node is with a full repair. On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam aja...@gmail.com wrote: There is another way. You create a cassandra node in it's own datacentre, then any changes going to the main cluster will be replicated to this node. You can backup from this node. In the event of a disaster the data from both clusters and wiped and then replayed to the individual node. The data will then be replicated to the main cluster. This will also work for the case when the main cluster increases or decreases in size. Thanks Jabbar Azam On 12 June 2014 18:27, Andrew redmu...@gmail.com wrote: There isn’t a lot of “actual documentation” on the act of backing up, but I did research for my own company into the act of backing up and unfortunately, you’re not going to have a similar setup as Oracle. There are reasons for this, however. If you have more than one replica of the data, that means each node in the cluster will likely be holding it’s own unique set of data. So you would need to back up the ENTIRE set of nodes in order to get an accurate snapshot. Likewise, you would need to restore it to the cluster of the same size in order to restore it (and then run refresh to tell Cassandra to reload the tables from disk). Copying the snapshots is easy—it’s just a bunch of files in your data directory. It’s even smaller if you use incremental snapshots. I’ll admit, I’m no expert on tape drives, but I’d imagine it’s as easy as copy/pasting the snapshots to the drive (or whatever the equivalent tape drive operation is). What you (and I, admittedly) would really like to see is a way to back up all the logical *data*, and then simply replay it. This is possible on Oracle because it’s typically restricted to either one (plus maybe one or two standbys) that don’t “share” any data. What you could do, in theory, is literally select all the data in the entire cluster and simply dump it to a file—but this could take hours, days, or even weeks to complete, depending on the size of your data, and then simply re-load it. This is probably not a great solution, but hey—maybe it will work for you. Netflix (thankfully) has posted a lot of their operational observations and what not, including their utility Priam. In their documentation, they include some overviews of what they use: https://github.com/Netflix/Priam/wiki/Backups Hope this helps! Andrew On June 12, 2014 at 6:18:57 AM, Jack Krupansky (j...@basetechnology.com) wrote: The doc for backing up – and restoring – Cassandra is here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html That doesn’t tell you how to move the “snapshot” to or from tape, but a snapshot is the starting point for backing up Cassandra. -- Jack Krupansky From: Camacho, Maria (NSN - FI/Espoo) Sent: Thursday, June 12, 2014 4:57 AM To: user@cassandra.apache.org Subject: Backup Cassandra to Hi there, I'm trying to find information/instructions about backing up and restoring a Cassandra DB to and from a tape unit. I was hopping someone in this forum could help me with this since I could not find anything useful in Google :( Thanks in advance, Maria
Re: Pattern to store maps of maps...
Wow.. that's the right answer. Now I'm super excited for C* 2.1 :) .. yeah.. that would work perfectly. having custom types would perfectly solve my problem. Now the issue is whether I wait for the next version or just push through this version… On Sat, Jun 14, 2014 at 2:24 AM, DuyHai Doan doanduy...@gmail.com wrote: Every thing is possible with Thrift, provided that you manage every thing manually client side. Having coded an implementation of Achilles (object mapper) over Hector, it was really painfull to manage collections in Thrift. Now to stick to the topic, if you want to nest collections into collections, it'll be possible in C* 2.1 with user defined types: http://www.datastax.com/dev/blog/cql-in-2-1 On Sat, Jun 14, 2014 at 7:01 AM, Johan Edstrom seij...@gmail.com wrote: Well to throw fire on the debate, that was actually really simple in Thrift. On Jun 13, 2014, at 10:50 PM, Kevin Burton bur...@spinn3r.com wrote: I could see just saying screw it and storing a serialized json object that gets read back in automatically as a map. That wouldn't be too painful but just not super pretty in terms of representing the data in cassandra. On Fri, Jun 13, 2014 at 8:45 PM, Jack Krupansky j...@basetechnology.com wrote: The first question is how you need to access this data. Do you need to directly access “bar” from a SELECT? Do you need to access “foo” as... what – Java Map, or what? That said, you can always flatten a map of maps by simply concatenating the keys, such as {“foo_bar”: “hello”} and then you can select ‘foo_bar’. Ditto for additional levels. And if you want each of the intermediate levels, pick a serialization format such as JSON or BSON in addition to the flattened leaf values. Anything in your use case(s) that doesn’t cover? -- Jack Krupansky From: Kevin Burton Sent: Friday, June 13, 2014 8:17 PM To: user@cassandra.apache.org Subject: Pattern to store maps of maps... So the cassandra map support in CQL is nice but it's got me wanting deeper nesting. For example { foo: { bar: hello } } … but that's not possible with CQL. Of course… one solution is something like avro, and then store your entire record as a blob. I guess that's not TOO bad but that means all my data is somewhat opaque to cqlsh. What are my options here? What are you guys doing to work around this problem? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Cassandra 2.0.8 MemoryMeter goes crazy
Hi everyone, this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8. All 3 nodes were upgraded. SStables are upgraded. Unfortunetaly we are now experiencing that Cassandra starts to hang every 10 hours or so. We can see the MemoryMeter being very active, every time it is hanging. Both in tpstats and in the system.log: INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481) CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0 (just-counted was 64.0). calculation took 0ms for 0 cells This line is logged hundreds of times per second (!) when Cassandra is down. CPU is a 100% busy. Interestingly this is only logged for this particular Columnfamily. This CF is used as a queue, which only contains a few entries (datafiles are about 4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones). Table: ResponsePortal SSTable count: 1 Space used (live), bytes: 4863 Space used (total), bytes: 4863 SSTable Compression Ratio: 0.9545454545454546 Number of keys (estimate): 128 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 1 Local read count: 0 Local read latency: 0.000 ms Local write count: 5 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used, bytes: 176 Compacted partition minimum bytes: 43 Compacted partition maximum bytes: 50 Compacted partition mean bytes: 50 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Table: ResponsePortal SSTable count: 1 Space used (live), bytes: 4765 Space used (total), bytes: 5777 SSTable Compression Ratio: 0.75 Number of keys (estimate): 128 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 12 Local read count: 0 Local read latency: 0.000 ms Local write count: 1096 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used, bytes: 16 Compacted partition minimum bytes: 43 Compacted partition maximum bytes: 50 Compacted partition mean bytes: 50 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Has anyone ever seen this or has an idea what could be wrong? It seems that 2.0 can handle this column family not as good as 1.2 could. Any hints on what could be wrong are greatly appreciated :-) Cheers, Christian
Re: incremental backups
You should delete the backup files once you have copied them off. Otherwise they will start to use disk space as the live SSTables diverge from the snapshots/incrementals. -psanford On Sat, Jun 14, 2014 at 10:17 AM, S C as...@outlook.com wrote: Is it ok to delete files from backups directory (hardlinks) once I have it copied over remotely? Any caution to take? Thanks, Kumar
RE: incremental backups
I am thinking of rm file.db once the backup is complete. Any special cases to be careful about? -Kumar Date: Sat, 14 Jun 2014 13:13:10 -0700 Subject: Re: incremental backups From: psanf...@retailnext.net To: user@cassandra.apache.org You should delete the backup files once you have copied them off. Otherwise they will start to use disk space as the live SSTables diverge from the snapshots/incrementals. -psanford On Sat, Jun 14, 2014 at 10:17 AM, S C as...@outlook.com wrote: Is it ok to delete files from backups directory (hardlinks) once I have it copied over remotely? Any caution to take? Thanks,Kumar
CQL IN query with 2i index
Hi there, I was wondering if there is a good reason for select queries on secondary indexes to not support any where operator other than the equality operator, or if its just a missing feature in CQL. Thanks, Tommaso