Re: Pattern to store maps of maps...

2014-06-14 Thread DuyHai Doan
Every thing is possible with Thrift, provided that you manage every  thing
manually client side. Having coded an implementation of Achilles (object
mapper) over Hector, it was really painfull to manage collections in Thrift.

 Now to stick to the topic, if you want to nest collections into
collections, it'll be possible in C* 2.1 with user defined types:
http://www.datastax.com/dev/blog/cql-in-2-1


On Sat, Jun 14, 2014 at 7:01 AM, Johan Edstrom seij...@gmail.com wrote:

 Well to throw fire on the debate, that was actually really simple in
 Thrift.

 On Jun 13, 2014, at 10:50 PM, Kevin Burton bur...@spinn3r.com wrote:

  I could see just saying screw it and storing a serialized json object
 that gets read back in automatically as a map.  That wouldn't be too
 painful but just not super pretty in terms of representing the data in
 cassandra.
 
 
  On Fri, Jun 13, 2014 at 8:45 PM, Jack Krupansky j...@basetechnology.com
 wrote:
  The first question is how you need to access this data. Do you need to
 directly access “bar” from a SELECT? Do you need to access “foo” as... what
 – Java Map, or what?
 
  That said, you can always flatten a map of maps by simply concatenating
 the keys, such as {“foo_bar”: “hello”} and then you can select ‘foo_bar’.
 Ditto for additional levels. And if you want each of the intermediate
 levels, pick a serialization format such as JSON or BSON in addition to the
 flattened leaf values. Anything in your use case(s) that doesn’t cover?
 
  -- Jack Krupansky
 
  From: Kevin Burton
  Sent: Friday, June 13, 2014 8:17 PM
  To: user@cassandra.apache.org
  Subject: Pattern to store maps of maps...
 
  So the cassandra map support in CQL is nice but it's got me wanting
 deeper nesting.
 
  For example { foo: { bar: hello } }
 
  … but that's not possible with CQL.
 
  Of course… one solution is something like avro, and then store your
 entire record as a blob.
 
  I guess that's not TOO bad but that means all my data is somewhat opaque
 to cqlsh.
 
  What are my options here?  What are you guys doing to work around this
 problem?
 
  --
 
  Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  Skype: burtonator
  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.
 
 
 
  --
 
  Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  Skype: burtonator
  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.
 




incremental backups

2014-06-14 Thread S C
Is it ok to delete files from backups directory (hardlinks) once I have it 
copied over remotely? Any caution to take?

Thanks,Kumar  

RE: Backup Cassandra to

2014-06-14 Thread S C
Despite storing a replica in the backup node, what is the guarantee that the 
backup node has all the data? Unless you make consistency a priority over 
availability of your cluster. 
I could think of another approach. 
You can design your cluster with a topology such that your work load is split 
into to virtual datacenters and you keep replicas in both datacenters. During 
read you can do LOCAL_QUORUM and during write you can do EACH_QUORUM. With this 
set up you can call the second virtual datacenter as a backup. Sorry there are 
no tapes involved here.
Thanks,KumarFrom: maria.cama...@nsn.com
To: user@cassandra.apache.org
Subject: RE: Backup Cassandra to
Date: Fri, 13 Jun 2014 10:04:49 +









Thanks a lot for your responses.
 
Maria.
 

From: ext Jabbar Azam [mailto:aja...@gmail.com]


Sent: Thursday, June 12, 2014 10:09 PM

To: user@cassandra.apache.org

Cc: Jack Krupansky

Subject: Re: Backup Cassandra to

 

Yes, I never thought of that.






Thanks



Jabbar Azam


 

On 12 June 2014 19:45, Jeremy Jongsma jer...@barchart.com wrote:

That will not necessarily scale, and I wouldn't recommend it - your backup 
node will need as much disk space as an entire replica of the cluster data. 
For a cluster with a couple of nodes that may be OK, for dozens of nodes, 
probably
 not. You also lose the ability to restore individual nodes - the only way to 
replace a dead node is with a full repair.




 

On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam aja...@gmail.com wrote:

There is another way. You create a cassandra node in it's own datacentre, then 
any changes going to the main cluster will be replicated to this node. You can 
backup from this node. In the event of a disaster the data from both clusters
 and wiped and then replayed to the individual node. The data will then be 
replicated to the main cluster.

 


This will also work for the case when the main cluster increases or decreases 
in size.







Thanks



Jabbar Azam


 

On 12 June 2014 18:27, Andrew redmu...@gmail.com wrote:


There isn’t a lot of “actual documentation” on the act of backing up, but I did 
research for my own company into the act of backing up and unfortunately, 
you’re not going
 to have a similar setup as Oracle.  There are reasons for this, however.


 


If you have more than one replica of the data, that means each node in the 
cluster will likely be holding it’s own unique set of data.  So you would need 
to back up the
 ENTIRE set of nodes in order to get an accurate snapshot.  Likewise, you would 
need to restore it to the cluster of the same size in order to restore it (and 
then run refresh to tell Cassandra to reload the tables from disk).


 


Copying the snapshots is easy—it’s just a bunch of files in your data 
directory.  It’s even smaller if you use incremental snapshots.  I’ll admit, 
I’m no expert on tape
 drives, but I’d imagine it’s as easy as copy/pasting the snapshots to the 
drive (or whatever the equivalent tape drive operation is).


 


What you (and I, admittedly) would really like to see is a way to back up all 
the logical *data*, and then simply replay it.  This is possible on Oracle 
because it’s typically
 restricted to either one (plus maybe one or two standbys) that don’t “share” 
any data.  What you could do, in theory, is literally select all the data in 
the entire cluster and simply dump it to a file—but this could take hours, 
days, or even weeks to complete,
 depending on the size of your data, and then simply re-load it.  This is 
probably not a great solution, but hey—maybe it will work for you.


 


Netflix (thankfully) has posted a lot of their operational observations and 
what not, including their utility Priam.  In their documentation, they include 
some overviews
 of what they use: https://github.com/Netflix/Priam/wiki/Backups


 


Hope this helps!



 

Andrew



 
On June 12, 2014 at 6:18:57 AM, Jack Krupansky (j...@basetechnology.com) wrote:







The doc for backing up – and restoring – Cassandra is here:


http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html


 


That doesn’t tell you how to move the “snapshot” to or from tape, but a 
snapshot is the starting point for backing up Cassandra.


 


-- Jack Krupansky




 



From:

Camacho, Maria (NSN - FI/Espoo)


Sent: Thursday, June 12, 2014 4:57 AM


To:

user@cassandra.apache.org


Subject: Backup Cassandra to




 







Hi there,

 

I'm trying to find information/instructions about backing up and restoring a 
Cassandra DB to and from a tape unit.

 

I was hopping someone in this forum could help me with this since I could not 
find anything useful in Google :(

 

Thanks in advance,

Maria
 













 


 




 

  

Re: Pattern to store maps of maps...

2014-06-14 Thread Kevin Burton
Wow.. that's the right answer.  Now I'm super excited for C* 2.1 :) ..

yeah.. that would work perfectly.  having custom types would perfectly
solve my problem.

Now the issue is whether I wait for the next version or just push through
this version…


On Sat, Jun 14, 2014 at 2:24 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Every thing is possible with Thrift, provided that you manage every  thing
 manually client side. Having coded an implementation of Achilles (object
 mapper) over Hector, it was really painfull to manage collections in Thrift.

  Now to stick to the topic, if you want to nest collections into
 collections, it'll be possible in C* 2.1 with user defined types:
 http://www.datastax.com/dev/blog/cql-in-2-1


 On Sat, Jun 14, 2014 at 7:01 AM, Johan Edstrom seij...@gmail.com wrote:

 Well to throw fire on the debate, that was actually really simple in
 Thrift.

 On Jun 13, 2014, at 10:50 PM, Kevin Burton bur...@spinn3r.com wrote:

  I could see just saying screw it and storing a serialized json object
 that gets read back in automatically as a map.  That wouldn't be too
 painful but just not super pretty in terms of representing the data in
 cassandra.
 
 
  On Fri, Jun 13, 2014 at 8:45 PM, Jack Krupansky 
 j...@basetechnology.com wrote:
  The first question is how you need to access this data. Do you need to
 directly access “bar” from a SELECT? Do you need to access “foo” as... what
 – Java Map, or what?
 
  That said, you can always flatten a map of maps by simply concatenating
 the keys, such as {“foo_bar”: “hello”} and then you can select ‘foo_bar’.
 Ditto for additional levels. And if you want each of the intermediate
 levels, pick a serialization format such as JSON or BSON in addition to the
 flattened leaf values. Anything in your use case(s) that doesn’t cover?
 
  -- Jack Krupansky
 
  From: Kevin Burton
  Sent: Friday, June 13, 2014 8:17 PM
  To: user@cassandra.apache.org
  Subject: Pattern to store maps of maps...
 
  So the cassandra map support in CQL is nice but it's got me wanting
 deeper nesting.
 
  For example { foo: { bar: hello } }
 
  … but that's not possible with CQL.
 
  Of course… one solution is something like avro, and then store your
 entire record as a blob.
 
  I guess that's not TOO bad but that means all my data is somewhat
 opaque to cqlsh.
 
  What are my options here?  What are you guys doing to work around this
 problem?
 
  --
 
  Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  Skype: burtonator
  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.
 
 
 
  --
 
  Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  Skype: burtonator
  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.
 





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Cassandra 2.0.8 MemoryMeter goes crazy

2014-06-14 Thread horschi
Hi everyone,

this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
All 3 nodes were upgraded. SStables are upgraded.

Unfortunetaly we are now experiencing that Cassandra starts to hang every
10 hours or so.

We can see the MemoryMeter being very active, every time it is hanging.
Both in tpstats and in the system.log:

 INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
(just-counted was 64.0).  calculation took 0ms for 0 cells

This line is logged hundreds of times per second (!) when Cassandra is
down. CPU is a 100% busy.

Interestingly this is only logged for this particular Columnfamily. This CF
is used as a queue, which only contains a few entries (datafiles are about
4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).

Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4863
Space used (total), bytes: 4863
SSTable Compression Ratio: 0.9545454545454546
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 1
Local read count: 0
Local read latency: 0.000 ms
Local write count: 5
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used, bytes: 176
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0


Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4765
Space used (total), bytes: 5777
SSTable Compression Ratio: 0.75
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 12
Local read count: 0
Local read latency: 0.000 ms
Local write count: 1096
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used, bytes: 16
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0


Has anyone ever seen this or has an idea what could be wrong? It seems that
2.0 can handle this column family not as good as 1.2 could.

Any hints on what could be wrong are greatly appreciated :-)

Cheers,
Christian


Re: incremental backups

2014-06-14 Thread Peter Sanford
You should delete the backup files once you have copied them off. Otherwise
they will start to use disk space as the live SSTables diverge from the
snapshots/incrementals.

-psanford


On Sat, Jun 14, 2014 at 10:17 AM, S C as...@outlook.com wrote:

 Is it ok to delete files from backups directory (hardlinks) once I have it
 copied over remotely? Any caution to take?

 Thanks,
 Kumar



RE: incremental backups

2014-06-14 Thread S C
I am thinking of rm file.db once the backup is complete. Any special cases 
to be careful about? 
-Kumar
Date: Sat, 14 Jun 2014 13:13:10 -0700
Subject: Re: incremental backups
From: psanf...@retailnext.net
To: user@cassandra.apache.org

You should delete the backup files once you have copied them off. Otherwise 
they will start to use disk space as the live SSTables diverge from the 
snapshots/incrementals.
-psanford


On Sat, Jun 14, 2014 at 10:17 AM, S C as...@outlook.com wrote:




Is it ok to delete files from backups directory (hardlinks) once I have it 
copied over remotely? Any caution to take?

Thanks,Kumar  

  

CQL IN query with 2i index

2014-06-14 Thread tommaso barbugli
Hi there,
I was wondering if there is a good reason for select queries on secondary
indexes to not support any where operator other than the equality operator,
or if its just a missing feature in CQL.

Thanks,
Tommaso