I have a simple set of tables that can be grouped as follows:
1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage
size
2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's
to manage size
3. Counter values, no deletes, update heavy,
I would like to +1 Jan.
We are using C* 2.0 and have just gone into production directly supporting
the latest revision of www.nytimes.com.
I avoid new features unless I really need them; we are prepared to read
code and make fixes ourselves if necessary, but it has not been.
Best regards,
are you getting from 2.0 if you aren't using the new
features? Why not stick with 1.2.x?
cheers,
Bruce
On Thu, Jan 9, 2014 at 12:37 PM, Laing, Michael
michael.la...@nytimes.com wrote:
I would like to +1 Jan.
We are using C* 2.0 and have just gone into production directly
supporting
It's a feature:
In the stock cassandra.yaml file for 2.03 see:
# When executing a scan, within or across a partition, we need to keep the
# tombstones seen in memory so we can return them to the coordinator, which
# will use them to make sure other replicas also know about the deleted
rows.
2.0.3: system tables have a 1 hour memtable_flush_period which I have
observed to trigger compaction on the 4 hour mark. Going by memory tho...
-ml
On Tue, Dec 10, 2013 at 10:31 AM, Andre Sprenger
andre.spren...@getanet.dewrote:
As far as I know there is nothing hard coded in Cassandra that
You could shard your rows like the following.
You would need over 100 shards, possibly... so testing is in order :)
Michael
-- put this in file and run using 'cqlsh -f file
DROP KEYSPACE robert_test;
CREATE KEYSPACE robert_test WITH replication = {
'class': 'SimpleStrategy',
My experience is that you must upgrade to 2.0.3 ASAP to fix this.
Michael
On Mon, Dec 9, 2013 at 6:39 PM, David Laube d...@stormpath.com wrote:
Hi All,
We are running Cassandra 2.0.2 and have recently stumbled upon an issue
with nodetool repair. Upon running nodetool repair on each of the
We use the python-driver and have contributed some to its development.
I have been careful to not push too fast on features until we need them.
For example, we have just started using prepared statements - working well
BTW.
Next we will employ futures and start to exploit the async nature of new
and thread pooling in
python-driver? For now, i would avoid object mapper cqlengine, just because
of my deadlines.
—
Sent from Mailbox https://www.dropbox.com/mailbox for iPhone
On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com
wrote:
We use the python-driver
That's not a problem we have faced yet.
On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.com wrote:
How do you insert huge amount of data?
—
Sent from Mailbox https://www.dropbox.com/mailbox for iPhone
On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com
This approach is similar to Janne's.
But I used a shard as an example to make more even rows, and just converted
each IP to an int.
-- put this in file and run using 'cqlsh -f file
DROP KEYSPACE jacob_test;
CREATE KEYSPACE jacob_test WITH replication = {
'class': 'SimpleStrategy',
You could try this:
CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary key
(shard, ts));
select user, ts from user_activity where shard in ('00', '01', ...) order
by ts desc;
Grab each user and ts the first time you see that user.
Use as many shards as you think you need
try this:
CREATE COLUMNFAMILY post (
KEY uuid,
author uuid,
blog timeuuid, -- sortable
name text,
data text,
PRIMARY KEY ( KEY, blog )
);
create index on post (author);
SELECT * FROM post
WHERE
blog = 4d6b5fc5-487b-11e3-a6f4-406c8f1838fa
AND blog =
Here are a couple ideas:
1. You can rotate tables and truncate to avoid deleting.
2. You can shard your tables (partition key) to mitigate hotspots.
3. You can use a column key to store rows in timeuuid sequence.
create table recent_updates_00 (shard text, uuid timeuuid, message text,
primary
http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/select_r.html
On Wed, Oct 23, 2013 at 6:50 AM, Alex N lot...@gmail.com wrote:
Thanks!
I can't find it in the documentation...
2013/10/23 Cyril Scetbon cyril.scet...@free.fr
Hi,
Now you can ask for the
Check your security groups to be sure you have appropriate access.
If in a VPC check both IN and OUT; if using ACLs check those.
On Wed, Sep 25, 2013 at 3:41 PM, Skye Book skye.b...@gmail.com wrote:
Hi all,
I have a three node cluster using the EC2 Multi-Region Snitch currently
operating
script above:
select_stmt = select * from time_series where userid = 'XYZ'
This would return me many hundreds of thousands of columns. I need to go
in time-series order using ranges [Pagination queries].
On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael michael.la...@nytimes.com
wrote
Col-Name-2 1001
On Wed, Sep 11, 2013 at 6:13 AM, Laing, Michael
michael.la...@nytimes.comwrote:
Then you can do this. I handle millions of entries this way and it works
well if you are mostly interested in recent activity.
If you need to span all activity then you can use a separate table
A way to do this would be to express the JSON structure as (path, value)
tuples and then use a mapjson, json to store them.
For example, your JSON above can be expressed as shown below where the path
is a list of keys/indices and the value is a scalar.
You could also concatenate the path
You could try this. C* doesn't do it all for you, but it will efficiently
get you the right data.
-ml
-- put this in file and run using 'cqlsh -f file
DROP KEYSPACE latest;
CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};
USE latest;
', u'1000', u'203', u'Col-Name-4')
# (u'XYZ', u'1001', u'201', u'Col-Name-2')
On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael
michael.la...@nytimes.comwrote:
You could try this. C* doesn't do it all for you, but it will efficiently
get you the right data.
-ml
-- put this in file and run using
I have seen something similar.
Of course correlation is not causation...
Like you, doing testing with heavy writes.
I was using a python client to drive the writes using the cql module which
is thrift based.
The correlation I eventually tracked down was that whichever node my python
client(s)
. Please let me know if the second approach is fine.
Regards,
Dawood
On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael
michael.la...@nytimes.comwrote:
I use the technique described in my previous message to handle millions
of messages and their versions.
Actually, I use timeuuid's instead
you could try this. -ml
-- put this in file and run using 'cqlsh -f file
DROP KEYSPACE carl_test;
CREATE KEYSPACE carl_test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};
USE carl_test;
CREATE TABLE carl_table (
app text,
name text,
ts int,
try the following. -ml
-- put this in file and run using 'cqlsh -f file
DROP KEYSPACE latest;
CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};
USE latest;
CREATE TABLE file (
parentid text, -- row_key, same for each version
id
to 9 replicas of each version.
We journal them all and use them for reporting latencies in our processing
pipelines as well as for replay when we need to recover application state.
Regards,
Michael
On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael michael.la...@nytimes.comwrote:
try the following
101 - 126 of 126 matches
Mail list logo