You could just expand the size of your ebs volume and extend the file
system. No data is lost - assuming you are running Linux.
On Monday, October 17, 2016, Seth Edwards wrote:
> We're running 2.0.16. We're migrating to a new data model but we've had an
> unexpected increase in write traffic tha
You could also follow this related issue:
https://issues.apache.org/jira/browse/CASSANDRA-8844
On Wed, May 25, 2016 at 12:04 PM, Aaditya Vadnere wrote:
> Thanks Eric and Mark, we were thinking along similar lines. But we already
> need Cassandra for regular database purpose, so instead of having
; SELECT id, workflow FROM sam WHERE dept='blah';
>
> And in Spark with Python:
> SELECT distinct id, dept, workflow FROM samd WHERE dept='blah';
>
>
> Best,
> Rajesh R
>
>
> --
> *From:* Laing, Michael [michael.la...@n
Try converting that int from decimal to hex and inserting dashes in the
appropriate spots - or go the other way.
Also, you are looking at different rows, based upon your selection
criteria...
ml
On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:
> Hi
You could take a look at, or follow:
https://issues.apache.org/jira/browse/CASSANDRA-8844
On Sun, Apr 24, 2016 at 10:51 AM, Alexander Orr wrote:
> Hi,
>
> I'm wondering if someone could help me, I'd like to use cassandra to store
> data and publish this on dowstream to another database (kdb if a
fyi the list of reserved keywords is at:
https://cassandra.apache.org/doc/cql3/CQL.html#appendixA
ml
On Wed, Mar 30, 2016 at 9:41 AM, Jean Carlo
wrote:
> Yes we did some reads and writes, the problem is that adding double quotes
> force us to modify our code to change and insert like that
>
>
Note that in C* 3.02 the second query is invalid:
cqlsh> Select * from communication.user_contact_list where user_id =
98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order
by contact_name asc;
*InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
"is_favorite" cannot
To add to what Jonathan and Jack have said...
To get high levels of performance with the python driver you should:
- prepare your statements once (recent drivers default to Token Aware -
and will correctly apply it if the statement is prepared).
- execute asynchronously (up to ~150 futur
why don't you just try it?
On Tue, Dec 15, 2015 at 6:30 PM, Will Zhang
wrote:
> Hi all,
>
> I originally raised this on SO, but not really getting any answer there,
> thought I give it a try here.
>
>
> Just thinking about this so please correct my understanding if any of this
> isn't right.
>
>
You don't have any syntax in your application anywhere such as:
UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;
Just a quick idempotency check :)
On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky
wrote:
> Is the data corrupted exactly the same way on all three nodes and in both
> d
t, Nov 21, 2015 at 8:52 AM, Laing, Michael
wrote:
> All these pain we need to take because the column names have special
>>> character like " ' _- ( ) '' ¬ " etc.
>>>
>>
> Hmm. I tried:
>
> cqlsh:test> create table quoted_
>
> All these pain we need to take because the column names have special
>> character like " ' _- ( ) '' ¬ " etc.
>>
>
Hmm. I tried:
cqlsh:test> create table quoted_col_name ( pk int primary key, "'_-()""¬"
int);
cqlsh:test> select * from quoted_col_name;
*pk* | *'_-()"¬*
+-
(0 row
So you are reading the row before writing as you say you have the timestamp.
If you really need CAS for the write *and* the timestamp you read is in the
future (by local reckoning), why not delay that write until the future
arrives and forget about explicitly setting the timestamp?
Backtracking o
http://www.tutorialspoint.com/java/util/uuid_timestamp.htm
On Mon, Nov 16, 2015 at 7:38 AM, Marlon Patrick
wrote:
> Hi Donfeng,
>
> I'm interested in convert a timeuuid already generated in a timestamp,
> similar to dateOf function of the Cassandra, but in Java code. The your
> sugestion is for
Dynamic schema changes are generally a bad idea, especially if they are
rapid.
You should rethink your approach.
On Fri, Nov 13, 2015 at 7:20 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:
>
> Thank you Carlos for looking.
> But when I rand the nodetool describecluster.
> It
Are the clocks synchronized across the cluster - probably, but I thought I
would ask :)
On Wed, Oct 21, 2015 at 3:35 AM, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:
> Hi,
>
> On 20/10/2015 19:48, Carlos Alonso wrote:
> > I think also having the output of cfhistograms could help. I'd
Remember that the system keyspace uses LocalStrategy: each node has its own
set of system tables. -ml
On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:
> Hi Carlos,
>
> I'm using 2.1.6. The mysterious node is not in the peers table. Any other
> ideas?
> One of
What client are you using?
Official java and python clients should not have a LB between them and the
C* nodes AFAIK.
Why aren't you using 2.1.9?
Have you checked for schema agreement amongst all nodes?
ml
On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen
wrote:
> More information,
>
>
>
> I’
Maybe compaction not keeping up - since you are hitting so many sstables?
Read heavy... are you using LCS?
Plenty of resources... tune to increase memtable size?
On Sat, Sep 26, 2015 at 9:19 AM, Eric Stevens wrote:
> Since you have most of your reads hitting 5-8 SSTables, it's probably
> relat
Wiser heads may have to chime in then :)
On Wed, Sep 9, 2015 at 3:07 PM, Eric Plowe wrote:
> So I set speculative_retry to NONE and I encountered the situation about
> 30 minutes ago.
>
>
>
> On Wednesday, September 9, 2015, Laing, Michael
> wrote:
>
>>
.
>
> On Wednesday, September 9, 2015, Laing, Michael
> wrote:
>
>> "alter table test.test_root WITH speculative_retry = '0.0PERCENTILE';"
>>
>> seemed to work for me with C* version 2.1.7
>>
>> On Wed, Sep 9, 2015 at 10:11 AM, Eric Plowe w
report back my findings.
>>
>> Thank you, Michael.
>>
>>
>> On Wednesday, September 9, 2015, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> Perhaps a variation on
>>> https://issues.apache.org/jira/
ptember 9, 2015, Laing, Michael
> wrote:
>
>> What are your read repair settings?
>>
>> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe wrote:
>>
>>> To further expand. We have two data centers, Miami and Dallas. Dallas is
>>> our disaster recovery data c
What are your read repair settings?
On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe wrote:
> To further expand. We have two data centers, Miami and Dallas. Dallas is
> our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and
> 6 in Dallas. The servers in Miami only read/write to Mi
Denormalize your data to support the query, e.g.:
CREATE TABLE name_by_cust_id (cust_id int, name text, PRIMARY KEY
> (cust_id));
> SELECT name WHERE cust_id = 3;
For additional queries, similarly denormalize.
Refer to https://academy.datastax.com/courses for free online courses
covering this t
: Sun, 6 Sep 2015 13:10:14 +0100
>> Subject: Re: Is Cassandra really Strong consistency?
>> From: ibrahimsaba...@gmail.com
>> To: user@cassandra.apache.org
>>
>>
>> Do you mean Cassandra does synchronize the clock across all the cluster,
>> if yes how it doe
I think I saw this before.
Clocks must be synchronized.
On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi
wrote:
> Hi folks,
>
> Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor
> is 3. When write CL =ALL and read CL=ONE:
>
> Client c1 sends W1 = [k1,V1] to N1 (a coordi
2 is more correct.
On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi <
ibrahimsaba...@gmail.com> wrote:
> Dear folks,
>
>
> I have doubt on how Cassandra performs a write request; I have two
> scenarios, please read them and ensure which one is correct?
>
>
> Assume we have cluster consists of
https://academy.datastax.com/courses/ds201-cassandra-core-concepts/internal-architecture-replication
On Fri, Aug 21, 2015 at 11:53 AM, Laing, Michael
wrote:
> 2 is more correct.
>
> On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrot
Possibly you have snapshots? If so, use nodetool to clear them.
On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
analialorenza...@gmail.com> wrote:
> Hello guys,
>
> I have a cassandra cluster 2.1 comprised of 4 nodes.
>
> I removed a lot of data in a Column Family, then I ran manually a
> co
No - it immediately removes the sstables on all nodes.
On Mon, Apr 27, 2015 at 7:53 AM, Ali Akhtar wrote:
> Wouldn't truncating the table create tombstones?
>
> On Mon, Apr 27, 2015 at 11:55 AM, Peer, Oded wrote:
>
>> I recommend truncating the table instead of dropping it since you don’t
>> n
Executor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> /var/lib/cassandra/d
n the situation, for what I
> read we need to start doing this also.
>
>
>
> https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>
>
>
>
>
> *From:* Laing, Michael [mailto:michael.la...@nytimes.com]
> *Sent:* 21 April 2015 16:26
> *To:
If you never delete except by ttl, and always write with the same ttl (or
monotonically increasing), you can set gc_grace_seconds to 0.
That's what we do. There have been discussions on the list over the last
few years re this topic.
ml
On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen
wrote:
>
rtfm - trncate creates snapshots by default, they must be cleared on all
nodes to recover *disk space *as requested by the OP.
On Thu, Apr 9, 2015 at 10:17 AM, Anuj Wadehra
wrote:
> You can try doing it from cassandra cli. Set consistency level to All and
> then truncate.
>
> Anuj Wadehra
>
> Se
Nodetool clearsnapshot
On Thursday, April 9, 2015, Eduardo Cusa
wrote:
> Hi Guys, I truncated a column family that has a size of 31 gb, and the
> disk space was not released
>
> what else do i have to do?
>
> Regards
> Eduardo
>
>
We use Alain's solution as well to make major operational revisions.
We have a "red team" and a "blue team in each AWS region, so we just add
and drop datacenters to get where we want to be.
Pretty simple.
ml
On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ wrote:
> People keep asking me if w
Actually I am in the middle of setting up the same sort of thing for
PostgreSQL using psycopg2 and pyev.
I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend
for swarms of MQTT brokers at something in the 10-100M client range.
ml
On Fri, Mar 27, 2015 at 4:59 PM,
I use callback chaining with the python driver and can confirm that it is
very fast.
You can "chain the chains" together to perform sequential processing. I do
this when retrieving "metadata" and then the referenced "payload" for
example, when the metadata has been inverted and the payload is larg
Perhaps you should learn more about Cassandra before you ask such questions.
It's easy if you just look at the readily accessible docs.
ml
On Sat, Feb 14, 2015 at 6:05 PM, Raj N wrote:
> I don't think thats solves my problem. The question really is why can't we
> use ranges for both time colum
Use token-awareness so you don't have as much coordinator overhead.
ml
On Mon, Feb 9, 2015 at 5:32 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:
> AFAIK, if you were using RF 3 in a 3 node cluster, so all your nodes had
> all your data.
> When the number of nodes sta
Since our workload is spread globally, we spread our nodes across AWS
regions as well: 2 nodes per zone, 6 nodes per region (datacenter) (RF 3),
12 nodes total (except during upgrade migrations). We autodeploy into VPCs.
If a region goes "bad" we can route all traffic to another and bring up a
thir
http://datastax.github.io/python-driver/api/cassandra.html
On Wed, Dec 17, 2014 at 9:27 AM, nitin padalia
wrote:
>
> Thanks! Philip/Ryan,
> Ryan I am using single Datacenter.
> Philip could you point some link where we could see those enums.
> -Nitin
> On Dec 17, 2014 7:14 PM, "Philip Thompson"
On a mac this works (different sed, use an actual newline):
"
nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\
>/'
"
Otherwise the last token will have an 'n' appended which you may not notice.
On Fri, Dec 5, 2014 at 4:34 PM, Robert Coli wrote:
> On Wed, Dec 3, 2
able as the OP suggested.
>
> On Mon Dec 01 2014 at 7:18:51 AM Laing, Michael
> wrote:
>
>> Since the session tokens are random, perhaps computing a shard from each
>> one and using it as the partition key would be a good idea.
>>
>> I would also use uuid v1 to ge
Since the session tokens are random, perhaps computing a shard from each
one and using it as the partition key would be a good idea.
I would also use uuid v1 to get ordering.
With such a small amount of data, only a few shards would be needed.
On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise
wrote:
gt;> - So we see ~ 3000 flush being enqueued.
>>> >> - This happens so suddenly that even boosting the number of flush
>>> writers
>>> >> to 20 does not suffice. I don't even see "all time blocked" numbers
>>> for it
>>> >> before C*
Since no one else has stepped in...
We have run clusters with ridiculously small nodes - I have a production
cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
storage. It works fine but you can see those little puppies struggle...
And I ran into problems such as you observe...
Is there a reason why updating a counter for this information will not work
for you?
On Monday, September 1, 2014, eduardo.cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> yes, is the same table, my mistake.
>
>
> On Mon, Sep 1, 2014 at 6:35 PM, Laing, Michael [via [hid
Is table track_user equivalent to table userpixel?
On Monday, September 1, 2014, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> Hi All. I Have a Cluster in Amazon with the following settings:
>
> * 2 Nodes M3.Large
> * Cassandra 2.0.7
> * Default instaltion on ubuntu
>
> And I have o
ging” rather than the exercise in
> futility of doing a massive number of deletes and updates in place?
>
> -- Jack Krupansky
>
> *From:* Laing, Michael
> *Sent:* Monday, September 1, 2014 9:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Help with select IN quer
y criteria how should I construct my schema? One
> thought has occurred to me is make three tables with each item
> asset_id , event_time, timeuuid as primary keys and depending on type
> of query choose the table to do query upon. That seems like a waste of
> resources (disk, cp
Oh it must be late - I missed the fact that you didn't want to specify
asset_id. The above queries will still work but you have to use 'allow
filtering' - generally not a good idea. I'll look again in the morning.
On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael
wrote:
> this data and reference interesting data points via the timestamp field.
> The timestamp field is my bridge between Sal and nosql world.
>
> Subodh
> On Aug 31, 2014 5:33 PM, "Laing, Michael"
> wrote:
>
>> Are event_time and timestamp essentially repre
Are event_time and timestamp essentially representing the same datetime?
On Sunday, August 31, 2014, Subodh Nijsure wrote:
> I have following database schema
>
> CREATE TABLE sensor_info_table (
> asset_id text,
> event_time timestamp,
> "timestamp" timeuuid,
> sensor_reading map,
> se
x27;m struggling to find documentation on the CQL to physical
> layout that isn't a trivial example, especially are around multiget use
> cases. Do you have any pointers to blogs or tutorials you've found
> helpful?
>
> Thanks,
> Todd
>
>
> On Sunday, August 31, 2014,
Actually I think you do want to use scopeId, scopeType as the partition key
(and drop row caching until you upgrade to 2.1 where "rows" are in fact
rows and not partitions):
CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
(
scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
timestam
I don't think there is an easy "answer" to this...
A possible approach, based upon the implied dimensions of the problem,
would be to maintain a bloom filter over "words" for each user as a
partition key with the user as clustering key. Then a single query would
efficiently yield the list of users
I saw this awhile back:
With requests possibly coming in from either US region, we need to make
> sure that the replication of data happens within an acceptable time
> threshold. This lead us to perform an experiment where we wrote 1 million
> records in one region of a multi-region cluster. We th
You may also want to use tuples for the clustering columns:
The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS:
>
> SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN
> (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
>
>
> from https:
Except then you have to merge results if you want them ordered.
On Fri, Jul 25, 2014 at 2:15 PM, Kevin Burton wrote:
> Ah.. ok. Nice. That should work. Parallel dispatch on the client would
> work too.. using async.
>
>
> On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael >
We use IN (keeping the number down). The coordinator does parallel dispatch
AND applies ORDERED BY to the aggregate results, which we would otherwise
have to do ourselves. Anyway, worth it for us.
ml
On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton wrote:
> Perhaps the best strategy is to have th
The cql you provided is invalid. You probably meant something like:
CREATE TABLE foo (
>
> rowkey text,
>
> family text,
>
> qualifier text,
>
> version int,
>
> value blob,
>
> PRIMARY KEY ((rowkey, family, qualifier), version))
>
> WITH CLUSTERING ORDER BY (version DESC);
And with python use future.has_more_pages and
future.start_fetching_next_page().
On Tue, Jun 24, 2014 at 1:20 AM, DuyHai Doan wrote:
> With the Java Driver, set the fetchSize and use ResultSet.iterator
> Le 24 juin 2014 01:04, "ziju feng" a écrit :
>
> Hi All,
>>
>> I have a wide row table th
However my extensive benchmarking this week of the python driver from
master shows a performance *decrease* when using 'token_aware'.
This is on 12-node, 2-datacenter, RF-3 cluster in AWS.
Also why do the work the coordinator will do for you: send all the queries,
wait for everything to come back
as opposed to how i'm doing it
>> now, in python.
>>
>> On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael
>> wrote:
>> > If you can arrange to index your rows by:
>> >
>> > (, )
>> >
>> > Then you can select ranges as you
If you can arrange to index your rows by:
(, )
Then you can select ranges as you wish.
This works because is the "partition key", arrived at by
hash (really it's a hash key), whereas is the "clustering
key" (really it is a range key) which is kept in sorted order both in
memory and on disk.
I
Just to add 2 more cents... :)
The CQL3 protocol is asynchronous. This can provide a substantial
throughput increase, according to my benchmarking, when one uses
non-blocking techniques.
It is also peer-to-peer. Hence the server can generate events to send to
the client, e.g. schema changes - in
Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.
Of course response size and other factors come
Perhaps if you described both the schema and the query in more detail, we
could help... e.g. did the query have an IN clause with 2 keys? Or is
the key compound? More detail will help.
On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma wrote:
> I didn't explain clearly - I'm not requesting 200
;
> Thanks a lot!
>
> Best regards,
> Marcelo.
>
>
> 2014-06-04 22:28 GMT-03:00 Laing, Michael :
>
> BTW you might want to put a LIMIT clause on your SELECT for testing. -ml
>>
>>
>> On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael > > wrote:
>>
select * from test_paging where *token(*id*)* > token(0);
ml
On Fri, Jun 6, 2014 at 1:47 AM, Jonathan Haddad wrote:
> Sorry, the datastax docs are actually a bit better:
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html
>
> Jon
>
>
> On Thu, Jun 5, 2014 at 10:46 PM, J
BTW you might want to put a LIMIT clause on your SELECT for testing. -ml
On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael
wrote:
> Marcelo,
>
> Here is a link to the preview of the python fast copy program:
>
> https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47
>
>
lbacks going at once so it is fun
to watch.
On my regional cluster of small nodes in AWS I got about 3000 rows per
second transferred after things warmed up a bit - each row about 6kb.
ml
On Wed, Jun 4, 2014 at 11:49 AM, Laing, Michael
wrote:
> OK Marcelo, I'll work on it today. -ml
&g
osts. Both servers have SDD and 64 Gb RAM, I
> could use the script as a benchmark for you if you want. Besides, we have
> some bigger clusters, I could run on the just to test the speed if this is
> going to help.
>
> Regards
> Marcelo.
>
>
> 2014-06-03 11:40 GMT-03
I would first check to see if there was a time synchronization issue among
nodes that triggered and/or perpetuated the event.
ml
On Wed, Jun 4, 2014 at 3:12 AM, Arup Chakrabarti wrote:
> Hello. We had some major latency problems yesterday with our 5 node
> cassandra cluster. Wanted to get some
Hi Marcelo,
I could create a fast copy program by repurposing some python apps that I
am using for benchmarking the python driver - do you still need this?
With high levels of concurrency and multiple subprocess workers, based on
my current actual benchmarks, I think I can get well over 1,000 row
Upgrade to 2.0.7 fixed this for me.
You can also try 'nodetool resetlocalschema' on disagreeing nodes. This
worked temporarily for me in 2.0.6.
ml
On Mon, May 12, 2014 at 3:31 PM, Gaurav Sehgal wrote:
> We have recently started seeing a lot of Schema Disagreement errors. We
> are using Cassan
Your understanding is incorrect - the easiest way to see that is to try it.
On Tue, Apr 22, 2014 at 12:00 PM, Sebastian Schmidt wrote:
> From my understanding, this would delete all entries with the given s.
> Meaning, if I have inserted (sa, p1, o1, c1) and (sa, p2, o2, c2),
> executing this:
>
Referring to the original post, I think the confusion is what is a "row" in
this context:
So as far as I understand, the s column is now the *row *key
...
Since I have multiple different p, o, c combinations per s, deleting the whole
> *row* identified by s is no option
The s column is in fact
I've never noticed that that setting tombstone_threshold has any effect...
at least in 2.0.6.
What gets written to the log?
On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan wrote:
> I was wondering, to remove the tombstones from Sstables created by LCS,
> why don't we just set the tombstone_thresh
At the cost of really quite a lot of compaction, you can temporarily switch
to SizeTiered, and when that is completely done (check each node), switch
back to Leveled.
it's like doing the laundry twice :)
I've done this on CFs that were about 5GB but I don't see why it wouldn't
work on larger ones
I have played with this quite a bit and recommend you set gc_grace_seconds
to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.
A caveat I have is that we use C* 2.0.6 - but the space we expect to
recover is in fact recovered.
Actually, since we never delete explicitly (just ttl) we
e run routinely as part of regular cluster maintenance
> operations.
>
>
>
> If RF=2, ReadConsistency is ONE and data failed to get replicated to the
> second node, then during a read might the app incorrectly return “missing
> data”?
>
>
>
> It seems to me that the
Since you are using LeveledCompactionStrategy there is no major/minor
compaction - just compaction.
Leveled compaction does more work - your logs don't look unreasonable to me
- the real question is whether your nodes can keep up w the IO. SSDs work
best.
BTW if you never delete and only ttl your
3, it's easy to pull just the new tables
> out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
> incur the overhead of routinely backing up the entire dataset. For a non
> trivial database, this matters quite a bit.
>
>
> On Fri, Mar 28, 2014 at 1:21 PM, La
As I tried to say, EBS snapshots require much care or you get corruption
such as you have encountered.
Does Cassandra quiesce the file system after a snapshot using fsfreeze or
xfs_freeze? Somehow I doubt it...
On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
> I have a nagging memory o
In your step 4, be sure you create a consistent EBS snapshot. You may have
pieces of your sstables that have not actually been flushed all the way to
EBS.
See https://github.com/alestic/ec2-consistent-snapshot
ml
On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie wrote:
> Thank you for your quick r
guys?
>> I have already tried reducing the number of rpc threads. Also tried
>> reducing the linux kernel overcommit.
>>
>>
>> On Sat, Mar 22, 2014 at 5:44 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> I ran into the same p
I ran into the same problem some time ago.
Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it.
I think the jdk change was the key for my similarly small memory cluster.
ml
On Sat, Mar 22, 2014 at 1:36 PM, prem yadav wrote:
> Michael, no memory constraints. System memory is 4
Of course what you really want is this:
create table x(
id text,
timestamp timeuuid,
flag boolean,
// other fields
primary key (flag, id, timestamp)
)
Whoops now there are only 2 partition keys! Not good if you have any
reasonable number of rows...
Faced with a situation like this (alt
Your second query is invalid:
*Bad Request: Partition KEY part key cannot be restricted by IN relation
(only the last part of the partition key can)*
ml
On Mon, Mar 17, 2014 at 6:56 AM, Tupshin Harper wrote:
> It's the difference between reading from only the partitions that you are
> interes
A possible workaround - not a fix - might be to install libev so the libev
event loop is used.
See http://datastax.github.io/python-driver/installation.html
Also be sure you are running the latest version: 1.0.2 I believe.
Your ';' is outside of your 'str' - actually shouldn't be a problem tho.
*If* you do not need to do range queries on your 'timestam' (ts) column -
*and* if you can change your schema (big if...), then you could move
'timestam' into the partition key like this (using your notation):
PK((key String , timestam int), column1 string, col2 string) , list1 , list
2, list 3 .
These are my personal opinions, reflecting both my long experience w
database systems, and my newness to Cassandra...
[tl;dr]
The Cassandra contributors, having made its history, tend to describe it in
terms of implementation rather than action. And its implementation has a
history, all relativel
ied to identify individual rows in a
> partition. Without clustering columns, one partition is one row. So, it’s a
> matter of whether you want your rows to be in the same partition or
> distributed.
>
> -- Jack Krupansky
>
> *From:* Laing, Michael
> *Sent:* Thursday, March
RIMARY KEY ((key1, key2)), any examples would be welcome if you have the
> time.
>
> Kind regards,
>
> Dave
>
>
> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael > wrote:
>
>> Create your table like this and it will work:
>>
>> CREATE TABLE test.do
rently testing with 2.0.2 which got
>> dragged in by the cassandra unit library I'm using for testing [1] I will
>> try to fix my build dependencies and retry, thx.
>>
>> /Dave
>>
>> [1] https://github.com/jsevellec/cassandra-unit
>>
>>
>&
I have no problem doing this w 2.0.5 - what version of C* are you using? Or
maybe I don't understand your data model... attach 'creates' if you don't
mind.
ml
On Thu, Mar 13, 2014 at 9:24 AM, David Savage wrote:
> Hi Peter,
>
> Thanks for the help, unfortunately I'm not sure that's the problem,
go uses 'zig-zag' encoding, perhaps that is the difference?
On Wed, Feb 26, 2014 at 6:52 AM, Peter Lin wrote:
>
> You may need to bit shift if that is the case
>
> Sent from my iPhone
>
> > On Feb 26, 2014, at 2:53 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> >
> > Hey Colin,
> >
> >> On Tue, Feb
1 - 100 of 133 matches
Mail list logo