I have a simple blob class over the top of this which handles input and
output streaming so reads/writes are only one column at a time
Thank you for the tips. I think I will do the same ; for this time, I've
developped a simple version which store the entire file in one column, but
I've already
I think very high uptime, and very low data loss is achievable in
Cassandra, but, for new users there are TONS of gotchas. You really
have to know what you're doing, and I doubt that many people acquire
that knowledge without making a lot of mistakes.
I see above that most people are talking
Hi Aaron. Reverted back to 4-32. Did the flush but it did not trigger
any minor compaction. Ran compact by hand, and it picked only two
sstables.
Here's the ls before:
http://pastebin.com/xDtvVZvA
And this is the ls after:
http://pastebin.com/DcpbGvK6
Any suggestions?
El jue, 23-06-2011 a
Hi -
I'd like to understand more how the token is hashed with the key to determine
on which node the data is stored - called decorating in cassandra speak.
Can anyone share any documentation on this or describe this more in detail?
Yes, I could look at the code, but I was hoping to be able
A compaction will be triggered when min number of same sized SStable files
are found. So what's actually the purpose of the max part of the
threshold?
On Jun 23, 2011, at 12:55 AM, aaron morton wrote:
Setting them to 2 and 2 means compaction can only ever compact 2 files at
time, so
On 06/23/11 09:43, David Boxenhorn wrote:
I think very high uptime, and very low data loss is achievable in
Cassandra, but, for new users there are TONS of gotchas. You really
have to know what you're doing, and I doubt that many people acquire
that knowledge without making a lot of mistakes.
As Jonathan said earlier, you are hitting
https://issues.apache.org/jira/browse/CASSANDRA-2765
This will be fixed in 0.8.1 that is currently under a vote and should be
released soon (let's say beginning of next week, maybe sooner).
--
Sylvain
2011/6/23 Héctor Izquierdo Seliva
On Thu, Jun 23, 2011 at 10:23 AM, Jonathan Colby
jonathan.co...@gmail.com wrote:
A compaction will be triggered when min number of same sized SStable files
are found. So what's actually the purpose of the max part of the
threshold?
It says, if there is more than max number of same sized
how can get_range_slices() function returns sorting key ?
BR
hey,
I have got my ec2 multi-dc across AZ's but in same region us-east.
Now I am trying to deploy cassandra over multiple regions that is ec2 us
west, singapore and us-east. I have edited the config file as
sasha's reply below.
though when I run nodetool in each DC, I only see the nodes from
are you able to open a connection from one of the nodes to a node on
the other side? us-east to us-west? could your problem be as simple
as connectivity and/or security group configuration?
On Thu, Jun 23, 2011 at 1:51 PM, pankaj soni pankajsoni0...@gmail.com wrote:
hey,
I have got my ec2
1. Is it feasible to run directly against a Cassandra data directory
restored from an EBS snapshot? (as opposed to nodetool snapshots restored
from an EBS snapshot).
Assuming EBS is not buggy, including honor write barriers, including
the linux guest kernel etc, then yes. EBS snapshots of a
Les,
Cassandra is a good system, but it has not reached version 1.0 yet, nor has
HBase etc. It is cutting edge technology and therefore in practice you are
unlikely to achieve five nines immediately - even if in theory with perfect
planning, perfect administration and so on, this should be
Domonic,
Thank you for your answer. I enjoy how in your day to day work you are
concerned with who has the monster. It must be a fun to read your
productions logs (User[Shelly] received vampire).
I looked into Cages and this does seem interesting. I need to do more
reading to have a better take.
No, the nodes in the separate DC's are able to discover each other. But
across the Dc's its not happening.
I have double checked the config parameters, both require in amazon settings
and cassandra.yaml before posting query here.
has anybody got there nodes talking to each other across regions
AJ,
Thanks for your input. I don't fully follow though how this would work with
a bank scenario. Could you explain in more detail?
Thanks.
Trevor
On Wed, Jun 22, 2011 at 6:34 PM, AJ a...@dude.podzone.net wrote:
I think Sasha's idea is worth studying more. Here is a supporting read
On Thu, Jun 23, 2011 at 5:04 AM, Peter Schuller peter.schul...@infidyne.com
wrote:
1. Is it feasible to run directly against a Cassandra data directory
restored from an EBS snapshot? (as opposed to nodetool snapshots restored
from an EBS snapshot).
Assuming EBS is not buggy, including
EBS volume atomicity is good. We've had tons of experience since EBS came
out almost 4 years ago, to back all kinds of things, including large DBs.
And thanks a lot for coming forward with production experience. That
is always useful with these things.
--
/ Peter Schuller
I've been doing EBS snapshots for mysql for some time now, and was using a
similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
complication that I was actually using 8 EBS's in RAID-0 (and the extra
extra complication that I had to lock the MyISAM tables... glad to be moving
On Thu, Jun 23, 2011 at 7:30 AM, Peter Schuller peter.schul...@infidyne.com
wrote:
EBS volume atomicity is good. We've had tons of experience since EBS came
out almost 4 years ago, to back all kinds of things, including large
DBs.
One important thing to have in mind though, is that EBS
On Thu, Jun 23, 2011 at 8:02 AM, William Oberman
ober...@civicscience.comwrote:
I've been doing EBS snapshots for mysql for some time now, and was using a
similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
complication that I was actually using 8 EBS's in RAID-0 (and
If taking an atomic snapshot of the device on which a file system is
located on, assuming the file system is designed to be crash
consistent, it *has* to result in a consistent snapshot. Anything else
would directly violate the claim that the file system is crash
consistent, making the
A snippet from the wikipedia page on XFS for example:
http://en.wikipedia.org/wiki/XFS
...
Snapshots
XFS does not provide direct support for snapshots, as it expects the
snapshot process to be implemented by the volume manager. Taking a snapshot
of an XFS filesystem involves freezing I/O
that with storage, there are *lots* of urban legends and people making
strange claims. In this case it is wrong for fundamental reasons
independent of kernel implementation details.
Also, note that it is not specific to log based file systems. Even
old file systems the predates journaling or
How should one go about creating a data model from RDBMS ER into Big Table
Data model? For eg: RDBMS has many indexes required for queries and I think
this is the most important aspect when desiging the data model in Big Table.
I was initially planning to denormalize into one CF and use
you can create the inverted index in the same CF ... just means you
would have potentially lots more rows ...
do you have a use-case or hypothetical you can share? if not ... here's one.
http://code.google.com/p/oauth-php it has an RDBMs suggested
model
On Thu, Jun 23, 2011 at 12:43 PM, mcasandra mohitanch...@gmail.com wrote:
How should one go about creating a data model from RDBMS ER into Big Table
Data model? For eg: RDBMS has many indexes required for queries and I think
this is the most important aspect when desiging the data model in Big
Here are some useful links that will help you learn how to create a data model:
WTF is a SuperColumn? An Intro to the Cassandra Data Model -
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
A Java Implementation of the Blog Data Model created by Arin
Sarkissian in his blog post WTF
On 06/22/2011 10:03 PM, Edward Capriolo wrote:
I have not read the original thread concerning the problem you mentioned.
One way to avoid OOM is large amounts of RAM :) On a more serious note most
OOM's are caused by setting caches or memtables too large. If the OOM was
caused by a software
On 06/22/2011 07:12 PM, Les Hazlewood wrote:
Telling me to read the mailing lists and follow the issue tracker and use
monitoring software is all great and fine - and I do all of these things
today already - but this is a philosophical recommendation that does not
actually address my question.
Since upgrading to 0.7.6-2 I'm seeing the following exception in our
server logs:
ERROR [MutationStage:1184874] 2011-06-22 23:59:43,867
AbstractCassandraDaemon.java (line 114) Fatal exception in thread
Thread[MutationStage:1184874,5,main]
java.lang.UnsupportedOperationException: Index manager
It's really not supported in = 0.7.6; it can cause index corruption
IF the row delete timestamp is higher than the column update's.
This is fixed for 0.7.7 in https://issues.apache.org/jira/browse/CASSANDRA-2773
On Thu, Jun 23, 2011 at 12:46 PM, Jim Ancona j...@anconafamily.com wrote:
Since
Exactly right - I wrote BlobInputStream and BlobOutputStream classes to
go with a Blob class.
I use 1MB for the block size, but I haven't done any performance
testing. I went small to favor low memory and bandwidth foot prints.
I'd, of course, be very interested in any performance tests.
Great stuff Chris - thanks so much for the feedback!
Les
In the spirit of your re-formulated questions:
- Read-before-write is a Cassandra anti-pattern, avoid it if at all
possible.
This leads me to believe that Cassandra may not be a good idea for a
primary OLTP data store. For example only create a user object if email
foo is not already in
The issue has the fix version as 0.8.2, not 0.7.7. Is that incorrect?
Cheers,
Les
On 06/23/2011 01:56 PM, Les Hazlewood wrote:
Is there a roadmap or time to 1.0? Even a ballpark time (e.g next year 3rd
quarter, end of year, etc) would be great as it would help me understand
where it may lie in relation to my production rollout.
The C* devs are rather strongly inclined
As an additional concrete detail to Edward's response, 'result
pinning' can provide some performance improvements depending on
topology and workload. See the conf file comments for details:
https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L308-315
I would also advise
Is there any reason this fix can't be back-ported to 0.7?
Jim
On Thu, Jun 23, 2011 at 3:00 PM, Jonathan Ellis jbel...@gmail.com wrote:
Sorry, 0.8.2 is correct.
On Thu, Jun 23, 2011 at 1:36 PM, Les Hazlewood l...@katasoft.com wrote:
The issue has the fix version as 0.8.2, not 0.7.7. Is that
Various places in the code call IPartitioner.decorateKey() which returns a
DecoratedKeyT which contains both the original key and the TokenT
The RandomPartitioner md5 to hash the key ByteBuffer and create a BigInteger.
OPP converts the key into utf8 encoded String.
Using the token to find
Missed that in the history, cheers.
A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 23 Jun 2011, at 20:26, Sylvain Lebresne wrote:
As Jonathan said earlier, you are hitting
https://issues.apache.org/jira/browse/CASSANDRA-2765
This
Not sure what your question is.
Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 23 Jun 2011, at 21:59, karim abbouh wrote:
how can get_range_slices() function
EC2Snitch doesn't currently support multi-Regions in Amazon.
Tickets to track:
https://issues.apache.org/jira/browse/CASSANDRA-2452
https://issues.apache.org/jira/browse/CASSANDRA-2491
Let us know if/how you get the OpenVPN connection to work across Regions.
On Thu, Jun 23, 2011 at 6:29 AM,
we use a combination of Vyatta OpenVPN on the nodes that are EC2 and
nodes that aren't Ec2works a treat.
On Thu, Jun 23, 2011 at 10:23 PM, Sameer Farooqui
cassandral...@gmail.com wrote:
EC2Snitch doesn't currently support multi-Regions in Amazon.
Tickets to track:
Hi Dominic,
Thanks so much for providing this information. I was unaware of Cages and
this looks like it could be used effectively for certain things.
This is because Cassandra uses the timestamps of columns that have been
written during reconciliation to determine which should be persisted
I've reopened the issue. On our 0.7.6-2 cluster, system.log is filling
with repeated instances of the UnsupportedOperationException. When
we've attempted to restart a node, the restart fails with the same
exception. Luckily we found this as part of our pre-deploy testing of
0.7.6, not in
Hey all,
We're running a slightly patched version of 0.7.3 on a cluster of 5 nodes.
I've been noticing a number of messages in our logs which look like this
(after a node goes down and comes back up, usually just due to a GC):
2011-06-23 14:46:35,381 INFO [HintedHandoff:1]
On Thu, Jun 23, 2011 at 2:05 PM, Les Hazlewood l...@katasoft.com wrote:
Hi Dominic,
Thanks so much for providing this information. I was unaware of Cages and
this looks like it could be used effectively for certain things.
This is because Cassandra uses the timestamps of columns that have
On Thu, Jun 23, 2011 at 2:55 PM, Jeffrey Wang jw...@palantir.com wrote:
Hey all,
We’re running a slightly patched version of 0.7.3 on a cluster of 5 nodes.
I’ve been noticing a number of messages in our logs which look like this
(after a node goes “down” and comes back up, usually just due
No, it's always been off. No hints are being delivered ever, but the
HintedHandoffManager still does some stuff when nodes come back online.
-Jeffrey
-Original Message-
From: Ryan King [mailto:r...@twitter.com]
Sent: Thursday, June 23, 2011 3:00 PM
To: user@cassandra.apache.org
(1) you should upgrade to a stable release instead of a frankenbuild
(2) hh_enabled just controls whether it creates new hints. it will
still attempt to deliver existing ones. this is essentially free if
no such hints exist.
tldr: it's harmless.
On Thu, Jun 23, 2011 at 4:55 PM, Jeffrey Wang
Thanks Jonathan,
I found more cases where I wrongly use getInt() directly.
it caused some random data errors: nio.BufferUnderflow. but I did rewind()
these buffers before reading them. then the most probable cause is that some
other threads are sharing the same buffers.
then this prompts me to
Thanks for the pointer Ryan!
Regards,
Les
On 6/23/2011 7:37 AM, Trevor Smith wrote:
AJ,
Thanks for your input. I don't fully follow though how this would work
with a bank scenario. Could you explain in more detail?
Thanks.
Trevor
I don't know yet. I'll be researching that. My working procedure is to
figure out a way to handle
On 6/22/2011 11:43 PM, Sasha Dolgy wrote:
maybe you want to spend a few minutes reading about Haystack over at
facebook to give you some ideas...
https://www.facebook.com/note.php?note_id=76191543919
Not saying what they've done is the right way... just sayin'
Thanks for the tip Sasha; will
http://github.com/twissandra/twissandra
On Thu, Jun 23, 2011 at 6:51 PM, Santiago Basulto
santiago.basu...@gmail.com wrote:
Hello people.
I've been looking for Twissandra at GitHub, but it's not there:
http://github.com/ericflo/twissandra
Do you know where can i get a copy of the code?
A little addendum
Key := Your data to identify a row
Token := Index on the ring calculated from Key. The calculation is
defined in replication strategy.
You can lookup responsible nodes (endpoints) for a specific key with
JMX getNaturalEndpoints interface.
maki
2011/6/24 aaron morton
57 matches
Mail list logo