A lot depends on your definition of frequently.
Also when a column is updated in the memtable the previous column is replaced,
so when the memtable is flushed to disk as an SSTable only one copy of the
column is stored. If you have a situation where a lot of columns are
overwritten setting a h
Here's a good writeup on how fightmymonster.com does it...
http://ria101.wordpress.com/category/nosql-databases/locking/
--
Dan Washusen
Make big files fly
visit digitalpigeon.com
On Saturday, 9 April 2011 at 11:53 AM, Alex Araujo wrote:
On 4/8/11 5:46 PM, Drew Kutcharian wrote:
> > I'm interes
We also have a ticket open at
https://issues.apache.org/jira/browse/CASSANDRA-2399
We have observed in production the impact of streaming data to new nodes being
added. We actually have our entire dataset in page cache in one of our
clusters, our 99th percentiles go from 20ms to >1 second on s
My brain just started working. The streaming for the move may need to be
throttled, but once the file has been received the bloom filters, row indexes
and secondary indexes are built. That will also take some effort, do you have
any secondary indexes?
If you are doing a move again could you tr
On 4/8/11 5:46 PM, Drew Kutcharian wrote:
I'm interested in this too, but I don't think this can be done with Cassandra
alone. Cassandra doesn't support transactions. I think hector can retry
operations, but I'm not sure about the atomicity of the whole thing.
On Apr 8, 2011, at 1:26 PM, Ale
thank you, I get it now.
On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis wrote:
> No, I'm suggesting you have a Tokyo keyspace that gets replicated as
> {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London:
> 2, NYC: 1}, for example.
>
> On Fri, Apr 8, 2011 at 5:59 PM, Patrick Juli
What are the key things to monitor while running a stress test? There is tons
of details in nodetoll tpstats/netstats/cfstats. What in particular should I
be looking at?
Also, I've been looking at iostat and await really goes high but cfstats
shows low latency in microsecs. Is latency in cfstats c
I use version 0.7.4, I've done something like this:
1. I ssh to my Eucalyptus account and applied 1 instance, got a public
IP and a internal IP.
2. I scp the tar ball of apache-cassandra-0.7.4-bin.tar.gz to root of
my instance, unzip it and create directories
3. I run /bin/cassandra -f, everything
No, I'm suggesting you have a Tokyo keyspace that gets replicated as
{Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London:
2, NYC: 1}, for example.
On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien wrote:
> I'm familiar with this material. I hadn't thought of it from this
> angle bu
Is that mean with this configuration i must use for column value only UUID ?
I realy don't understand how it work.
I little change my code:
UUID timeUUID = DaoHelper.getTimeUUID();
HColumn column = HFactory.createColumn("name", "Alex",
StringSerializer.get(), StringSerializer.get());
Mutator mut
I am starting a stress test using hector on 6 node machine 4GB heap and 12
core. In hectore readme this is what I got by default:
create keyspace StressKeyspace
with replication_factor = 3
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use StressKeyspace;
drop col
I'm familiar with this material. I hadn't thought of it from this
angle but I believe what you're suggesting is that the different data
centers would hold a different properties file for node discovery
instead of using auto-discovery.
So Tokyo, and others, would have a configuration that make it
I'm interested in this too, but I don't think this can be done with Cassandra
alone. Cassandra doesn't support transactions. I think hector can retry
operations, but I'm not sure about the atomicity of the whole thing.
On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:
> Hi, I was wondering if th
I think the problem is this then:
mutator.addInsertion(timeUUID, columnFamilyName, column);
I'm not sure what you're doing here, but you're using your timeUUID as
the row key, not the column name. I don't see you actually assigning
the column name so I don't know what you're putting in
I try to use method getUniqueTimeUUIDinMillis from
https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/utils/TimeUUIDUtils.java
but i still get same result "InvalidRequestException(why:TimeUUID should be
16 or 0 bytes (3))";
9 квітня 2011 р. 01:32 Олександр Сил
On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien wrote:
> The problem is this: we would like the historical data from Tokyo to
> stay in Tokyo and only be replicated to New York. The one in London
> to be in London and only be replicated to New York and so on for all
> data centers.
>
> Is this cu
Thanks that you try to help me, but i still get error message
InvalidRequestException(why:TimeUUID should be 16 or 0 bytes (3))
This code UUID timeUUID = getTimeUUID(); doesn't solve my problem.
9 квітня 2011 р. 01:16 Ed Anuff написав:
> Oops, I should have been more clear. You have this code:
Oops, I should have been more clear. You have this code:
UUID timeUUID = getTimeUUID().randomUUID();
what you need is this code:
UUID timeUUID = getTimeUUID();
What I meant by not understanding the error message was that I thought
the TimeUUIDType gave a different error message than the one yo
I think this is what you're looking for
http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java
2011/4/8 Олександр Силка :
>
> Then how i can generate correct time UUID key in java ?
>
> 8 квітня 2011 р. 22:58 Ed Anuff написав:
>>
>> Hmm, if you're really doing this, you're not gettin
Then how i can generate correct time UUID key in java ?
8 квітня 2011 р. 22:58 Ed Anuff написав:
> Hmm, if you're really doing this, you're not getting a time uuid:
>
> UUID timeUUID = getTimeUUID().randomUUID();
>
> That call to randomUUID() is invoking the static randomUUID() method
> in java
Well, the amazon paper is good at describing the nature of the
problem, but to solve it you'll probably want to use zookeeper. The
paper is useful in understanding exactly what you need to lock on and
what you don't while updating the index, so you can avoid slowing
things down any more than is ne
Thanks for the suggestions Ed. Your blog post is quite helpful in deciding
on and implementing CF inverted indexes.
Our data definitely leans towards external CF - has high cardinality(1000s
for one column, millions for another), multiple columns need to be indexed,
needs sorted order.
Hope that a
A few lines of Java in a partitioning or rack aware strategy might be able to
achieve this.
--Joe
--
Typed with big fingers on a small keyboard.
On Apr 8, 2011, at 13:17, Patrick Julien wrote:
> We have a pilot project running where all our historical data
> worldwide would be stored using
dynamic_snitch seems to do host score calculation to figure the
latency of each node.
What are the details of this calculation :
1. What is the mechanism to determine latency ?
2. Does it score the calculated scores and use the historical figures
to come up with the latest scores ? (You can't just
Hi, I was wondering if there are any patterns/best practices for
creating atomic units of work when dealing with several column families
and their inverted indices.
For example, if I have Users and Groups column families and did
something like:
Users.insert( user_id, columns )
UserGroupTimel
Hmm, if you're really doing this, you're not getting a time uuid:
UUID timeUUID = getTimeUUID().randomUUID();
That call to randomUUID() is invoking the static randomUUID() method
in java.util.UUID which is generating a non-time random uuid. I'm not
sure why you're getting that error message tho
Hi everyone,
I have column family called site sorted
by org.apache.cassandra.db.marshal.TimeUUIDType.
When I try to save some data using hector i get next
message InvalidRequestException(why:TimeUUID should be 16 or 0 bytes (3)).
My Cassandra version 0.7.0
This is snippets of my code:
public sta
If you're just indexing on a single column value and the values have
low cardinality in, say, the 10's - I'd have a wide row for each
cardinal value that contained the set of keys for rows that contained
that value. For higher levels of cardinality or if you're indexing on
multiple columns, there
On Fri, Apr 8, 2011 at 4:48 AM, Sasha Dolgy wrote:
> hi all,
>
> is there a way to select random columns from a key?
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>
getRangeSlice with random column start key.
in yaml:
# Set to true to make new [non-seed] nodes automatically migrate data
# to themselves from the pre-existing nodes in the cluster.
Why only non-seed nodes? What if seed nodes need to bootstrap?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble
We have a pilot project running where all our historical data
worldwide would be stored using cassandra. So far, we have been
successful at getting the write and read throughput we need, in fact,
coming in over 27% over our needed capacity and well beyond what we
were able to achieve with mysql, v
Sadly repair isn't very resilient to errors and has failed. There is a
few ticket open to improve this and repair in general but right now,
if any problems occurs during repairs, it will fail (and nodetool
repair won't return, so you could just ctrl-c).
Provided you're on a recent enough cassandra
I am trying to decide whether to use secondary indexes or use an inverted
index column family for a use case. Is there any suggested ballpark range
for low cardinality for which secondary indexes are suitable.
Meaning at what range should using a secondary index be ruled in or out:
cardinality of
It seems on my cluster there are a few unserializable Rows. I'm trying to run
a repair on the nodes, but it also seems that the replica nodes have unreadable
or unserializable rows.The problem is, I cannot determine if the repair is
still going on, or if was interrupted because of these err
On 04/05/2011 03:04 PM, Chris Burroughs wrote:
> I have gc logs if anyone is interested.
This is from a node with standard io, jna enabled, but limits were not
set for mlockall to succeed. One can see -/+ buffers/cache free
shrinking and the C* pid's RSS growing.
Includes several days of:
gc l
hi all,
is there a way to select random columns from a key?
--
Sasha Dolgy
sasha.do...@gmail.com
36 matches
Mail list logo