On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin wrote:
> Looking at your example,as I think you understand, you forgo indexes by
> combining two conditions in one query, thinking along the lines of what is
> often done in RDBMS. A scan is expected in this case, and there is no
> magic to avoid it
On Thu, Nov 17, 2011 at 1:08 PM, Maciej Miklas wrote:
> A) Skinny rows
> - row key contains login name - this is the main search criteria
> - login data is replicated - each possible login is stored as single row
> which contains all user data - 10 logins for
single customer create 10 rows, whe
On Wed, Nov 2, 2011 at 7:26 PM, David Jeske wrote:
> - make sure the summarizer does try to do it's job for a batch of counters
> until they are fully replicated and 'static' (no new increments will appear)
>
Apologies. make the summarizer ( doesn't ) try to do it's job...
I understand what you are thinking daniel, but this approach has at least
one big wrinkle. You would be introducing depencencies between compaction
and replication.
The 'unique' idempotent records are required for cassandra to read repair
properly. Therefore, if a compaction (or even a memtable f
You are answering your own question here. If you are running at 80% of
network bandwidth, you are saturating your network.
AFAIK - most distributed databases are running on gigabit, not 100mb. I
recommend you upgrade your switch (and nics if necessary). Gigabit is
insanely cheap now. In the extrem
If your summary data is frequently accessed, you will probably be best off
storing the two sets of data separately (either in separate column families
or with different key-prefixes). This will give you the greatest
cache-locality for your summary data, which you say is popular. If your
summary dat
On Wed, Oct 26, 2011 at 7:35 PM, Ben Gambley wrote:
> Our requirement is to store per user, many unique results (which is
> basically an attempt at some questions ..) so I had thought of having the
> userid as the row key and the result id as columns.
>
> The keys for the result ids are maintaine
On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote:
> At the moment I am partitioning the data in Cassandra in 75 CFs
You might consider not using so many column families. I am not a Cassandra
expert, but from what I've seen floated around, there is currently a unique
memtable, commit log,
>
>
> 2) If a single key, would adding a file/block/record-level encryption to
>> Cassandra solve this problem? If not, why not? Is there something
>> special about your encryption methods?
>>
>
> There is nothing special about our encryption methods but will never be
> able to encrypt or decrypt
On Tue, Oct 18, 2011 at 12:14 AM, Matthias Pfau wrote:
> we want to sort completely on the client-side (where the data is
> encrypted). But that requires an "insert at offset X" operation. We would
> always use CL QUORUM and client side synchronisation.
>
You can do "insert at offset X"... just
On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau wrote:
> We would be very happy if cassandra would give us an option to maintain the
> sort order on our own (application logic). That is why it would be
> interesting to hear from any of the developers if it would be easily
> possible to add such a
Logically, whether you use cassandra or not, there is some "physics" of
sorted order structures which you should understand and dictate what is
possible.
In order to keep data sorted, a database needs to be able to see the proper
sort-order of the data "all the time" not just at insertion or query
After writing my message, I recognized a scenerio you might be referring to
Kevin.
If I understand correctly, you're not referring to set-membership in the
general sense, where one could add and remove entries. General
set-membership, in the context of eventual-consistency, requires timestamps.
Th
On Sat, Sep 3, 2011 at 8:26 PM, Kevin Burton wrote:
> The point is that replication in Cassandra only needs timestamps to handle
> out of order writes … for values that are idempotent, this isn't necessary.
> The order doesn't matter.
>
I believe this is a mis-understanding of how idempotency a
Thanks for all the great answers last week about Cassandra. I have an
additional question about cassandra and columns/supercolumns. I had naively
assumed that columns and super-columns map to an internal row-key (like how
in Bigtable the indexed map is row/column-key/timestamp to data), but some
pe
> My point still applies though. Caching HFIle blocks on a single node
>> vs individual "dataums" on N nodes may not be more efficient. Thus
>> terms like "Slower" and "Less Efficient" could be very misleading.
>>
>
I seem to have missed this the first time around. Next time I correct the
summary I
This is my second attempt at a summary of Cassandra vs HBase consistency and
performance for an hbase acceptable workload. I think these tricky subtlties
are hard to understand, yet it's helpful for the community to understand
them. I'm not trying to state my own facts (or opinion) but merely summa
On Mon, Nov 22, 2010 at 2:44 PM, David Jeske wrote:
> On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
>
>> Return messages such as "your data was written to at least 1 node but
>> not enough to make your write-consistency count". Do not help the
>> si
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
> Return messages such as "your data was written to at least 1 node but
> not enough to make your write-consistency count". Do not help the
> situation. As the client that writes the data would be aware of the
> inconsistency, but the other c
On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
> Not quite. The replica synchronization code is pretty messy, but basically
> it will take the longest replica that may have been synced, not a quorum.
>
> i.e the guarantee is that "if you successfully sync() data, it will be
> present after
On Mon, Nov 22, 2010 at 1:26 PM, Edward Capriolo wrote:
> For cassandra all writes must be transmitted to all replicas.
>
I thought that was only true if you set the number of replicas required for
the write to the same as the number of replicas.
Further, we've established in this thread that ev
>
> 2) Cassandra has a less efficient memory footprint data pinned in
> memory (or cached). With 3 replicas on Cassandra, each element of data
> pinned in-memory is kept in memory on 3 servers, wheras in hbase only
> region masters keep the data in memory, so there is only one-copy of
> each data e
I already noticed a mistake in my own facts...
On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
> *4) Cassandra (N3/W3/R1) takes longer to allow data to become writable
> again in the face of a node-failure than HBase/HDFS.* Cassandra must
> repair the keyrange to bring N from
I havn't used either Cassandra or hbase, so please don't take any part of
this message as me attempting to state facts about either system. However,
I'm very familiar with data-storage design details, and I've worked
extensively optimizing applications running on MySQL, Oracle, berkeledb
(including
24 matches
Mail list logo