Hello Jay
Your query is : select * from keyspaceuser.company_testusers where
lastname = ‘lau’ LIMIT 1
Why do you think that the slowness is due to vnodes and not your query
asking for 10 000 results ?
On Fri, Sep 19, 2014 at 3:33 AM, Jay Patel pateljay3...@gmail.com wrote:
Hi there,
We
Keep in mind secondary indexes in cassandra are not there to improve
performance, or even really be used in a serious user facing manner.
Build and maintain your own view of the data, it'll be much faster.
On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote:
Hi there,
We
Agreed. We only use secondary indexes for column families that are relatively
small (~5k rows). For anything larger, we store the data into a wide row (but
this depends on your data model)
-Original Message-
From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf
Hey all,
I tried googling around to get an idea about what was new (and potentially
cool) in the newest release of cassandra - 2.1.0.
But all that I've been able to find so far is this kind of general
statement about the new features.
Hello Tim
From this blog (http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1)
you should find the pointers to other big topics of 2.1
On Fri, Sep 19, 2014 at 3:33 PM, Tim Dunphy bluethu...@gmail.com wrote:
Hey all,
I tried googling around to get an idea about what was new (and
Thanks I'll check that out! Really appreciate that!
On Fri, Sep 19, 2014 at 10:07 AM, DuyHai Doan doanduy...@gmail.com wrote:
Hello Tim
From this blog (
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1) you should
find the pointers to other big topics of 2.1
On Fri, Sep 19,
I am trying to use wide rows concept in my data modelling design for
Cassandra. We are using Cassandra 2.0.6.
CREATE TABLE test_data (
test_id int,
client_name text,
record_data text,
creation_date timestamp,
last_modified_date timestamp,
PRIMARY KEY
Hello,
Yes, this is a wide row table design. The first col is your Partition
Key. The remaining 2 cols are clustering cols. You will receive ordered
result sets based on client_name, record_date when running that query.
Jonathan
[image: datastax_logo.png]
Jonathan Lacefield
Solution
Does my above table falls under the category of wide rows in Cassandra or
not? -- It depends on the cardinality. For each distinct test_id, how
many combinations of client_name/record_data do you have ?
By the way, why do you put the record_data as part of primary key ?
In your table partiton
@DuyHai - I have put that because of this condition -
In this table, we can have multiple record_data for same client_name.
It can be multiple combinations of client_name and record_data for each
distinct test_id.
On Fri, Sep 19, 2014 at 8:48 AM, DuyHai Doan doanduy...@gmail.com wrote:
Does
Ahh yes, sorry, I read too fast, missed it.
On Fri, Sep 19, 2014 at 5:54 PM, Check Peck comptechge...@gmail.com wrote:
@DuyHai - I have put that because of this condition -
In this table, we can have multiple record_data for same client_name.
It can be multiple combinations of client_name
Jon's advice is definitely still true, but in 2.1 there is
https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes
the fetching of ranges.
On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel ppa...@clearpoolgroup.com
wrote:
Agreed. We only use secondary indexes for column families
Thanks folks for all your inputs! Yes, I totally agree that we need to have
a custom column family for indexing. However, we're trying to upgrade our
existing cluster from non-vnode to vnode, and queries using secondary
indexes breaks badly which used to be good with non-vnode.
Btw, there is no
Hi Kevin, if you are using the latest version of opscenter, then even the
community (= free) edition can do a rolling restart of your cluster. It's
pretty convenient.
We’re using ansible so I’d like something that integrates with that…
On Tue, Sep 16, 2014 at 11:09 AM, Duncan Sands
This is great feedback…
I think it could actually be even easier than this…
You could have an ansible (or whatever cluster management system you’re
using) role for just seeds.
Then you would serially restart all seeds one at a time. You would need to
run ‘nodetool status’ and make sure the
We run on DSE 3.1.3 and only use the Cassandra in prod cluster.
What is the release that I need to be on right away. Because if I need to
upgrade to DSE 4.5.c* 2.0.7. I need to take 3 paths to get there. I see lot
of improvements for solr/Hadoop features in DSE 4.0 and above.
Can I upgrade to
Depending on how you query (one or quorum) you might be able to do 1 rack at a
time (or az or whatever you've got) assuming your snitch is set up right
On Sep 19, 2014, at 11:30 AM, Kevin Burton bur...@spinn3r.com wrote:
This is great feedback…
I think it could actually be even easier
On Fri, Sep 19, 2014 at 1:26 PM, Kevin Burton bur...@spinn3r.com wrote:
We’re using ansible so I’d like something that integrates with that…
I'm not familiar with Ansible, so I don't know if it's useful, but
OpsCenter has a REST api you can use to do anything you can do from the
UI. For
Hey all,
I'm attempting to upgrade from cassandra 2.0.10 to version 2.1.0.
However when launching the new version I'm running into the following:
[root@beta-new:/etc/alternatives/cassandrahome] #./bin/cassandra -f
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
java.lang.NoSuchMethodError -- Seems like there is inconsistency with
your jar dependencies
On Fri, Sep 19, 2014 at 11:05 PM, Tim Dunphy bluethu...@gmail.com wrote:
Hey all,
I'm attempting to upgrade from cassandra 2.0.10 to version 2.1.0.
However when launching the new version I'm
On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel pateljay3...@gmail.com wrote:
Btw, there is no data in the table. Table is empty. Query is fired on the
empty table.
This is actually the worst case for secondary index lookups.
From the tracing ouput, I don't understand why it's doing multiple
If we have hundreds of CQL clients (for C* 2.0.9), should we increase
native_transport_max_threads in cassandra.yaml from the default (128) to the
number of clients? If we don't do that, I presume requests will queue up,
resulting in higher latency, What's a reasonable max value for
It will merge requests to neighboring ranges when the same node is a
replica for both of them. Without vnodes, this usually results in all
ranges for a node being merged. With vnodes, merging still happens, but
not all ranges can be merged. --
But does it implies that with vnodes, there are
On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan doanduy...@gmail.com wrote:
But does it implies that with vnodes, there are actually extra work to
do for scanning indices ?
Yes.
If yes, is this extra load rather I/O bound or CPU bound ?
It doesn't necessarily change what the query is
My company is using an RDBMS for storing time-series data. This application
was developed before Cassandra and NoSQL. I'd like to move to C*, but ...
The application supports data coming from multiple models of devices.
Because there is enough variability in the data, the main table to hold the
On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan doanduy...@gmail.com wrote:
But does it implies that with vnodes, there are actually extra work to
do for scanning indices ?
Vnodes are just nodes, so they have all the
problems-associated-with-many-nodes one would get with 256x as many nodes.
Thanks Tyler for the details. I'm still trying to understand what you
described.
Just to simplify my question what I don't understand:
When coordinator fires indexed scan request to node 192.168.51.22, why
don't it ask that node to check all of its (at least primary) ranges for
the queried
I have a question about the steps listed in this article for addressing
CASSANDRA-4411 https://issues.apache.org/jira/browse/CASSANDRA-4411 in an
upgrade from a version = 1.1.3 or to a version = 1.1.5 when using leveled
compaction: http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps
Thanks Robert for your intput but that sounds little crazy to me. Still
physical node is the same so why can't it just do one indexed scan for all
the contiguous or non-contiguous token ranges (vnodes) held by that
physical node. I doubt that it needs to respect token order for some
reason hence
On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel pateljay3...@gmail.com wrote:
When coordinator fires indexed scan request to node 192.168.51.22, why
don't it ask that node to check all of its (at least primary) ranges for
the queried data, at once. Also, internally that node should be able to
Thanks Tyler for clarification. I'll opened a tix CASSANDRA-7982
https://issues.apache.org/jira/browse/CASSANDRA-7982. For now, I've
assigned to myself and put you as a reviewer. Pls. change assignment as you
prefer..
Assume that we now batch the requests send only one request to the
replica:
Most of the C* success stories are for greenfield applications.
Migrating from one database to another database is a lot of work. C* offers no
magical path.
If you only have a few tables and minor RDBMS feature dependencies, it can be
done.
Make sure your users and QA people are cooperative
Kevin: The serial approach would
take a LONG time for large clusters.
If you have sixty nodes, it could
take an hour to do a rolling restart.
1) In Cassandra land, an hour is nothing. There's people doing repairs that
practically
never finish - as soon as one finishes after a week, they have
I'll be blunt. The reason to use the latest 2.0 or soon 2.1 is because
Apple has committed 20 patches that make Cassandra
operationally useful. Apple is the QA lab for Cassandra.
Their conference talk was very exciting. I hope a video of that
gets posted in October.
Thanks, James Briggs.
--
Start by asking how you intend to query the data. That should drive the data
model.
Is there existing app client code or an app layer that is already using the
current schema, or are you intending to rewrite that as well.
FWIW, you could place the numeric columns in a numeric map collection,
35 matches
Mail list logo