Re: Best way to do a multi_get using CQL

Marcelo Elias Del Valle Thu, 19 Jun 2014 19:38:18 -0700

This is interesting, I didn't know that!
It might make sense then to use select = + async + token aware, I will try
to change my code.


But would it be a "recomended solution" for these cases? Any other options?

I still would if this is the right use case for Cassandra, to look for
random keys in a huge cluster. After all, the amount of connections to
Cassandra will still be huge, right... Wouldn't it be a problem?
Or when you use async the driver reuses the connection?

[]s


2014-06-19 22:16 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:

> If you use async and your driver is token aware, it will go to the
> proper node, rather than requiring the coordinator to do so.
>
> Realistically you're going to have a connection open to every server
> anyways.  It's the difference between you querying for the data
> directly and using a coordinator as a proxy.  It's faster to just ask
> the node with the data.
>
> On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
> <marc...@s1mbi0se.com.br> wrote:
> > But using async queries wouldn't be even worse than using SELECT IN?
> > The justification in the docs is I could query many nodes, but I would
> still
> > do it.
> >
> > Today, I use both async queries AND SELECT IN:
> >
> > SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + " WHERE
> > name=%s and value in(%s)"
> >
> > for name, values in identifiers.items():
> >    query = self.SELECT_ENTITY_LOOKUP % ('%s',
> ','.join(['%s']*len(values)))
> >    args = [name] + values
> >    query_msg = query % tuple(args)
> >    futures.append((query_msg, self.session.execute_async(query, args)))
> >
> > for query_msg, future in futures:
> >    try:
> >       rows = future.result(timeout=100000)
> >       for row in rows:
> >         entity_ids.add(row.entity_id)
> >    except:
> >       logging.error("Query '%s' returned ERROR " % (query_msg))
> >       raise
> >
> > Using async just with select = would mean instead of 1 async query
> (example:
> > in (0, 1, 2)), I would do several, one for each value of "values" array
> > above.
> > In my head, this would mean more connections to Cassandra and the same
> > amount of work, right? What would be the advantage?
> >
> > []s
> >
> >
> >
> >
> > 2014-06-19 22:01 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:
> >
> >> Your other option is to fire off async queries.  It's pretty
> >> straightforward w/ the java or python drivers.
> >>
> >> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
> >> <marc...@s1mbi0se.com.br> wrote:
> >> > I was taking a look at Cassandra anti-patterns list:
> >> >
> >> >
> >> >
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
> >> >
> >> > Among then is
> >> >
> >> > SELECT ... IN or index lookups¶
> >> >
> >> > SELECT ... IN and index lookups (formerly secondary indexes) should be
> >> > avoided except for specific scenarios. See When not to use IN in
> SELECT
> >> > and
> >> > When not to use an index in Indexing in
> >> >
> >> > CQL for Cassandra 2.0"
> >> >
> >> > And Looking at the SELECT doc, I saw:
> >> >
> >> > When not to use IN¶
> >> >
> >> > The recommendations about when not to use an index apply to using IN
> in
> >> > the
> >> > WHERE clause. Under most conditions, using IN in the WHERE clause is
> not
> >> > recommended. Using IN can degrade performance because usually many
> nodes
> >> > must be queried. For example, in a single, local data center cluster
> >> > having
> >> > 30 nodes, a replication factor of 3, and a consistency level of
> >> > LOCAL_QUORUM, a single key query goes out to two nodes, but if the
> query
> >> > uses the IN condition, the number of nodes being queried are most
> likely
> >> > even higher, up to 20 nodes depending on where the keys fall in the
> >> > token
> >> > range."
> >> >
> >> > In my system, I have a column family called "entity_lookup":
> >> >
> >> > CREATE KEYSPACE IF NOT EXISTS Identification1
> >> >   WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
> >> >   'DC1' : 3 };
> >> > USE Identification1;
> >> >
> >> > CREATE TABLE IF NOT EXISTS entity_lookup (
> >> >   name varchar,
> >> >   value varchar,
> >> >   entity_id uuid,
> >> >   PRIMARY KEY ((name, value), entity_id));
> >> >
> >> > And I use the following select to query it:
> >> >
> >> > SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)
> >> >
> >> > Is this an anti-pattern?
> >> >
> >> > If not using SELECT IN, which other way would you recomend for lookups
> >> > like
> >> > that? I have several values I would like to search in cassandra and
> they
> >> > might not be in the same particion, as above.
> >> >
> >> > Is Cassandra the wrong tool for lookups like that?
> >> >
> >> > Best regards,
> >> > Marcelo Valle.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> skype: rustyrazorblade
> >
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Best way to do a multi_get using CQL

Reply via email to