This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code.
But would it be a "recomended solution" for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>: > If you use async and your driver is token aware, it will go to the > proper node, rather than requiring the coordinator to do so. > > Realistically you're going to have a connection open to every server > anyways. It's the difference between you querying for the data > directly and using a coordinator as a proxy. It's faster to just ask > the node with the data. > > On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle > <marc...@s1mbi0se.com.br> wrote: > > But using async queries wouldn't be even worse than using SELECT IN? > > The justification in the docs is I could query many nodes, but I would > still > > do it. > > > > Today, I use both async queries AND SELECT IN: > > > > SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + " WHERE > > name=%s and value in(%s)" > > > > for name, values in identifiers.items(): > > query = self.SELECT_ENTITY_LOOKUP % ('%s', > ','.join(['%s']*len(values))) > > args = [name] + values > > query_msg = query % tuple(args) > > futures.append((query_msg, self.session.execute_async(query, args))) > > > > for query_msg, future in futures: > > try: > > rows = future.result(timeout=100000) > > for row in rows: > > entity_ids.add(row.entity_id) > > except: > > logging.error("Query '%s' returned ERROR " % (query_msg)) > > raise > > > > Using async just with select = would mean instead of 1 async query > (example: > > in (0, 1, 2)), I would do several, one for each value of "values" array > > above. > > In my head, this would mean more connections to Cassandra and the same > > amount of work, right? What would be the advantage? > > > > []s > > > > > > > > > > 2014-06-19 22:01 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>: > > > >> Your other option is to fire off async queries. It's pretty > >> straightforward w/ the java or python drivers. > >> > >> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle > >> <marc...@s1mbi0se.com.br> wrote: > >> > I was taking a look at Cassandra anti-patterns list: > >> > > >> > > >> > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html > >> > > >> > Among then is > >> > > >> > SELECT ... IN or index lookups¶ > >> > > >> > SELECT ... IN and index lookups (formerly secondary indexes) should be > >> > avoided except for specific scenarios. See When not to use IN in > SELECT > >> > and > >> > When not to use an index in Indexing in > >> > > >> > CQL for Cassandra 2.0" > >> > > >> > And Looking at the SELECT doc, I saw: > >> > > >> > When not to use IN¶ > >> > > >> > The recommendations about when not to use an index apply to using IN > in > >> > the > >> > WHERE clause. Under most conditions, using IN in the WHERE clause is > not > >> > recommended. Using IN can degrade performance because usually many > nodes > >> > must be queried. For example, in a single, local data center cluster > >> > having > >> > 30 nodes, a replication factor of 3, and a consistency level of > >> > LOCAL_QUORUM, a single key query goes out to two nodes, but if the > query > >> > uses the IN condition, the number of nodes being queried are most > likely > >> > even higher, up to 20 nodes depending on where the keys fall in the > >> > token > >> > range." > >> > > >> > In my system, I have a column family called "entity_lookup": > >> > > >> > CREATE KEYSPACE IF NOT EXISTS Identification1 > >> > WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', > >> > 'DC1' : 3 }; > >> > USE Identification1; > >> > > >> > CREATE TABLE IF NOT EXISTS entity_lookup ( > >> > name varchar, > >> > value varchar, > >> > entity_id uuid, > >> > PRIMARY KEY ((name, value), entity_id)); > >> > > >> > And I use the following select to query it: > >> > > >> > SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s) > >> > > >> > Is this an anti-pattern? > >> > > >> > If not using SELECT IN, which other way would you recomend for lookups > >> > like > >> > that? I have several values I would like to search in cassandra and > they > >> > might not be in the same particion, as above. > >> > > >> > Is Cassandra the wrong tool for lookups like that? > >> > > >> > Best regards, > >> > Marcelo Valle. > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Jon Haddad > >> http://www.rustyrazorblade.com > >> skype: rustyrazorblade > > > > > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > skype: rustyrazorblade >