What Happened To Alternate Storage And Rocksandra?

2021-03-12 Thread Gareth Collins
Hi,

I remember a couple of years ago there was some noise about Rocksandra
(Cassandra using rocksdb for storage) and opening up Cassandra to alternate
storage mechanisms.

I haven't seen anything about it for a while now though. The last commit to
Rocksandra on github was in Nov 2019. The associated JIRA items
(CASSANDRA-13474 and CASSANDRA-13476) haven't had any activity since 2019
either.

I was wondering whether anyone knew anything about it. Was it decided that
this wasn't a good idea after all (the alleged performance differences
weren't worth it...or were exaggerated)? Or is it just that it still may be
a good idea, but there are no resources available to make this happen (e.g.
perhaps the original sponsor moved onto other things)?

I ask because I was looking at RocksDB/Kafka Streams for another project
(which may replace some functionality which currently uses Cassandra)...and
was wondering if there could be some important info about RocksDB I may be
missing.

thanks in advance,
Gareth Collins


Re: Stumped By Cassandra delays

2018-07-22 Thread Gareth Collins
Hi Shalom,

Thanks very much for the response!

 We are only using batches on one Cassandra partition to improve
performance. Batches are NEVER used in this app across Cassandra partition.
And if you look at the trace
messages I showed, there is only one statement per batch anyway.

In fact, what I see in the trace is that the responses to the writes may be
being held up by the reads. Here is a more complete example which is
consistent
across nodes. We are using datastax client 3.1.2. Note that all the
requests appear to be processed on nio-worker-5 which is suggesting that
this may be all on the one connection
(even though I can see two connections to each C* server from each client):



*2018-07-20 05:32:43,185 [luster1-nio-worker-5] [  ] [
   ] [] ( core.QueryLogger.SLOW) DEBUG   -
[cluster1] [/10.123.4.52:9042 <http://10.123.4.52:9042>] Query too slow,
took 9322 ms: [2 bound values] select a, b, c, d from  where
token(a)>? and token(a)<=?; << slow read2018-07-20 05:32:43,185
[luster1-nio-worker-5] [  ] [] [
 ] ( core.QueryLogger.SLOW) DEBUG   - [cluster1]
[/10.123.4.52:9042 <http://10.123.4.52:9042>] Query too slow, took 5950 ms:
[1 statements, 6 bound values] BEGIN BATCH INSERT INTO  (a, b,
c, d, e) VALUES (?, ?, ?, ?, ?) using ttl ?; APPLY BATCH; << write response
received immediately after the read2018-07-20 05:32:43,185
[luster1-nio-worker-5] [  ] [] [
 ] ( core.QueryLogger.SLOW) DEBUG   - [cluster1]
[/10.123.4.52:9042 <http://10.123.4.52:9042>] Query too slow, took 511 ms:
[1 statements, 6 bound values] BEGIN BATCH INSERT INTO  (a, b,
c, d, e) VALUES (?, ?, ?, ?, ?) using ttl ?; APPLY BATCH; << write response
received immediately after the read*
2018-07-20 05:32:43,607 [luster1-nio-worker-5] [  ] [
 ] [] (   core.QueryLogger.NORMAL) DEBUG   -
[cluster1] [/10.123.4.52:9042] Query completed normally, took 33 ms: [2
bound values] select CustomerID, ds_, data_, AudienceList from
data.customer_b01be157931bcbfa32b7f240a638129d where token(CustomerID)>?
and token(CustomerID)<=?; << normal read
2018-07-20 05:32:45,938 [luster1-nio-worker-5] [  ] [
 ] [] ( core.QueryLogger.SLOW) DEBUG   -
[cluster1] [/10.123.4.52:9042] Query too slow, took 1701 ms: [2 bound
values] select a, b, c, d from  where token(a)>? and
token(a)<=?; << slow read
2018-07-20 05:32:46,257 [luster1-nio-worker-5] [  ] [
 ] [] (   core.QueryLogger.NORMAL) DEBUG   -
[cluster1] [/10.123.4.52:9042] Query completed normally, took 0 ms: [1
statements, 6 bound values] BEGIN BATCH INSERT INTO  (a, b, c,
d, e) VALUES (?, ?, ?, ?, ?) using ttl ?; APPLY BATCH; << normal write – no
overlap with the read
2018-07-20 05:32:46,336 [luster1-nio-worker-5] [  ] [
 ] [] (   core.QueryLogger.NORMAL) DEBUG   -
[cluster1] [/10.123.4.52:9042] Query completed normally, took 30 ms: [2
bound values] select a, b, c, d from  where token(a)>? and
token(a)<=?; << normal read

*2018-07-20 05:32:48,622 [luster1-nio-worker-5] [  ] [
   ] [] ( core.QueryLogger.SLOW) DEBUG   -
[cluster1] [/10.123.4.52:9042 <http://10.123.4.52:9042>] Query too slow,
took 1626 ms: [2 bound values] select select a, b, c, d from 
where token(a)>? and token(a)<=?; << slow read2018-07-20 05:32:48,622
[luster1-nio-worker-5] [  ] [] [
 ] ( core.QueryLogger.SLOW) DEBUG   - [cluster1]
[/10.123.4.52:9042 <http://10.123.4.52:9042>] Query too slow, took 425 ms:
[1 statements, 6 bound values] BEGIN BATCH INSERT INTO  (a, b,
c, d, e) VALUES (?, ?, ?, ?, ?) using ttl ?; APPLY BATCH; << write appears
immediately after the read*

I would be suggesting some sort of bug on the client holding up the
thread...but I don't know why I would only have a problem on one C* node at
any one time (the clients process reads and writes to other nodes at the
same time without delays).

thanks in advance,
Gareth


On Sun, Jul 22, 2018 at 4:12 AM, shalom sagges 
wrote:

> Hi Gareth,
>
> If you're using batches for multiple partitions, this may be the root
> cause you've been looking for.
>
> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
>
> If batches are optimally used and only one node is misbehaving, check if
> NTP on the node is properly synced.
>
> Hope this helps!
>
>
> On Sat, Jul 21, 2018 at 9:31 PM, Gareth Collins <
> gareth.o.coll...@gmail.com> wrote:
>
>> Hello,
>>
>> We are running Cassandra 2.1.14 in AWS, with c5.4xlarge machines
>> (initially these were m4.xlarge) for our cassandra servers and
>> m4.xlarge for our application servers. On 

Stumped By Cassandra delays

2018-07-21 Thread Gareth Collins
C* server
upgrade and we still had problems, but I could always try again).

Any ideas/suggestions are greatly appreciated.

thanks in advance,
Gareth Collins

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Gareth Collins
Thanks for the response!

I could understand that being the case if the Cassandra cluster is not
loaded. Splitting the work across multiple nodes would obviously make
the query faster.

But if this was just a single node, shouldn't one IN query be faster
than multiple due to the fact that, if I understand correctly,
Cassandra should need to do less work?

thanks in advance,
Gareth

On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh
<rahul.xavier.si...@gmail.com> wrote:
> That depends on the driver you use but separate queries asynchronously
> around the cluster would be faster.
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>, wrote:
>
> Someone can correct me if I'm wrong, but I believe if you do a large IN() on
> a single partition's cluster keys, all the reads are going to be served from
> a single replica.  Compared to many concurrent individual equal statements
> you can get the performance gain of leaning on several replicas for
> parallelism.
>
> On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <gareth.o.coll...@gmail.com>
> wrote:
>>
>> Hello,
>>
>> When querying large wide rows for multiple specific values is it
>> better to do separate queries for each value...or do it with one query
>> and an "IN"? I am using Cassandra 2.1.14
>>
>> I am asking because I had changed my app to use 'IN' queries and it
>> **appears** to be slower rather than faster. I had assumed that the
>> "IN" query should be faster...as I assumed it only needs to go down
>> the read path once (i.e. row cache -> memtable -> key cache -> bloom
>> filter -> index summary -> index -> compaction -> sstable) rather than
>> once for each entry? Or are there some additional caveats that I
>> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
>> query entries, closeness of 'IN' query values in the SSTable etc.)?
>>
>> thanks in advance,
>> Gareth Collins
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Performance Of IN Queries On Wide Rows

2018-02-20 Thread Gareth Collins
Hello,

When querying large wide rows for multiple specific values is it
better to do separate queries for each value...or do it with one query
and an "IN"? I am using Cassandra 2.1.14

I am asking because I had changed my app to use 'IN' queries and it
**appears** to be slower rather than faster. I had assumed that the
"IN" query should be faster...as I assumed it only needs to go down
the read path once (i.e. row cache -> memtable -> key cache -> bloom
filter -> index summary -> index -> compaction -> sstable) rather than
once for each entry? Or are there some additional caveats that I
should be aware of for 'IN' query performance (e.g. ordering of 'IN'
query entries, closeness of 'IN' query values in the SSTable etc.)?

thanks in advance,
Gareth Collins

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Weird Bootstrapping Issue

2017-05-01 Thread Gareth Collins
Hi,

We are running Cassandra 2.1.14 on an IBM AIX cluster using IBM Java 7
(1.7.1.64). I am having problems adding new nodes to the cluster. I am
seeing the following exception. It appears like the new node is
getting stuck trying to send the magic number on the first streaming
socket...whilst the receiving node never receives it and times out
after 10 seconds.

New Node:

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,196
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /1.2.3.4

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:20,197
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /5.6.7.8

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,209
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /1.2.3.4

INFO  [STREAM-IN-/1.2.3.4] 2017-04-28 17:39:20,276
StreamResultFuture.java:166 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92 ID#0] Prepare completed.
Receiving 2 files(43103 bytes), sending 0 files(0 bytes)

INFO  [StreamReceiveTask:2] 2017-04-28 17:39:20,410
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /1.2.3.4 is
complete

ERROR [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,207
StreamSession.java:505 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Streaming error occurred

java.nio.channels.AsynchronousCloseException: null

at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:224)
~[na:1.7.0]

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:538)
~[na:1.7.0]

at 
org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.sendInitMessage(ConnectionHandler.java:191)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:81)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208)
[apache-cassandra-2.1.14.jar:2.1.14]

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
[na:1.7.0]

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
[na:1.7.0]

at java.lang.Thread.run(Thread.java:809) [na:1.7.0]

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,208
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /5.6.7.8 is
complete

WARN  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,211
StreamResultFuture.java:207 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Stream failed

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,212
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /5.6.7.8

ERROR [main] 2017-04-28 17:39:30,213 CassandraDaemon.java:581 -
Exception encountered during startup

java.lang.RuntimeException: Error during boostrap: Stream failed

at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
~[apache-cassandra-2.1.14.jar:2.1.14]


Existing node:

DEBUG [ACCEPT-/5.6.7.8] 2017-04-28 17:39:29,914
MessagingService.java:1014 - Error reading the socket
Socket[addr=/9.0.1.2,port=55848,localport=7000]

java.net.SocketTimeoutException: null

at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:242)
~[na:1.7.0]

at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:116)
~[na:1.7.0]

at java.io.DataInputStream.readFully(DataInputStream.java:207)
~[na:1.7.0]

at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

at 
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:988)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,989
IncomingTcpConnection.java:92 - eof reading from socket; closing

java.io.EOFException: null

at java.io.DataInputStream.readFully(DataInputStream.java:209)
~[na:1.7.0]

at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,990

Cassandra Memory Question

2014-03-09 Thread Gareth Collins
Hello,

I have a question about CQL memory usage. I am currently using 1.2.9.

If I have a Cassandra table like this (created using Astyanax API):

CREATE TABLE table_name (
  key text,
  column1 text,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE;

and I run a query like this:

select key from table_name;

Will Cassandra filter the key from the row as it goes...or will it
get all the rows first (i.e. requiring the whole table in memory),
then filter out the key? Or will it filter each row as it goes?

I ask because I am researching an OOM on our Cassandra system. I
believe there must be a query select * from table_name (each value
blob is very large - I see the value blobs in the Cassandra hprof),
which would explain the OOM. However I am told the query is select
key from table_name. If it needs to read the whole table into memory
anyway, this would explain the OOM (BTW - I know that this type of
query is usually a bad idea without some type of paging).

As a supplementary question, is there any way to actually trace the
CQL query test? I turned on the tracing described here:

http://www.datastax.com/dev/blog/advanced-request-tracing-in-cassandra-1-2

Whilst I found the bad query (I was able to match it to the thread
name from the OOM Exception), the trace did not appear to be storing
the original query text. The only CQL text I saw in the trace was from
those queries done from cqlsh.

thanks in advance,
Gareth


Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-02 Thread Gareth Collins
OK, thanks for the information.

Gareth

On Thu, Aug 1, 2013 at 3:53 PM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Aug 1, 2013 at 12:49 PM, Gareth Collins gareth.o.coll...@gmail.com
 wrote:

 Would this be correct? Just making sure I understand how to best use
 secondary indexes in Cassandra with time series data.


 In general unless you ABSOLUTELY NEED the one unique feature of built-in
 Secondary Indexes (atomic update of base row and index) you should just use
 a normal column family for secondary index cases.

 =Rob


Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Gareth Collins
Hello,

Say I have time series data for a table like this:

CREATE TABLE mytimeseries (
pk_part1  text,
partition bigint,  e.g. partition per day or per hour
pk_part2  text,  this is part of the partition key so I can
split write load
message_id  timeuuid,
secondary_key1  text,
secondary_key2   text,
.
more columns
.
PRIMARY KEY ((pk_part1, partition, pk_part2), message_id));

Most of the time I will need to do queries with
pk_part1/partition/pk_part2/message_id range. So this is what I
optimize for.

Sometimes, however, I will need to do queries with
pk_part1/partition/message_id range and some combination of
secondary_key1 (95% of the time there is a one-to-one relationship
with pk_part1) or secondary_key2 (for each secondary_key2 there will
be many pk_part2 values).

In this time series scenario, to efficiently make use of
secondary_key1/secondary_key2 as Cassandra secondary indexes for these
queries I assume that secondary_key1/secondary_key_2 would really need
to be composites combined into one column (in SQL I would create
multi-column indexes)? i.e.:

secondary_key_1 - pk_part1 + partition_key + real_secondary_key_1
secondary_key_2 - pl_part2 + partition_key + real_secondary_key_2

Would this be correct? Just making sure I understand how to best use
secondary indexes in Cassandra with time series data.

thanks in advance,
Gareth


Re: Coprosessors/Triggers in C*

2013-06-13 Thread Gareth Collins
Edward, Michal,

Thanks very much for the answers. I hadn't really thought before about how
Cassandra would implement the TTL feature. I had foolishly assumed that it
would be like a delete (which I would eventually be able to trigger on to
execute another action) but it makes sense how it is really implemented.

I will need to find another way outside of Cassandra to implement my do
something if not deleted before TTL requirement (ugh).

Anyway, thanks again for the clarification.

Gareth



On Thu, Jun 13, 2013 at 2:19 AM, Michal Michalski mich...@opera.com wrote:

 I understood it as a run trigger when column gets deleted due to TTL, so
 - as you said - it doesn't sound like something that can be done.

 Gareth, TTL'd columns in Cassandra are not really removed after TTL - they
 are just ignored from that time (so they're not returned by queries), but
 they still exist as long as they're not tombstoned and then removed after
 grace period. Cassandra doesn't know about the exact moment they become
 outdated due to TTL. It could be doable to do something when they get
 converted to tombstone, but I don't think it's the use case you're looking
 for.

 M.


  I do not understand what feature you suggesting. Columns can already have
 a
 ttl. Are you speaking of a ttl column that could delete something beside
 itself.


  That does not sound easy because a ttl comment is dorment until read or
 compacted.

 On Tuesday, June 11, 2013, Gareth Collins gareth.o.coll...@gmail.com
 wrote:

 Hello Edward,
 I am curious - What about triggering on a TTL timeout delete (something I

 am most interested in doing - perhaps it doesn't make sense?)? Would you
 say that is something the user should implement themselves? Would you see
 intravert being able to do something with this at some later point
 (somehow?)?

 thanks,
 Gareth
 On Tue, Jun 11, 2013 at 2:34 PM, Edward Capriolo edlinuxg...@gmail.com

 wrote:


 This is arguably something you should do yourself. I have been

 investigating integrating vertx and cassandra together for a while to
 accomplish this type of work, mainly to move processing close to data and
 eliminate large batches that can be computed from a single map of data.



  https://github.com/zznate/**intravert-ug/wiki/Service-**
 Processor-for-trigger-like-**functionalityhttps://github.com/zznate/intravert-ug/wiki/Service-Processor-for-trigger-like-functionality


 On Tue, Jun 11, 2013 at 5:06 AM, Tanya Malik sonichedg...@gmail.com

 wrote:


 Thanks Romain.

 On Tue, Jun 11, 2013 at 1:44 AM, Romain HARDOUIN 

 romain.hardo...@urssaf.fr wrote:


 Not yet but Cassandra 2.0 will provide experimental triggers:
 https://issues.apache.org/**jira/browse/CASSANDRA-1311https://issues.apache.org/jira/browse/CASSANDRA-1311


 Tanya Malik sonichedg...@gmail.com a écrit sur 11/06/2013 04:12:44
 :

  De : Tanya Malik sonichedg...@gmail.com
 A : user@cassandra.apache.org,
 Date : 11/06/2013 04:13
 Objet : Coprosessors/Triggers in C*

 Hi,

 Does C* support something like co-processor functionality/triggers

 to

 run client-supplied code in the address space of the server?









Re: Coprosessors/Triggers in C*

2013-06-11 Thread Gareth Collins
Hello Edward,

I am curious - What about triggering on a TTL timeout delete (something I
am most interested in doing - perhaps it doesn't make sense?)? Would you
say that is something the user should implement themselves? Would you see
intravert being able to do something with this at some later point
(somehow?)?

thanks,
Gareth

On Tue, Jun 11, 2013 at 2:34 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 This is arguably something you should do yourself. I have been
 investigating integrating vertx and cassandra together for a while to
 accomplish this type of work, mainly to move processing close to data and
 eliminate large batches that can be computed from a single map of data.


 https://github.com/zznate/intravert-ug/wiki/Service-Processor-for-trigger-like-functionality


 On Tue, Jun 11, 2013 at 5:06 AM, Tanya Malik sonichedg...@gmail.comwrote:

 Thanks Romain.


 On Tue, Jun 11, 2013 at 1:44 AM, Romain HARDOUIN 
 romain.hardo...@urssaf.fr wrote:

 Not yet but Cassandra 2.0 will provide experimental triggers:
 https://issues.apache.org/jira/browse/CASSANDRA-1311


 Tanya Malik sonichedg...@gmail.com a écrit sur 11/06/2013 04:12:44 :

  De : Tanya Malik sonichedg...@gmail.com
  A : user@cassandra.apache.org,
  Date : 11/06/2013 04:13
  Objet : Coprosessors/Triggers in C*
 
  Hi,
 
  Does C* support something like co-processor functionality/triggers to
  run client-supplied code in the address space of the server?






Re: Hector vs Astyanax dependency issue

2013-05-26 Thread Gareth Collins
Hi Renato,

Are you sure that you don't have two copies of guava in your classpath? I
don't have this problem (I was using both Hector and Astyanax for a while
- now transitioned completely to Astyanax).

Probably the most problematic part of using the datastax or astyanax
clients is that they both depend on the cassandra-all jar which by
default brings in a massive number of dependencies. It took me a good
couple of days to figure out what was really required (especially since I
work in OSGi - I had to OSGi all the non-OSGi dependencies, ugh).

Gareth


On Fri, May 24, 2013 at 7:02 PM, Renato Marroquín Mogrovejo 
renatoj.marroq...@gmail.com wrote:

 Hi all,

 I am using Astyanax and Hector client within an application but right now
 I am hitting a dependency issue [1] related to Guava version being used by
 Hector and Astyanax which makes Maven headache. I have taken it out as
 exclusions within my poms but I still get the dependency issue.
 Do you guys think you could help me out with this one?
 Thanks in advance!


 Renato M.

 [1] https://github.com/Netflix/astyanax/issues/204




Re: CQL3 And ReversedTypes Question

2013-04-15 Thread Gareth Collins
Added:

https://issues.apache.org/jira/browse/CASSANDRA-5472

thanks,
Gareth


On Sun, Apr 14, 2013 at 2:33 PM, aaron morton aa...@thelastpickle.comwrote:

 Bad Request: Type error:
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318cannot be 
 passed as argument 0 of function dateof of type timeuuid

 Is there something I am missing here or should I open a new ticket?

 Yes please.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:40 PM, Gareth Collins gareth.o.coll...@gmail.com
 wrote:

 OK, trying out 1.2.4. The previous issue seems to be fine, but I am
 experiencing a new one:

 cqlsh:location create table test_y (message_id timeuuid, name text,
 PRIMARY KEY (name,message_id));
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_y;

  dateOf(message_id)
 --
  2013-04-13 00:33:42-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:44-0400

 cqlsh:location create table test_x (message_id timeuuid, name text,
 PRIMARY KEY (name,message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_x;
 Bad Request: Type error:
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318cannot be 
 passed as argument 0 of function dateof of type timeuuid

 Is there something I am missing here or should I open a new ticket?

 thanks in advance,
 Gareth


 On Tue, Mar 26, 2013 at 3:30 PM, Gareth Collins 
 gareth.o.coll...@gmail.com wrote:

 Added:

 https://issues.apache.org/jira/browse/CASSANDRA-5386

 Thanks very much for the quick answer!

 regards,
 Gareth

 On Tue, Mar 26, 2013 at 3:55 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
  You aren't missing anything obvious. That's a bug really. Would you mind
  opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA?
 
  --
  Sylvain
 
 
  On Tue, Mar 26, 2013 at 2:48 AM, Gareth Collins 
 gareth.o.coll...@gmail.com
  wrote:
 
  Hi,
 
  I created a table with the following structure in cqlsh (Cassandra
  1.2.3 - cql 3):
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId));
 
  I can quite happily add values to this table. e.g:
 
  insert into client_queue (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  Yet if I decide I want to set the clustering order on messageId DESC:
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
  ORDER BY (messageId DESC);
 
  and try to do an insert:
 
  insert into client_queue2 (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  I get the following error:
 
  Bad Request: Type error: cannot assign result of function now (type
  timeuuid) to messageid (type
 
 
 'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')
 
  I am sure I am missing something obvious here, but I don't understand.
  Why am I getting an error? What do I need
  to do to be able to add an entry to this table?
 
  thanks in advance,
  Gareth
 
 






Anyway To Query Just The Partition Key?

2013-04-13 Thread Gareth Collins
Hello,

If I have a cql3 table like this (I don't have a table with this data -
this is just for example):

create table (
surname text,
city text,
country text,
event_id timeuuid,
data text,
PRIMARY KEY ((surname, city, country),event_id));

there is no way of (easily) getting the set (or a subset) of partition
keys, is there (i.e. surname/city/country)? If I want easy access to do
queries to get a subset of the partition keys, I have to create another
table?

I am assuming yes but just making sure I am not missing something obvious
here.

thanks in advance,
Gareth


Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Gareth Collins
Thank you for the answer.

My apologies. I should have been clearer with my question.

Say for example, I have a 1000 partition keys and 1 rows per partition
key I am trying to avoid bringing back 10 million rows to find the 1000
partition keys. I assume I cannot avoid bringing back the 10 million rows
(or at least an order of magnitude more than 1000 rows) without having
another table?

thanks,
Gareth


On Sat, Apr 13, 2013 at 4:13 AM, Jabbar Azam aja...@gmail.com wrote:

 With your example you can do an equality search with surname and city and
 then use in with country

 Eg.  Select * from yourtable where surname=blah and city=blah blah and
 country in (country1, country2)

 Hope that helps

 Jabbar Azam
 On 13 Apr 2013 07:06, Gareth Collins gareth.o.coll...@gmail.com wrote:

 Hello,

 If I have a cql3 table like this (I don't have a table with this data -
 this is just for example):

 create table (
 surname text,
 city text,
 country text,
 event_id timeuuid,
 data text,
 PRIMARY KEY ((surname, city, country),event_id));

 there is no way of (easily) getting the set (or a subset) of partition
 keys, is there (i.e. surname/city/country)? If I want easy access to do
 queries to get a subset of the partition keys, I have to create another
 table?

 I am assuming yes but just making sure I am not missing something obvious
 here.

 thanks in advance,
 Gareth




Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Gareth Collins
Edward,

Thanks for the response. This is what I thought. The only reason why I am
doing it like this is that I don't know these partition keys in advance
(otherwise I would design this differently). So when I need to insert data,
it looks like I need to insert to both the data table and the table
containing the partition keys. Good thing writes in Cassandra are
idempotent...:)

thanks again,
Gareth


On Sat, Apr 13, 2013 at 7:26 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You can 'list' or 'select *' the column family and you get them in a
 pseudo random order. When you say subset it implies you might want a
 specific range which is something this schema can not do.




 On Sat, Apr 13, 2013 at 2:05 AM, Gareth Collins 
 gareth.o.coll...@gmail.com wrote:

 Hello,

 If I have a cql3 table like this (I don't have a table with this data -
 this is just for example):

 create table (
 surname text,
 city text,
 country text,
 event_id timeuuid,
 data text,
 PRIMARY KEY ((surname, city, country),event_id));

 there is no way of (easily) getting the set (or a subset) of partition
 keys, is there (i.e. surname/city/country)? If I want easy access to do
 queries to get a subset of the partition keys, I have to create another
 table?

 I am assuming yes but just making sure I am not missing something obvious
 here.

 thanks in advance,
 Gareth





CQL3 And Map Literals

2013-03-28 Thread Gareth Collins
Hello,

I have been playing with map literals in CQL3 queries. I see that
single-quotes work:

{'foo':'bar'}

but double-quotes do not:

{foo:bar}

I am curious. Was there a specific reason why it was decided to use
single-quotes?
I ask because double-quotes would make this valid json.

thanks in advance,
Gareth


Re: CQL3 And ReversedTypes Question

2013-03-26 Thread Gareth Collins
Added:

https://issues.apache.org/jira/browse/CASSANDRA-5386

Thanks very much for the quick answer!

regards,
Gareth

On Tue, Mar 26, 2013 at 3:55 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 You aren't missing anything obvious. That's a bug really. Would you mind
 opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA?

 --
 Sylvain


 On Tue, Mar 26, 2013 at 2:48 AM, Gareth Collins gareth.o.coll...@gmail.com
 wrote:

 Hi,

 I created a table with the following structure in cqlsh (Cassandra
 1.2.3 - cql 3):

 CREATE TABLE mytable ( column1 text,
   column2 text,
   messageId timeuuid,
   message blob,
   PRIMARY KEY ((column1, column2), messageId));

 I can quite happily add values to this table. e.g:

 insert into client_queue (column1,column2,messageId,message) VALUES
 ('string1','string2',now(),'ABCCDCC123');

 Yet if I decide I want to set the clustering order on messageId DESC:

 CREATE TABLE mytable ( column1 text,
   column2 text,
   messageId timeuuid,
   message blob,
   PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
 ORDER BY (messageId DESC);

 and try to do an insert:

 insert into client_queue2 (column1,column2,messageId,message) VALUES
 ('string1','string2',now(),'ABCCDCC123');

 I get the following error:

 Bad Request: Type error: cannot assign result of function now (type
 timeuuid) to messageid (type

 'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')

 I am sure I am missing something obvious here, but I don't understand.
 Why am I getting an error? What do I need
 to do to be able to add an entry to this table?

 thanks in advance,
 Gareth




Returning A Generated Id From An Insert

2013-03-26 Thread Gareth Collins
Hi,

I have a question on if I could do something in Cassandra similar to
what I can do in SQL.

In SQL (e.g. SQL Server), if I have a generated primary key, I can get
the generated primary key
back as a result for the insert statement.

Is it possible to do something similar with CQL (e.g. could I be
returned the generated timeuuid from
now() somehow?). It certainly makes my client code cleaner if this
were possible (it is a nice to have).

thanks in advance,
Gareth


CQL3 And ReversedTypes Question

2013-03-25 Thread Gareth Collins
Hi,

I created a table with the following structure in cqlsh (Cassandra
1.2.3 - cql 3):

CREATE TABLE mytable ( column1 text,
  column2 text,
  messageId timeuuid,
  message blob,
  PRIMARY KEY ((column1, column2), messageId));

I can quite happily add values to this table. e.g:

insert into client_queue (column1,column2,messageId,message) VALUES
('string1','string2',now(),'ABCCDCC123');

Yet if I decide I want to set the clustering order on messageId DESC:

CREATE TABLE mytable ( column1 text,
  column2 text,
  messageId timeuuid,
  message blob,
  PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
ORDER BY (messageId DESC);

and try to do an insert:

insert into client_queue2 (column1,column2,messageId,message) VALUES
('string1','string2',now(),'ABCCDCC123');

I get the following error:

Bad Request: Type error: cannot assign result of function now (type
timeuuid) to messageid (type
'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')

I am sure I am missing something obvious here, but I don't understand.
Why am I getting an error? What do I need
to do to be able to add an entry to this table?

thanks in advance,
Gareth