somebody interested in hacking some very simple php client

2011-01-04 Thread nicolas lattuada

Yesterday i made it real quick, maybe it can help someone.

Here it is:

http://pastebin.com/bAyWMfXD

Hope it helps.

Nicolas
  

Re: Bootstrapping taking long

2011-01-04 Thread shimi
In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi


On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. Logging level at DEBUG now but nothing interesting going on except
 for occasional messages such as [1] or [2]

 So the question is: what's keeping the new node from finishing the
 bootstrap and how can I check its status?
 Thanks

 [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
 Disseminating load info ...
 [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
 StorageService.java (line 1189) computing ranges for
 28356863910078205288614550619314017621,
 56713727820156410577229101238628035242,
  85070591730234615865843651857942052863,
 113427455640312821154458202477256070484,
 141784319550391026443072753096570088105,
 170141183460469231731687303715884105727

 --
 /Ran




Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...


[1]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. Logging level at DEBUG now but nothing interesting going on except
 for occasional messages such as [1] or [2]

 So the question is: what's keeping the new node from finishing the
 bootstrap and how can I check its status?
 Thanks

 [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line
 36) Disseminating load info ...
 [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
 StorageService.java (line 1189) computing ranges for
 28356863910078205288614550619314017621,
 56713727820156410577229101238628035242,
  

Re: Reclaim deleted rows space

2011-01-04 Thread Peter Schuller
 This is what I thought. I was wishing there might be another way to reclaim
 the space.

Be sure you really need this first :) Normally you just let it happen in the bg.

 The problem is that the more data you have the more time it will take to
 Cassandra to response.

Relative to what though? There are definitely important side-effects
of having very large data sets, and part of that involves compactions,
but in a normal steady state type of system you should never be in the
position to wait for a major compaction to run. Compactions are
something that is intended to run every now and then in the
background. It will result in variations in disk space within certain
bounds, which is expected.

Certainly the situation can be improved and the current disk space
utilization situation is not perfect, but the above suggests to me
that you're trying to do something that is not really intended to be
done.

 Reclaim space of deleted rows in the biggest SSTable requires Major
 compaction. This compaction can be triggered by adding x2 data (or x4 data
 in the default configuration) to the system or by executing it manually
 using JMX.

You can indeed choose to trigger major compactions by e.g. cron jobs.
But just be aware that if you're operating under conditions where you
are close to disk space running out, you have other concerns too -
such as periodic repair operations also needing disk space.

Also; suppose you're overwriting lots of data (or replacing by
deleting and adding other data). It is not necessarily true that you
need 4x the space relative to what you otherwise do just because of
the compaction threshold.

Keep in mind that compactions already need extra space anyway. If
you're *not* overwriting or adding data, a compaction of a single CF
is expected to need up to twice the amount of space that it occupies.
If you're doing more overwrites and deletions though, as you point out
you will have more dead data at any given point in time. But on the
other hand, the peak disk space usage during compactions is lower. So
the actual peak disk space usage (which is what matters since you must
have this much disk space) is actually helped by the
deletions/overwrites too.

Further, suppose you trigger major compactions more often. That means
each compaction will have a higher relative spike of disk usage
because less data has had time to be overwritten or removed.

So in a sense, it's like the disk space demands is being moved between
the category of dead data retained for longer than necessary and
peak disk usage during compaction.

Also keep in mind that the *low* peak of disk space usage is not
subject to any fragmentation concerns. Depending on the size of your
data compared to e.g. column names, that disk space usage might be
significantly lower than what you would get with an in-place updating
database. There are lots of trade-offs :)

You say you have to wait for deletions though which sounds like
you're doing something unusual. Are you doing stuff like deleting lots
of data in bulk from one CF, only to then write data to *another* CF?
Such that you're actually having to wait for disk space to be freed to
make room for data somewhere else?

 In case of a system that deletes data regularly, which needs to serve
 customers all day and the time it takes should be in ms, this is a problem.

Not in general. I am afraid there may be some misunderstanding here.
Unless disk space is a problem for you (i.e., you're running out of
space), there is no need to wait for compactions. And certainly
whether you can serve traffic 24/7 at low-ms latencies is an important
consideration, and does become complex when disk I/O is involved, but
it is not about disk *space*. If you have important performance
requirements, make sure you can service the read load at all given
your data set size. If you're runnning out of disk, I presume your
data is big. See
http://wiki.apache.org/cassandra/LargeDataSetConsiderations

Perhaps if you can describe your situation in more detail?

 It appears to me that in order to use Cassandra you must have a process that
 will trigger major compaction on the nodes once in X amount of time.

For some cases this will be beneficial, but not always. It's been
further improved for 0.7 too w.r.t. tomb stone handling in non-major
compactions (I don't have the JIRA ticket number handy). It's
certainly not a hard requirement and would only ever be relevant if
you're operating nodes that are significantly full.

 One case where you would do that is when you don't (or hardly) delete data.

Or just in most cases where you don't push disk space concerns.

 Another one is when your upper limit of time it should take to response is
 very high so major compaction will not hurt you.

To be really clear: Compaction is a background operation. It is never
the case that reads or writes somehow wait for compaction to
complete.

-- 
/ Peter Schuller


Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped
the process, deleted all the data directories and started it again. It
worked in the sense of the node bootstrapped again but as before, after it
had finished moving the data nothing happened for a long time (I'm still
waiting, but nothing seems to be happening).

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks Shimi, so indeed anticompaction was run on one of the other nodes
 from the same DC but to my understanding it has already ended. A few hour
 ago...
 I plenty of log messages such as [1] which ended a couple of hours ago, and
 I've seen the new node streaming and accepting the data from the node which
 performed the anticompaction and so far it was normal so it seemed that data
 is at its right place. But now the new node seems sort of stuck. None of the
 other nodes is anticompacting right now or had been anticompacting since
 then.
 The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.

 On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...


 [1]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. 

Re: Reclaim deleted rows space

2011-01-04 Thread shimi
I think I didn't make myself clear.
I don't have a problem with disk space. I have a problem with the data
size.
I have a simple crud application. Most of the requests are read but there
are update/delete and when the time pass the number of deleted rows is big
enough in order to free some disk space (a matter of days and not hours).
Since not all of the data can fit to RAM (and I have a lot of RAM) the rest
is served from disk. Since disk is slow I want to reduce as much as possible
the number of requests that goes to the disk. The more requests to the disk,
the disk wait time gets longer and it takes more time to return a response.

Bottom line is that I want to reduce the number of requests that goes to
disk. Since there is enough data that is no longer valid I can do it by
reclaiming the space. The only way to do it is by running Major compaction.
I can wait and let Cassandra do it for me but then the data size will get
even bigger and the response time will be worst. I can do it manually but I
prefer it to happen in the background with less impact on the system

Shimi


On Tue, Jan 4, 2011 at 2:33 PM, Peter Schuller
peter.schul...@infidyne.comwrote:

  This is what I thought. I was wishing there might be another way to
 reclaim
  the space.

 Be sure you really need this first :) Normally you just let it happen in
 the bg.

  The problem is that the more data you have the more time it will take to
  Cassandra to response.

 Relative to what though? There are definitely important side-effects
 of having very large data sets, and part of that involves compactions,
 but in a normal steady state type of system you should never be in the
 position to wait for a major compaction to run. Compactions are
 something that is intended to run every now and then in the
 background. It will result in variations in disk space within certain
 bounds, which is expected.

 Certainly the situation can be improved and the current disk space
 utilization situation is not perfect, but the above suggests to me
 that you're trying to do something that is not really intended to be
 done.

  Reclaim space of deleted rows in the biggest SSTable requires Major
  compaction. This compaction can be triggered by adding x2 data (or x4
 data
  in the default configuration) to the system or by executing it manually
  using JMX.

 You can indeed choose to trigger major compactions by e.g. cron jobs.
 But just be aware that if you're operating under conditions where you
 are close to disk space running out, you have other concerns too -
 such as periodic repair operations also needing disk space.

 Also; suppose you're overwriting lots of data (or replacing by
 deleting and adding other data). It is not necessarily true that you
 need 4x the space relative to what you otherwise do just because of
 the compaction threshold.

 Keep in mind that compactions already need extra space anyway. If
 you're *not* overwriting or adding data, a compaction of a single CF
 is expected to need up to twice the amount of space that it occupies.
 If you're doing more overwrites and deletions though, as you point out
 you will have more dead data at any given point in time. But on the
 other hand, the peak disk space usage during compactions is lower. So
 the actual peak disk space usage (which is what matters since you must
 have this much disk space) is actually helped by the
 deletions/overwrites too.

 Further, suppose you trigger major compactions more often. That means
 each compaction will have a higher relative spike of disk usage
 because less data has had time to be overwritten or removed.

 So in a sense, it's like the disk space demands is being moved between
 the category of dead data retained for longer than necessary and
 peak disk usage during compaction.

 Also keep in mind that the *low* peak of disk space usage is not
 subject to any fragmentation concerns. Depending on the size of your
 data compared to e.g. column names, that disk space usage might be
 significantly lower than what you would get with an in-place updating
 database. There are lots of trade-offs :)

 You say you have to wait for deletions though which sounds like
 you're doing something unusual. Are you doing stuff like deleting lots
 of data in bulk from one CF, only to then write data to *another* CF?
 Such that you're actually having to wait for disk space to be freed to
 make room for data somewhere else?

  In case of a system that deletes data regularly, which needs to serve
  customers all day and the time it takes should be in ms, this is a
 problem.

 Not in general. I am afraid there may be some misunderstanding here.
 Unless disk space is a problem for you (i.e., you're running out of
 space), there is no need to wait for compactions. And certainly
 whether you can serve traffic 24/7 at low-ms latencies is an important
 consideration, and does become complex when disk I/O is involved, but
 it is not about disk *space*. If you have important performance
 

Reading data problems during bootstrap. [pycassa 0.7.0 rc4]

2011-01-04 Thread Mateusz Korniak
hi ! 
As cassandra newbie, I am trying to convert my single node cluster to cluster 
with two nodes with RF=2.

I have one node cluster , RF=1  all data accessible:
nodetool -h 192.168.3.8  ring
Address Status State   LoadOwnsToken
192.168.3.8 Up Normal  1.59 GB 100.00% 
150705614882854895881815284349323762700

  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 1


If I switch to RF=2 here my cluster will refuse updates and reads. So I 
bootstrap 2nd node (192.168.3.4) :

$ nodetool -h 192.168.3.8  ring
Address Status State   LoadOwnsToken
   
150705614882854895881815284349323762700
192.168.3.4 Up Joining 120.94 KB   49.80%  
65291865063116528976449321853009633660
192.168.3.8 Up Normal  1.59 GB 50.20%  
150705614882854895881815284349323762700

question 1: Why in that point I am unable to read data from part of keys ? 

Even when node is up:

nodetool -h 192.168.3.8  ring
Address Status State   LoadOwnsToken
   
150705614882854895881815284349323762700
192.168.3.4 Up Normal  721.37 MB   49.80%  
65291865063116528976449321853009633660
192.168.3.8 Up Normal  1.59 GB 50.20%  
150705614882854895881815284349323762700

I am still unable to read part of my original set of data with 
ConsistencyLevel.ONE  (NotFoundException in pycassa) :/

question 2: Why is that ? And what should I do to have cluster with full data 
?

Next I planned to do:
update keyspacewith replication_factor=2;
repair both nodes,
and this point have fully working 2 node cluster with RF=2

question 3: Is this proper approach or there is better one ?


question 4: I hoped that during above operations I would be able to _read_ 
whole dataset as it was at beginning in one node cluster, is it possible ?


Thanks in advance for any answers, regards,
-- 
Mateusz Korniak


Re: Bootstrapping taking long

2011-01-04 Thread Jake Luciani
In 0.6, locate the node doing anti-compaction and look in the streams
subdirectory in the keyspace data dir to monitor the anti-compaction
progress (it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

 Running nodetool decommission didn't help. Actually the node refused to
 decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).

 Any hints how to analyze a stuck bootstrapping node??
 thanks

 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks Shimi, so indeed anticompaction was run on one of the other nodes
 from the same DC but to my understanding it has already ended. A few hour
 ago...
 I plenty of log messages such as [1] which ended a couple of hours ago,
 and I've seen the new node streaming and accepting the data from the node
 which performed the anticompaction and so far it was normal so it seemed
 that data is at its right place. But now the new node seems sort of stuck.
 None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.

 On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...


 [1]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's
 asleep ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for
 too long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it 
 stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. 

Looking for London-based users of Cassandra

2011-01-04 Thread Dave Gardner
I am looking for London-based users of Cassandra who would be interested in
giving a short talk on _how_ they make use of Cassandra, hopefully including
details such as data layout, types of query, load etc.. This is for the
Cassandra London user group -- this month we are planning to have more than
one talk based around use cases.

If anyone is interested then please get in contact.

http://www.meetup.com/Cassandra-London/calendar/15490565/


Dave


Re: Bootstrapping taking long

2011-01-04 Thread shimi
You will have something new to talk about in your talk tomorrow :)

You said that the anti compaction was only on a single node? I think that
your new node should get data from at least two other nodes (depending on
the replication factor). Maybe the problem is not in the new node.
In old version (I think prior to 0.6.3) there was case of stuck bootstrap
that required restart to the new node and the nodes which were suppose to
stream data to it. As far as I remember this case was resolved. I haven't
seen this problem since then.

Shimi

On Tue, Jan 4, 2011 at 3:01 PM, Ran Tavory ran...@gmail.com wrote:

 Running nodetool decommission didn't help. Actually the node refused to
 decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).

 Any hints how to analyze a stuck bootstrapping node??
 thanks

 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks Shimi, so indeed anticompaction was run on one of the other nodes
 from the same DC but to my understanding it has already ended. A few hour
 ago...
 I plenty of log messages such as [1] which ended a couple of hours ago,
 and I've seen the new node streaming and accepting the data from the node
 which performed the anticompaction and so far it was normal so it seemed
 that data is at its right place. But now the new node seems sort of stuck.
 None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.

 On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...


 [1]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's
 asleep ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for
 too long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it 
 stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node 

Re: drop column family bug

2011-01-04 Thread Jonathan Ellis
Data files are explained on the page I linked.

Snapshots must be deleted manually.

2011/1/4 陶敏 taomin...@taobao.com

  Hello!

 I would like to ask a question, data files and snapshot files will be deleted 
 at what time? When I run compact successor remain

 please help me. Thank you.



 Sincerely,



 Salute





 *[image: Picon]* http://ftp.cs.indiana.edu/pub/faces/picons/
  Re: drop column family bug

 Jonathan Ellis jbellis at gmail.com
 2011-01-04 14:08:03 GMT

 It's normal for the sstables to remain temporarily.  See discussion of

 compaction marker at *http://wiki.apache.org/cassandra/MemtableSSTable.* 
 http://wiki.apache.org/cassandra/MemtableSSTable.



 It's also normal for a snapshot to be taken before drop commands.



 2011/1/4 陶敏 taomin.tw at taobao.com



   Hello!

  When I use cassandra 0.7 rc4 delete column family, the data file 
  still

  exists. But also appeared in the snapshot data backup, this is a bug or my 
  version

  of the parameters wrong, please help me. Thank you.

 

  Sincerely,

 

 Salute




 __ Information from ESET NOD32 Antivirus, version of virus
 signature database 5691 (20101210) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com

 --

 This email (including any attachments) is confidential and may be legally
 privileged. If you received this email in error, please delete it
 immediately and do not copy it or use it for any purpose or disclose its
 contents to any other person. Thank you.


 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Converting a TimeUUID to a long (timestamp) and vice-versa

2011-01-04 Thread Roshan Dawrani
Hello Victor,

It is actually not that I need the 2 UUIDs to be exactly same - they need to
be same timestamp wise.

So, what I need is to extract the timestamp portion from a time UUID (say,
U1) and then later in the cycle, use the same long timestamp value to
re-create a UUID (say, U2) that is equivalent of the previous one in terms
of its timestamp portion - i.e., I should be able to give this U2 and filter
the data from a column family - and it should be same as if I had used the
original UUID U1.

Does it make any more sense than before? Any way I can do that?

rgds,
Roshan

On Tue, Jan 4, 2011 at 11:46 PM, Victor Kabdebon
victor.kabde...@gmail.comwrote:

 Hello Roshan,

 Well it is normal to do not be able to get the exact same UUID from a
 timestamp, it is its purpose.
 When you create an UUID you have in fact two information : random 64 bits
 number - 64 bits timestamp. You put that together and you have your uuid.
 .
 So unless you save your random number two UUID for the same milli( or
 micro) second are different.

 Best regards,
 Victor K.
 http://www.voxnucleus.fr

 2011/1/4 Roshan Dawrani roshandawr...@gmail.com

 Hi,
 I am having a little difficulty converting a time UUID to its timestamp
 equivalent and back. Can someone please help?

 Here is what I am trying. Is it not the right way to do it?

 ===
 UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();

 long time = someUUID.timestamp(); /* convery from UUID to a long
 timestamp */
 UUID otherUUID = TimeUUIDUtils.getTimeUUID(time); /* do the
 reverse and get back the UUID from timestamp */

 System.out.println(someUUID); /* someUUID and otherUUID should be
 same, but are different */
 System.out.println(otherUUID);
 ===

 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani





-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani http://twitter.com/roshandawrani
Skype: roshandawrani


Re: Converting a TimeUUID to a long (timestamp) and vice-versa

2011-01-04 Thread Patricio Echagüe
In Hector framework, take a look at TimeUUIDUtils.java

You can create a UUID using   TimeUUIDUtils.getTimeUUID(long time); or
TimeUUIDUtils.getTimeUUID(ClockResolution clock)

and later on, TimeUUIDUtils.getTimeFromUUID(..) or just UUID.timestamp();

There are some example in TimeUUIDUtilsTest.java

Let me know if it helps.



On Tue, Jan 4, 2011 at 10:27 AM, Roshan Dawrani roshandawr...@gmail.comwrote:

 Hello Victor,

 It is actually not that I need the 2 UUIDs to be exactly same - they need
 to be same timestamp wise.

 So, what I need is to extract the timestamp portion from a time UUID (say,
 U1) and then later in the cycle, use the same long timestamp value to
 re-create a UUID (say, U2) that is equivalent of the previous one in terms
 of its timestamp portion - i.e., I should be able to give this U2 and filter
 the data from a column family - and it should be same as if I had used the
 original UUID U1.

 Does it make any more sense than before? Any way I can do that?

 rgds,
 Roshan


 On Tue, Jan 4, 2011 at 11:46 PM, Victor Kabdebon 
 victor.kabde...@gmail.com wrote:

 Hello Roshan,

 Well it is normal to do not be able to get the exact same UUID from a
 timestamp, it is its purpose.
 When you create an UUID you have in fact two information : random 64 bits
 number - 64 bits timestamp. You put that together and you have your uuid.
 .
 So unless you save your random number two UUID for the same milli( or
 micro) second are different.

 Best regards,
 Victor K.
 http://www.voxnucleus.fr

 2011/1/4 Roshan Dawrani roshandawr...@gmail.com

 Hi,
 I am having a little difficulty converting a time UUID to its timestamp
 equivalent and back. Can someone please help?

 Here is what I am trying. Is it not the right way to do it?

 ===
 UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();

 long time = someUUID.timestamp(); /* convery from UUID to a long
 timestamp */
 UUID otherUUID = TimeUUIDUtils.getTimeUUID(time); /* do the
 reverse and get back the UUID from timestamp */

 System.out.println(someUUID); /* someUUID and otherUUID should be
 same, but are different */
 System.out.println(otherUUID);
 ===

 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani





 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani




-- 
Patricio.-


Re: Hector version

2011-01-04 Thread Nate McCall
0.6.0-19 still relies on the local lib directory and execution targets
in the pom file for tracking Cassandra versions. This is fixed in
0.6.0 branch, but has not been released although it is stable (we
could probably stand to do a release here and certainly will with
0.6.9 of Cassandra).

As for migrating from 0.6.0-16, this thread provides some high level
details (as well as additional maven explanation):
http://groups.google.com/group/hector-users/browse_thread/thread/ada58caca0174858/e8dd164ff10cc649?lnk=gstq=release+0.6#

For future reference, hector-specific questions might be better
directed towards hector-us...@googlegroups.com as you will probably
get a more direct reply quicker.


On Tue, Jan 4, 2011 at 12:18 PM, Hugo Zwaal h...@unitedgames.com wrote:
 Hi,

 I'm also using Hector on Cassandra 0.6.8, but could not get Hector 0.6.0-19
 to work in Maven. It refers to an unexistent
 or.apache.cassandra/cassandra/0.6.5 package. I suspect this should be
 org.apache.cassandra/apache-cassandra/0.6.5. I also noticed there exists a
 0.6.0-20 version. My questions are;

 1) Wouldn's it be better to use 0.6.0-20 over 0.6.0-19, or is the latter one
 preferred (an why)?
 2) Is there some documentation on migrating from 0.6.0-16? I noticed that
 several things changed in a backward-incompatible way.

 Thanks, Hugo.

 On 12/31/2010 8:52 AM, Ran Tavory wrote:

 Use 0.6.0-19

 On Friday, December 31, 2010, Zhidong Shezhidong@gmail.com  wrote:

 Hi guys,

 We are trying Cassandra 0.6.8, and could you please kindly tell me which
 Hector Java client is suitable for 0.6.8?
 The Hector 0.7.0 says it's for Cassandra 0.7.X, and shall we use Hector
 0.6.0?

 Thanks,
 Br
 Zhidong





Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours.
It seems that all the data was already transferred to the joining host but
the joining node, after having received the data would still remain in
bootstrapping mode and not join the cluster. I'm not sure that *all* data
was transferred (perhaps other nodes need to transfer more data) but nothing
is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?


On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

 In 0.6, locate the node doing anti-compaction and look in the streams
 subdirectory in the keyspace data dir to monitor the anti-compaction
 progress (it puts new SSTables for bootstrapping node in there)


 On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

 Running nodetool decommission didn't help. Actually the node refused to
 decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).

 Any hints how to analyze a stuck bootstrapping node??
 thanks

 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks Shimi, so indeed anticompaction was run on one of the other nodes
 from the same DC but to my understanding it has already ended. A few hour
 ago...
 I plenty of log messages such as [1] which ended a couple of hours ago,
 and I've seen the new node streaming and accepting the data from the node
 which performed the anticompaction and so far it was normal so it seemed
 that data is at its right place. But now the new node seems sort of stuck.
 None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.

 On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...


 [1]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the
 cluster is the anticompaction on the other nodes. The streaming part is 
 very
 fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's
 asleep ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for
 too long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool 

Re: Cassandra disk usage and failure recovery

2011-01-04 Thread Peter Schuller
 That is correct.  In 0.6, an anticompaction was performed and a temporary
 SSTable was written out to disk, then streamed to the recipient.  The way
 this is now done in 0.7 requires no extra disk space on the source node.

Great. So that should at least mean that running out of diskspace
should always be solvable in terms of the cluster by adding news in
between other pre-existing nodes. That is provided that the internal
node issues (allowing compaction to take place to space can actually
be freed) are solved in some way. And assuming the compaction redesign
happens at some point, that should be a minor issue because it should
be easy to avoid the possibility of getting to the point of not
fitting a single (size limited) sstable.

-- 
/ Peter Schuller


Re: Reclaim deleted rows space

2011-01-04 Thread Peter Schuller
 I don't have a problem with disk space. I have a problem with the data
 size.

[snip]

 Bottom line is that I want to reduce the number of requests that goes to
 disk. Since there is enough data that is no longer valid I can do it by
 reclaiming the space. The only way to do it is by running Major compaction.
 I can wait and let Cassandra do it for me but then the data size will get
 even bigger and the response time will be worst. I can do it manually but I
 prefer it to happen in the background with less impact on the system

Ok - that makes perfect sense then. Sorry for misunderstanding :)

So essentially, for workloads that are teetering on the edge of cache
warmness and is subject to significant overwrites or removals, it may
be beneficial to perform much more aggressive background compaction
even though it might waste lots of CPU, to keep the in-memory working
set down.

There was talk (I think in the compaction redesign ticket) about
potentially improving the use of bloom filters such that obsolete data
in sstables could be eliminated from the read set without
necessitating actual compaction; that might help address cases like
these too.

I don't think there's a pre-existing silver bullet in a current
release; you probably have to live with the need for
greater-than-theoretically-optimal memory requirements to keep the
working set in memory.

-- 
/ Peter Schuller


Re: Insert LongType with ruby

2011-01-04 Thread vicent roca daniel
I'm getting more consistent results using Time.stamp instead of Time

From: https://github.com/fauna/cassandra/blob/master/lib/cassandra/long.rb

when NilClass, Time
# Time.stamp is 52 bytes, so we have 12 bytes of entropy left over
int = ((bytes || Time).stamp  12) + rand(2**12)

I'll keep looking at this.
Thanks! :)


On Mon, Jan 3, 2011 at 10:24 PM, vicent roca daniel sap...@gmail.comwrote:

 The problem I think I have is that I think I'm not storing the correct
 value.
 If I do this (for example):

 app.insert(:NumData, 'device1-cpu', { Time.now + 1 minut = 10.to_s })
 app.insert(:NumData, 'device1-cpu', { Time.now + 1 minu = 10.to_s })
 app.insert(:NumData, 'device1-cpu', { Time.now + 1 minu = 10.to_s })
 app.insert(:NumData, 'device1-cpu', { Time.now + 1 minu = 10.to_s })

 and I do a get query with :start= first_Time.now and :finish= the second
 Time, I should get two columns, but I'm getting none.
 I suspect that the column name is not a valid Time.

 ¿That make sense?
 I'm really new, so please, understand me if I did something crazy :)


 On Mon, Jan 3, 2011 at 10:17 PM, Ryan King r...@twitter.com wrote:

 On Mon, Jan 3, 2011 at 1:15 PM, vicent roca daniel sap...@gmail.com
 wrote:
  hi,
  no I'n not getting any exception.

 Then what problem are you seeing?

 -ryan

  The value gets inserted withou problem.
  If I try to convert to string I get:
  Cassandra::Comparable::TypeError: Expected 2011-01-03 22:14:40 +0100
 to
  cast to a Cassandra::Long (invalid bytecount)
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/long.rb:20:in
  `initialize'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/0.6/columns.rb:10:in
  `new'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/0.6/columns.rb:10:in
  `_standard_insert_mutation'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:125:in
  `block in insert'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:125:in
  `each'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:125:in
  `collect'
  from
 
 /Users/armandolalala/.rvm/gems/ruby-1.9.2-p0/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:125:in
  `insert'
  from (irb):6
  from /Users/armandolalala/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in
 `main'
 
  On Mon, Jan 3, 2011 at 10:06 PM, Ryan King r...@twitter.com wrote:
 
  On Mon, Jan 3, 2011 at 12:56 PM, vicent roca daniel sap...@gmail.com
  wrote:
   Hi again!
   code:
   require 'rubygems'
   require 'cassandra'
   app = Cassandra.new('AOM', servers = 127.0.0.1:9160)
   app.insert(:NumData, 'device1-cpu', { Time.now = 10.to_s })
 
  I'm going to assume you're getting an exception here? I think you need
  to convert the time to a string.
 
   
   storag-confl.xm.
   Keyspace Name=AOM
   ColumnFamily CompareWith=LongType Name=NumericArchive /
   /..
  
   Thanks!!
 
  -ryan
 
 





Re: Insert LongType with ruby

2011-01-04 Thread Ryan King
On Tue, Jan 4, 2011 at 12:50 PM, vicent roca daniel sap...@gmail.com wrote:
 I'm getting more consistent results using Time.stamp instead of Time
 From: https://github.com/fauna/cassandra/blob/master/lib/cassandra/long.rb

Yeah, you were probably overwriting values then.

-ryan


Re: Insert LongType with ruby

2011-01-04 Thread vicent roca daniel
I don't know.
Looking the table with de cli I see this results:

Using app.insert(:Numers, 'device1-cpu', {Time.now = i.to_s }) :

= (column=5300944406187227576, value=3, timestamp=1294175880417061)
= (column=5300944406181604704, value=2, timestamp=1294175880415584)
= (column=5300944406071978530, value=1, timestamp=1294175880413584)


Using app.insert(:Numers, 'device1-cpu', {Time.stamp = i.to_s }) :

= (column=1294176156967820, value=3, timestamp=1294176156967851)
= (column=1294176156966904, value=2, timestamp=1294176156966949)
= (column=1294176156957286, value=1, timestamp=1294176156965795)

Which I think it makes more sense since this columns names are timestamps.
I'll keep working on this.
Thanks for your help ryan :)



On Tue, Jan 4, 2011 at 10:18 PM, Ryan King r...@twitter.com wrote:

 On Tue, Jan 4, 2011 at 12:50 PM, vicent roca daniel sap...@gmail.com
 wrote:
  I'm getting more consistent results using Time.stamp instead of Time
  From:
 https://github.com/fauna/cassandra/blob/master/lib/cassandra/long.rb

 Yeah, you were probably overwriting values then.

 -ryan



Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
I'm still at lost.   I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
 Thanks Jake, but unfortunately the streams directory is empty so I don't 
 think that any of the nodes is anti-compacting data right now or had been in 
 the past 5 hours. It seems that all the data was already transferred to the 
 joining host but the joining node, after having received the data would still 
 remain in bootstrapping mode and not join the cluster. I'm not sure that 
 *all* data was transferred (perhaps other nodes need to transfer more data) 
 but nothing is actually happening so I assume all has been moved.
 Perhaps it's a configuration error from my part. Should I use I use 
 AutoBootstrap=true ? Anything else I should look out for in the configuration 
 file or something else?


 On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

 In 0.6, locate the node doing anti-compaction and look in the streams 
 subdirectory in the keyspace data dir to monitor the anti-compaction progress 
 (it puts new SSTables for bootstrapping node in there)


 On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:


 Running nodetool decommission didn't help. Actually the node refused to 
 decommission itself (b/c it wasn't part of the ring). So I simply stopped the 
 process, deleted all the data directories and started it again. It worked in 
 the sense of the node bootstrapped again but as before, after it had finished 
 moving the data nothing happened for a long time (I'm still waiting, but 
 nothing seems to be happening).




 Any hints how to analyze a stuck bootstrapping node??thanks
 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
 Thanks Shimi, so indeed anticompaction was run on one of the other nodes from 
 the same DC but to my understanding it has already ended. A few hour ago...



 I plenty of log messages such as [1] which ended a couple of hours ago, and 
 I've seen the new node streaming and accepting the data from the node which 
 performed the anticompaction and so far it was normal so it seemed that data 
 is at its right place. But now the new node seems sort of stuck. None of the 
 other nodes is anticompacting right now or had been anticompacting since then.




 The new node's CPU is close to zero, it's iostats are almost zero so I can't 
 find another bottleneck that would keep it hanging.
 On the IRC someone suggested I'd maybe retry to join this node, 
 e.g. decommission and rejoin it again. I'll try it now...






 [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]





 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:





 In my experience most of the time it takes for a node to join the cluster is 
 the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing anticompaction.I 
 don't remember how much data I had in the cluster when I needed to add/remove 
 nodes. I do remember that it took a few hours.






 The node will join the ring only when it will finish the bootstrap.
 --
 /Ran



-- 
/Ran


Cassandra LongType data insertion problem

2011-01-04 Thread Jaydeep Chovatia
Hi,

I have configured Cassandra Column Family (standard CF) of LongType. If I try 
to insert data (using batch_mutate) in this Column Family then it shows me 
following error: A long is exactly 8 bytes. I have tried assigning column 
name of 8 bytes, 7 bytes, etc. but it shows same error.

Please find my sample program details:
Platform: Linux
Language: C++, Cassandra Thrift interface

Column c1;
c1.name = 12345678;
c1.value = SString(len).AsPtr();
c1.timestamp = curTime;
columns.push_back(c1);

Any help on this would be appreciated.

Thank you,
Jaydeep


Deletion via SliceRange

2011-01-04 Thread mike dooley
any idea when Deletion via SliceRanges will be supported?

 [java] Caused by: InvalidRequestException(why:Deletion does not yet 
support SliceRange predicates.)
 [java] at 
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16477)

thanks,
-mike

Re: Bootstrapping taking long

2011-01-04 Thread Nate McCall
Does the new node have itself in the list of seeds per chance? This
could cause some issues if so.

On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
 I'm still at lost.   I haven't been able to resolve this. I tried
 adding another node at a different location on the ring but this node
 too remains stuck in the bootstrapping state for many hours without
 any of the other nodes being busy with anti compaction or anything
 else. I don't know what's keeping it from finishing the bootstrap,no
 CPU, no io, files were already streamed so what is it waiting for?
 I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
 be anything addressing a similar issue so I figured there was no point
 in upgrading. But let me know if you think there is.
 Or any other advice...

 On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
 Thanks Jake, but unfortunately the streams directory is empty so I don't 
 think that any of the nodes is anti-compacting data right now or had been in 
 the past 5 hours. It seems that all the data was already transferred to the 
 joining host but the joining node, after having received the data would 
 still remain in bootstrapping mode and not join the cluster. I'm not sure 
 that *all* data was transferred (perhaps other nodes need to transfer more 
 data) but nothing is actually happening so I assume all has been moved.
 Perhaps it's a configuration error from my part. Should I use I use 
 AutoBootstrap=true ? Anything else I should look out for in the 
 configuration file or something else?


 On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

 In 0.6, locate the node doing anti-compaction and look in the streams 
 subdirectory in the keyspace data dir to monitor the anti-compaction 
 progress (it puts new SSTables for bootstrapping node in there)


 On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:


 Running nodetool decommission didn't help. Actually the node refused to 
 decommission itself (b/c it wasn't part of the ring). So I simply stopped 
 the process, deleted all the data directories and started it again. It 
 worked in the sense of the node bootstrapped again but as before, after it 
 had finished moving the data nothing happened for a long time (I'm still 
 waiting, but nothing seems to be happening).




 Any hints how to analyze a stuck bootstrapping node??thanks
 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
 Thanks Shimi, so indeed anticompaction was run on one of the other nodes 
 from the same DC but to my understanding it has already ended. A few hour 
 ago...



 I plenty of log messages such as [1] which ended a couple of hours ago, and 
 I've seen the new node streaming and accepting the data from the node which 
 performed the anticompaction and so far it was normal so it seemed that data 
 is at its right place. But now the new node seems sort of stuck. None of the 
 other nodes is anticompacting right now or had been anticompacting since 
 then.




 The new node's CPU is close to zero, it's iostats are almost zero so I can't 
 find another bottleneck that would keep it hanging.
 On the IRC someone suggested I'd maybe retry to join this node, 
 e.g. decommission and rejoin it again. I'll try it now...






 [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]





 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:





 In my experience most of the time it takes for a node to join the cluster is 
 the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing 
 anticompaction.I don't remember how much data I had in the cluster when I 
 needed to add/remove 

Re: Deletion via SliceRange

2011-01-04 Thread Jonathan Ellis
It's not on anyone's short list, that I know of.

https://issues.apache.org/jira/browse/CASSANDRA-494

On Tue, Jan 4, 2011 at 5:18 PM, mike dooley doo...@apple.com wrote:
 any idea when Deletion via SliceRanges will be supported?

     [java] Caused by: InvalidRequestException(why:Deletion does not yet 
 support SliceRange predicates.)
     [java]     at 
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16477)

 thanks,
 -mike



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


anyone using Cassandra as an analytics/data warehouse?

2011-01-04 Thread Dave Viner
Does anyone use Cassandra to power an analytics or data warehouse
implementation?

As a concrete example, one could imagine Cassandra storing data for
something that reports on page-views on a website.  The basic notions might
be simple (url as row-key and columns as timeuuids of viewers).  But, how
would one store things like ip-geolocation to set of pages viewed?  Or
hour-of-day to pages viewed?

Also, how would one do a query like
- tell me how many page views occurred between 12/01/2010 and 12/31/2010?
- tell me how many page views occurred between 12/01/2010 and 12/31/2010
from the US?
- tell me how many page views occurred between 12/01/2010 and 12/31/2010
from the US in the 9th hour of the day (in gmt)?

Time slicing and dimension slicing seems like it might be very challenging
(especially since the windows of time would not be known in advance).

Thanks
Dave Viner


Re: anyone using Cassandra as an analytics/data warehouse?

2011-01-04 Thread Peter Harrison
Okay, here is two ways to handle this, both are quite different from each
other.


A)

This approach does not depend on counters. You simply have a Column Family
with the row key being the Unix time divided by 60x60 and a column key of...
pretty much anything unique. Then have another process look at the current
row every hour to actually compile the numbers, and store the count in the
same Column Family. This will solve the first and third use cases, as it is
just a matter of looking at the right rows. The second case will require a
similar index, but one which includes a country code to be appended to the
row key.

The downside here is that you are storing lots of data on individual
requests and retaining it. If you don't want the detailed data you might add
a second process to purge the detail every hour.

B)

There is a counter feature added to the latest versions of Cassandra. I
have not used them, but they should be able to be used to achieve the same
effect without a second process cleaning up every hour. Also means it is
more of a real time system so you can see how many requests in the hour you
are currently in.



Basically you have to design your approach based on the query you will be
doing. Don't get too hung up on traditional data structures and queries as
they have little relationship to a Cassandra approach.


On Wed, Jan 5, 2011 at 2:34 PM, Dave Viner davevi...@gmail.com wrote:

 Does anyone use Cassandra to power an analytics or data warehouse
 implementation?

 As a concrete example, one could imagine Cassandra storing data for
 something that reports on page-views on a website.  The basic notions might
 be simple (url as row-key and columns as timeuuids of viewers).  But, how
 would one store things like ip-geolocation to set of pages viewed?  Or
 hour-of-day to pages viewed?

 Also, how would one do a query like
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US in the 9th hour of the day (in gmt)?

 Time slicing and dimension slicing seems like it might be very challenging
 (especially since the windows of time would not be known in advance).

 Thanks
 Dave Viner



Re: Converting a TimeUUID to a long (timestamp) and vice-versa

2011-01-04 Thread Roshan Dawrani
If I use *com.eaio.uuid.UUID* directly, then I am able to do what I need
(attached a Java program for the same), but unfortunately I need to deal
with *java.util.UUID *in my application and I don't have its equivalent
com.eaio.uuid.UUID at the point where I need the timestamp value.

Any suggestion on how I can achieve the equivalent using Hector library's
TimeUUIDUtils?

On Wed, Jan 5, 2011 at 7:21 AM, Roshan Dawrani roshandawr...@gmail.comwrote:

 Hi Victor / Patricio,

 I have been using Hector library's TimeUUIDUtils. I also just looked at
 TimeUUIDUtilsTest also but didn't find anything similar being tested there.

 Here is what I am trying and it's not working - I am creating a Time UUID,
 extracting its timestamp value and with that I create another Time UUID and
 I am expecting both time UUIDs to have the same timestamp() value - am I
 doing / expecting something wrong here?:

 ===
 import java.util.UUID;
 import me.prettyprint.cassandra.utils.TimeUUIDUtils;

 public class TryHector {
 public static void main(String[] args) throws Exception {
 UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();
 long timestamp1 = someUUID.timestamp();

 UUID otherUUID = TimeUUIDUtils.getTimeUUID(timestamp1);
 long timestamp2 = otherUUID.timestamp();

 System.out.println(timestamp1);
 System.out.println(timestamp2);
 }
 }
 ===

 I have to create the timestamp() equivalent of my time UUIDs so I can send
 it to my UI client, for which it will be simpler to compare long timestamp
 than comparing UUIDs. Then for the long timestamp chosen by the client, I
 need to re-create the equivalent time UUID and go and filter the data from
 Cassandra database.


 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani

 On Wed, Jan 5, 2011 at 1:32 AM, Victor Kabdebon victor.kabde...@gmail.com
  wrote:

 Hi Roshan,

 Sorry I misunderstood your problem.It is weird that it doesn't work, it
 works for me...
 As Patricio pointed out use hector standard way of creating TimeUUID and
 tell us if it still doesn't work.
 Maybe you can paste here some of the code you use to query your columns
 too.

 Victor K.
 http://www.voxnucleus.fr

 2011/1/4 Patricio Echagüe patric...@gmail.com

 In Hector framework, take a look at TimeUUIDUtils.java

 You can create a UUID using   TimeUUIDUtils.getTimeUUID(long time); or
 TimeUUIDUtils.getTimeUUID(ClockResolution clock)

 and later on, TimeUUIDUtils.getTimeFromUUID(..) or just UUID.timestamp();

 There are some example in TimeUUIDUtilsTest.java

 Let me know if it helps.




 On Tue, Jan 4, 2011 at 10:27 AM, Roshan Dawrani roshandawr...@gmail.com
  wrote:

 Hello Victor,

 It is actually not that I need the 2 UUIDs to be exactly same - they
 need to be same timestamp wise.

 So, what I need is to extract the timestamp portion from a time UUID
 (say, U1) and then later in the cycle, use the same long timestamp value to
 re-create a UUID (say, U2) that is equivalent of the previous one in terms
 of its timestamp portion - i.e., I should be able to give this U2 and 
 filter
 the data from a column family - and it should be same as if I had used the
 original UUID U1.

 Does it make any more sense than before? Any way I can do that?

 rgds,
 Roshan


 On Tue, Jan 4, 2011 at 11:46 PM, Victor Kabdebon 
 victor.kabde...@gmail.com wrote:

 Hello Roshan,

 Well it is normal to do not be able to get the exact same UUID from a
 timestamp, it is its purpose.
 When you create an UUID you have in fact two information : random 64
 bits number - 64 bits timestamp. You put that together and you have your
 uuid.
 .
 So unless you save your random number two UUID for the same milli( or
 micro) second are different.

 Best regards,
 Victor K.
 http://www.voxnucleus.fr

 2011/1/4 Roshan Dawrani roshandawr...@gmail.com

 Hi,
 I am having a little difficulty converting a time UUID to its
 timestamp equivalent and back. Can someone please help?

 Here is what I am trying. Is it not the right way to do it?

 ===
 UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();

 long time = someUUID.timestamp(); /* convery from UUID to a
 long timestamp */
 UUID otherUUID = TimeUUIDUtils.getTimeUUID(time); /* do the
 reverse and get back the UUID from timestamp */

 System.out.println(someUUID); /* someUUID and otherUUID should
 be same, but are different */
 System.out.println(otherUUID);
 ===

 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani





 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani 

Re: anyone using Cassandra as an analytics/data warehouse?

2011-01-04 Thread Dave Viner
Hi Peter,

Thanks.  These are great ideas.  One comment tho.  I'm actually not as
worried about the logging into the system performance and more
speculating/imagining the querying out of the system.

Most traditional data warehouses have a cube or a star schema or something
similar.  I'm trying to imagine how one might use Cassandra in situations
where that sort of design has historically been applied.

But, I want to make sure I understand your suggestion A.

Is it something like this?

a Column Family with the row key being the Unix time divided by 60x60 and a
column key of... pretty much anything unique
LogCF[hour-day-in-epoch-seconds][timeuuid] = 1
where 'hour-day-in-epoch-seconds' is something like the first second of the
given hour of the day, so 01/04/2011 19:00:00 (in epoch
seconds: 1294167600); 'timeuuid' is a TimeUUID from cassandra, and '1' is
the value of the entry.

Then look at the current row every hour to actually compile the numbers,
and store the count in the same Column Family
LogCF[hour-day-in-epoch-seconds][total] = x
where 'x' is the sum of the number of timeuuid columns in the row?


Is that what you're envisioning in Option A?

Thanks
Dave Viner



On Tue, Jan 4, 2011 at 6:38 PM, Peter Harrison cheetah...@gmail.com wrote:

 Okay, here is two ways to handle this, both are quite different from each
 other.


 A)

 This approach does not depend on counters. You simply have a Column Family
 with the row key being the Unix time divided by 60x60 and a column key of...
 pretty much anything unique. Then have another process look at the current
 row every hour to actually compile the numbers, and store the count in the
 same Column Family. This will solve the first and third use cases, as it is
 just a matter of looking at the right rows. The second case will require a
 similar index, but one which includes a country code to be appended to the
 row key.

 The downside here is that you are storing lots of data on individual
 requests and retaining it. If you don't want the detailed data you might add
 a second process to purge the detail every hour.

 B)

 There is a counter feature added to the latest versions of Cassandra. I
 have not used them, but they should be able to be used to achieve the same
 effect without a second process cleaning up every hour. Also means it is
 more of a real time system so you can see how many requests in the hour you
 are currently in.



 Basically you have to design your approach based on the query you will be
 doing. Don't get too hung up on traditional data structures and queries as
 they have little relationship to a Cassandra approach.



 On Wed, Jan 5, 2011 at 2:34 PM, Dave Viner davevi...@gmail.com wrote:

 Does anyone use Cassandra to power an analytics or data warehouse
 implementation?

 As a concrete example, one could imagine Cassandra storing data for
 something that reports on page-views on a website.  The basic notions might
 be simple (url as row-key and columns as timeuuids of viewers).  But, how
 would one store things like ip-geolocation to set of pages viewed?  Or
 hour-of-day to pages viewed?

 Also, how would one do a query like
 - tell me how many page views occurred between 12/01/2010 and
 12/31/2010?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US in the 9th hour of the day (in gmt)?

 Time slicing and dimension slicing seems like it might be very challenging
 (especially since the windows of time would not be known in advance).

 Thanks
 Dave Viner





Re: anyone using Cassandra as an analytics/data warehouse?

2011-01-04 Thread Jake Luciani
Some relevant information here:
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/

On Tue, Jan 4, 2011 at 10:09 PM, Dave Viner davevi...@gmail.com wrote:

 Hi Peter,

 Thanks.  These are great ideas.  One comment tho.  I'm actually not as
 worried about the logging into the system performance and more
 speculating/imagining the querying out of the system.

 Most traditional data warehouses have a cube or a star schema or something
 similar.  I'm trying to imagine how one might use Cassandra in situations
 where that sort of design has historically been applied.

 But, I want to make sure I understand your suggestion A.

 Is it something like this?

 a Column Family with the row key being the Unix time divided by 60x60 and
 a column key of... pretty much anything unique
 LogCF[hour-day-in-epoch-seconds][timeuuid] = 1
 where 'hour-day-in-epoch-seconds' is something like the first second of the
 given hour of the day, so 01/04/2011 19:00:00 (in epoch
 seconds: 1294167600); 'timeuuid' is a TimeUUID from cassandra, and '1' is
 the value of the entry.

 Then look at the current row every hour to actually compile the numbers,
 and store the count in the same Column Family
 LogCF[hour-day-in-epoch-seconds][total] = x
 where 'x' is the sum of the number of timeuuid columns in the row?


 Is that what you're envisioning in Option A?

 Thanks
 Dave Viner



 On Tue, Jan 4, 2011 at 6:38 PM, Peter Harrison cheetah...@gmail.comwrote:

 Okay, here is two ways to handle this, both are quite different from each
 other.


 A)

 This approach does not depend on counters. You simply have a Column Family
 with the row key being the Unix time divided by 60x60 and a column key of...
 pretty much anything unique. Then have another process look at the current
 row every hour to actually compile the numbers, and store the count in the
 same Column Family. This will solve the first and third use cases, as it is
 just a matter of looking at the right rows. The second case will require a
 similar index, but one which includes a country code to be appended to the
 row key.

 The downside here is that you are storing lots of data on individual
 requests and retaining it. If you don't want the detailed data you might add
 a second process to purge the detail every hour.

 B)

 There is a counter feature added to the latest versions of Cassandra. I
 have not used them, but they should be able to be used to achieve the same
 effect without a second process cleaning up every hour. Also means it is
 more of a real time system so you can see how many requests in the hour you
 are currently in.



 Basically you have to design your approach based on the query you will be
 doing. Don't get too hung up on traditional data structures and queries as
 they have little relationship to a Cassandra approach.



 On Wed, Jan 5, 2011 at 2:34 PM, Dave Viner davevi...@gmail.com wrote:

 Does anyone use Cassandra to power an analytics or data warehouse
 implementation?

 As a concrete example, one could imagine Cassandra storing data for
 something that reports on page-views on a website.  The basic notions might
 be simple (url as row-key and columns as timeuuids of viewers).  But, how
 would one store things like ip-geolocation to set of pages viewed?  Or
 hour-of-day to pages viewed?

 Also, how would one do a query like
 - tell me how many page views occurred between 12/01/2010 and
 12/31/2010?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US?
 - tell me how many page views occurred between 12/01/2010 and 12/31/2010
 from the US in the 9th hour of the day (in gmt)?

 Time slicing and dimension slicing seems like it might be very
 challenging (especially since the windows of time would not be known in
 advance).

 Thanks
 Dave Viner






Re: Converting a TimeUUID to a long (timestamp) and vice-versa

2011-01-04 Thread Roshan Dawrani
Ok, found the solution - finally ! - by applying opposite of what
createTime() does in TimeUUIDUtils. Ideally I would have preferred for this
solution to come from Hector API, so I didn't have to be tied to the private
createTime() implementation.


import java.util.UUID;
import me.prettyprint.cassandra.utils.TimeUUIDUtils;

public class TryHector {
public static void main(String[] args) throws Exception {
final long NUM_100NS_INTERVALS_SINCE_UUID_EPOCH =
0x01b21dd213814000L;

UUID u1 = TimeUUIDUtils.getUniqueTimeUUIDinMillis();
final long t1 = u1.timestamp();

long tmp = (t1 - NUM_100NS_INTERVALS_SINCE_UUID_EPOCH) / 1;

UUID u2 = TimeUUIDUtils.getTimeUUID(tmp);
long t2 = u2.timestamp();

System.out.println(u2.equals(u1));
System.out.println(t2 == t1);
}

}
 


On Wed, Jan 5, 2011 at 8:15 AM, Roshan Dawrani roshandawr...@gmail.comwrote:

 If I use *com.eaio.uuid.UUID* directly, then I am able to do what I need
 (attached a Java program for the same), but unfortunately I need to deal
 with *java.util.UUID *in my application and I don't have its equivalent
 com.eaio.uuid.UUID at the point where I need the timestamp value.

 Any suggestion on how I can achieve the equivalent using Hector library's
 TimeUUIDUtils?


 On Wed, Jan 5, 2011 at 7:21 AM, Roshan Dawrani roshandawr...@gmail.comwrote:

 Hi Victor / Patricio,

 I have been using Hector library's TimeUUIDUtils. I also just looked at
 TimeUUIDUtilsTest also but didn't find anything similar being tested there.

 Here is what I am trying and it's not working - I am creating a Time UUID,
 extracting its timestamp value and with that I create another Time UUID and
 I am expecting both time UUIDs to have the same timestamp() value - am I
 doing / expecting something wrong here?:

 ===
 import java.util.UUID;
 import me.prettyprint.cassandra.utils.TimeUUIDUtils;

 public class TryHector {
 public static void main(String[] args) throws Exception {
 UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();
 long timestamp1 = someUUID.timestamp();

 UUID otherUUID = TimeUUIDUtils.getTimeUUID(timestamp1);
 long timestamp2 = otherUUID.timestamp();

 System.out.println(timestamp1);
 System.out.println(timestamp2);
 }
 }
 ===

 I have to create the timestamp() equivalent of my time UUIDs so I can send
 it to my UI client, for which it will be simpler to compare long timestamp
 than comparing UUIDs. Then for the long timestamp chosen by the client, I
 need to re-create the equivalent time UUID and go and filter the data from
 Cassandra database.


 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani

 On Wed, Jan 5, 2011 at 1:32 AM, Victor Kabdebon 
 victor.kabde...@gmail.com wrote:

 Hi Roshan,

 Sorry I misunderstood your problem.It is weird that it doesn't work, it
 works for me...
 As Patricio pointed out use hector standard way of creating TimeUUID
 and tell us if it still doesn't work.
 Maybe you can paste here some of the code you use to query your columns
 too.

 Victor K.
 http://www.voxnucleus.fr

 2011/1/4 Patricio Echagüe patric...@gmail.com

 In Hector framework, take a look at TimeUUIDUtils.java

 You can create a UUID using   TimeUUIDUtils.getTimeUUID(long time); or
 TimeUUIDUtils.getTimeUUID(ClockResolution clock)

 and later on, TimeUUIDUtils.getTimeFromUUID(..) or just
 UUID.timestamp();

 There are some example in TimeUUIDUtilsTest.java

 Let me know if it helps.




 On Tue, Jan 4, 2011 at 10:27 AM, Roshan Dawrani 
 roshandawr...@gmail.com wrote:

 Hello Victor,

 It is actually not that I need the 2 UUIDs to be exactly same - they
 need to be same timestamp wise.

 So, what I need is to extract the timestamp portion from a time UUID
 (say, U1) and then later in the cycle, use the same long timestamp value 
 to
 re-create a UUID (say, U2) that is equivalent of the previous one in terms
 of its timestamp portion - i.e., I should be able to give this U2 and 
 filter
 the data from a column family - and it should be same as if I had used the
 original UUID U1.

 Does it make any more sense than before? Any way I can do that?

 rgds,
 Roshan


 On Tue, Jan 4, 2011 at 11:46 PM, Victor Kabdebon 
 victor.kabde...@gmail.com wrote:

 Hello Roshan,

 Well it is normal to do not be able to get the exact same UUID from a
 timestamp, it is its purpose.
 When you create an UUID you have in fact two information : random 64
 bits number - 64 bits timestamp. You put that together and you have your
 uuid.
 .
 So unless you save your random number two UUID for the same milli( or
 micro) second are different.

 Best regards,
 

Re: Cassandra LongType data insertion problem

2011-01-04 Thread Tyler Hobbs
Here's an example:

int64_t my_long = 12345678;
char chars[8];
for(int i = 0; i  8; ++i) {
chars[i] = my_long  0xff;
my_long = my_long  1;
}

std::string str_long(chars, 8);

Column c1;
c1.name = str_long;
// etc ...

Basically, Thrift expects a string which is a big-endian binary
representation of a long. When you create the std::string, you have to
specify the length of the char[] so that it doesn't terminate the string on
a 0x00 byte.

The approach is similar for integers and UUIDs.
- Tyler

On Tue, Jan 4, 2011 at 4:32 PM, Jaydeep Chovatia 
jaydeep.chova...@openwave.com wrote:

  Hi,



 I have configured Cassandra Column Family (standard CF) of LongType. If I
 try to insert data (using batch_mutate) in this Column Family then it
 shows me following error: “*A long is exactly 8 bytes”. *I have tried
 assigning column name of 8 bytes, 7 bytes, etc. but it shows same error.



 Please find my sample program details:

 *Platform*: Linux

 *Language*: C++, Cassandra Thrift interface



 Column c1;

 c1.name = 12345678;

 c1.value = SString(len).AsPtr();

 c1.timestamp = curTime;

 columns.push_back(c1);



 Any help on this would be appreciated.



 Thank you,

 Jaydeep



Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
The new node does not see itself as part of the ring, it sees all others but
itself, so from that perspective the view is consistent.
The only problem is that the node never finishes to bootstrap. It stays in
this state for hours (It's been 20 hours now...)

$ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote:

 Does the new node have itself in the list of seeds per chance? This
 could cause some issues if so.

 On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
  I'm still at lost.   I haven't been able to resolve this. I tried
  adding another node at a different location on the ring but this node
  too remains stuck in the bootstrapping state for many hours without
  any of the other nodes being busy with anti compaction or anything
  else. I don't know what's keeping it from finishing the bootstrap,no
  CPU, no io, files were already streamed so what is it waiting for?
  I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
  be anything addressing a similar issue so I figured there was no point
  in upgrading. But let me know if you think there is.
  Or any other advice...
 
  On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
  Thanks Jake, but unfortunately the streams directory is empty so I don't
 think that any of the nodes is anti-compacting data right now or had been in
 the past 5 hours. It seems that all the data was already transferred to the
 joining host but the joining node, after having received the data would
 still remain in bootstrapping mode and not join the cluster. I'm not sure
 that *all* data was transferred (perhaps other nodes need to transfer more
 data) but nothing is actually happening so I assume all has been moved.
  Perhaps it's a configuration error from my part. Should I use I use
 AutoBootstrap=true ? Anything else I should look out for in the
 configuration file or something else?
 
 
  On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:
 
  In 0.6, locate the node doing anti-compaction and look in the streams
 subdirectory in the keyspace data dir to monitor the anti-compaction
 progress (it puts new SSTables for bootstrapping node in there)
 
 
  On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:
 
 
  Running nodetool decommission didn't help. Actually the node refused to
 decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).
 
 
 
 
  Any hints how to analyze a stuck bootstrapping node??thanks
  On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
  Thanks Shimi, so indeed anticompaction was run on one of the other nodes
 from the same DC but to my understanding it has already ended. A few hour
 ago...
 
 
 
  I plenty of log messages such as [1] which ended a couple of hours ago,
 and I've seen the new node streaming and accepting the data from the node
 which performed the anticompaction and so far it was normal so it seemed
 that data is at its right place. But now the new node seems sort of stuck.
 None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 
 
 
 
  The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.
  On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...
 
 
 
 
 
 
  [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
 CompactionManager.java (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
 
 
 
 
   INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
 
 
 
 
   INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
 (line 338) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
 
 
 
 
   INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
 (line 338) AntiCompacting
 

Re: Cassandra LongType data insertion problem

2011-01-04 Thread Tyler Hobbs
Oops, I made one typo there. It should be:

my_long = my_long  8;

That is, shift by a byte, not a bit.
- Tyler

On Tue, Jan 4, 2011 at 10:50 PM, Tyler Hobbs ty...@riptano.com wrote:

 Here's an example:

 int64_t my_long = 12345678;
 char chars[8];
 for(int i = 0; i  8; ++i) {
 chars[i] = my_long  0xff;
 my_long = my_long  1;
 }

 std::string str_long(chars, 8);

 Column c1;
 c1.name = str_long;
 // etc ...

 Basically, Thrift expects a string which is a big-endian binary
 representation of a long. When you create the std::string, you have to
 specify the length of the char[] so that it doesn't terminate the string on
 a 0x00 byte.

 The approach is similar for integers and UUIDs.
 - Tyler


 On Tue, Jan 4, 2011 at 4:32 PM, Jaydeep Chovatia 
 jaydeep.chova...@openwave.com wrote:

  Hi,



 I have configured Cassandra Column Family (standard CF) of LongType. If I
 try to insert data (using batch_mutate) in this Column Family then it
 shows me following error: “*A long is exactly 8 bytes”. *I have tried
 assigning column name of 8 bytes, 7 bytes, etc. but it shows same error.



 Please find my sample program details:

 *Platform*: Linux

 *Language*: C++, Cassandra Thrift interface



 Column c1;

 c1.name = 12345678;

 c1.value = SString(len).AsPtr();

 c1.timestamp = curTime;

 columns.push_back(c1);



 Any help on this would be appreciated.



 Thank you,

 Jaydeep