Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-30 Thread R. Verlangen
You might run it from a VM?

2012/1/30 Ertio Lew ertio...@gmail.com



 On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael 
 michael.fri...@nuance.com wrote:

  OpsCenter?

  http://www.datastax.com/products/opscenter

  - Mike


  I have tried Sebastien's phpmyAdmin For 
 Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to
 see the data stored in Cassandra in the same manner as phpMyAdmin allows.
 But since it makes assumptions about the datatypes of the column
 name/column value  doesn't allow to configure the datatype data should be
 read as on per cf basis, I couldn't make the best use of it.

  Are there any similar other tools out there that can do the job better ?


 Thanks, that's a great product but unfortunately doesn't work with
 windows. Any tools for windows ?




Re: two dimensional slicing

2012-01-30 Thread Bryce Allen
On Sun, 29 Jan 2012 23:26:52 +1300
aaron morton aa...@thelastpickle.com wrote:
  and compare them, but at this point I need to focus on one to get
  things working, so I'm trying to make a best initial guess.
 I would go for RP then, BOP may look like less work to start with but
 it *will* bite you later. If you use an increasing version number as
 a key you will get a hot spot. Get it working with RP and Standard
 CF's, accept the extra lookups, and then see if where you are
 performance / complexity wise. Cassandra can be pretty fast.
The keys are (random uuid)-(version), because there are many lists and
they already have a random id associated with them. Some of the lists
will be much larger than others, but with the random prefix the large
lists will be evenly distributed across the cluster. This is pretty
much the same as having some rows that are bigger than others with RP.
There is a small amount of other data that has non-random keys and
would require an artificial MD5(key) prefix, but it's (at least
currently) an insignificant subset of the total data. I do appreciate
the warning though - if things change and we end up with a lot of keys
that aren't naturally random, I can see how it would be a pain to
manage.

The reason I'm concerned about one more query (especially when it can't
be done in parallel), is that the overarching structure is actually a
tree, and the data payload under a name will often be a pointer to
another list. Each query required in the list lookup will be repeated
at each level.

Anyway I don't want to turn this into a BOP vs RP thread. I'm really
interested in the underlying modeling issues, and how it plays out
using different partitioning is instructive. I'm willing to use BOP _if_
it has real concrete advantages, because it seems very unlikely to
cause balance/hotspot issues for our application. That being said all
other things being equal (or almost equal), I would use RP, and
actually our latest design uses RP...

 I still don't really understand the problem, but I think you have
 many lists of names and when each list is updated you consider it a
 version. 
 
 You then want to answer a query such as get all the names between
 foo and bar that were written to between version 100 and 200. Can
 this query can be re-written as get all the names between foo and
 bar that existed at version 200 and were created on or after version
 100 ?
There are two queries we need to answer - one is get the first N names
from version V of list L starting at name n0 (chunked listing). The
second is get name n from list L version V (or determine that it
doesn't exist). In many cases the list is too big to re-write on every
update, so storing deltas instead of the whole thing becomes
attractive. There can be additions, deletes, updates, and renames
(which can be modeled as deletion + addition). A background process
creates complete lists from the deltas at certain versions (compacts),
to prevent having to replay the entire history.

The queries are actually done based on timestamp, not on a specific
version (e.g. what was the state of the list at time T). The passed
timestamp won't in general correspond to the time of an update.

With this model, fetching a chunk of a list version requires pulling
the range of names from the most recent complete/compacted list less
than or equal to the desired version, and fetching the relevent deltas
between that and the desired version. Fetching the relevent deltas is
where it gets complicated.

We've gone through many iterations - this is our latest model (very
much still subject to change):

CF: List
row key: list id (random uuid)
columns: latest version and unversioned meta data about the list

CF: ListVersionIndex
row key: (list id)
columns: ts - version, compact?

CF: ListCompact
row key: (list id)-(version)
columns: name - associated data

CF: ListDelta
row key: (list id)-(version)
columns: name - operation (create, delete, update) + associated data

With BOP and timestamp versions, ListVersionIndex isn't necessary - a
row range scan can be done to get the latest compact list, and then
another to get all the deltas since compaction, all with an appropriate
column offset and limit. Timestamp versions make cleaning up partial
updates more complicated though, since the versions numbers aren't
known.

With RP, the idea is to query many versions in ListVersionIndex starting
at the desired version going backward, hoping that it will hit a
compact version. We could also maintain a separate CompactVersion
index, and accept another query.

In any case I think this model demonstrates a key point about two
dimensional range queries - RP really only requires one extra query on
an index to do get the row range, and then replaces the BOP row range
query with a multi get. Multi get can be done in parallel (correct me
if I'm wrong?), so it seems reasonable that in some cases it could
actually be faster than the row range query (but still at the cost of
the extra RTT 

recovering from network partition

2012-01-30 Thread Thorsten von Eicken
I'm trying to work through various failure modes to figure out the
proper operating procedure and proper client coding practices. I'm a
little unclear about what happens when a network partition gets
repaired. Take the following scenario:
 - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1
 - network partition divides A-C off from D-E
 - operation continues on both sides, obviously some data is unavailable
from D-E
 - hinted handoffs accumulate

Now the network partition is repaired. The question I have is what is
the sequencing of events, in particular between processing HH and
forwarding read requests across the former partition. I'm hoping that
there is a time period to process HH *before* nodes forward requests.
E.g. it would be really good for A not to forward read requests to D
until D is done with HH processing. Otherwise, clients of A may see a
discontinuity where data that was available during the partition see it
go away and then come back.

Is there a manual or wiki section that discusses some of this and I just
missed it?



Re: two dimensional slicing

2012-01-30 Thread aaron morton
(not trolling) but do you have any ideas on how ? 

The token produced by the partitioner is used as the key in the distributed 
hash table so we can map keys to nodes, and evenly distribute load.  If the 
range of tokens for the DHT are infinite it's difficult to evenly map them to a 
finite set of nodes. 

So…

If you know that the number of DHT keys (and so row keys) are finite then it is 
easier to use the BOP. 

Or if you know that the row keys are something like a time series you could use 
the sort of approach used with Horizontal Partitioning in a RDBMS and run a 
sliding window of nodes. Every month drop the oldest partition / node off the 
end and add a new one for the next month. 

Just some thoughts.
A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/01/2012, at 7:19 PM, Terje Marthinussen wrote:

 
 
 On Sun, Jan 29, 2012 at 7:26 PM, aaron morton aa...@thelastpickle.com wrote:
 and compare them, but at this point I need to focus on one to get
 things working, so I'm trying to make a best initial guess.
 I would go for RP then, BOP may look like less work to start with but it 
 *will* bite you later. If you use an increasing version number as a key you 
 will get a hot spot. Get it working with RP and Standard CF's, accept the 
 extra lookups, and then see if where you are performance / complexity wise. 
 Cassandra can be pretty fast.
 
 Of course, there is no guarantee that it will bite you.
 
 Whatever data hotspot you may get may very well be minor vs. the advantage of 
 slicing continous blocks of data on a single server vs. random bits and 
 pieces all over the place.
 
 For instance, there are many large data repositories out there of analytic 
 data which only have a few queries per hour. BOP will most likely have no 
 performance at all for many of these, indeed, it may be much faster than the 
 alternatives.
 
 BOP is very useful and powerful for many things and saves a fair chunk of 
 development time vs. the alternatives when you can use it.
 
 If we really want everybody to stop using it, we should change cassandra so 
 it by default can provide the same function in some other way without adding 
 days and maybe weeks of development and extra complexity to your project.
  
 Terje
 
 



Re: recovering from network partition

2012-01-30 Thread aaron morton
If you are working at CF ONE you are accepting that *any* value for a key+col 
combination stored on a replica for a row is a valid response, and that 
includes no value.

After the nodes have detected the others are UP they will start their HH in a 
staggered fashion, and will rate limit themselves to avoid overwhelming the 
node. It may take some time to complete. 
  
  Otherwise, clients of A may see a
 discontinuity where data that was available during the partition see it
 go away and then come back.
If you are concerned about reads been consistent, then use CL QUORUM.

If you are reading at CL ONE (in 1.0* ) the read will go one replica 90%  of 
the time, and you will only get the result from that one replica. Which may be 
any value the key+col has been set to including no value. 

The other 10% of the time Read Repair will kick in (this is the configured 
value for read_repair_chance in 1.0, you can change this value). The purpose of 
RR is to make is so that the next time a read happens the data is consistent. 
So reading the CL ONE the read will go to all nodes, you will get a response 
from one and only one of them. In the background the responses from the others 
will be checked and consistency repaired. 

If you were working at a higher CL the responses from CL nodes are checked as 
part of the read request, synchronous to the read, and you get a consistent 
result from all nodes. RR may still run in the background and CL nodes may be 
less than RF nodes.

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/01/2012, at 6:51 AM, Thorsten von Eicken wrote:

 I'm trying to work through various failure modes to figure out the
 proper operating procedure and proper client coding practices. I'm a
 little unclear about what happens when a network partition gets
 repaired. Take the following scenario:
 - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1
 - network partition divides A-C off from D-E
 - operation continues on both sides, obviously some data is unavailable
 from D-E
 - hinted handoffs accumulate
 
 Now the network partition is repaired. The question I have is what is
 the sequencing of events, in particular between processing HH and
 forwarding read requests across the former partition. I'm hoping that
 there is a time period to process HH *before* nodes forward requests.
 E.g. it would be really good for A not to forward read requests to D
 until D is done with HH processing. Otherwise, clients of A may see a
 discontinuity where data that was available during the partition see it
 go away and then come back.
 
 Is there a manual or wiki section that discusses some of this and I just
 missed it?
 



Re: How much has Cassandra improved from 0.8.6 to 1.0+?

2012-01-30 Thread Jake Luciani
Well as they say Lies, damned lies, and statistics  This is a alternate
comparison you can review:
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/

YCSB is a known and agreed upon benchmark.  The benchmark you link includes
no sourcecode to reproduce with and as the author mentions For Cassandra
this was single node cluster, for Mongo simply one server with no
replication. Cluster tests were run for functionality.

-Jake

On Mon, Jan 30, 2012 at 1:56 PM, Kevin klawso...@gmail.com wrote:

 I’m currently using 0.8.6 and want to know how much (performance wise),
 Cassandra has improved. Specifically read performance. This 
 benchmarkhttp://amesar.wordpress.com/2011/10/19/mongodb-vs-cassandra-benchmarks/here
  illustrates my concerns. I don’t know whether it was a fair comparison
 (especially since the conductor did not perform any tweaks or optimizations
 beforehand), but from all the resources I’ve read it seems that Cassandra
 still has quite a way to go before matching the read performance of MongoDB
 and some of the other NoSQL alternatives. 

 ** **

 Is this still true, and if so, how far down the line can we expect to see
 work on this specific area?




-- 
http://twitter.com/tjake


Re: two dimensional slicing

2012-01-30 Thread Bryce Allen
On Mon, 30 Jan 2012 11:14:37 -0600
Bryce Allen bal...@ci.uchicago.edu wrote:
 With RP, the idea is to query many versions in ListVersionIndex
 starting at the desired version going backward, hoping that it will
 hit a compact version. We could also maintain a separate
 CompactVersion index, and accept another query.
Actually a better way to handle this is to store the latest compacted
version with each delta version in the index. When doing compaction, all
the deltas between it and the next compaction (or end) are updated to
point at the new compaction. E.g.:

ts0:  20;20 - compacted version
ts1:  21;20
ts2:  22;20
...
ts9:  29;20
ts10: 30;20
ts11: 31;20

compaction is done on version 30:

...
ts9:  29;20
ts10: 30;30 - new compacted version
ts11: 31;30

Perhaps compaction is a bad term because it already has meaning in
Cassandra, but I can't think of a better name at the moment.

-Bryce


signature.asc
Description: PGP signature


Re: how stable is 1.0 these days?

2012-01-30 Thread Jim Newsham


Could you also elaborate for creating/dropping column families?  We're 
currently working on moving to 1.0 and using dynamically created tables, 
so I'm very interested in what issues we might encounter.


So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is 
that dropping a cf may sometimes fail with UnavailableException.  I 
think this happens when the cf is busy being compacted.  When I 
sleep/retry within a loop it eventually succeeds.


Thanks,
Jim

On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:

Can you elaborate on the composite types instabilities ? is this
specific to hector as the radim's posts suggests ?
These one liner answers are quite stressful :)

On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com  wrote:

If you need to use composite types and create/drop column families on the
fly you must be prepared to instabilities.





Re: SSTable compaction issue in our system

2012-01-30 Thread Roshan Pradeep
Thanks Aaron for the perfect explanation. Decided to go with automatic
compaction. Thanks again.

On Wed, Jan 25, 2012 at 11:19 AM, aaron morton aa...@thelastpickle.comwrote:

 The issue with major / manual compaction is that it creates a one file.
 One big old file.

 That one file will not be compacted unless there are
 (min_compaction_threshold -1) other files of a similar size. So thombstones
 and overwrites in that file may not be purged for a long time.

 If you go down the manual compaction path you need to keep doing it.

 If you feel you need to do it do it, otherwise let automatic compaction do
 it's thing.
 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/01/2012, at 12:47 PM, Roshan wrote:

 Thanks for the reply. Is the major compaction not recommended for Cassandra
 1.0.6?

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-size-of-SSTable-are-remain-in-the-system-without-compact-tp7218239p7222322.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.





WARN [Memtable] live ratio

2012-01-30 Thread Roshan
Hi All

Time to time I am seen this below warning in Cassandra logs.
WARN  [Memtable] setting live ratio to minimum of 1.0 instead of
0.21084217381985554

Not sure what the exact cause for this and the solution to eliminate this.
Any help is appreciated. Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238582.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: WARN [Memtable] live ratio

2012-01-30 Thread Mohit Anchlia
I have the same experience. Wondering what's causing this? One thing I
noticed is that this happens if server is idle for some time and then
load starts going high is when I start to see these messages.

On Mon, Jan 30, 2012 at 4:54 PM, Roshan codeva...@gmail.com wrote:
 Hi All

 Time to time I am seen this below warning in Cassandra logs.
 WARN  [Memtable] setting live ratio to minimum of 1.0 instead of
 0.21084217381985554

 Not sure what the exact cause for this and the solution to eliminate this.
 Any help is appreciated. Thanks.

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238582.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.


Re: WARN [Memtable] live ratio

2012-01-30 Thread Roshan
Exactly, I am also getting this when server moving idle to high load. May be
Cassandra Experts can help to us.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238603.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-30 Thread fid
I think his development environment is windows.

On Mon, Jan 30, 2012 at 7:29 PM, R. Verlangen ro...@us2.nl wrote:

 You might run it from a VM?


 2012/1/30 Ertio Lew ertio...@gmail.com



 On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael 
 michael.fri...@nuance.com wrote:

  OpsCenter?

  http://www.datastax.com/products/opscenter

  - Mike


  I have tried Sebastien's phpmyAdmin For 
 Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to
 see the data stored in Cassandra in the same manner as phpMyAdmin allows.
 But since it makes assumptions about the datatypes of the column
 name/column value  doesn't allow to configure the datatype data should be
 read as on per cf basis, I couldn't make the best use of it.

  Are there any similar other tools out there that can do the job better
 ?


 Thanks, that's a great product but unfortunately doesn't work with
 windows. Any tools for windows ?





-- 
Best Regards

Bob Bao
baohan...@gmail.com


Re: how stable is 1.0 these days?

2012-01-30 Thread Ben Coverston
I'm not sure what Carlo is referring to, but generally if you have done,
thousands of migrations you can end up in a situation where the migrations
take a long time to replay, and there are some race conditions that can be
problematic in the case where there are thousands of migrations that may
need to be replayed while a node is bootstrapped. If you get into this
situation it can be fixed by copying migrations from a known good schema to
the node that you are trying to bootstrap.

Generally I would advise against frequent schema updates. Unlike rows in
column families the schema itself is designed to be relatively static.

On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.comwrote:


 Could you also elaborate for creating/dropping column families?  We're
 currently working on moving to 1.0 and using dynamically created tables, so
 I'm very interested in what issues we might encounter.

 So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that
 dropping a cf may sometimes fail with UnavailableException.  I think this
 happens when the cf is busy being compacted.  When I sleep/retry within a
 loop it eventually succeeds.

 Thanks,
 Jim


 On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:

 Can you elaborate on the composite types instabilities ? is this
 specific to hector as the radim's posts suggests ?
 These one liner answers are quite stressful :)

 On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com
  wrote:

 If you need to use composite types and create/drop column families on the
 fly you must be prepared to instabilities.





-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-30 Thread Brandon Williams
On Sun, Jan 29, 2012 at 11:52 PM, Ertio Lew ertio...@gmail.com wrote:

 On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael michael.fri...@nuance.com
 wrote:

 OpsCenter?

 http://www.datastax.com/products/opscenter


 Thanks, that's a great product but unfortunately doesn't work with windows.

Now it does: http://www.datastax.com/products/opscenter/platforms

-Brandon