from:"Edward Capriolo"

Re: Cluster per Application vs. Multi-Application Clusters

2012-08-22 Thread Edward Capriolo

If you are staring out small one logical/physical cluster is probably the best and only approach. Long term this is very case by case dependent but I generally believe Cluster per Application is the best approach. Although I consider it Cluster per QOS For our use cases I find that two

Re: Automating nodetool repair

2012-08-28 Thread Edward Capriolo

You can consider adding -pr. When iterating through all your hosts like this. -pr means primary range, and will do less duplicated work. On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner synfina...@gmail.com wrote: I use cron. On one box I just do: for n in node1 node2 node3 node4 ; do

Re: Advantage of pre-defining column metadata

2012-08-28 Thread Edward Capriolo

Setting the metadata will set the validation. If you insert to a column that is supposed to only INT values Cassandra will reject non INT data on insert time. Also comparator can not be changed, you only get once chance to set the column sorting. On Tue, Aug 28, 2012 at 3:34 PM, A J

Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-30 Thread Edward Capriolo

If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as soon as possible. If you have large bloomfilters you can hit a bug where the bloom filters will not work properly. On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote: we are running somewhat queue-like

Re: Helenos - web based gui tool

2012-09-07 Thread Edward Capriolo

You might want to change the name. There is a node.js driver for cassandra with the same name. I am not sure which one of your got to the name first. On Thu, Sep 6, 2012 at 8:00 PM, aaron morton aa...@thelastpickle.com wrote: Thanks Tomek, Feel free to add it to

Re: cassandra performance looking great...

2012-09-07 Thread Edward Capriolo

Try to get Cassandra running the TPH-C benchmarks and beat oracle :) On Fri, Sep 7, 2012 at 10:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote: So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable SQL) query of PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-15 Thread Edward Capriolo

Generally tuning the garbage collector is a waste of time. Just follow someone else's recommendation and use that. The problem with tuning is that workloads change then you have to tune again and again. New garbage collectors come out and you have to tune again and again. Someone at your company

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Edward Capriolo

Haha Ok. It is not a total waste, but practically your time is better spent in other places. The problem is just about everything is a moving target, schema, request rate, hardware. Generally tuning nudges a couple variables in one direction or the other and you see some decent returns. But each

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Edward Capriolo

If you are using ext3 there is a hard limit on number if files in a directory of 32K. EXT4 as a much higher limit (cant remember exactly IIRC). So true that having many files is not a problem for the file system though your VFS cache could be less efficient since you would have a higher

Re: 1000's of column families

2012-09-27 Thread Edward Capriolo

Hector also offers support for 'Virtual Keyspaces' which you might want to look at. On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean dean.hil...@nrel.gov wrote: We have 1000's of different building devices and we stream

Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-10-01 Thread Edward Capriolo

I could use some help as I do not have extensive experience with these two combinations. Contact me if you have any other ideas as well. Edward On Tue, Jun 26, 2012 at 5:25 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Hello all, It has not been very long since the first book was published

Re: MBean cassandra.db.CompactionManager TotalBytesCompacted counts backwards

2012-10-07 Thread Edward Capriolo

I have not looked at this JMX object in a while, however the compaction manager can support multiple threads. Also it moves from 0-filesize each time it has to compact a set of files. That is more useful for showing current progress rather then lifetime history. On Fri, Oct 5, 2012 at 7:27 PM,

Re: how to avoid range ghosts?

2012-10-07 Thread Edward Capriolo

Read this: http://wiki.apache.org/cassandra/FAQ#range_ghosts Then say this to yourself: http://cn1.kaboodle.com/img/b/0/0/196/4/C1xHoQAAAZZL9w/ghostbusters-logo-i-aint-afraid-of-no-ghost-pinback-button-1.25-pin-badge.jpg?v=1320511953000 On Sun, Oct 7, 2012 at 4:15 AM, Satoshi Yamada

Re: can I have a mix of 32 and 64 bit machines in a cluster?

2012-10-09 Thread Edward Capriolo

Java abstracts you from all these problems. One thing to look out for is JVM options do not work across all JVMs. For example if you try to enable https://wikis.oracle.com/display/HotSpotInternals/CompressedOops on a 32bit machine the JVM fails to start. On Tue, Oct 9, 2012 at 1:45 PM, Brian

Re: unexpected behaviour on seed nodes when using -Dcassandra.replace_token

2012-10-19 Thread Edward Capriolo

Yes. That would be a good jira if it is not already listed. If node is a seed node autobootstrap and replicate_token settings should trigger a fatal non-start because your giving c* conflicting directions. Edward On Fri, Oct 19, 2012 at 8:49 AM, Thomas van Neerijnen t...@bossastudios.com wrote:

Java 7 support?

2012-10-24 Thread Edward Capriolo

We have been using cassandra and java7 for months. No problems. A key concept of java is portable binaries. There are sometimes wrinkles with upgrades. If you hit one undo the upgrade and restart. On Tuesday, October 23, 2012, Eric Evans eev...@acunu.com wrote: On Tue, Oct 16, 2012 at 7:54 PM,

Re: Keeping the record straight for Cassandra Benchmarks...

2012-10-25 Thread Edward Capriolo

Yes another benchmark with 100,000,000 rows on EC2 machines probably less powerful then my laptop. The benchmark might as well have run 4 vmware instances on the same desktop. On Thu, Oct 25, 2012 at 7:40 AM, Brian O'Neill b...@alumni.brown.edu wrote: People probably saw...

Re: Java 7 support?

2012-10-25 Thread Edward Capriolo

...@gmail.com wrote: Are you using openJDK or Oracle JDK? I know java7 should be based on openJDK since 7, but still not sure. On 25 October 2012 05:42, Edward Capriolo edlinuxg...@gmail.com wrote: We have been using cassandra and java7 for months. No problems. A key concept of java is portable

Large results and network round trips

2012-10-25 Thread Edward Capriolo

Hello all, Currently we implement wide rows for most of our entities. For example: user { event1=x event2=y event3=z ... } Normally the entires are bounded to be less then 256 columns and most columns are small in size say 30 bytes. Because the blind write nature of Cassandra it is possible

Re: Large results and network round trips

2012-10-25 Thread Edward Capriolo

can look at the Netflix client as it makes the co-ordinator node same as the node which holds that data. This will reduce one hop. On Thu, Oct 25, 2012 at 9:04 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Hello all, Currently we implement wide rows for most of our entities. For example

Re: disable compaction node-wide

2012-10-27 Thread Edward Capriolo

If you are using sized teired set minCompactionThreshold to 0 and maxCompactionThreshold to 0. You can probably also use this https://issues.apache.org/jira/browse/CASSANDRA-2130 But if you do not compact the number of sstables gets high and then read performance can suffer. On Sat, Oct 27,

Getting all schema in 1.2.0-beta-1

2012-11-03 Thread Edward Capriolo

Using 1.2.0-beta1. I am noticing that there is no longer a single way to get all the schema. It seems like non-compact storage can be seen with show schema, but other tables are not visible. Is this by design, bug, or operator error? http://pastebin.com/PdSDsdTz

How does Cassandra optimize this query?

2012-11-04 Thread Edward Capriolo

If we create a column family: CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) ); The CLI views this column like so: create column family videos with column_type =

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo

I see. It is fairly misleading because it is a query that does not work at scale. This syntax is only helpful if you have less then a few thousand rows in Cassandra. On Mon, Nov 5, 2012 at 12:24 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Mon, Nov 5, 2012 at 4:12 PM, Edward Capriolo

Re: triggers(newbie)

2012-11-05 Thread Edward Capriolo

There are no built-in trigger. Someone has written an aspect oriented piece to do triggers outside of the project. http://brianoneill.blogspot.com/2012/03/cassandra-triggers-for-indexing-and.html On Mon, Nov 5, 2012 at 12:30 PM, davuk...@veleri.hr wrote: Hello! I was wondering if someone

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo

sylv...@datastax.com wrote: On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I see. It is fairly misleading because it is a query that does not work at scale. This syntax is only helpful if you have less then a few thousand rows in Cassandra. Just for the sake

Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo

it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com

Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo

a keyspace aware connection pool. Edward On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote: Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: it is better to have one keyspace unless you need

Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo

MessageService has to other nodes. Then there will be incoming connections via thrift from clients. How are they affected by multiple keyspaces? On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Any connection pool. Imagine if you have 10 column families in 10

Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo

is from the thrift part. I use hector. In hector, I can create multiple keyspace objects for each keyspace and use them when I want to talk to that keyspace. Why will it need to do a round trip to the server for each switch. On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo edlinuxg...@gmail.com

Re: Retrieve Multiple CFs from Range Slice

2012-11-09 Thread Edward Capriolo

HBase is different is this regard. A table is comprised of multiple column families, and they can be scanned at once. However, last time I checked, scanning a table with two column families is still two seeks across three different column families. A similar thing can be accomplished in cassandra

Re: leveled compaction and tombstoned data

2012-11-10 Thread Edward Capriolo

No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) I would paypal them a hundo. On Sat, Nov 10, 2012 at 1:10 PM,

Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta2 released

2012-11-10 Thread Edward Capriolo

just a note for all. The default partitioner is no longer randompartitioner. It is now murmur, and the token range starts in negative numbers. So you don't chose tokens Luke your father taught you anymore. On Friday, November 9, 2012, Sylvain Lebresne sylv...@datastax.com wrote: The Cassandra

Re: CREATE COLUMNFAMILY

2012-11-11 Thread Edward Capriolo

If you supply metadata cassandra can use it for several things. 1) It validates data on insertion 2) Helps display the information in human readable formats in tools like the CLI and sstabletojson 3) If you add a built-in secondary index the type information is needed, strings sort differently

Re: removing SSTABLEs

2012-11-11 Thread Edward Capriolo

If you shutdown c* and remove an sstable (and it associated data, index, bloom filter , and etc) files it is safe. I would delete any saved caches as well. It is safe in the sense that Cassandra will start up with no issues, but you could be missing some data. On Sun, Nov 11, 2012 at 11:09 PM,

Re: removing SSTABLEs

2012-11-12 Thread Edward Capriolo

Because you did a major compaction that table is larger then all the rest. So it will never go away until you have 3 other tables about that size or you run major compaction again. You should vote on the ticket: https://issues.apache.org/jira/browse/CASSANDRA-4766 On Mon, Nov 12, 2012 at 11:51

Re: unable to read saved rowcache from disk

2012-11-13 Thread Edward Capriolo

Yes the row cache could be incorrect so on startup cassandra verify they saved row cache by re reading. It takes a long time so do not save a big row cache. On Tuesday, November 13, 2012, Manu Zhang owenzhang1...@gmail.com wrote: I have a rowcache provieded by SerializingCacheProvider. The data

Re: Read during digest mismatch

2012-11-13 Thread Edward Capriolo

I think the code base does not benefit from having too many different read code paths. Logically what your suggesting is reasonable, but you have to consider the case of one being slow to respond. Then what? On Tuesday, November 13, 2012, Manu Zhang owenzhang1...@gmail.com wrote: If consistency

Re: unable to read saved rowcache from disk

2012-11-13 Thread Edward Capriolo

is not big. On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Yes the row cache could be incorrect so on startup cassandra verify they saved row cache by re reading. It takes a long time so do not save a big row cache. On Tuesday, November 13, 2012, Manu Zhang

Re: Offsets and Range Queries

2012-11-15 Thread Edward Capriolo

There are several reasons. First there is no absolute offset. The rows are sorted by the data. If someone inserts new data between your query and this query the rows have changed. Unless you doing select queries inside a transaction with repeatable read and your database supports this the query

Re: unable to read saved rowcache from disk

2012-11-15 Thread Edward Capriolo

. (if the key is Long, could be more than 1M rows) Thanks. -Wei From: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Tuesday, November 13, 2012 11:13 PM Subject: Re: unable to read saved rowcache from disk http://wiki.apache.org

Re: Question regarding the need to run nodetool repair

2012-11-15 Thread Edward Capriolo

On Thursday, November 15, 2012, Dwight Smith dwight.sm...@genesyslab.com wrote: I have a 4 node cluster, version 1.1.2, replication factor of 4, read/write consistency of 3, level compaction. Several questions. 1) Should nodetool repair be run regularly to assure it has completed before

Re: Admin for cassandra?

2012-11-15 Thread Edward Capriolo

We should build an eclipse plugin named Eclipsandra or something. On Thu, Nov 15, 2012 at 9:45 PM, Wz1975 wz1...@yahoo.com wrote: Cqlsh is probably the closest you will get. Or pay big bucks to hire someone to develop one for you:) Thanks. -Wei Sent from my Samsung smartphone on ATT

Re: Collections, query for contains?

2012-11-19 Thread Edward Capriolo

This was my first question after I git the inserts working. Hive has udfs like array contains. It also has lateral view syntax that is similar to transposed. On Monday, November 19, 2012, Timmy Turner timm.t...@gmail.com wrote: Is there no option to query for the contents of a collection?

Re: SchemaDisagreementException

2012-11-19 Thread Edward Capriolo

even if you made the calls through cql you would have the same issue since cql uses thrift. 1.2:0 is supposed to be nicer with concurrent modifications. On Monday, November 19, 2012, Everton Lima peitin.inu...@gmail.com wrote: I was using cassandra direct because it has more performace than

Re: SchemaDisagreementException

2012-11-19 Thread Edward Capriolo

http://www.acunu.com/2/post/2011/12/cql-benchmarking.html Last I checked, thrift still had an edge over cql due to string serialization and de serialization. Might be even more dramatic for later columns. Not that client speed matters much overall in cassandra's speed, but CQL client does more.

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Edward Capriolo

On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But

Re: Other problem in update

2012-11-27 Thread Edward Capriolo

I am just taking a stab at this one. UUID's interact with system time and maybe your real time os is doing something funky there. The other option, which seems more likely, is that your unit tests are not cleaning up their data directory and there is some corrupt data in there. On Tue, Nov 27,

Re: Java high-level client

2012-11-27 Thread Edward Capriolo

Hector does not require an outdated version of thift, you are likely using an outdated version of hector. Here is the long and short of it: If the thrift thrift API changes then hector can have compatibility issues. This happens from time to time. The main methods like get() and insert() have

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo

The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write:

Re: selective replication of keyspaces

2012-11-27 Thread Edward Capriolo

You can do something like this: Divide your nodes up into 4 datacenters art1,art2,art3,core [default@unknown] create keyspace art1 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art1:2,core:2}]; [default@unknown] create keyspace art2

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo

'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo

performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo

By the way the other issues you are seeing with replicate on write at false could be because you did not repair. You should do that when changing rf. On Tuesday, November 27, 2012, Edward Capriolo edlinuxg...@gmail.com wrote: Cassandra's counters read on increment. Additionally

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo

in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough

Re: selective replication of keyspaces

2012-11-27 Thread Edward Capriolo

it couldn't be done. When I run the command I get the error syntax error at position 21: missing EOF at 'placement_strategy' that is probably because I still need to set the correct properties in the conf files On November 27, 2012 at 5:41 PM Edward Capriolo edlinuxg...@gmail.com wrote

Re: Generic questions over Cassandra 1.1/1.2

2012-11-27 Thread Edward Capriolo

@Bill Are you saying that now cassandra is less schema less ? :) Compact storage is the schemaless of old. On Tuesday, November 27, 2012, Bill de hÓra b...@dehora.net wrote: I'm not sure I always understand what people mean by schema less exactly and I'm curious. For 'schema less', given

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo

, 2012 at 3:21 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 Per Sylvain on the referenced ticket : I don't disagree about

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo

with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two

Re: Java high-level client

2012-11-28 Thread Edward Capriolo

Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see the fork as necessary. It has split up the community a bit, as there are now 3 high level Java clients. I would advice follow Josh's

Re: Rename cluster

2012-11-29 Thread Edward Capriolo

Since the cluster name is only cosmetic people do not often change it. I would not do this in a production cluster for sure. On Thu, Nov 29, 2012 at 2:56 PM, Wei Zhu wz1...@yahoo.com wrote: Hi, I am trying to rename a cluster by following the instruction on Wiki: Cassandra says ClusterName

Re: Row caching + Wide row column family == almost crashed?

2012-12-02 Thread Edward Capriolo

Row cache has to store the entire row. It is a very bad option for wide rows. On Sunday, December 2, 2012, Mike mthero...@yahoo.com wrote: Hello, We recently hit an issue within our Cassandra based application. We have a relatively new Column Family with some very wide rows (10's of

Re: What is substituting keys_cached column family argument

2012-12-06 Thread Edward Capriolo

Rob, Have you played with this I have many CFs, some big some small some using large caches some using small ones, some that take many requests, some that take a few. Over time I have cooked up a strategy for how to share the cache love, even thought it may not be the best solution to the

Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-06 Thread Edward Capriolo

http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Thu, Dec 6, 2012 at 9:53 AM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: “Having so much data on each node is a potential bad day.” ** ** Is this discussed somewhere on the Cassandra documentation (limits,

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-07 Thread Edward Capriolo

Good point . hadoop sprays its blocks around randomly. Thus if replication factor nodes are down some blocks are not found. The larger the cluster the higher chance nodes are down. To deal with this increase rf once the cluster gets to be very large. On Wednesday, December 5, 2012, Eric Parusel

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-10 Thread Edward Capriolo

Assuming you need to work with quorum in a non-vnode scenario. That means that if 2 nodes in a row in the ring are down some number of quorum operations will fail with UnavailableException (TimeoutException right after the failures). This is because the for a given range of tokens quorum will be

Re: Why Secondary indexes is so slowly by my test?

2012-12-13 Thread Edward Capriolo

Until the secondary indexes do not read before write is in a release and stabilized you should follow Ed ENuff s blog and do your indexing yourself with composites. On Thursday, December 13, 2012, aaron morton aa...@thelastpickle.com wrote: The IndexClause for the get_indexed_slices takes a

Re: Datastax C*ollege Credit Webinar Series : Create your first Java App w/ Cassandra

2012-12-13 Thread Edward Capriolo

It should be good stuff. Brian eats this stuff for lunch. On Wednesday, December 12, 2012, Brian O'Neill b...@alumni.brown.edu wrote: FWIW -- I'm presenting tomorrow for the Datastax C*ollege Credit Webinar Series:

Re: Help on MMap of SSTables

2012-12-13 Thread Edward Capriolo

This issue has to be looked from a micro and macro level. On the microlevel the best way is workload specific. On the macro level this mostly boils down to data and memory size. Companions are going to churn cache, this is unavoidable. Imho solid state makes the micro optimization meanless in the

Re: Why Secondary indexes is so slowly by my test?

2012-12-13 Thread Edward Capriolo

Here is a good start. http://www.anuff.com/2011/02/indexing-in-cassandra.html On Thu, Dec 13, 2012 at 11:35 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi Edward, can you share the link to this blog ? Alain 2012/12/13 Edward Capriolo edlinuxg...@gmail.com Ed ENuff s

Re: Read operations resulting in a write?

2012-12-17 Thread Edward Capriolo

Is there a way to turn this on and off through configuration? I am not necessarily sure I would want this feature. Also it is confusing if these writes show up in JMX and look like user generated write operations. On Mon, Dec 17, 2012 at 10:01 AM, Mike mthero...@yahoo.com wrote: Thank you

Re: rpc_timeout exception while inserting

2012-12-18 Thread Edward Capriolo

CQL2 and CQL3 indexes are not compatible. I guess CQL2 is able to detect that the table was defined in CQL3 probably should not allow it. Backwards comparability is something the storage engines and interfaces have to account for. At least they should prevent you from hurting yourself. But do not

Re: Monitoring the number of client connections

2012-12-19 Thread Edward Capriolo

In the TCP mib for SNMP (Simple Network Management Protocol) this information is available http://www.simpleweb.org/ietf/mibs/mibSynHiLite.php?category=IETFmodule=TCP-MIB On Wed, Dec 19, 2012 at 12:22 AM, Michael Kjellman mkjell...@barracuda.comwrote: netstat + cron is your friend at this

Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Edward Capriolo

The cli using microsecond precision your client might be using something else and the insert with lower timestamps are dropped. On Friday, December 21, 2012, Qiaobing Xie qiaobing@gmail.com wrote: Hi, I am developing a thrift client that inserts and removes columns from a column-family

Re: Correct way to design a cassandra database

2012-12-21 Thread Edward Capriolo

You could store the order as the first part of a composite string say first picture as A and second as B. To insert one between call it AA. If you shuffle alot the strings could get really long. Might be better to store the order in a separate column. Neither solution mentioned deals with

Re: State of Cassandra and Java 7

2012-12-23 Thread Edward Capriolo

This what versions are supported is kinda up to you for example earlier versions of jdk now have bugs. I have a version of java 1.6.0_23 I believe that will not even start with the latest cassandra releases. Likewise people suggest not running the newest ones 1.7.0 because they have not tested it.

Re: how to create a keyspace in CQL3

2012-12-23 Thread Edward Capriolo

Unfortunately one of the first command everyone needs to use to use to work with cassandra changes very often. You can use cqlsh help create_keyspace; But some times even the documentation is not in line. Using this permutation of goodness: cqlsh 2.3.0 | Cassandra 1.2.0-beta2-SNAPSHOT | CQL

Re: Force data to a specific node

2013-01-02 Thread Edward Capriolo

There is a crazy, very bad, don't do it way to do this. You can set RF=1 and hack the LocalPartitioner (because the local partitioner has been made not to do this) Then the node you connect to and write is the node the data will get stored on. Its like memcache do it yourself style sharding.

Re: RandomPartitioner to Murmur3Partitioner

2013-01-03 Thread Edward Capriolo

By the way 10% faster does not necessarily mean 10% more requests. https://issues.apache.org/jira/browse/CASSANDRA-2975 https://issues.apache.org/jira/browse/CASSANDRA-3772 Also if you follow the tickets My tests show that Murmur3Partitioner actually is worse than MD5 with high cardinality

Re: Error after 1.2.0 upgrade

2013-01-03 Thread Edward Capriolo

Just a shot in the dark, but I would try setting -Xss higher then the default. It's probably like 180, but I cant even start at that level, bumped it up to 256 for JDK 7. On Thu, Jan 3, 2013 at 12:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: :) yes, I'm crazy The assertion appears to

Re: Error after 1.2.0 upgrade

2013-01-03 Thread Edward Capriolo

been fixed in 1.1.7 ?? From: Edward Capriolo edlinuxg...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, January 3, 2013 11:57 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Error after 1.2.0 upgrade There is a bug

Re: Specifying initial token in 1.2 fails

2013-01-04 Thread Edward Capriolo

Yes. They were really just introduced and if you are ready to hitch your wagon to every new feature you put yourself in considerable risk. With any piece of software not just Cassandra. On Fri, Jan 4, 2013 at 11:59 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: But I don't really get the point

Re: help turning compaction..hours of run to get 0% compaction....

2013-01-07 Thread Edward Capriolo

There is some point where you simply need more machines. On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: Right, I guess I'm saying that you should try loading your data with leveled compaction and see how your compaction load is. Your work load sounds like

Re: about validity of recipe A node join using external data copy methods

2013-01-08 Thread Edward Capriolo

at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split

Re: about validity of recipe A node join using external data copy methods

2013-01-08 Thread Edward Capriolo

to do it this way anymore I guess it's true in v1.2. Is it true also in v1.1 ? Thanks. Dominique *De :* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Envoyé :* mardi 8 janvier 2013 16:01 *À :* user@cassandra.apache.org *Objet :* Re: about validity of recipe A node join using

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo

I ask myself this every day. CQL3 is new way to do things, including wide rows with collections. There is no upgrade path. You adopt CQL3's sparse tables as soon as you start creating column families from CQL. There is not much backwards compatibility. CQL3 can query compact tables, but you may

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo

By no upgrade path I mean to say if I have a table with compact storage I can not upgrade it to sparse storage. If i have an existing COMPACT table and I want to add a Map to it, I can not. This is what I mean by no upgrade path. Column families that mix static and dynamic columns are pretty

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo

, that do not bother me anyway. 4 are these sparse columns also taking memtable space? This questions would give me serious pause to use sparse tables On Wednesday, January 9, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: By no upgrade path I mean to say if I have a table with compact storage

Re: Starting Cassandra

2013-01-10 Thread Edward Capriolo

I think 1.6.0_24 is too low and 1.7.0 is too high. Try a more recent 1.6. I just had problems with 1.6.0_23 see here: https://issues.apache.org/jira/browse/CASSANDRA-4944 On Thu, Jan 10, 2013 at 10:29 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: I have 4 vm's with 1024M memory. 1

Re: trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Edward Capriolo

You have to change the column family cache info from keys_only to otherwise the cache will not br on for this cf. On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com wrote: We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the

Re: Starting Cassandra

2013-01-16 Thread Edward Capriolo

I think at this point cassandra startup scripts should reject versions since cassandra won't even star with many jvms at this point. On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com wrote: Do yourself a favor and get a copy of the Oracle 7 JDK (now with more security

Re: Cassandra Consistency problem with NTP

2013-01-17 Thread Edward Capriolo

If you have 40ms NTP drift something is VERY VERY wrong. You should have a local NTP server on the same subnet, do not try to use one on the moon. On Thu, Jan 17, 2013 at 4:42 AM, Sylvain Lebresne sylv...@datastax.comwrote: So what I want is, Cassandra provide some information for client, to

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Edward Capriolo

Wow you managed to do a load test through the cassandra-cli. There should be a merit badge for that. You should use the built in stress tool or YCSB. The CLI has to do much more string conversion then a normal client would and it is not built for performance. You will definitely get better

Re: Key-hash based node selection

2013-01-19 Thread Edward Capriolo

You can not be /mostly/ consistent readlike you can not be half-pregnant or half transactional. You either are or you are not. If you do not have enough nodes for a QUORUM the read fails. Thus you never get stale reads you only get failed reads. The dynamic snitch makes reads sticky at READ.ONE.

Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Edward Capriolo

This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the

Re: Large commit log reasons

2013-01-23 Thread Edward Capriolo

By default Cassandra uses 1/3rd heap size for memtable storage. If you make sure memtables smaller they should flush faster and you commit logs should not grow large. Large commit logs are not a problem, some use cases that write to some Column Families more then other can make the commit log

Re: Large commit log reasons

2013-01-23 Thread Edward Capriolo

1. The commit log is only read on startup. W: If writes are unflushed then the commit logs need to be replayed 2: shrink the memtable settings. but you dont want to do this. 3. Commit log size is not directly related to sstable size. E.g. if you write the same row a billion times the commit log

Re: Issue when deleting Cassandra rowKeys.

2013-01-26 Thread Edward Capriolo

Make sure the timestamp on your delete is then timestamp of the data. On Sat, Jan 26, 2013 at 1:33 PM, Kasun Weranga kas...@wso2.com wrote: Hi all, When I delete some rowkeys programmatically I can see two rowkeys remains in the column family. I think it is due to tombstones. Is there a way

Re: Denormalization

2013-01-27 Thread Edward Capriolo

One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data

Re: Denormalization

2013-01-27 Thread Edward Capriolo

LOVE the performance of our ACL checks. Ps. 30,000 writes in cassandra is not cheap when done from one server ;) but in general parallized writes is very fast for like 500. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user

1 2 3 4 5 6 7 8 >

1 - 100 of 742 matches

Mail list logo