support for nulls in composite lost in CQL3

2013-11-19 Thread Hiller, Dean
We have wide rows which are composite of integer.byte array where some of our columns are {empty}.byte array (ie. The first part of the composite key is empty as in 0 length string or 0 length integer(ie. NOT 0, but basically null) This has worked great when we look up all the entries with a

Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
PlayOrm supports different types of wide rows like embedded list in the object, etc. etc. There is a list of nosql patterns mixed with playorm patterns on this page http://buffalosw.com/wiki/patterns-page/ From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com Reply-To:

Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
data) and ORM Thanks Dean. I'll check that page out. Les On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: PlayOrm supports different types of wide rows like embedded list in the object, etc. etc. There is a list of nosql patterns mixed

Re: Online shop with Cassandra

2013-10-09 Thread Hiller, Dean
Read the paper Building on Quicksand especially the section where he describes what they do at AmazonŠthe apology modelŠie. Allow overbooking and apologize but limit overbookingŠ.That is one way to go and stay scalable. You may want to analyze the percentage change that overbooking can be as

Re: How many Column Families can Cassandra handle?

2013-09-26 Thread Hiller, Dean
600 is probably doable but each CF takes up memory……PlayOrm goes with a strategy that can virtualize CF's into one CF allowing less memory usage….we have 80,000 virtual CF's in cassandra through playorm….you can copy playorm's pattern if desired. But 600 is probably doable but high. 10,000 is

is this correct, thrift unportable to CQL3Š.

2013-09-24 Thread Hiller, Dean
Many applications in thrift use the wide row with composite column name and as an example, let's say golf score for instance and we end up with golf score : pk like so null : pk56 null : pk45 89 : pk90 89: pk87 90: pk101 95: pk17 Notice that there are some who do not have a golf score(zero

composite with null prefix in CQL3(porting from thrift)

2013-09-23 Thread Hiller, Dean
I ran into this same issue on this stackoverflow post… http://stackoverflow.com/questions/18963248/how-can-i-have-null-column-value-for-a-composite-key-column-in-cql3 Does anyone know how to have the same composite column name pattern that enables wide rows with a null value? Ie. We had some

Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Hiller, Dean
Can ou describe what you mean by reverse compaction? I mean once you put a row together and blow away sstables that contained it before, you can't possibly know how to split it since that information is gone. Perhaps you want the simple sstable2json script in the bin directory so you can inspect

cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
Anyone know how to debug cassandra processes just exiting? There is no info in the cassandra logs and there is no heap dump file(which in the past has shown up in /opt/cassandra/bin directory for me). This occurs when running a map/reduce job that put severe load on the system. The logs look

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
when it is desperately low on memory. Have a look in either your syslog output of the output of dmesg cheers On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Anyone know how to debug cassandra processes just exiting? There is no info

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
output of the output of dmesg cheers On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean dean.hil...@nrel.govjavascript:_e({},%20'cvml',%20'dean.hil...@nrel.gov'); wrote: Anyone know how to debug cassandra processes just exiting? There is no info in the cassandra logs and there is no heap dump file

Revisit with another spin: is there any type of table existing on all nodes?

2013-09-18 Thread Hiller, Dean
, 2013 at 12:29 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Actually, I have been on a few projects where something like that is useful. Gemfire(a grid memory cache) had that feature which we used at another company. On every project I encounter, there is usually one

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
: A random guess - possibly an OOM (Out of Memory) where Linux will kill a process to recover memory when it is desperately low on memory. Have a look in either your syslog output of the output of dmesg cheers On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil

Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ?

2013-09-18 Thread Hiller, Dean
1. Always in cassandra up your file descriptor limits on linux and even in 0.7 that was the recommendation so cassandra could open tons of files 2. We use 50M for our LCS with no performance issues. We had it 10M on our previous with no issues but a huge amount of files of course with our

Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ?

2013-09-18 Thread Hiller, Dean
at 3:15 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: 1. Always in cassandra up your file descriptor limits on linux and even in 0.7 that was the recommendation so cassandra could open tons of files 2. We use 50M for our LCS with no performance issues. We had it 10M

hadoop 12 T recommendation vs. cassandra 1T recommendation

2013-09-18 Thread Hiller, Dean
This article looks like it came out just one month ago or not even http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ And recommends 12-24 1-4TB disks in a JBOD configuration. I know hadoop is used a lot in analytics but can also be used in some

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
You may want to be careful as column 1 could be stored in both files until compaction as well when column 1 has encountered changes and cassandra returns the latest column 1 version but two sstables contain column 1. (At least that is the way I understand it). Later, Dean From: Takenori Sato

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
Netflix created file streaming in astyanax into cassandra specifically because writing too big a column cell is a bad thing. The limit is really dependent on use case….do you have servers writing 1000's of 200Meg files at the same time….if so, astyanax streaming may be a better way to go there

is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
node DCs if you really wanted it. On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Actually, I have been on a few projects where something like that is useful. Gemfire(a grid memory cache) had that feature which we used at another company

Re: map/reduce performance time and sstable readerŠ.

2013-09-03 Thread Hiller, Dean
We are considering creating our own InputFormat for hadoop and running the tasktrackers on every 3rd node(ie. RF=3) such that we cover all ranges. Our M/R overhead appears to be 13 days vs. 12.5 hours on just reading SSTAbles directly on our current data set. I personally don't think parsing

map/reduce performance time and sstable readerŠ.

2013-08-30 Thread Hiller, Dean
Has anyone done performance tests on sstable reading vs. M/R? I did a quick test on reading all SSTAbles in a LCS column family on 23 tables and took the average time it took sstable2json(to /dev/null to make it faster) which was 7 seconds per table. (reading to stdout took 16 seconds per

is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes

Re: node dead after restart

2013-08-22 Thread Hiller, Dean
Isn't this the log file from 10.0.0.146??? And this 10.0.0.146 sees that 10.0.0.111 is up, then sees it dead and in the log we can see it bind with this line INFO 12:16:23,108 Binding thrift service to ip-10-0-0-146.ec2.internal/10.0.0.146:9160http://10.0.0.146:9160 What is the log file look

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
Yup, there are other types of indexing like that in PlayOrm which do it differently so all nodes are not hit so it works better for instance if you are partitioning your data and you query into just a single partition so it doesn't put load on all the nodes. (of course, you have to have a

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 21 August 2013 07:36 To: user@cassandra.apache.org Subject: Re: Secondary Index Question Yup, there are other types of indexing like that in PlayOrm which do it differently so all nodes are not hit so it works better for instance

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
results ? -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 21 August 2013 07:36 To: user@cassandra.apache.org Subject: Re: Secondary Index Question Yup, there are other types of indexing like that in PlayOrm which do it differently so all nodes are not hit so

Re: make cassandra-cli use 7197 for JMX instead?

2013-07-29 Thread Hiller, Dean
Ugh, how did I miss that one, it was in the cassandra-cli --helpŠ.never mind. Dean On 7/29/13 11:24 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I start nodetool with Cassandra-cli ­p 9158 but it gives warnings about not displaying all information because my JMX port is on 7197 instead of 7199

hadoop/cassandra integration using CL_ONE...

2013-07-26 Thread Hiller, Dean
Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job? And more importantly is there a way to configure that such that if my RF=3, that it only reads from 1 of the nodes in that 3. We have 12 nodes and ideally we would for example hope M/R runs on a2, a9, a5, a12 which happen

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Out of curiosity, what version of hadoop are you using with cassandra? I think we are trying 0.20.2 if I remember(I have to ask my guy working on it to be sure). I do remember him saying the cassandra maven dependency was odd in that it is in the older version and not a newer hadoop version.

Re: About column family

2013-07-23 Thread Hiller, Dean
We use PlayOrm to have 60,000 VIRTUAL column families such that the performance is just fine ;). You may want to try something like that. Dean From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
Out of curiosity, isn't what is really happening is this As writes keep coming in, memory fills up causing flushes to the commit log disk of the whole memtable. In a bursting scenario, writes are thus limited only by memory and cpu in short bursting cases that tend to fit in memory. In a

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
inserting data, even stopping Cassandra, cleaning my entire data folder and then starting it again. I am also really curious to know if there is anyone else having these problems or if it is just me... Best regards, Marcelo. 2013/7/23 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Oh, and in the past 0.20.x has been pretty stable by the wayŠ..they finally switched their numbering scheme thank god. Dean On 7/23/13 2:13 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Perhaps try 0.20.2 as 1. The maven pom files have cassandra depending on 0.20.2 2. The 0.20.2 default

Re: temporarily running a cassandra side by side in production

2013-07-12 Thread Hiller, Dean
- Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 11/07/2013, at 11:37 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: We have a 12 node production cluster and a 4 node QA cluster. We are starting to think we are going to try

temporarily running a cassandra side by side in production

2013-07-10 Thread Hiller, Dean
We have a 12 node production cluster and a 4 node QA cluster. We are starting to think we are going to try to run a side by side cassandra instance in production while we map/reduce from one cassandra into the new instance. We are intending to do something like this Modify all ports in

playORM version 1.6 released

2013-07-08 Thread Hiller, Dean
Another new release is up in maven repos… - Astyanx is upgraded to 1.56.42 - Hbase support is almost done(barring those few test cases) - And following issues are fixed: Thanks to snazy and hsn10 :) https://github.com/deanhiller/playorm/issues/80 https://github.com/deanhiller/playorm/issues/81

column sort order and reversed sort performance question

2013-07-03 Thread Hiller, Dean
We loaded 5 million columns into a single row and when accessing the first 30k and last 30k columns we saw no performance difference. We tried just loading 2 rows from the beginning and end and saw no performance difference. I am sure reverse sort is there for a reason though. In what

Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied). We did find out later and we are working on this now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled compaction can run more in parallel though or else we get stuck

Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
Oh and if you are using STCS, I don't think the below is an issue at all since that can run in parallel if needed already. Dean On 7/1/13 10:24 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied

Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Hiller, Dean
What does CAS stand for? And is that the row locking feature like hbase's setAndReadWinner that you give the previous val and next val and your next val is returned if you won otherwise the current result is returned and you know some other node won? Thanks, Dean On 7/1/13 12:09 PM, Blair Zajac

Re: NREL has released open source Databus on github for time series data

2013-06-25 Thread Hiller, Dean
along with the time series data ? I had a quick look at the links and could not see anything. Cheers Aaron - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/06/2013, at 2:51 AM, Hiller, Dean dean.hil

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-24 Thread Hiller, Dean
We would be very very interested in your results. We currently run 10M but have heard of 256M sizes as well. Please let us know what you find out. Thanks, Dean From: Andrew Bialecki andrew.biale...@gmail.commailto:andrew.biale...@gmail.com Reply-To:

AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
I haven't seen this error in a long time. We just received the below error in production when rebuilding a node…any ideas on how to get around this? We had rebuilt 3 other nodes already I think(we have been swapping hardware) ERROR 06:32:21,474 Exception in thread Thread[ReadStage:1,5,main]

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
and auto bootstrap is true according to this log DEBUG 06:53:03,411 setting auto_bootstrap to true OR better yet, if someone can point me to the code on where bootstrap is decided so I can see why it decides not to bootstrap? Thanks, Dean On 6/24/13 6:42 AM, Hiller, Dean dean.hil...@nrel.gov wrote

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
the unknown keyspace errors :( but it is bootstrapping now) and I assume I can add node B back once all the data is in there. Thanks, Dean On 6/24/13 6:55 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Ah, so digging deeper, it is not bootstrapping. How do I force the node to bootstrap

quick question on seed nodes configuration

2013-06-24 Thread Hiller, Dean
For ease of use, we actually had a single cassandra.yaml deployed to every machine and a script that swapped out the token and listen address. I had seed nodes ip1,ip2,ip3 as the seeds but what I didn't realize was then that these nodes had themselves as seeds. I am assuming that should never

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
...@eventbrite.commailto:rc...@eventbrite.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Monday, June 24, 2013 10:34 AM Subject: Re: AssertionError: Unknown keyspace? On Mon, Jun 24, 2013 at 6:04 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Oh shoot, this is a seed

Re: sorting columns by time

2013-06-24 Thread Hiller, Dean
Send the naming scheme you desire. Is long time since epoch ok? Or a composite name of time since epoch + (something else) Dean From: Bill Hastings bllhasti...@gmail.commailto:bllhasti...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

NREL has released open source Databus on github for time series data

2013-06-21 Thread Hiller, Dean
NREL has released their open source databus. They spin it as energy data (and a system for campus energy/building energy) but it is very general right now and probably will stay pretty general. More information can be found here http://www.nrel.gov/analysis/databus/ The source code can be

Re: Unit Testing Cassandra

2013-06-19 Thread Hiller, Dean
For unit testing, we actually use PlayOrm which has an in-memory version of nosql so we just write unit tests against our code which uses the in-memory version but that is only if you are in java. Later, Dean From: Shahab Yunus shahab.yu...@gmail.commailto:shahab.yu...@gmail.com Reply-To:

Re: Large number of files for Leveled Compaction

2013-06-17 Thread Hiller, Dean
My bet is 5MB is the low end since many people go with the default. We upped it to 10MB as at that time no one knew of what size was a good size to be and the default was only 5MB. Dean From: Franc Carter franc.car...@sirca.org.aumailto:franc.car...@sirca.org.au Reply-To:

Re: headed to cassandra conference next week in San Fran?

2013-06-10 Thread Hiller, Dean
are using cassandra for and how it's working for you. I'm a software engineer at Quantcast and we're just beginning to use cassandra. So far it's been great, but there's still a lot to learn in this space. See you at the conference, hopefully! Faraaz On Fri, Jun 07, 2013 at 01:15:08PM -0700, Hiller

headed to cassandra conference next week in San Fran?

2013-06-07 Thread Hiller, Dean
I would not mind meeting people there. My cell is 303-517-8902, best to text me probably or just email me at d...@alvazan.com. Later, Dean

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
What happens when you use CL=TWO. Dean From: srmore comom...@gmail.commailto:comom...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, June 3, 2013 2:09 PM To:

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
Also, we had to put a fix into cassandra so it removed slow nodes from the list of nodes to read from. With that fix our QUOROM(not local quorom) started working again and would easily take the other DC nodes out of the list of reading from for you as well. I need to circle back to with my

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Also, we had to put a fix into cassandra so it removed slow nodes from the list of nodes to read from. With that fix our QUOROM(not local quorom) started working again and would easily take the other DC nodes out

Re: Bulk loading into CQL3 Composite Columns

2013-05-31 Thread Hiller, Dean
Another option is not having it part of the primary key and using PlayOrm to query but to succeed and scale, you would need to also use PlayOrm partitions and then you can query in the partition and sort stuff. Dean From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com Reply-To:

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
Nope, partitioning is done per CF in PlayOrm. Dean From: cem cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, May 29, 2013 10:01 AM To:

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
such that we could be running 10 compactions in parallel. QUESTION: I am assuming 10 compactions should be enough to put enough load on the disk/cpu/ram etc. etc. or do you think I should go with 100CF's. 98% of our data is all in this one CF. Thanks, Dean On 5/29/13 10:06 AM, Hiller, Dean dean.hil

random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
We recently ran into too much data in one CF because LCS can't really run in parallel on one CF in a single tier which got me thinking, why doesn't the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the key to which directory it would go in and then put it in one of the

Re: random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
2013 17:49, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: We recently ran into too much data in one CF because LCS can't really run in parallel on one CF in a single tier which got me thinking, why doesn't the CF directoy have 100 or 1000 directories 0-999 and cassandra hash

Re: how to handle join properly in this case

2013-05-29 Thread Hiller, Dean
. Thanks! On Tue, May 28, 2013 at 11:39 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Another option is joins on partitions to keep the number of stuff needing to join relatively small. PlayOrm actually supports joins of partition 1 of table A with partition X of table B. You then just keep

weird token ownerships

2013-05-28 Thread Hiller, Dean
I was assuming my node a1 would always own token 0, but we just added 5 of 6 more nodes and a1 no longer owns that token range. I have a few questions on the table at the bottom 1. Is this supposed to happen where host a1 no longer owns token range 0(but that is in his cassandra.yaml file),

Re: weird token ownerships

2013-05-28 Thread Hiller, Dean
and the data exists on nodes a2, a3, and a4 but not on a1. You can see us inserting node a7 between a1 and a2, and inserting node a8 between node a2 and a3, etc. etc. Thanks, Dean On 5/28/13 8:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I was assuming my node a1 would always own token 0

Re: how to handle join properly in this case

2013-05-28 Thread Hiller, Dean
Another option is joins on partitions to keep the number of stuff needing to join relatively small. PlayOrm actually supports joins of partition 1 of table A with partition X of table B. You then just keep the number of rows in each partition at less than millions and you can filter with the

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Don't do any delete != need to free the disk space after retention period which you have in both your emails. My understanding is TTL is an expiry and just like tombstones will only be really deleted upon a compaction(ie. You do have deletes via TTL from the sound of it). If you have TTL of 1

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
You said compaction can't keep up. Are you manually running compaction all the time or just letting cassandra kick off compactions when needed? Is compaction always 100% running or are you saying your disk is growing faster than you like and would like compactions to be always 100% running?

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Also, how many nodes are you running? From: cem cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, May 28, 2013 1:17 PM To:

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
. Cem. On Tue, May 28, 2013 at 9:37 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Also, how many nodes are you running? From: cem cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: Tuesday, May 28, 2013 1:10 PM To: user@cassandra.apache.org Subject: Re: data clean up problem How much disk used on each node? We run the suggested 300G per node as above that compactions can have trouble keeping up. Ps. We run compactions

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Oh and yes, astyanax uses client side response latency and cassandra does the same as a client of the other nodes. Dean On 5/28/13 2:23 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Actually, we did a huge investigation into this on astyanax and cassandra. Astyanax if I remember worked

Re: Using CQL to insert a column to a row dynamically

2013-05-27 Thread Hiller, Dean
Wide rows, dynamic columns are still possible in CQL3. There are some links here http://comments.gmane.org/gmane.comp.db.cassandra.user/30321 Also, there are other advantages to noSQL, not just schemaless aspect such as that it can accept tons of writes and you can scale the writes(you can't

Re: exception causes streaming to hang forever

2013-05-24 Thread Hiller, Dean
: What kind of error does the other end of streaming(/10.10.42.36) say? On Wed, May 22, 2013 at 5:19 PM, Hiller, Dean dean.hil...@nrel.gov wrote: We had 3 nodes roll on good and the next 2, we see a remote node with this exception every time we start over and bootstrap the node ERROR [Streaming

found the issue on bootstrap streaming hang

2013-05-24 Thread Hiller, Dean
For anyone else that might be interested, when the stream hangs, there is no exceptions around that time frame as to what exactly happened and why it hung(there is an exception just not informative at all). We did find other exceptions that we thought were unrelated though days before. We

Re: exception causes streaming to hang forever

2013-05-24 Thread Hiller, Dean
, May 24, 2013 at 6:56 AM, Hiller, Dean dean.hil...@nrel.gov wrote: The exception on that node was just this ERROR [Thread-6056] 2013-05-22 14:47:59,416 CassandraDaemon.java (line 132) Exception in thread Thread[Thread-6056,5,main] java.lang.IndexOutOfBoundsException

changing ips on node replacement

2013-05-24 Thread Hiller, Dean
I seem to remember problems with ghost nodes, etc. and I seem to remember if you are replacing a node and you don’t use the same ip, this can cause issues. Is this correct? We would like the new node to keep the same token, and the same host name but are wondering if we can change the ip

Re: High performance disk io

2013-05-22 Thread Hiller, Dean
Well, if you just want to lower your I/O util %, you could always just add more nodes to the cluster ;). Dean From: Igor i...@4friends.od.uamailto:i...@4friends.od.ua Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: High performance disk io

2013-05-22 Thread Hiller, Dean
If you are only running repair on one node, should it not skip that node? So there should be no performance hit except when doing CL_ALL of course. We had to make a change to cassandra or slow nodes did impact us previously. Dean From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com

exception causes streaming to hang forever

2013-05-22 Thread Hiller, Dean
We had 3 nodes roll on good and the next 2, we see a remote node with this exception every time we start over and bootstrap the node ERROR [Streaming to /10.10.42.36:2] 2013-05-22 14:47:59,404 CassandraDaemon.java (line 132) Exception in thread Thread[Streaming to /10.10.42.36:2,5,main]

bootstrapping a new node...

2013-05-21 Thread Hiller, Dean
We are using 1.2.2 cassandra and have rolled on 3 additionals nodes to our 6 node cluster(totalling 9 so far). We are trying to roll on node 10 but during the streaming a compaction kicked off which seemed very odd to us. nodetool netstats still reported tons of files that were not

Re: any way to get the #writes/second, reads per second

2013-05-14 Thread Hiller, Dean
://www.tomas.cat/blog/en/monitoring-cassandra-relevant-data-should-be-watched-and-how-send-it-graphite 2013/5/13 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Hiller, Dean
We had to roll out a fix in cassandra as a slow node was slowing down our clients of cassandra in 1.2.2 for some reason. Every time we had a slow node, we found out fast as performance degraded. We tested this in QA and had the same issue. This means a repair made that node slow which made

Re: (better info)any way to get the #writes/second, reads per second

2013-05-14 Thread Hiller, Dean
in the data distribution ? Did it settle down ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/05/2013, at 5:06 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Ah, okay iostat -x NEEDS

any way to get the #writes/second, reads per second

2013-05-13 Thread Hiller, Dean
We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production not so much). One mistake that was made was starting up the new node, then disabling the firewall :( which allowed nodes to discover it BEFORE the node

(better info)any way to get the #writes/second, reads per second

2013-05-13 Thread Hiller, Dean
nodetool compactionstats Any reason why cassandra might be reading a lot from the data disks(not the commit log disk) more than usual? Thanks, Dean On 5/13/13 10:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We running a pretty consistent load on our cluster and added a new node to a 6 node cluster

Re: Replica info

2013-05-08 Thread Hiller, Dean
nodetool describering {keyspace} From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, May 8, 2013 3:00 PM To:

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
We use PlayOrm to do 60,000 different streams which are all time series and use the virtual column families of PlayOrm so they are all in one column family. We then partition by time as well. I don't believe that we really have any hotspots from what I can tell. Dean From: Keith Wright

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
, event_id UUID, app_id INT, event_time TIMESTAMP, user_id INT, Š. PRIMARY KEY (hour, event_time, event_id) ) WITH CLUSTERING ORDER BY (event_time desc); Is this what others are doing? On 5/7/13 4:18 PM, Hiller, Dean dean.hil...@nrel.gov wrote: We use PlayOrm to do 60,000

Re: hector or astyanax

2013-05-06 Thread Hiller, Dean
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet. I think you do get a big speed advantage from the asynchronous

Re: hector or astyanax

2013-05-06 Thread Hiller, Dean
without actual benchmarks to back them up. I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate. Just my .02. :) On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean dean.hil

Re: multitenant support with key spaces

2013-05-06 Thread Hiller, Dean
Another option may be virtual column families with PlayOrm. We currently do around 60,000 column families to store data from 60,000 different sensors that keep feeding us information. Dean On 5/6/13 11:18 AM, Robert Coli rc...@eventbrite.com wrote: On Sun, May 5, 2013 at 11:37 PM, Darren

lastest PlayOrm released for cassandra and mongodb

2013-04-26 Thread Hiller, Dean
PlayOrm now supports mongodb and cassandra with a query language that is portable across both systems as well. https://github.com/deanhiller/playorm Later, Dean

Re: Is Cassandra oversized for this kind of use case?

2013-04-26 Thread Hiller, Dean
Well, it depends more on what you will do with the data. I know I was on a sybase(RDBMS) with 1 billion rows but it was getting close to not being able to handle more (constraints had to be turned off and all sorts of optimizations done and expert consultants brought in and everything). BUT

Re: Is Cassandra oversized for this kind of use case?

2013-04-26 Thread Hiller, Dean
Nodes ? Can i virtualize these two Nodes ? Thx a lot for your assistance. Marc 2013/4/26 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Well, it depends more on what you will do with the data. I know I was on a sybase(RDBMS) with 1 billion rows but it was getting close

compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
I was wondering about the compactionthroughput. I never see ours get even close to 16MB and I thought this is supposed to throttle compaction, right? Ours is constantly less than 3MB/sec from looking at our logs or do I have this totally wrong? How can I see the real throughput so that I can

Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
. In those cases it may actually throttle something. On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I was wondering about the compactionthroughput. I never see ours get even close to 16MB and I thought this is supposed to throttle compaction

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4) On Tue, Apr 23, 2013 at 6:02 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Out of curiosity, why did cassandra choose

Re: move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Hiller, Dean
We went from 1.1.4 to 1.2.2 and in QA rolling restart failed but in production and QA bringing down the whole cluster upgrading every node and then bringing it back up worked fine. We left ours at randompartitioner and had LCS as well. We did not convert to Vnodes at all. Don't know if it

Re: move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Hiller, Dean
: Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org; Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Sent: Tuesday, April 23, 2013 11:17 AM Subject: Re: move data from

  1   2   3   4   >