Great, that worked, thanks for your time. On Thu, Mar 10, 2011 at 4:57 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
> Drop the index, then restart once more. It shouldn't try to rebuild > the index after that. > > On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy <stinkym...@gmail.com> > wrote: > > Sorry, I wasn't clear on the timeline of events. I started the index > build > > and then posted this message to the list. Once I read the links you > posted, > > I did expect the cluster to crash, but I let it run until it blew up > anyway, > > since I didn't really know how to stop the index build. > > > > Which is sort of where I'm still stuck, I don't want to corrupt that > column > > family by issuing an "update column family" that has a smaller set of > > indexes while the index build is going on without some encouragement from > > the list that doing that won't wreck the column family. Is there a safe > way > > to tell an index build to stop after the cluster starts up from a crash > due > > to the index build? > > > > Thanks, > > Matt > > > > On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> If you read the bugs I linked, you would see that this is expected > >> behavior with 0.7.3 once you get more data than you can index > >> in-memory. > >> > >> You should wait for the next Hudson build (which will include 2295) > >> and use that. Or, create your indexes before adding the data. > >> > >> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com> > >> wrote: > >> > Well it looks like the index creation job crashed the cluster. All of > >> > the > >> > nodes were down having dumped out .hprof files. I brought the cluster > >> > back > >> > up and when I do "describe keyspace ks" it looks like the index build > >> > process has started over again. Is it safe to attempt to stop that by > >> > running an "update column family" command with fewer indexes defined? > >> > Or is > >> > there a better way to safely terminate this index creation process > that > >> > I > >> > assume will crash the cluster again eventually? > >> > > >> > Would creating the indexes one at a time help? Or will the same > problem > >> > occur once I get to a certain number of indexes on the column family? > >> > > >> > Thanks, > >> > Matt > >> > > >> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com> > >> > wrote: > >> >> > >> >> https://issues.apache.org/jira/browse/CASSANDRA-2294 > >> >> https://issues.apache.org/jira/browse/CASSANDRA-2295 > >> >> > >> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com> > >> >> wrote: > >> >> > I'm trying to gain some insight into what happens with a cluster > when > >> >> > indexes are being built, or when CFs with indexed columns are being > >> >> > written > >> >> > to. > >> >> > > >> >> > Over the past couple of days we've been doing some loads into a CF > >> >> > with > >> >> > 29 > >> >> > indexed columns. Eventually, the nodes just got overwhelmed and > the > >> >> > client > >> >> > (Hector) started getting timeouts. We were using using a MapReduce > >> >> > job > >> >> > to > >> >> > load an HDFS file into Cassandra, though we had limited the load > job > >> >> > to > >> >> > one > >> >> > task per node. My confusion comes from how difficult it was to > know > >> >> > that > >> >> > the nodes were becoming overwhelmed. The ring consistently > reported > >> >> > that > >> >> > all nodes were up and it did not appear that there were pending > >> >> > operations > >> >> > under tpstats. I also monitor this cluster with Ganglia, and at no > >> >> > point > >> >> > did any of the machine loads appear very high at all, yet our job > >> >> > kept > >> >> > failing with Hector reporting timeouts. > >> >> > > >> >> > Today we decided to leave index creation until the end, and just > load > >> >> > the > >> >> > data using the same Hector code. We bumped up the hadoop > concurrency > >> >> > to > >> >> > two > >> >> > concurrent tasks per node, and everything went fine, as expected, > >> >> > we've > >> >> > done > >> >> > much larger loads than this using Hadoop and as long as you don't > >> >> > shoot > >> >> > for > >> >> > too much concurrency, Cassandra can deal with it. So now we have > the > >> >> > data > >> >> > in the column family and I updated the column family metadata in > the > >> >> > CLI > >> >> > to > >> >> > enable the 29 indexes. As soon as I do that, the ring starts > >> >> > reporting > >> >> > that > >> >> > nodes are down intermittently, and HintedHandoffs are starting to > >> >> > accumulate > >> >> > under tpstats. Ganglia is reporting very low overall load, so I'm > >> >> > wondering > >> >> > why it's taking so long for cli and nodetool commands to return. > >> >> > > >> >> > I'm just trying to get a better handle on what kind of actions have > a > >> >> > serious impact on cluster availability and to know the right places > >> >> > to > >> >> > look > >> >> > to try to get ahead of those conditions. > >> >> > > >> >> > Thanks for any insight you can provide, > >> >> > Matt > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Jonathan Ellis > >> >> Project Chair, Apache Cassandra > >> >> co-founder of DataStax, the source for professional Cassandra support > >> >> http://www.datastax.com > >> > > >> > > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of DataStax, the source for professional Cassandra support > >> http://www.datastax.com > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >