Re: Understanding index builds (updated: crashed cluster)

Matt Kennedy Thu, 10 Mar 2011 10:27:14 -0800

Well it looks like the index creation job crashed the cluster.  All of the
nodes were down having dumped out .hprof files.  I brought the cluster back
up and when I do "describe keyspace ks" it looks like the index build
process has started over again.  Is it safe to attempt to stop that by
running an "update column family" command with fewer indexes defined?  Or is
there a better way to safely terminate this index creation process that I
assume will crash the cluster again eventually?


Would creating the indexes one at a time help? Or will the same problem
occur once I get to a certain number of indexes on the column family?

Thanks,
Matt

On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-2294
> https://issues.apache.org/jira/browse/CASSANDRA-2295
>
> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com> wrote:
> > I'm trying to gain some insight into what happens with a cluster when
> > indexes are being built, or when CFs with indexed columns are being
> written
> > to.
> >
> > Over the past couple of days we've been doing some loads into a CF with
> 29
> > indexed columns.  Eventually, the nodes just got overwhelmed and the
> client
> > (Hector) started getting timeouts.  We were using using a MapReduce job
> to
> > load an HDFS file into Cassandra, though we had limited the load job to
> one
> > task per node.  My confusion comes from how difficult it was to know that
> > the nodes were becoming overwhelmed.  The ring consistently reported that
> > all nodes were up and it did not appear that there were pending
> operations
> > under tpstats.  I also monitor this cluster with Ganglia, and at no point
> > did any of the machine loads appear very high at all, yet our job kept
> > failing with Hector reporting timeouts.
> >
> > Today we decided to leave index creation until the end, and just load the
> > data using the same Hector code.  We bumped up the hadoop concurrency to
> two
> > concurrent tasks per node, and everything went fine, as expected, we've
> done
> > much larger loads than this using Hadoop and as long as you don't shoot
> for
> > too much concurrency, Cassandra can deal with it.  So now we have the
> data
> > in the column family and I updated the column family metadata in the CLI
> to
> > enable the 29 indexes.  As soon as I do that, the ring starts reporting
> that
> > nodes are down intermittently, and HintedHandoffs are starting to
> accumulate
> > under tpstats. Ganglia is reporting very low overall load, so I'm
> wondering
> > why it's taking so long for cli and nodetool commands to return.
> >
> > I'm just trying to get a better handle on what kind of actions have a
> > serious impact on cluster availability and to know the right places to
> look
> > to try to get ahead of those conditions.
> >
> > Thanks for any insight you can provide,
> > Matt
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Understanding index builds (updated: crashed cluster)

Reply via email to