Re: Commit log + Data directory on same partition (software raid)

2012-08-11 Thread Thibaut Britz
Unfortunately ssd drives are no option at the moment. I have to use 2
regular hds. Has anyone tried above scenario?

THanks,
Thibaut


On Fri, Aug 10, 2012 at 3:30 PM, Radim Kolar  wrote:

>
>  I was thinking about putting both the commit log and the data directory
>> on a software raid partition spanning over the two disks. Would this
>> increase the general read performance? In theory I could get twice the read
>> performance, but I don't know how the commit log will influence the read
>> performance on both disks?
>>
> zfs + ssd cache is best. get freebsd 8.3 and install cassandra from ports.
>
>


Commit log + Data directory on same partition (software raid)

2012-08-10 Thread Thibaut Britz
Hi,

Has anyone of you made some experience with software raid (raid 1,
mirroring 2 disks)?

Our workload is rather read based at the moment (Commit Log directory only
grows by 128MB every 2-3 minutes), while the second hd is under high load
due to the read requests to our cassandra cluster.

I was thinking about putting both the commit log and the data directory on
a software raid partition spanning over the two disks. Would this increase
the general read performance? In theory I could get twice the read
performance, but I don't know how the commit log will influence the read
performance on both disks?

Thanks,
Thibaut


Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Thibaut Britz
I will disable read repair for slice requests fully (we can handle those on
the application side) until we upgrade to 1.0.8.

Thanks,
Thibaut


On Wed, Apr 11, 2012 at 7:04 PM, Jeremy Hanna wrote:

> I backported this to 0.8.4 and it didn't fix the problem we were seeing
> (as I outlined in my parallel post) but if it fixes it for you, then
> beautiful.  Just wanted to let you know our experience with similar
> symptoms.
>
> On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote:
>
> > Fixed in  https://issues.apache.org/jira/browse/CASSANDRA-3843
> >
> >
> >
> > On Wed, Apr 11, 2012 at 5:58 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
> > We have read repair disabled (0.0).
> >
> > Even if this would be the case, this also doesn't explain why the writes
> are executed again and again when going over the same range again and again.
> >
> > The keyspace is new, it doesn't contain any thumbstones and only 1
> keys.
> >
> >
> >
> > On Wed, Apr 11, 2012 at 5:52 PM, R. Verlangen  wrote:
> > Are you sure this isn't read-repair?
> http://wiki.apache.org/cassandra/ReadRepair
> >
> >
> > 2012/4/11 Thibaut Britz 
> > Also executing the same multiget rangeslice query over the same range
> again will trigger the same writes again and again.
> >
> > On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
> > Hi,
> >
> > I just diagnosted this strange behavior:
> >
> > When I fetch a rangeslice through hector and set the consistency level
> to quorum, according to cfstats (and also to the output files on the hd),
> cassandra seems to execute a write request for each read I execute. The
> write count in cfstats is increased when I execute the rangeslice function
> over the same range again and again (without saving anything at all).
> >
> > If I set the consitency level to ONE, no writes are executed.
> >
> > How can I disable this? Why are the records rewritten each time, even
> though I don't want them to be rewritten?
> >
> > Thanks,
> > Thibaut.
> >
> >
> > Code:
> > Keyspace ks = getConnection(cluster,
> consistencylevel);
> >
> >   RangeSlicesQuery
> rangeSlicesQuery = HFactory.createRangeSlicesQuery(ks,
> StringSerializer.get(), StringSerializer.get(), s);
> >
> >
> rangeSlicesQuery.setColumnFamily(columnFamily);
> >   rangeSlicesQuery.setColumnNames(column);
> >
> >   rangeSlicesQuery.setKeys(start, end);
> >   rangeSlicesQuery.setRowCount(maxrows);
> >
> >   QueryResult V>> result = rangeSlicesQuery.execute();
> >   return result.get();
> >
> >
> >
> >
> >
> >
> >
> > --
> > With kind regards,
> >
> > Robin Verlangen
> > www.robinverlangen.nl
> >
> >
> >
>
>


Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Thibaut Britz
Fixed in  https://issues.apache.org/jira/browse/CASSANDRA-3843



On Wed, Apr 11, 2012 at 5:58 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> We have read repair disabled (0.0).
>
> Even if this would be the case, this also doesn't explain why the writes
> are executed again and again when going over the same range again and again.
>
> The keyspace is new, it doesn't contain any thumbstones and only 1
> keys.
>
>
>
> On Wed, Apr 11, 2012 at 5:52 PM, R. Verlangen  wrote:
>
>> Are you sure this isn't read-repair?
>> http://wiki.apache.org/cassandra/ReadRepair
>>
>>
>> 2012/4/11 Thibaut Britz 
>>
>>> Also executing the same multiget rangeslice query over the same range
>>> again will trigger the same writes again and again.
>>>
>>> On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz <
>>> thibaut.br...@trendiction.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I just diagnosted this strange behavior:
>>>>
>>>> When I fetch a rangeslice through hector and set the consistency level
>>>> to quorum, according to cfstats (and also to the output files on the hd),
>>>> cassandra seems to execute a write request for each read I execute. The
>>>> write count in cfstats is increased when I execute the rangeslice function
>>>> over the same range again and again (without saving anything at all).
>>>>
>>>> If I set the consitency level to ONE, no writes are executed.
>>>>
>>>> How can I disable this? Why are the records rewritten each time, even
>>>> though I don't want them to be rewritten?
>>>>
>>>> Thanks,
>>>> Thibaut.
>>>>
>>>>
>>>> Code:
>>>> Keyspace ks = getConnection(cluster,
>>>> consistencylevel);
>>>>
>>>>  RangeSlicesQuery rangeSlicesQuery =
>>>> HFactory.createRangeSlicesQuery(ks, StringSerializer.get(),
>>>> StringSerializer.get(), s);
>>>>
>>>> rangeSlicesQuery.setColumnFamily(columnFamily);
>>>> rangeSlicesQuery.setColumnNames(column);
>>>>
>>>> rangeSlicesQuery.setKeys(start, end);
>>>> rangeSlicesQuery.setRowCount(maxrows);
>>>>
>>>> QueryResult> result =
>>>> rangeSlicesQuery.execute();
>>>> return result.get();
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> With kind regards,
>>
>> Robin Verlangen
>> www.robinverlangen.nl
>>
>>
>


Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Thibaut Britz
We have read repair disabled (0.0).

Even if this would be the case, this also doesn't explain why the writes
are executed again and again when going over the same range again and again.

The keyspace is new, it doesn't contain any thumbstones and only 1 keys.



On Wed, Apr 11, 2012 at 5:52 PM, R. Verlangen  wrote:

> Are you sure this isn't read-repair?
> http://wiki.apache.org/cassandra/ReadRepair
>
>
> 2012/4/11 Thibaut Britz 
>
>> Also executing the same multiget rangeslice query over the same range
>> again will trigger the same writes again and again.
>>
>> On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> Hi,
>>>
>>> I just diagnosted this strange behavior:
>>>
>>> When I fetch a rangeslice through hector and set the consistency level
>>> to quorum, according to cfstats (and also to the output files on the hd),
>>> cassandra seems to execute a write request for each read I execute. The
>>> write count in cfstats is increased when I execute the rangeslice function
>>> over the same range again and again (without saving anything at all).
>>>
>>> If I set the consitency level to ONE, no writes are executed.
>>>
>>> How can I disable this? Why are the records rewritten each time, even
>>> though I don't want them to be rewritten?
>>>
>>> Thanks,
>>> Thibaut.
>>>
>>>
>>> Code:
>>> Keyspace ks = getConnection(cluster,
>>> consistencylevel);
>>>
>>>  RangeSlicesQuery rangeSlicesQuery =
>>> HFactory.createRangeSlicesQuery(ks, StringSerializer.get(),
>>> StringSerializer.get(), s);
>>>
>>> rangeSlicesQuery.setColumnFamily(columnFamily);
>>> rangeSlicesQuery.setColumnNames(column);
>>>
>>> rangeSlicesQuery.setKeys(start, end);
>>> rangeSlicesQuery.setRowCount(maxrows);
>>>
>>> QueryResult> result =
>>> rangeSlicesQuery.execute();
>>> return result.get();
>>>
>>>
>>>
>>>
>>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> www.robinverlangen.nl
>
>


Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Thibaut Britz
Also executing the same multiget rangeslice query over the same range again
will trigger the same writes again and again.

On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> Hi,
>
> I just diagnosted this strange behavior:
>
> When I fetch a rangeslice through hector and set the consistency level to
> quorum, according to cfstats (and also to the output files on the hd),
> cassandra seems to execute a write request for each read I execute. The
> write count in cfstats is increased when I execute the rangeslice function
> over the same range again and again (without saving anything at all).
>
> If I set the consitency level to ONE, no writes are executed.
>
> How can I disable this? Why are the records rewritten each time, even
> though I don't want them to be rewritten?
>
> Thanks,
> Thibaut.
>
>
> Code:
> Keyspace ks = getConnection(cluster,
> consistencylevel);
>
>  RangeSlicesQuery rangeSlicesQuery =
> HFactory.createRangeSlicesQuery(ks, StringSerializer.get(),
> StringSerializer.get(), s);
>
> rangeSlicesQuery.setColumnFamily(columnFamily);
> rangeSlicesQuery.setColumnNames(column);
>
> rangeSlicesQuery.setKeys(start, end);
> rangeSlicesQuery.setRowCount(maxrows);
>
> QueryResult> result =
> rangeSlicesQuery.execute();
> return result.get();
>
>
>
>


cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Thibaut Britz
Hi,

I just diagnosted this strange behavior:

When I fetch a rangeslice through hector and set the consistency level to
quorum, according to cfstats (and also to the output files on the hd),
cassandra seems to execute a write request for each read I execute. The
write count in cfstats is increased when I execute the rangeslice function
over the same range again and again (without saving anything at all).

If I set the consitency level to ONE, no writes are executed.

How can I disable this? Why are the records rewritten each time, even
though I don't want them to be rewritten?

Thanks,
Thibaut.


Code:
Keyspace ks = getConnection(cluster,
consistencylevel);

RangeSlicesQuery rangeSlicesQuery =
HFactory.createRangeSlicesQuery(ks, StringSerializer.get(),
StringSerializer.get(), s);

rangeSlicesQuery.setColumnFamily(columnFamily);
rangeSlicesQuery.setColumnNames(column);

rangeSlicesQuery.setKeys(start, end);
rangeSlicesQuery.setRowCount(maxrows);

QueryResult> result =
rangeSlicesQuery.execute();
return result.get();


Re: how stable is 1.0 these days?

2012-03-05 Thread Thibaut Britz
Thanks for the feedback. I will certainly execute scrub after the update.


On Mon, Mar 5, 2012 at 11:55 AM, Viktor Jevdokimov wrote:

> 1.0.7 is very stable, weeks in high-load production environment without
> any exception, 1.0.8 should be even more stable, check changes.txt for what
> was fixed.
>
>
> 2012/3/2 Marcus Eriksson 
>
>> beware of https://issues.apache.org/jira/browse/CASSANDRA-3820 though if
>> you have many keys per node
>>
>> other than that, yep, it seems solid
>>
>> /Marcus
>>
>>
>> On Wed, Feb 29, 2012 at 6:20 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> Thanks!
>>>
>>> We will test it on our test cluster in the coming weeks and hopefully
>>> put it into production on our 200 node main cluster. :)
>>>
>>> Thibaut
>>>
>>> On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo 
>>> wrote:
>>>
>>>> On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz
>>>>  wrote:
>>>> > Any more feedback on larger deployments of 1.0.*?
>>>> >
>>>> > We are eager to try out the new features in production, but don't
>>>> want to
>>>> > run into bugs as on former 0.7 and 0.8 versions.
>>>> >
>>>> > Thanks,
>>>> > Thibaut
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston <
>>>> ben.covers...@datastax.com>
>>>> > wrote:
>>>> >>
>>>> >> I'm not sure what Carlo is referring to, but generally if you have
>>>> done,
>>>> >> thousands of migrations you can end up in a situation where the
>>>> migrations
>>>> >> take a long time to replay, and there are some race conditions that
>>>> can be
>>>> >> problematic in the case where there are thousands of migrations that
>>>> may
>>>> >> need to be replayed while a node is bootstrapped. If you get into
>>>> this
>>>> >> situation it can be fixed by copying migrations from a known good
>>>> schema to
>>>> >> the node that you are trying to bootstrap.
>>>> >>
>>>> >> Generally I would advise against frequent schema updates. Unlike
>>>> rows in
>>>> >> column families the schema itself is designed to be relatively
>>>> static.
>>>> >>
>>>> >> On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham <
>>>> jnews...@referentia.com>
>>>> >> wrote:
>>>> >>>
>>>> >>>
>>>> >>> Could you also elaborate for creating/dropping column families?
>>>>  We're
>>>> >>> currently working on moving to 1.0 and using dynamically created
>>>> tables, so
>>>> >>> I'm very interested in what issues we might encounter.
>>>> >>>
>>>> >>> So far the only thing I've encountered (with 1.0.7 + hector 1.0-2)
>>>> is
>>>> >>> that dropping a cf may sometimes fail with UnavailableException.  I
>>>> think
>>>> >>> this happens when the cf is busy being compacted.  When I
>>>> sleep/retry within
>>>> >>> a loop it eventually succeeds.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Jim
>>>> >>>
>>>> >>>
>>>> >>> On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:
>>>> >>>>
>>>> >>>> Can you elaborate on the composite types instabilities ? is this
>>>> >>>> specific to hector as the radim's posts suggests ?
>>>> >>>> These one liner answers are quite stressful :)
>>>> >>>>
>>>> >>>> On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pires
>>>> >>>>  wrote:
>>>> >>>>>
>>>> >>>>> If you need to use composite types and create/drop column
>>>> families on
>>>> >>>>> the
>>>> >>>>> fly you must be prepared to instabilities.
>>>> >>>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Ben Coverston
>>>> >> DataStax -- The Apache Cassandra Company
>>>> >>
>>>> >
>>>>
>>>> I would call 1.0.7 rock fricken solid. Incredibly stable. It has been
>>>> that way since I updated to 0.8.8  really. TBs of data, billions of
>>>> requests a day, and thanks to JAMM, memtable type auto-tuning, and
>>>> other enhancements I rarely, if ever, find a node in a state where it
>>>> requires a restart. My clusters are beast-ing.
>>>>
>>>> There always is bugs in software, but coming from a guy who ran
>>>> cassandra 0.6.1.Administration on my Cassandra cluster is like a
>>>> vacation now.
>>>>
>>>
>>>
>>
>


Re: how stable is 1.0 these days?

2012-02-29 Thread Thibaut Britz
Thanks!

We will test it on our test cluster in the coming weeks and hopefully put
it into production on our 200 node main cluster. :)

Thibaut

On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo wrote:

> On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz
>  wrote:
> > Any more feedback on larger deployments of 1.0.*?
> >
> > We are eager to try out the new features in production, but don't want to
> > run into bugs as on former 0.7 and 0.8 versions.
> >
> > Thanks,
> > Thibaut
> >
> >
> >
> > On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston <
> ben.covers...@datastax.com>
> > wrote:
> >>
> >> I'm not sure what Carlo is referring to, but generally if you have done,
> >> thousands of migrations you can end up in a situation where the
> migrations
> >> take a long time to replay, and there are some race conditions that can
> be
> >> problematic in the case where there are thousands of migrations that may
> >> need to be replayed while a node is bootstrapped. If you get into this
> >> situation it can be fixed by copying migrations from a known good
> schema to
> >> the node that you are trying to bootstrap.
> >>
> >> Generally I would advise against frequent schema updates. Unlike rows in
> >> column families the schema itself is designed to be relatively static.
> >>
> >> On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham 
> >> wrote:
> >>>
> >>>
> >>> Could you also elaborate for creating/dropping column families?  We're
> >>> currently working on moving to 1.0 and using dynamically created
> tables, so
> >>> I'm very interested in what issues we might encounter.
> >>>
> >>> So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is
> >>> that dropping a cf may sometimes fail with UnavailableException.  I
> think
> >>> this happens when the cf is busy being compacted.  When I sleep/retry
> within
> >>> a loop it eventually succeeds.
> >>>
> >>> Thanks,
> >>> Jim
> >>>
> >>>
> >>> On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:
> >>>>
> >>>> Can you elaborate on the composite types instabilities ? is this
> >>>> specific to hector as the radim's posts suggests ?
> >>>> These one liner answers are quite stressful :)
> >>>>
> >>>> On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pires
> >>>>  wrote:
> >>>>>
> >>>>> If you need to use composite types and create/drop column families on
> >>>>> the
> >>>>> fly you must be prepared to instabilities.
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Ben Coverston
> >> DataStax -- The Apache Cassandra Company
> >>
> >
>
> I would call 1.0.7 rock fricken solid. Incredibly stable. It has been
> that way since I updated to 0.8.8  really. TBs of data, billions of
> requests a day, and thanks to JAMM, memtable type auto-tuning, and
> other enhancements I rarely, if ever, find a node in a state where it
> requires a restart. My clusters are beast-ing.
>
> There always is bugs in software, but coming from a guy who ran
> cassandra 0.6.1.Administration on my Cassandra cluster is like a
> vacation now.
>


Re: how stable is 1.0 these days?

2012-02-29 Thread Thibaut Britz
Any more feedback on larger deployments of 1.0.*?

We are eager to try out the new features in production, but don't want to
run into bugs as on former 0.7 and 0.8 versions.

Thanks,
Thibaut



On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston
wrote:

> I'm not sure what Carlo is referring to, but generally if you have done,
> thousands of migrations you can end up in a situation where the migrations
> take a long time to replay, and there are some race conditions that can be
> problematic in the case where there are thousands of migrations that may
> need to be replayed while a node is bootstrapped. If you get into this
> situation it can be fixed by copying migrations from a known good schema to
> the node that you are trying to bootstrap.
>
> Generally I would advise against frequent schema updates. Unlike rows in
> column families the schema itself is designed to be relatively static.
>
> On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham wrote:
>
>>
>> Could you also elaborate for creating/dropping column families?  We're
>> currently working on moving to 1.0 and using dynamically created tables, so
>> I'm very interested in what issues we might encounter.
>>
>> So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is
>> that dropping a cf may sometimes fail with UnavailableException.  I think
>> this happens when the cf is busy being compacted.  When I sleep/retry
>> within a loop it eventually succeeds.
>>
>> Thanks,
>> Jim
>>
>>
>> On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:
>>
>>> Can you elaborate on the composite types instabilities ? is this
>>> specific to hector as the radim's posts suggests ?
>>> These one liner answers are quite stressful :)
>>>
>>> On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pires
>>>  wrote:
>>>
 If you need to use composite types and create/drop column families on
 the
 fly you must be prepared to instabilities.


>>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>
>


Re: Accessing expired data

2012-01-03 Thread Thibaut Britz
Thanks Sylvain!

That's exactly what I needed to know.



On Mon, Jan 2, 2012 at 12:49 PM, Sylvain Lebresne wrote:

> On Mon, Jan 2, 2012 at 11:51 AM, Thibaut Britz
>  wrote:
> > Hi,
> >
> > due to a misconfiguration on our site, some parts of our data got saved
> with
> > a wrong expiration date, which expired just recently.
> >
> > How can I recover the data?
> > Is it sufficient to copy over a backup of the tables into the table
> > directory and iterate over the table (e.g. Read.ALL). Does cassandra
> return
> > expired data in this case?
>
> It won't, unless you trick the nodes by setting their clocks in the
> past. But that is
> not something I would recommend you to do (unless you do that in some
> specific
> test cluster for that purpose only).
>
> > Or will they be silently dropped? Will the
> > sstable2jason output expired data?
>
> It will (if used on sstable that contains the data obviously). In the
> sstable2json
> output, expiring columns should look like:
>  [ column_name, column_value, column_timestamp, "e", column_ttl,
> local_expiration_time ]
> where column_ttl is the ttl you've set on the column and
> local_expiration_time is a timestamp
> of when that data will expire on the node (it's a timestamp in
> milliseconds).
>
> Using this is probably the simplest way to recover from that. A fairly
> simple option could
> be to filter that output by changing the local_expiration_time to
> whatever you want and
> use that as input for json2sstable.
>
> --
> Sylvain
>
> >
> >
> > Thanks,
> > Thibaut
> >
> >
>


Accessing expired data

2012-01-02 Thread Thibaut Britz
Hi,

due to a misconfiguration on our site, some parts of our data got saved
with a wrong expiration date, which expired just recently.

How can I recover the data?
Is it sufficient to copy over a backup of the tables into the table
directory and iterate over the table (e.g. Read.ALL). Does cassandra return
expired data in this case? Or will they be silently dropped? Will the
sstable2jason output expired data?


Thanks,
Thibaut


Re: 0.8.1: JVM Crash Segmentation Fault

2011-11-02 Thread Thibaut Britz
Hi,

We use these type of crashes as indicator that the node might have some
hardware errors.

Did you check the ram? (eg memtest86)


Thibaut



On Wed, Nov 2, 2011 at 2:03 PM, Jahangir Mohammed
wrote:

> Hello All,
>
> JVM is crashing on the cassandra nodes. Re-start doesn't help for long.
>
> Ring information:
> $ bin/nodetool -h A ring;
> Address DC  RackStatus State   Load
>  OwnsToken
>
>  127605887595351923798765477786913079297
> A   DC1 RAC1Up Normal  83.65 GB25.00%  0
> BDC2 RAC1Down   Normal  170.09 GB   0.00%   1
> C   DC1 RAC1Up Normal  94.6 GB 25.00%
>  42535295865117307932921825928971026432
> DDC2 RAC1Up Normal  87 GB   0.00%
> 42535295865117307932921825928971026433
> E   DC1 RAC1Up Normal  98.05 GB25.00%
>  85070591730234615865843651857942052864
> FDC2 RAC1Up Normal  95.55 GB0.00%
> 85070591730234615865843651857942052865
> G   DC1 RAC1Up Normal  111.22 GB   25.00%
>  127605887595351923798765477786913079296
> HDC2 RAC1Up Normal  42.05 GB0.00%
> 127605887595351923798765477786913079297
>
> Details:
> 10GB Heap space.
> Memory on each node = 98 GB
> Disk space on each node = 400 GB
>
> JVM Crashes with segmentation faults. Have to do frequent re-starts of the
> nodes.
> Space on B is 170 GB and is getting CPU bound on re-start. but didn't get
> added to ring for almost 7 hours now.
>
> Java version:
>  java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> JVM Crash Error log:
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x2abc7ec41fbc, pid=14232, tid=1104185664
> #
> # JRE version: 6.0_24-b07
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x30ffbc]
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
>
> ---  T H R E A D  ---
>
> Current thread (0x4d374000):  GCTaskThread [stack:
> 0x,0x] [id=14243]
>
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
> si_addr=0x0010
>
> Registers:
>
>
> Any ideas/suggestions? Any preferred JVM version? There is nothing in
> cassandra logs to identify what's going on.
>
> Thanks,
> Jahangir.
>


Re: [RELEASE] Apache Cassandra 1.0 released

2011-10-18 Thread Thibaut Britz
Great news!

Especially the improved read performance and compactions are great!

Thanks,
Thibaut


On Tue, Oct 18, 2011 at 2:11 PM, Jonathan Ellis  wrote:
> Thanks for the help, everyone!  This is a great milestone for Cassandra.
>
> On Tue, Oct 18, 2011 at 7:01 AM, Sylvain Lebresne  
> wrote:
>> The Cassandra team is very pleased to announce the release of Apache 
>> Cassandra
>> version 1.0.0. Cassandra 1.0.0 is a new major release that build upon the
>> awesomeness of previous versions and adds numerous improvements[1,2], amongst
>> which:
>>  - Compression of on-disk data files (SSTables), with checksummed blocks to
>>    protect against bitrot[4].
>>  - Improvements to memory management through off-heap caches, arena
>>    allocation and automatic self-tuning, for less GC pauses and more
>>    predictable performances[5].
>>  - Better disk-space management: better control of the space taken by commit
>>    logs and immediate deletion of obsolete data files.
>>  - New optional leveled compaction strategy with more predictable performance
>>    and fixed sstable size[6].
>>  - Improved hinted handoffs, leading to less need for read repair for
>>    better read performances.
>>  - Lots of improvements to performance[7], CQL, repair, easier operation,
>>    etc[8]...
>>
>> And as is the rule for some time now, rolling upgrades from previous versions
>> are supported, so there is nothing stopping you to get all those goodies 
>> right
>> now!
>>
>> Both source and binary distributions of Cassandra 1.0.0 can be downloaded at:
>>
>>  http://cassandra.apache.org/download/
>>
>> Or you can use the debian package available from the project APT 
>> repository[3]
>> (you will need to use the 10x series).
>>
>> The download page also link to the CQL drivers that, from this release on, 
>> are
>> maintained out of tree[9].
>>
>>
>> That's all folks!
>>
>> [1]: http://goo.gl/t3qpw (CHANGES.txt)
>> [2]: http://goo.gl/6t0qN (NEWS.txt)
>> [3]: http://wiki.apache.org/cassandra/DebianPackaging
>> [4]: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
>> [5]: 
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
>> [6]: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>> [7]: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
>> [8]: 
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-windows-service-new-cql-clients-and-more
>> [9]: http://acunu.com/blogs/eric-evans/cassandra-drivers-released/
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-22 Thread Thibaut Britz
Hi,

Thanks for explaining: As I understood each node now only displays
it's local view of the the data it cotains, and not the global view
anymore.

One more question:
Why do the nodes at the end of the ring only show the % load from 2
nodes and not from 3?
We are always writing with quorum, so there should also be data on the
adjacent nodes? Or are the quorum writes not working as expected (only
writing to 2 nodes) instead of 3 at the beginning and end of the
cluster?

Thanks,
Thibaut


On Mon, Aug 22, 2011 at 12:01 AM, aaron morton  wrote:
> I'm not sure what the fix is.
>
> When using an order preserving partitioner it's up to you to ensure the ring 
> is correctly balanced.
>
> Say you have the following setup…
>
> node : token
> 1 : a
> 2 : h
> 3 : p
>
> If keys are always 1 character we can say each node own's roughly 33% of the 
> ring. Because we know there are only 26 possible keys.
>
> With the RP we know how many keys there are, the output of the md5 
> calculation is a 128 bit integer. So we can say what fraction of the total 
> each range is.
>
> If in the example above keys are of any length, how many values exist between 
> a and h ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/08/2011, at 3:33 AM, Thibaut Britz wrote:
>
>> Hi,
>>
>> I will wait until this is fixed beforeI upgrade, just to be sure.
>>
>> Shall I open a new ticket for this issue?
>>
>> Thanks,
>> Thibaut
>>
>> On Sun, Aug 21, 2011 at 11:57 AM, aaron morton  
>> wrote:
>>> This looks like an artifact of the way ownership is calculated for the OOP.
>>> See 
>>> https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177
>>>  it
>>> was changed in this ticket
>>> https://issues.apache.org/jira/browse/CASSANDRA-2800
>>> The change applied in CASSANDRA-2800 was not applied to the
>>> AbstractByteOrderPartitioner. Looks like it should have been. I'll chase
>>> that up.
>>>
>>> When each node calculates the ownership for the token ranges (for OOP and
>>> BOP) it's based on the number of keys the node has in that range. As there
>>> is no way for the OOP to understand the range of values the keys may take.
>>> If you look at the 192 node it's showing ownership most with 192, 191 and
>>> 190 - so i'm assuming RF3 and 192 also has data from the ranges owned by 191
>>> and 190.
>>> IMHO you can ignore this.
>>> You can use load the the number of keys estimate from cfstats to get an idea
>>> of whats happening.
>>> Hope that helps.
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> On 19/08/2011, at 9:42 PM, Thibaut Britz wrote:
>>>
>>> Hi,
>>>
>>> we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
>>> production and wanted to upgrade to 0.8.4.
>>>
>>> Our cluster was well balanced and we only saved keys with a lower case
>>> md5 prefix. (Orderpreserving partitioner).
>>> Each node owned 20% of the tokens, which was also displayed on each
>>> node in nodetool -h localhost ring.
>>>
>>> After upgrading, our well balanced cluster shows completely wrong
>>> percentage on who owns which keys:
>>>
>>> *.*.*.190:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 34.57%  2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 0.02%   55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 0.02%   80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 0.02%   aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 65.36%  
>>>
>>> *.*.*.191:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 36.46%  2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 26.02%  55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 0.02%   80
>&g

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread Thibaut Britz
Hi,

I will wait until this is fixed beforeI upgrade, just to be sure.

Shall I open a new ticket for this issue?

Thanks,
Thibaut

On Sun, Aug 21, 2011 at 11:57 AM, aaron morton  wrote:
> This looks like an artifact of the way ownership is calculated for the OOP.
> See https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177 it
> was changed in this ticket
> https://issues.apache.org/jira/browse/CASSANDRA-2800
> The change applied in CASSANDRA-2800 was not applied to the
> AbstractByteOrderPartitioner. Looks like it should have been. I'll chase
> that up.
>
> When each node calculates the ownership for the token ranges (for OOP and
> BOP) it's based on the number of keys the node has in that range. As there
> is no way for the OOP to understand the range of values the keys may take.
> If you look at the 192 node it's showing ownership most with 192, 191 and
> 190 - so i'm assuming RF3 and 192 also has data from the ranges owned by 191
> and 190.
> IMHO you can ignore this.
> You can use load the the number of keys estimate from cfstats to get an idea
> of whats happening.
> Hope that helps.
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 19/08/2011, at 9:42 PM, Thibaut Britz wrote:
>
> Hi,
>
> we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
> production and wanted to upgrade to 0.8.4.
>
> Our cluster was well balanced and we only saved keys with a lower case
> md5 prefix. (Orderpreserving partitioner).
> Each node owned 20% of the tokens, which was also displayed on each
> node in nodetool -h localhost ring.
>
> After upgrading, our well balanced cluster shows completely wrong
> percentage on who owns which keys:
>
> *.*.*.190:
> Address DC  Rack    Status State   Load
> Owns    Token
>
>    
> *.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
> 34.57%  2a
> *.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
> 0.02%   55
> *.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
> 0.02%   80
> *.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
> 0.02%   aa
> *.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
> 65.36%  
>
> *.*.*.191:
> Address DC  Rack    Status State   Load
> Owns    Token
>
>    
> *.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
> 36.46%  2a
> *.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
> 26.02%  55
> *.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
> 0.02%   80
> *.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
> 0.02%   aa
> *.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
> 37.48%  
>
> *.*.*.192:
> Address DC  Rack    Status State   Load
> Owns    Token
>
>    
> *.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
> 38.16%  2a
> *.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
> 27.61%  55
> *.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
> 34.17%  80
> *.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
> 0.02%   aa
> *.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
> 0.02%   
>
> *.*.*.194:
> Address DC  Rack    Status State   Load
> Owns    Token
>
>    
> *.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
> 0.03%   2a
> *.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
> 31.43%  55
> *.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
> 39.69%  80
> *.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
> 28.82%  aa
> *.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
> 0.03%   
>
> *.*.*.196:
> Address DC  Rack    Status State   Load
> Owns    Token
>
>    
> *.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
> 0.02%   2a
> *.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
> 0.02%   55
> *.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
> 0.02%   80
> *.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
> 27.52%  aa
> *.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
> 72.42%  
>
>
> Interestingly, each server shows something completely different.
>
> Removing the locationInfo files didn't help.
> -Dcassandra.load_ring_state=false didn't help as well.
>
> Our cassandra.yaml is at http://pastebin.com/pCVCt3RM
>
> Any idea on what might cause this? Is it save to suspect that
> operating under this distribution will cause severe data loss? Or can
> I safely ignore this?
>
> Thanks,
> Thibaut
>
>


Cluster key distribution wrong after upgrading to 0.8.4

2011-08-19 Thread Thibaut Britz
Hi,

we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
production and wanted to upgrade to 0.8.4.

Our cluster was well balanced and we only saved keys with a lower case
md5 prefix. (Orderpreserving partitioner).
Each node owned 20% of the tokens, which was also displayed on each
node in nodetool -h localhost ring.

After upgrading, our well balanced cluster shows completely wrong
percentage on who owns which keys:

*.*.*.190:
Address DC  RackStatus State   Load
OwnsToken


*.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
34.57%  2a
*.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
0.02%   55
*.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
0.02%   80
*.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
0.02%   aa
*.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
65.36%  

*.*.*.191:
Address DC  RackStatus State   Load
OwnsToken


*.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
36.46%  2a
*.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
26.02%  55
*.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
0.02%   80
*.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
0.02%   aa
*.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
37.48%  

*.*.*.192:
Address DC  RackStatus State   Load
OwnsToken


*.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
38.16%  2a
*.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
27.61%  55
*.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
34.17%  80
*.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
0.02%   aa
*.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
0.02%   

*.*.*.194:
Address DC  RackStatus State   Load
OwnsToken


*.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
0.03%   2a
*.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
31.43%  55
*.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
39.69%  80
*.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
28.82%  aa
*.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
0.03%   

*.*.*.196:
Address DC  RackStatus State   Load
OwnsToken


*.*.*.190   datacenter1 rack1   Up Normal  87.95 GB
0.02%   2a
*.*.*.191   datacenter1 rack1   Up Normal  84.3 GB
0.02%   55
*.*.*.192   datacenter1 rack1   Up Normal  79.46 GB
0.02%   80
*.*.*.194   datacenter1 rack1   Up Normal  68.16 GB
27.52%  aa
*.*.*.196   datacenter1 rack1   Up Normal  79.9 GB
72.42%  


Interestingly, each server shows something completely different.

Removing the locationInfo files didn't help.
-Dcassandra.load_ring_state=false didn't help as well.

Our cassandra.yaml is at http://pastebin.com/pCVCt3RM

Any idea on what might cause this? Is it save to suspect that
operating under this distribution will cause severe data loss? Or can
I safely ignore this?

Thanks,
Thibaut


Re: Native heap leaks?

2011-05-05 Thread Thibaut Britz
Reading this, I tried it again (this time on a freshly formated node due to
hd failure).

Before crashing, my data dir was only 7.7M big. Using nmap_indexonly
(mlockall was successfull) on a 64bit machine.

Anything else I could try to get this to work? Ps. All the other nodes (and
this node) run fine (even with a few 100 Gigs of data loaded) when I remove
the jna.jar, nodetool  -h localhost info showing more than 800M of free
heap. (of a total of 3256M)

Thanks,
Thibaut








On Thu, May 5, 2011 at 2:49 PM, Chris Burroughs
wrote:

> On 2011-05-05 06:30, Hannes Schmidt wrote:
> > This was my first thought, too. We switched to mmap_index_only and
> > didn't see any change in behavior. Looking at the smaps file attached
> > to my original post, one can see that the mmapped index files take up
> > only a minuscule part of RSS.
>
> I have not looked into smaps before. But it actually seems odd that that
> mmaped Index files are taking up so *little memory*.  Are they only a
> few kb on disk?  Is this a snapshot taken shortly after the process
> started or before the OOM killer is presumably about to come along.  How
> long does it take to go from 1.1 G to 2.1 G resident?  Either way, it
> would be worthwhile to set one node to standard io to make sure it's
> really not mmap causing the problem.
>
> Anyway, assuming it's not mmap, here are the other similar threads on
> the topic.  Unfortunately none of them claim an obvious solution:
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
> http://www.mail-archive.com/user@cassandra.apache.org/msg08063.html
> http://www.mail-archive.com/user@cassandra.apache.org/msg12036.html
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2011-April/004091.html
>


Re: OOM on heavy write load

2011-04-28 Thread Thibaut Britz
Could this be related as well to
https://issues.apache.org/jira/browse/CASSANDRA-2463?

Thibaut


On Wed, Apr 27, 2011 at 11:35 PM, Aaron Morton wrote:

> I'm a bit confused by the two different cases you described, so cannot
> comment specially on your case.
>
> In general if Cassandra is slowing down take a look at the thread pool
> stats, using nodetool tpstats to see where it is backing up and take at look
> at the logs to check for excessive GC. If node stats shows the read or
> mutation stage backing up, check the iostats.
>
> Hope that helps.
> Aaron
>
> On 28/04/2011, at 12:32 AM, Nikolay Kоvshov  wrote:
>
> > I have set quite low memory consumption (see my configuration in first
> message) and give Cassandra 2.7 Gb of memory.
> > I cache 1M of 64-bytes keys + 64 Mb memtables. I believe overhead can't
> be 500% or so ?
> >
> > memtable operations in millions = default 0.3
> >
> > I see now very strange behaviour
> >
> > If i fill Cassandra with, say, 100 millions of 64B key + 64B data and
> after that I start inserting 64B key + 64 KB data, compaction queue
> immediately grows to hundreds and cassandra extremely slows down (it makes
> aroung 30-50 operations/sec), then starts to give timeout errors.
> >
> > But if I insert 64B key + 64 KB data from the very beginning, cassandra
> works fine and makes around 300 operations/sec even when the database
> exceeds 2-3 TB
> >
> > The behaviour is quite complex and i cannot predict the effect of my
> actions. What consequences I will have if I turn off compaction ? Can i read
> somewhere about what is compaction and why it so heavily depends on what and
> in which order i write into cassandra ?
> >
> > 26.04.2011, 00:08, "Shu Zhang" :
> >> the way I measure actual memtable row sizes is this
> >>
> >> write X rows into a cassandra node
> >> trigger GC
> >> record heap usage
> >> trigger compaction and GC
> >> record heap savings and divide by X for actual cassandra memtable row
> size in memory
> >>
> >> Similar process to measure per-key/per-row cache sizes for your data. To
> understand your memtable row overhead size, you can do the above exercise
> with very different data sizes.
> >>
> >> Also, you probably know this, but when setting your memory usage ceiling
> or heap size, make sure to leave a few hundred MBs for GC.
> >> 
> >> From: Shu Zhang [szh...@mediosystems.com]
> >> Sent: Monday, April 25, 2011 12:55 PM
> >> To: user@cassandra.apache.org
> >> Subject: RE: OOM on heavy write load
> >>
> >> How large are your rows? binary_memtable_throughput_in_
> >> mb only tracks size of data, but there is an overhead associated with
> each row on the order of magnitude of a few KBs. If your row data sizes are
> really small then the overhead dominates the memory usage and
> binary_memtable_throughput_in_
> >> mb end up not limiting your memory usage the way you'd expect. It's a
> good idea to specify memtable_operations_in_millions in that case. If you're
> not sure how big your data is compared to memtable overhead, it's a good
> idea to specify both parameters to effectively put in a memory ceiling no
> matter which dominates your actual memory usage.
> >>
> >> It could also be that your key cache is too big, you should measure your
> key sizes and make sure you have enough memory to cache 1m keys (along with
> your memtables). Finally if you have multiple keyspaces (for multiple
> applications) on your cluster, they all share the total available heap, so
> you have to account for that.
> >>
> >> There's no measure in cassandra to guard against OOM, you must configure
> nodes such that the max memory usage on each node, that is max size all your
> caches and memtables can potentially grow to, is less than your heap size.
> >> 
> >> From: Nikolay Kоvshov [nkovs...@yandex.ru]
> >> Sent: Monday, April 25, 2011 5:21 AM
> >> To: user@cassandra.apache.org
> >> Subject: Re: OOM on heavy write load
> >>
> >> I assume if I turn off swap it will just die earlier, no ? What is the
> mechanism of dying ?
> >>
> >> From the link you provided
> >>
> >> # Row cache is too large, or is caching large rows
> >> my row_cache is 0
> >>
> >> # The memtable sizes are too large for the amount of heap allocated to
> the JVM
> >> Is my memtable size too large ? I have made it less to surely fit the
> "magical formula"
> >>
> >> Trying to analyze heap dumps gives me the following:
> >>
> >> In one case diagram has 3 Memtables about 64 Mb each + 72 Mb "Thread" +
> 700 Mb "Unreachable objects"
> >>
> >> suspected threats:
> >> 7 instances of "org.apache.cassandra.db.Memtable", loaded by
> "sun.misc.Launcher$AppClassLoader @ 0x7f29f4992d68" occupy 456,292,912
> (48.36%) bytes.
> >> 25,211 instances of "org.apache.cassandra.io.sstable.SSTableReader",
> loaded by "sun.misc.Launcher$AppClassLoader @ 0x7f29f4992d68" occupy
> 294,908,984 (31.26%) byte
> >> 72 instances of "java.lang.Thread", loaded by ""
> occ

Re: SSTable Corruption

2011-03-24 Thread Thibaut Britz
Just accidently hard resetet a node running 0.7.2 (with some patches from
0.7.3) and had the same problem.

I'm a little hesitating upgrading to 0.7.4

Can I always delete the Statistics.db without any data loss?

Thibaut


On Thu, Mar 24, 2011 at 1:37 AM, Brandon Williams  wrote:

> On Wed, Mar 23, 2011 at 6:52 PM, Erik Onnen  wrote:
>
>> Thanks, so is it the "[Index.db, Statistics.db, Data.db, Filter.db];
>> skipped" that indicates it's in Statistics? Basically I need a way to
>> know if the same is true of all the other tables showing this issue.
>
>
> It's the at
> org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:207)
> that clued me in.
>
> -Brandon
>


Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
Hi Paul,

It's more of a scientific mining app. We crawl websites and extract
information from these websites for our clients. For us, it doesn't really
matter if one cassandra node replies after 1 second or a few ms, as long as
the throughput over time stays high. And so far, this seems to be the case.

If you are using hector, be sure to use the latest hector version. There
were a few bugs related to error handling in earlier versions. (e.g also
threads hanging forever waiting for an answer). I occasionaly see timeouts,
but we then just move to another node and retry.

Thibaut


On Thu, Mar 17, 2011 at 6:53 PM, Paul Pak  wrote:

> On 3/17/2011 1:06 PM, Thibaut Britz wrote:
> > If it helps you to sleep better,
> >
> > we use cassandra  (0.7.2 with the flush fix) in production on > 100
> > servers.
> >
> > Thibaut
> >
>
> Thanks Thibaut, believe it or not, it does. :)
>
> Is your use case a typical web app or something like a scientific/data
> mining app?  I ask because I'm wondering how you have managed to deal
> with the stop-the-world garbage collection issues that seems to hit most
> clusters that have significant load and cause application timeouts.
> Have you found that cassandra scales in read/write capacity reasonably
> well as you add nodes?
>
> Also, you may also want to backport these fixes at a minimum?
>
>  * reduce memory use during streaming of multiple sstables (CASSANDRA-2301)
>  * update memtable_throughput to be a long (CASSANDRA-2158)
>
>
>
>


Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
As for the version,

we will wait a few more days, and if nothing really bad shows up, move to
0.7.4.


On Thu, Mar 17, 2011 at 10:40 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> Hi Paul,
>
> It's more of a scientific mining app. We crawl websites and extract
> information from these websites for our clients. For us, it doesn't really
> matter if one cassandra node replies after 1 second or a few ms, as long as
> the throughput over time stays high. And so far, this seems to be the case.
>
> If you are using hector, be sure to use the latest hector version. There
> were a few bugs related to error handling in earlier versions. (e.g also
> threads hanging forever waiting for an answer). I occasionaly see timeouts,
> but we then just move to another node and retry.
>
> Thibaut
>
>
>
> On Thu, Mar 17, 2011 at 6:53 PM, Paul Pak  wrote:
>
>> On 3/17/2011 1:06 PM, Thibaut Britz wrote:
>> > If it helps you to sleep better,
>> >
>> > we use cassandra  (0.7.2 with the flush fix) in production on > 100
>> > servers.
>> >
>> > Thibaut
>> >
>>
>> Thanks Thibaut, believe it or not, it does. :)
>>
>> Is your use case a typical web app or something like a scientific/data
>> mining app?  I ask because I'm wondering how you have managed to deal
>> with the stop-the-world garbage collection issues that seems to hit most
>> clusters that have significant load and cause application timeouts.
>> Have you found that cassandra scales in read/write capacity reasonably
>> well as you add nodes?
>>
>> Also, you may also want to backport these fixes at a minimum?
>>
>>  * reduce memory use during streaming of multiple sstables
>> (CASSANDRA-2301)
>>  * update memtable_throughput to be a long (CASSANDRA-2158)
>>
>>
>>
>>
>


Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
If it helps you to sleep better,

we use cassandra  (0.7.2 with the flush fix) in production on > 100 servers.

Thibaut

On Thu, Mar 17, 2011 at 5:58 PM, Paul Pak  wrote:

> I'm at a crossroads right now.  We built an application around .7 and
> the features in .7, so going back to .6 wasn't an option for us.  Now,
> we are in the middle of setting up dual mysql and cassandra support so
> that we can "fallback" to mysql if Cassandra can't handle the workload
> properly.  It's a stupid amount of extra work, but I think it's
> unavoidable for us given the state of things with .7.  It also gives us
> the benefit of seeing the true benefit of Cassandra over mysql in our
> particular application and make a decision from there.
>
> Paul
>
> On 3/16/2011 9:03 PM, Joshua Partogi wrote:
> > So did you downgraded it back to 0.6.x series?
> >
>
>


Re: Fill disks more than 50%

2011-02-24 Thread Thibaut Britz
Hi,

How would you use rsync instead of repair in case of a node failure?

Rsync all files from the data directories from the adjacant nodes
(which are part of the quorum group) and then run a compactation which
will? remove all the unneeded keys?

Thanks,
Thibaut


On Thu, Feb 24, 2011 at 4:22 AM, Edward Capriolo  wrote:
> On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
>  wrote:
>> Hi,
>> Given that you have have always increasing key values (timestamps) and never
>> delete and hardly ever overwrite data.
>> If you want to minimize work on rebalancing and statically assign (new)
>> token ranges to new nodes as you add them so they always get the latest
>> data
>> Lets say you add a new node each year to handle next years data.
>> In a scenario like this, could you with 0.7 be able to safely fill disks
>> significantly more than 50% and still manage things like repair/recovery of
>> faulty nodes?
>>
>> Regards,
>> Terje
>
> Since all your data for a day/month/year would sit on the same server.
> Meaning all your servers with old data would be idle and your servers
> with current data would be very busy. This is probably not a good way
> to go.
>
> There is a ticket open for 0.8 for efficient node moves joins. It is
> already a lot better in 0.7. Pretend you did not see this (you can
> join nodes using rsync if you know some tricks) if you are really
> afraid of joins, which you really should not be.
>
> As for the 50% statement. In a worse case scenario a major compaction
> will require double the disk size of your column family. So if you
> have more then 1 column family you do NOT need 50% overhead.
>


Re: Benchmarking Cassandra with YCSB

2011-02-15 Thread Thibaut Britz
Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
What's your CPU usage during these tests?


On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
> Hi there,
>
> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
> High-Mem Quadruple Extra Large EC2 nodes
> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
> (replication factor is 3, random partitioner). We assigned 32 GB RAM
> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
> We also set the user count to a very large number via ulimit -u
> 99.
>
> Our goal is to achieve max throughput by increasing YCSB's threadcount
> parameter (i.e. the number of parallel benchmarking client threads).
> However, this does only improve Cassandra throughput for low numbers
> of threads. If we move to higher threadcounts, throughput does not
> increase and even  decreases. Do you have any idea why this is
> happening and possibly suggestions how to scale throughput to much
> higher numbers? Why is throughput hitting a wall, anyways? And where
> does the latency/throughput tradeoff come from?
>
> Here is our YCSB configuration:
> recordcount=30
> operationcount=100
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0.5
> updateproportion=0.5
> scanproportion=0
> insertproportion=0
> threadcount= 500
> target = 1
> hosts=EC2-1,EC2-2,EC2-3
> requestdistribution=uniform
>
> These are typical results for threadcount=1:
> Loading workload...
> Starting test.
>  0 sec: 0 operations;
>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>
> These are typical results for threadcount=10:
> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>
> These are typical results for threadcount=100:
> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>
> These are typical results for threadcount=500:
> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>
> We never measured more than ~6000 ops/sec. Are there ways to tune
> Cassandra that we are not aware of? We made some modification to the
> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
> switch to 0.7x or 0.8x. However, if this might solve the scaling
> issues, we might consider to port our modifications to a newer
> Cassandra version...
>
> Thanks,
>
> Markus Klems
>
> Karlsruhe Institute of Technology, Germany
>


Re: What if write consistency level cannot me met ?

2011-02-15 Thread Thibaut Britz
Your write will fail. But if the write has reached  at least one node,
it will eventually reach all the other nodes as well. So it won't
rollback.


On Tue, Feb 15, 2011 at 7:38 PM, A J  wrote:
> Say I set write consistency level to ALL and all but one node are down. What
> happens to writes ? Does it rollback from the live node before returning
> failure to client ?
> Thanks.


Re: [0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Thibaut Britz
I think this is related to a faulty disk.


On Mon, Feb 7, 2011 at 3:35 PM, Patrik Modesto  wrote:
> INFO 15:30:49,647 Compacted to
> /www/foo/cassandra/data/foo/Url-tmp-f-767-Data.db.  4,199,999,762 to
> 4,162,579,242 (~99% of original) bytes for 379,179 keys.  Time:
> 137,149ms.
> ERROR 15:30:49,699 Fatal exception in thread 
> Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: java.lang.IllegalArgumentException:
> Illegal mode "w" must be one of "r", "rw", "rws", or "rwd"
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.IllegalArgumentException: Illegal mode "w" must
> be one of "r", "rw", "rws", or "rwd"
>        at java.io.RandomAccessFile.(RandomAccessFile.java:197)
>        at 
> org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:116)
>        at 
> org.apache.cassandra.io.sstable.CacheWriter.saveCache(CacheWriter.java:48)
>        at 
> org.apache.cassandra.db.CompactionManager$9.runMayThrow(CompactionManager.java:746)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        ... 6 more
> ERROR 15:30:49,699 Fatal exception in thread 
> Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: java.lang.IllegalArgumentException:
> Illegal mode "w" must be one of "r", "rw", "rws", or "rwd"
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.IllegalArgumentException: Illegal mode "w" must
> be one of "r", "rw", "rws", or "rwd"
>        at java.io.RandomAccessFile.(RandomAccessFile.java:197)
>        at 
> org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:116)
>        at 
> org.apache.cassandra.io.sstable.CacheWriter.saveCache(CacheWriter.java:48)
>        at 
> org.apache.cassandra.db.CompactionManager$9.runMayThrow(CompactionManager.java:746)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        ... 6 more
>


Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz
Had the same Problem a while ago. Upgrading solved the problem (Don't know
if you have to redeploy your cluster though)

http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html


On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory  wrote:

> @Thibaut wrong email? Or how's "Avoid dropping messages off the client
> request path" (CASSANDRA-1676) related to the bootstrap questions I had?
>
>
> On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> https://issues.apache.org/jira/browse/CASSANDRA-1676
>>
>> you have to use at least 0.6.7
>>
>>
>>
>> On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo wrote:
>>
>>> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
>>> > In storage-conf I see this comment [1] from which I understand that the
>>> > recommended way to bootstrap a new node is to set AutoBootstrap=true
>>> and
>>> > remove itself from the seeds list.
>>> > Moreover, I did try to set AutoBootstrap=true and have the node in its
>>> own
>>> > seeds list, but it would not bootstrap. I don't recall the exact
>>> message but
>>> > it was something like "I found myself in the seeds list therefore I'm
>>> not
>>> > going to bootstrap even though AutoBootstrap is true".
>>> >
>>> > [1]
>>> >   
>>> >   false
>>> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
>>> wrote:
>>> >>
>>> >> If "seed list should be the same across the cluster" that means that
>>> nodes
>>> >> *should* have themselves as a seed. If that doesn't work for Ran, then
>>> that
>>> >> is the first problem, no?
>>> >>
>>> >>
>>> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani 
>>> wrote:
>>> >>>
>>> >>> Well your ring issues don't make sense to me, seed list should be the
>>> >>> same across the cluster.
>>> >>> I'm just thinking of other things to try, non-boostrapped nodes
>>> should
>>> >>> join the ring instantly but reads will fail if you aren't using
>>> quorum.
>>> >>>
>>> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>>> >>>>
>>> >>>> I haven't tried repair.  Should I?
>>> >>>>
>>> >>>> On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>>> >>>> > Have you tried not bootstrapping but setting the token and
>>> manually
>>> >>>> > calling
>>> >>>> > repair?
>>> >>>> >
>>> >>>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
>>> wrote:
>>> >>>> >
>>> >>>> >> My conclusion is lame: I tried this on several hosts and saw the
>>> same
>>> >>>> >> behavior, the only way I was able to join new nodes was to first
>>> >>>> >> start them
>>> >>>> >> when they are *not in* their own seeds list and after they
>>> >>>> >> finish transferring the data, then restart them with themselves
>>> *in*
>>> >>>> >> their
>>> >>>> >> own seeds list. After doing that the node would join the ring.
>>> >>>> >> This is either my misunderstanding or a bug, but the only place I
>>> >>>> >> found it
>>> >>>> >> documented stated that the new node should not be in its own
>>> seeds
>>> >>>> >> list.
>>> >>>> >> Version 0.6.6.
>>> >>>> >>
>>> >>>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
>>> >>>> >> wrote:
>>> >>>> >>
>>> >>>> >>> My nodes all have themselves in their list of seeds - always did
>>> -
>>> >>>> >>> and
>>> >>>> >>> everything works. (You may ask why I did this. I don't know, I
>>> must
>>> >>>> >>> have
>>> >>>> >>> copied it from an example somewhere.)
>>> >>>> >>>
>>> >>>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
>>> wrote:
>>

Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz
https://issues.apache.org/jira/browse/CASSANDRA-1676

you have to use at least 0.6.7


On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo wrote:

> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
> > In storage-conf I see this comment [1] from which I understand that the
> > recommended way to bootstrap a new node is to set AutoBootstrap=true and
> > remove itself from the seeds list.
> > Moreover, I did try to set AutoBootstrap=true and have the node in its
> own
> > seeds list, but it would not bootstrap. I don't recall the exact message
> but
> > it was something like "I found myself in the seeds list therefore I'm not
> > going to bootstrap even though AutoBootstrap is true".
> >
> > [1]
> >   
> >   false
> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
> wrote:
> >>
> >> If "seed list should be the same across the cluster" that means that
> nodes
> >> *should* have themselves as a seed. If that doesn't work for Ran, then
> that
> >> is the first problem, no?
> >>
> >>
> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:
> >>>
> >>> Well your ring issues don't make sense to me, seed list should be the
> >>> same across the cluster.
> >>> I'm just thinking of other things to try, non-boostrapped nodes should
> >>> join the ring instantly but reads will fail if you aren't using quorum.
> >>>
> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
> 
>  I haven't tried repair.  Should I?
> 
>  On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>  > Have you tried not bootstrapping but setting the token and manually
>  > calling
>  > repair?
>  >
>  > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
> wrote:
>  >
>  >> My conclusion is lame: I tried this on several hosts and saw the
> same
>  >> behavior, the only way I was able to join new nodes was to first
>  >> start them
>  >> when they are *not in* their own seeds list and after they
>  >> finish transferring the data, then restart them with themselves
> *in*
>  >> their
>  >> own seeds list. After doing that the node would join the ring.
>  >> This is either my misunderstanding or a bug, but the only place I
>  >> found it
>  >> documented stated that the new node should not be in its own seeds
>  >> list.
>  >> Version 0.6.6.
>  >>
>  >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
>  >> wrote:
>  >>
>  >>> My nodes all have themselves in their list of seeds - always did -
>  >>> and
>  >>> everything works. (You may ask why I did this. I don't know, I
> must
>  >>> have
>  >>> copied it from an example somewhere.)
>  >>>
>  >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
> wrote:
>  >>>
>   I was able to make the node join the ring but I'm confused.
>   What I did is, first when adding the node, this node was not in
> the
>   seeds
>   list of itself. AFAIK this is how it's supposed to be. So it was
>   able to
>   transfer all data to itself from other nodes but then it stayed
> in
>   the
>   bootstrapping state.
>   So what I did (and I don't know why it works), is add this node
> to
>   the
>   seeds list in its own storage-conf.xml file. Then restart the
>   server and
>   then I finally see it in the ring...
>   If I had added the node to the seeds list of itself when first
>   joining
>   it, it would not join the ring but if I do it in two phases it
> did
>   work.
>   So it's either my misunderstanding or a bug...
>  
>  
>   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
>   wrote:
>  
>  > The new node does not see itself as part of the ring, it sees
> all
>  > others
>  > but itself, so from that perspective the view is consistent.
>  > The only problem is that the node never finishes to bootstrap.
> It
>  > stays
>  > in this state for hours (It's been 20 hours now...)
>  >
>  >
>  > $ bin/nodetool -p 9004 -h localhost streams
>  >> Mode: Bootstrapping
>  >> Not sending any streams.
>  >> Not receiving any streams.
>  >
>  >
>  > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>  > wrote:
>  >
>  >> Does the new node have itself in the list of seeds per chance?
>  >> This
>  >> could cause some issues if so.
>  >>
>  >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>  >> wrote:
>  >> > I'm still at lost. I haven't been able to resolve this. I
> tried
>  >> > adding another node at a different location on the ring but
>  >> > this node
>  >> > too remains stuck in the bootstrapping state for many hours
>  >> > without
>  >> > any of the other nodes being busy with anti compaction or
>  >

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Thibaut Britz
Hi,

The UnavailableExceptions  will be thrown because quorum of size 2
needs at least 2 nodes to be alive (as for qurom of size 3 as well).

The data won't be automatically redistributed to other nodes.

Thibaut


On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig  wrote:
> Hi!
>
> I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
> for writes. But when I shut down one of them UnavailableExceptions are 
> thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB 
> that it continues with the remaining 2 nodes and redistributes the data to 
> the broken one as soons as its up again?
>
> What may I be doing wrong?
>
> thx
> tcn


Re: Question about consitency level & data propagation & eventually consistent

2010-11-11 Thread Thibaut Britz
Hi,

thanks for all the informative answers.

Since writing is much faster then reading, I assume that's it's faster to
write the data to 3 replicas and read from 1 instead of writing to 2 and
reading from at least 2. (especially if I execute the read operation
multiple times on the same key). I could then easily double my read
performance.

I would then like to do the following: Always write to all nodes which are
marked as Up. Then read from one repair. If one node would go down (hardware
failure/cassandra down) I would run the repair tool and fix the node, which
shouldn't happen very often. I can also deal with very small inconsitencies.

- What consitency level would I have to chose? All will fail if one node is
down, Quorum will only write to the quorom. I would need something that will
write to all nodes which are marked as UP.
- If I choose Quorum, what will happen to the remaining writes if the node
is marked as UP. Will they always be executed or can they be dropped (eg.
node doing compactation while the write happens?)
- To bring a node back to the system, I would run the repair command on the
node. Is there a way to do an offline repair (so I make sure that my
application won't read from this node). I guess chaning the port temporarely
will not work, since cassandra will communicate the node through the other
nodes to my client?

Thanks,
Thibaut






On Wed, Nov 10, 2010 at 5:52 PM, Jonathan Ellis  wrote:

> On Wed, Nov 10, 2010 at 8:54 AM, Thibaut Britz
>  wrote:
> > Assuming I'm reading and writing with consitency level 1 (one), read
> repair
> > turned off, I have a few questions about data propagation.
> > Data is being stored at consistency level 3.
> > 1) If all nodes are up:
> >  - Will all writes eventually reach all nodes (of the 3 nodes)?
>
> Yes.
>
> >  - What will be the maximal time until the last write reaches the last
> node
>
> Situation-dependent.  The important thing is that if you are writing
> at CL.ALL, it will be before the write is acked to the client.
>
> > 2) If one or two nodes are down
> > - As I understood it, one node will buffer the writes for the remaining
> > nodes.
>
> Yes: _after_ the failure detector recognizes them as down. This will
> take several seconds.
>
> > - If the nodes go up again: When will these writes be propagated
>
> When FD recognizes them as back up.
>
> > The best way would then be to run nodetool repair after the two nodes
> will
> > be available again. Is there a way to make the node not accept any
> > connections during that time until it is finished repairing? (eg throw
> the
> > Unavailableexception)
>
> No.  The way to prevent stale reads is to use an appropriate
> consistencylevel, not error-prone heuristics.  (For instance: what if
> the replica with the most recent data were itself down when the first
> node recovered and initiated repair?)
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Question about consitency level & data propagation & eventually consistent

2010-11-10 Thread Thibaut Britz
Hi,

Assuming I'm reading and writing with consitency level 1 (one), read repair
turned off, I have a few questions about data propagation.
Data is being stored at consistency level 3.

I'm not interested in the deletes. I can live with older data (or data that
has been deleted and will reappear), but I need to know how long it will
take until the data will be available at the other nodes, since I have
turned read repair off.

1) If all nodes are up:
 - Will all writes eventually reach all nodes (of the 3 nodes)?
 - What will be the maximal time until the last write reaches the last node
(of the 3 nodes)? (e.g. Assume one of the node is doing compactation at that
time)

2) If one or two nodes are down
- As I understood it, one node will buffer the writes for the remaining
nodes.
- If the nodes go up again: When will these writes be propagated, at
compactation?, what will be the maximal time until the writes reach the 2
nodes? Will these writes be propagated at all?

In case of 2:

The best way would then be to run nodetool repair after the two nodes will
be available again. Is there a way to make the node not accept any
connections during that time until it is finished repairing? (eg throw the
Unavailableexception)


Thanks,
Thibaut


Re: New nodes won't bootstrap on .66

2010-11-08 Thread Thibaut Britz
I had also multiple keyspaces defined (> 20). All nodes were 64 bit, no
mixtures.


On Mon, Nov 8, 2010 at 8:23 PM, Dimitry Lvovsky wrote:

> We didn't solve it unfortunately and and ended up regenerating the entire
> cluster.  But, if it helps anyone in the future, we too had multiple
> keyspaces when we encountered the problem.
>
>
>
> On Mon, Nov 8, 2010 at 5:47 PM, Marc Canaleta  wrote:
>
>> I have just solved the problem removing the second keyspace (manually
>> moving its column families to the first). So it seems the problem appears
>> when having multiple keyspaces.
>>
>> 2010/11/8 Thibaut Britz 
>>
>> Hi,
>>>
>>> No I didn't solve the problem. I reinitialized the cluster and gave each
>>> node manually a token before adding data. There are a few messages in
>>> multiple threads related to this, so I suspect it's very common and I hope
>>> it's gone with 0.7.
>>>
>>> Thibaut
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta wrote:
>>>
>>>> Hi,
>>>>
>>>> Did you solve this problem? I'm having the same poblem. I'm trying to
>>>> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
>>>> and KeyspaceLogs, both with replication factor 2.
>>>>
>>>> It starts bootstrapping, receives some streams but it keeps waiting for
>>>> streams. I enabled the debug mode. This lines may be useful:
>>>>
>>>> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70)
>>>> Beginning bootstrap process
>>>> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160)
>>>> Added /10.204.93.16/Keyspace1 as a bootstrap source
>>>> ...
>>>> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160)
>>>> Added /10.204.93.16/KeyspaceLogs as a bootstrap source
>>>> ... (streaming mesages)
>>>> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
>>>> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
>>>> 10.204.93.16]
>>>> ...
>>>> (and never ends).
>>>>
>>>> It seems it is waiting for  [/10.204.93.16] when it should be waiting
>>>> for /10.204.93.16/KeyspaceLogs.
>>>>
>>>> The third node is 64 bits, while the two existing nodes are 32 bits. Can
>>>> this be a problem?
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> 2010/10/28 Dimitry Lvovsky 
>>>>
>>>> Maybe your7000 is being blocked by
>>>>> iptables or some firewall or maybe you have it bound ( tag 
>>>>> )
>>>>>  to localhost instead an ip address.
>>>>>
>>>>> Hope this helps,
>>>>> Dimitry.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
>>>>> thibaut.br...@trendiction.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem with 0.6.5
>>>>>>
>>>>>> New nodes will hang forever in bootstrap mode (no streams are being
>>>>>> opened) and the receiver thread just waits for data forever:
>>>>>>
>>>>>>
>>>>>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line
>>>>>> 120) Sampling index for /hd2/cassandra/data/table_xyz/
>>>>>> table_xyz-3-Data.db
>>>>>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>>>>>> (line 64) Streaming added 
>>>>>> /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>>>>>
>>>>>> Stacktracke:
>>>>>>
>>>>>> "pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
>>>>>> [0x7fd7cf217000]
>>>>>>java.lang.Thread.State: RUNNABLE
>>>>>> at java.net.SocketInputStream.socketRead0(Native Method)
>>>>>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>>>> at
>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>>>> at
>>>>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>>>>> at
>

Re: New nodes won't bootstrap on .66

2010-11-08 Thread Thibaut Britz
Hi,

No I didn't solve the problem. I reinitialized the cluster and gave each
node manually a token before adding data. There are a few messages in
multiple threads related to this, so I suspect it's very common and I hope
it's gone with 0.7.

Thibaut




On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta  wrote:

> Hi,
>
> Did you solve this problem? I'm having the same poblem. I'm trying to
> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
> and KeyspaceLogs, both with replication factor 2.
>
> It starts bootstrapping, receives some streams but it keeps waiting for
> streams. I enabled the debug mode. This lines may be useful:
>
> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning
> bootstrap process
> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added /
> 10.204.93.16/Keyspace1 as a bootstrap source
> ...
> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added /
> 10.204.93.16/KeyspaceLogs as a bootstrap source
> ... (streaming mesages)
> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
> 10.204.93.16]
> ...
> (and never ends).
>
> It seems it is waiting for  [/10.204.93.16] when it should be waiting for
> /10.204.93.16/KeyspaceLogs.
>
> The third node is 64 bits, while the two existing nodes are 32 bits. Can
> this be a problem?
>
> Thank you.
>
>
> 2010/10/28 Dimitry Lvovsky 
>
> Maybe your7000 is being blocked by iptables
>> or some firewall or maybe you have it bound ( tag )  to
>> localhost instead an ip address.
>>
>> Hope this helps,
>> Dimitry.
>>
>>
>>
>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> Hi,
>>>
>>> I have the same problem with 0.6.5
>>>
>>> New nodes will hang forever in bootstrap mode (no streams are being
>>> opened) and the receiver thread just waits for data forever:
>>>
>>>
>>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>>> Sampling index for /hd2/cassandra/data/table_xyz/
>>> table_xyz-3-Data.db
>>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>>
>>> Stacktracke:
>>>
>>> "pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
>>> [0x7fd7cf217000]
>>>java.lang.Thread.State: RUNNABLE
>>> at java.net.SocketInputStream.socketRead0(Native Method)
>>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>> at
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>> - locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
>>> at
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>>> at
>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>>> at
>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>>> at
>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton 
>>> wrote:
>>>
>>>> The best approach is to manually select the tokens, see the Load
>>>> Balancing section http://wiki.apache.org/cassandra/Operations Also
>>>>
>>>> Are there any log messages in the existing nodes or the new one 

Re: New nodes won't bootstrap on .66

2010-10-28 Thread Thibaut Britz
Hi,

I have the same problem with 0.6.5

New nodes will hang forever in bootstrap mode (no streams are being opened)
and the receiver thread just waits for data forever:


 INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
Sampling index for /hd2/cassandra/data/table_xyz/
table_xyz-3-Data.db
 INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java (line
64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db

Stacktracke:

"pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
[0x7fd7cf217000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)












On Thu, Oct 28, 2010 at 12:44 PM, aaron morton wrote:

> The best approach is to manually select the tokens, see the Load Balancing
> section http://wiki.apache.org/cassandra/Operations Also
>
> Are there any log messages in the existing nodes or the new one which
> mention each other?
>
> Is this a production system? Is it still running ?
>
> Sorry there is not a lot to go on, it sounds like you've done the right
> thing. I'm assuming things like the Cluster Name, seed list and port numbers
> are set correct as the new node got some data.
>
> You'll need to dig through the logs a bit more to see that the boot
> strapping started and what was the last message it logged.
>
> Good Luck.
> Aaron
>
> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote:
>
> Hi Aaron,
> Thanks for your reply.
>
> We still haven't solved this unfortunately.
>
> How did you start the bootstrap for the .18 node ?
>
>
> Standard way: we set "AutoBootstrap" to true and added all the servers from
> the working ring as seeds.
>
>
>> Was it the .18 or the .17 node you tried to add
>
>
> We first tried adding .17, it streamed for a while, took on a 50GB of load,
> stopped streaming but then didn't enter into the ring.  We left it for a few
> days to see if it would come in, but no luck.  After that we did
>  decommission and  removeToken ( in that order) operations.
> Since we couldn't get .17 in we tried again with .18.  Before doing so we
> increased the RpcTimeoutInMillis from 1000, to 1 having read that this
> may cause the problem of nodes not entering into the ring.   It's been going
> since friday and still, like .17, won't come into the ring.
>
> Does it have a token in the config or did you use nodetool move to set it
>
> No we didn't manually set the token in the config, rather we were relaying
> on the token to be assigned durring bootstrap from the RandomPartitioner.
>
> Again thanks for the help.
>
> Dimitry.
>
>
>
> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton wrote:
>
>> Dimitry, Did you get anywhere with this ?
>>
>> Was it the .18 or the .17 node you tried to add ? How did you start the
>> bootstrap for the .18 node ? Does it have a token in the config or did you
>> use nodetool move to set it?
>>
>> I had a quick look at the code AKAIK  the message about removing the fat
>> client is logged when the node does not have a record of the token the other
>> node as.
>>
>> Aaron
>>
>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky 
>> wrote:
>>
>> Hi All,
>> We recently upgraded from .65 to .66 after which we tried adding a new
>> node to our cluster. We left it bootstrapping and after 3 days, it still
>> refused to join the ring. The strange thing is that nodetool info shows 50GB
>> of load and nodetool ring shows that it sees the rest of ring, which it is
>> not part of. We tried the process again with another server -- again the
>> same thing as before:
>>
>>
>> //from machine 192.168.218
>>
>>
>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info
>> 131373516047318302934572185119435768941
>> Load : 52.85 GB
>> Generation No : 12877

Re: Cluster load balancing?

2010-10-28 Thread Thibaut Britz
Yes. I even tried just starting one node only, and then bootstrapping
another node. (However, at the beginning a few days ago, the cluster was
unstable and unresponsive and I had to restart the cluster. Maybe something
went wrong back then.)

Anyway, I will export all the data, and reimport it with the new
randomparitioner, which I should have used from the beginning.

Thanks,
Thibaut






On Thu, Oct 28, 2010 at 12:35 AM, Tyler Hobbs  wrote:

> Not sure if this is the cause, but do all of your nodes have the same seed
> list?  Did you bring up the seeds first?
>
> - Tyler
>
>
> On Wed, Oct 27, 2010 at 1:46 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> Depending on the range I choose, choosing manually a token will also fail.
>> (node will never exit boostrap, streams doesn't list any open streams)
>>
>>
>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>> Sampling index for /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>
>> Stacktracke:
>>
>> "pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
>> [0x7fd7cf217000]
>>java.lang.Thread.State: RUNNABLE
>> at java.net.SocketInputStream.socketRead0(Native Method)
>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>> - locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
>> at
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>> at
>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>> at
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>> at
>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 27, 2010 at 8:27 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> Hello Tyler,
>>>
>>> thanksf or the quick answer. That's true, I should have noticed.
>>>
>>> I also tried kicking out one node, clearing all directories and then
>>> restarting it with the bootstrap option. It received a few files, but just
>>> set there in bootstrapping mode (streams always printed bootstrapping
>>> without any files open), forever (> 15 minutes). I stopped the applicaiton
>>> so it couldn't be load related, and also tried with a fresh cluster restart.
>>> What could cause this?
>>>
>>> (This should ahve the advantage of cassandra choosing a key in my range
>>> which splits the range evenly?)
>>>
>>>
>>> Thanks,
>>> Thibaut
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 27, 2010 at 7:40 PM, Tyler Hobbs  wrote:
>>>
>>>> With OrderPreservingPartitioner, you have to keep the ring balanced
>>>> manually.
>>>> This is why people frequently suggest that you use RandomPartitioner
>>>> unless
>>>> you absolutely have to do otherwise.  With OPP, keys are *not* evenly
>>>> distributed
>>>> around the ring.
>>>>
>>>> Apparently you have lots of keys that are between ~'t' and 'x', so start
>>>> bunching
>>>> your tokens there.
>>>>
>>>> - Tyler
>>>>
>>>>
>>>> On Wed, Oct 27, 2010 at 12:00 PM, Thibaut Britz <
>>>> thibaut.br...@trendiction.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I h

Re: Cluster load balancing?

2010-10-27 Thread Thibaut Britz
Depending on the range I choose, choosing manually a token will also fail.
(node will never exit boostrap, streams doesn't list any open streams)


 INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
Sampling index for /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
 INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java (line
64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db

Stacktracke:

"pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
[0x7fd7cf217000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)





On Wed, Oct 27, 2010 at 8:27 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> Hello Tyler,
>
> thanksf or the quick answer. That's true, I should have noticed.
>
> I also tried kicking out one node, clearing all directories and then
> restarting it with the bootstrap option. It received a few files, but just
> set there in bootstrapping mode (streams always printed bootstrapping
> without any files open), forever (> 15 minutes). I stopped the applicaiton
> so it couldn't be load related, and also tried with a fresh cluster restart.
> What could cause this?
>
> (This should ahve the advantage of cassandra choosing a key in my range
> which splits the range evenly?)
>
>
> Thanks,
> Thibaut
>
>
>
>
>
>
>
>
> On Wed, Oct 27, 2010 at 7:40 PM, Tyler Hobbs  wrote:
>
>> With OrderPreservingPartitioner, you have to keep the ring balanced
>> manually.
>> This is why people frequently suggest that you use RandomPartitioner
>> unless
>> you absolutely have to do otherwise.  With OPP, keys are *not* evenly
>> distributed
>> around the ring.
>>
>> Apparently you have lots of keys that are between ~'t' and 'x', so start
>> bunching
>> your tokens there.
>>
>> - Tyler
>>
>>
>> On Wed, Oct 27, 2010 at 12:00 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> Hi,
>>>
>>> I have a little java hector test application running whcih writes and
>>> reads data to my little cassandra cluster (7 nodes).
>>>
>>> The data doesn't get loadbalanced at all:
>>>
>>> 192.168.1.12 Up 178.32 MB
>>> 8S6VvT7oKNcQTso3   |<--|
>>> 192.168.1.14 Up 30.12 MB
>>> 9tybk3nB6JCtqQU1   |   ^
>>> 192.168.1.15 Up 11.96 MB
>>> RZVG3NC3ksqjEmYE   v   |
>>> 192.168.1.16 Up 668.7 KB
>>> aTV6W12YxxMI31Z8   |   ^
>>> 192.168.1.10 Up 22.86 GB
>>> u5iaQxEfyUSwnPn1   v   |
>>> 192.168.1.13 Up 22.5 GB
>>> vZlWeU8b6LBeAcAY   |   ^
>>> 192.168.1.11 Up 22.27 GB
>>> xrmaUS6nnrYFSk8e   |-->|
>>>
>>> What could be the issue? I couldn't find anything in the FAQ related to
>>> this
>>>
>>> Will data (writes) always be added to the server I connect to? If so, why
>>> will the replicas then always be stored on the same 2 other machines.
>>>
>>> (Tested with
>>> org.apache.cassandra.dht.OrderPreservingPartitioner
>>> on 0.6.5 and replication level 3)
>>>
>>> Thanks,
>>> Thibaut
>>>
>>>
>>>
>>>
>>
>


Re: Cluster load balancing?

2010-10-27 Thread Thibaut Britz
Hello Tyler,

thanksf or the quick answer. That's true, I should have noticed.

I also tried kicking out one node, clearing all directories and then
restarting it with the bootstrap option. It received a few files, but just
set there in bootstrapping mode (streams always printed bootstrapping
without any files open), forever (> 15 minutes). I stopped the applicaiton
so it couldn't be load related, and also tried with a fresh cluster restart.
What could cause this?

(This should ahve the advantage of cassandra choosing a key in my range
which splits the range evenly?)


Thanks,
Thibaut







On Wed, Oct 27, 2010 at 7:40 PM, Tyler Hobbs  wrote:

> With OrderPreservingPartitioner, you have to keep the ring balanced
> manually.
> This is why people frequently suggest that you use RandomPartitioner unless
> you absolutely have to do otherwise.  With OPP, keys are *not* evenly
> distributed
> around the ring.
>
> Apparently you have lots of keys that are between ~'t' and 'x', so start
> bunching
> your tokens there.
>
> - Tyler
>
>
> On Wed, Oct 27, 2010 at 12:00 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> Hi,
>>
>> I have a little java hector test application running whcih writes and
>> reads data to my little cassandra cluster (7 nodes).
>>
>> The data doesn't get loadbalanced at all:
>>
>> 192.168.1.12 Up 178.32 MB
>> 8S6VvT7oKNcQTso3   |<--|
>> 192.168.1.14 Up 30.12 MB
>> 9tybk3nB6JCtqQU1   |   ^
>> 192.168.1.15 Up 11.96 MB
>> RZVG3NC3ksqjEmYE   v   |
>> 192.168.1.16 Up 668.7 KB
>> aTV6W12YxxMI31Z8   |   ^
>> 192.168.1.10 Up 22.86 GB
>> u5iaQxEfyUSwnPn1   v   |
>> 192.168.1.13 Up 22.5 GB
>> vZlWeU8b6LBeAcAY   |   ^
>> 192.168.1.11 Up 22.27 GB
>> xrmaUS6nnrYFSk8e   |-->|
>>
>> What could be the issue? I couldn't find anything in the FAQ related to
>> this
>>
>> Will data (writes) always be added to the server I connect to? If so, why
>> will the replicas then always be stored on the same 2 other machines.
>>
>> (Tested with
>> org.apache.cassandra.dht.OrderPreservingPartitioner
>> on 0.6.5 and replication level 3)
>>
>> Thanks,
>> Thibaut
>>
>>
>>
>>
>


Re: Job Opportunity in Europe (Nosql, hadoop, crawling)

2010-08-25 Thread Thibaut Britz
Hello Raddi,

If you are interested, please send us your resume to the email address
mentioned at the blog post.

If we can provide you the environment to work on this only depends only on
your qualifications and skills.

Thanks,
Thibaut


On Fri, Aug 20, 2010 at 4:55 AM, sharanabasava raddi wrote:

> Hi,
> I have worked on Cassandra, Thrift and Hector, and I did it for 3 months, I
> have written code for loading and fetching data from Cassandra in single
> node, I want to work on this domain. Will u provide me the environment to
> work on this?
>
>
> On Wed, Aug 18, 2010 at 8:29 PM, Thibaut Britz  wrote:
>
>> Hi,
>>
>> We are searching at least 3 more developers in the fields of search &
>> automatic content/site extraction, crawling, duplicate content, news/spam
>> detection.
>>
>> We do content fetching and aggregation (news,message boards, blogs, ...)
>> for market research institutes, media analytics companies, etc...
>>
>> We are still relative small (mostly Harvard, ETH Zurich, and TU Munich
>> graduates), so you are still able to actively shape our company.
>>
>> You will work on interesting problems/features/products involving large
>> datasets and write production code for large scale application tools like
>> apache hadoop, cassandra, hbase, zookeeper running on over 50 servers, and
>> processing terabytes of data each day.
>>
>>
>> If you are from abroad, and want to experience a different culture for a
>> few months/years (some even stayed here their entire life ;)) in a small
>> french/german/english speaking country, why not join?
>>
>> We can also offer internships for a few months (probably 6).
>>
>> There are more details at our blog, or just contact me for further details
>> under my email thibaut.br...@trendiction.com:
>>
>> http://blog.trendiction.com/tag/jobs
>>
>> Thanks,
>> Thibaut
>>
>
>


Job Opportunity in Europe (Nosql, hadoop, crawling)

2010-08-18 Thread Thibaut Britz
Hi,

We are searching at least 3 more developers in the fields of search &
automatic content/site extraction, crawling, duplicate content, news/spam
detection.

We do content fetching and aggregation (news,message boards, blogs, ...) for
market research institutes, media analytics companies, etc...

We are still relative small (mostly Harvard, ETH Zurich, and TU Munich
graduates), so you are still able to actively shape our company.

You will work on interesting problems/features/products involving large
datasets and write production code for large scale application tools like
apache hadoop, cassandra, hbase, zookeeper running on over 50 servers, and
processing terabytes of data each day.


If you are from abroad, and want to experience a different culture for a few
months/years (some even stayed here their entire life ;)) in a small
french/german/english speaking country, why not join?

We can also offer internships for a few months (probably 6).

There are more details at our blog, or just contact me for further details
under my email thibaut.br...@trendiction.com:

http://blog.trendiction.com/tag/jobs

Thanks,
Thibaut