subject:"\"Why isn't there a separate JVM per table\\\?\""

Re: Why isn't there a separate JVM per table?

2018-02-23 Thread Rahul Singh

I agree with Jon. The actor based model would be the logical approach to get to 
be more “efficient.” Until then fault tolerance has to be built into the driver 
to contact another node if in the middle and then reconcile the commitlog later.

I’ve seen many people combine an external queue to deal with the GC issues by 
adding yet another layer of asynchronicity. (If it’s not a word it is now)

Even in systems like SQL servers there are internal queues that get locked up 
due to memory, storage, or cpu pressures. It’s not a GC pause but it may as 
well be. Even with all the tweaking the only way to get beyond is distributed 
asynchronous systems that are self healing.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 23, 2018, 4:34 AM -0500, Brian Hess , wrote:
> Something folks haven't raised, but would be another impediment here is that 
> in Cassandra if you submit a batch (logged or unlogged) for two tables in the 
> same keyspace with the same partition then Cassandra collapses them into the 
> same Mutation and the two INSERTs are processed atomically. There are a few 
> (maybe more than a few) things that take advantage of this fact.
>
> If you move each table to its own JVM then you cannot really achieve this 
> atomicity. So, at most you would want to consider a JVM per keyspace (or 
> consider touching a lot of code or changing a pretty fundamental/deep 
> contract in Cassandra).
>
> >Brian
>
> Sent from my iPhone
>
> > On Feb 22, 2018, at 7:10 PM, J. D. Jordan  wrote:
> >
> > I would be careful with anything per table for memory sizing. We used to 
> > have many caches and things that could be tuned per table, but they have 
> > all since changed to being per node, as it was a real PITA to get them 
> > right. Having to do per table heap/gc/memtable/cache tuning just sounds 
> > like a usability nightmare.
> >
> > -Jeremiah
> >
> > On Feb 22, 2018, at 6:59 PM, kurt greaves  wrote:
> >
> > > >
> > > > ... compaction on its own jvm was also something I was thinking about, 
> > > > but
> > > > then I realized even more JVM sharding could be done at the table level.
> > >
> > >
> > > Compaction in it's own JVM makes sense. At the table level I'm not so sure
> > > about. Gotta be some serious overheads from running that many JVM's.
> > > Keyspace might be reasonable purely to isolate bad tables, but for the 
> > > most
> > > part I'd think isolating every table isn't that beneficial and pretty
> > > complicated. In most cases people just fix their modelling so that they
> > > don't generate large amounts of GC, and hopefully test enough so they know
> > > how it will behave in production.
> > >
> > > If we did at the table level we would inevitable have to make each
> > > individual table incredibly tune-able which would be a bit tedious IMO.
> > > There's no way for us to smartly decide how much heap/memtable space/etc
> > > each table should use (not without some decent AI, anyway).
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: Why isn't there a separate JVM per table?

2018-02-23 Thread Brian Hess

Something folks haven't raised, but would be another impediment here is that in 
Cassandra if you submit a batch (logged or unlogged) for two tables in the same 
keyspace with the same partition then Cassandra collapses them into the same 
Mutation and the two INSERTs are processed atomically. There are a few (maybe 
more than a few) things that take advantage of this fact. 

If you move each table to its own JVM then you cannot really achieve this 
atomicity. So, at most you would want to consider a JVM per keyspace (or 
consider touching a lot of code or changing a pretty fundamental/deep contract 
in Cassandra). 

>Brian

Sent from my iPhone

> On Feb 22, 2018, at 7:10 PM, J. D. Jordan  wrote:
> 
> I would be careful with anything per table for memory sizing. We used to have 
> many caches and things that could be tuned per table, but they have all since 
> changed to being per node, as it was a real PITA to get them right.  Having 
> to do per table heap/gc/memtable/cache tuning just sounds like a usability 
> nightmare.
> 
> -Jeremiah 
> 
> On Feb 22, 2018, at 6:59 PM, kurt greaves  wrote:
> 
>>> 
>>> ... compaction on its own jvm was also something I was thinking about, but
>>> then I realized even more JVM sharding could be done at the table level.
>> 
>> 
>> Compaction in it's own JVM makes sense. At the table level I'm not so sure
>> about. Gotta be some serious overheads from running that many JVM's.
>> Keyspace might be reasonable purely to isolate bad tables, but for the most
>> part I'd think isolating every table isn't that beneficial and pretty
>> complicated. In most cases people just fix their modelling so that they
>> don't generate large amounts of GC, and hopefully test enough so they know
>> how it will behave in production.
>> 
>> If we did at the table level we would inevitable have to make each
>> individual table incredibly tune-able which would be a bit tedious IMO.
>> There's no way for us to smartly decide how much heap/memtable space/etc
>> each table should use (not without some decent AI, anyway).
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Jonathan Haddad

There's an incredible amount of work that would need to be done in order to
make any of this happen.  Basically a full rewrite of the entire codebase.
Years of effort.

The codebase would have to move to a shared-nothing actor & message based
communication mechanism before any of this is possible.  Fun in theory, but
considering removing singletons has been a multi-year, many failure effort,
I suspect we might need 10 years to refactor Cassandra to use multiple
JVMs.  By then maybe we'll have a pauseless / low pause collector and it
won't matter.

On Thu, Feb 22, 2018 at 3:59 PM kurt greaves  wrote:

> >
> >  ... compaction on its own jvm was also something I was thinking about,
> but
> > then I realized even more JVM sharding could be done at the table level.
>
>
> Compaction in it's own JVM makes sense. At the table level I'm not so sure
> about. Gotta be some serious overheads from running that many JVM's.
> Keyspace might be reasonable purely to isolate bad tables, but for the most
> part I'd think isolating every table isn't that beneficial and pretty
> complicated. In most cases people just fix their modelling so that they
> don't generate large amounts of GC, and hopefully test enough so they know
> how it will behave in production.
>
> If we did at the table level we would inevitable have to make each
> individual table incredibly tune-able which would be a bit tedious IMO.
> There's no way for us to smartly decide how much heap/memtable space/etc
> each table should use (not without some decent AI, anyway).
> 
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread J. D. Jordan

I would be careful with anything per table for memory sizing. We used to have 
many caches and things that could be tuned per table, but they have all since 
changed to being per node, as it was a real PITA to get them right.  Having to 
do per table heap/gc/memtable/cache tuning just sounds like a usability 
nightmare.

-Jeremiah 

On Feb 22, 2018, at 6:59 PM, kurt greaves  wrote:

>> 
>> ... compaction on its own jvm was also something I was thinking about, but
>> then I realized even more JVM sharding could be done at the table level.
> 
> 
> Compaction in it's own JVM makes sense. At the table level I'm not so sure
> about. Gotta be some serious overheads from running that many JVM's.
> Keyspace might be reasonable purely to isolate bad tables, but for the most
> part I'd think isolating every table isn't that beneficial and pretty
> complicated. In most cases people just fix their modelling so that they
> don't generate large amounts of GC, and hopefully test enough so they know
> how it will behave in production.
> 
> If we did at the table level we would inevitable have to make each
> individual table incredibly tune-able which would be a bit tedious IMO.
> There's no way for us to smartly decide how much heap/memtable space/etc
> each table should use (not without some decent AI, anyway).
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread kurt greaves

>
>  ... compaction on its own jvm was also something I was thinking about, but
> then I realized even more JVM sharding could be done at the table level.


Compaction in it's own JVM makes sense. At the table level I'm not so sure
about. Gotta be some serious overheads from running that many JVM's.
Keyspace might be reasonable purely to isolate bad tables, but for the most
part I'd think isolating every table isn't that beneficial and pretty
complicated. In most cases people just fix their modelling so that they
don't generate large amounts of GC, and hopefully test enough so they know
how it will behave in production.

If we did at the table level we would inevitable have to make each
individual table incredibly tune-able which would be a bit tedious IMO.
There's no way for us to smartly decide how much heap/memtable space/etc
each table should use (not without some decent AI, anyway).

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Nate McCall

Agree that any first efforts per compaction should be on profiling.
Probably some low-hanging fruit there.

On Fri, Feb 23, 2018 at 11:55 AM, Jeff Jirsa  wrote:
> Bloom filters are offheap.
>
> To be honest, there may come a time when it makes sense to move compaction
> into its own JVM, but it would be FAR less effort to just profile what
> exists now and fix the problems.
>
>
>
> On Thu, Feb 22, 2018 at 2:52 PM, Carl Mueller 
> wrote:
>
>> BLoom filters... nevermind
>>
>>
>> On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller <
>> carl.muel...@smartthings.com>
>> wrote:
>>
>> > Is the current reason for a large starting heap due to the memtable?
>> >
>> > On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller <
>> > carl.muel...@smartthings.com> wrote:
>> >
>> >>  ... compaction on its own jvm was also something I was thinking about,
>> >> but then I realized even more JVM sharding could be done at the table
>> level.
>> >>
>> >> On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:
>> >>
>> >>> Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world
>> >>> where we’re isolating crazy GC churning parts of the DB.  It would mean
>> >>> reworking how tasks are created and removal of all shared state in
>> favor of
>> >>> messaging + a smarter manager, which imo would be a good idea
>> regardless.
>> >>>
>> >>> It might be a better use of time (especially for 4.0) to do some GC
>> >>> performance profiling and cut down on the allocations, since that
>> doesn’t
>> >>> involve a massive effort.
>> >>>
>> >>> I’ve been meaning to do a little benchmarking and profiling for a while
>> >>> now, and it seems like a few others have the same inclination as well,
>> >>> maybe now is a good time to coordinate that.  A nice perf bump for 4.0
>> >>> would be very rewarding.
>> >>>
>> >>> Jon
>> >>>
>> >>> > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
>> >>> >
>> >>> > I've heard a couple of folks pontificate on compaction in its own
>> >>> > process as well, given it has such a high impact on GC. Not sure
>> about
>> >>> > the value of individual tables. Interesting idea though.
>> >>> >
>> >>> > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek > >
>> >>> wrote:
>> >>> >> I've given it some thought in the past. In the end, I usually talk
>> >>> myself
>> >>> >> out of it because I think it increases the surface area for failure.
>> >>> That
>> >>> >> is, managing N processes is more difficult that managing one
>> process.
>> >>> But
>> >>> >> if the additional failure modes are addressed, there are some
>> >>> interesting
>> >>> >> possibilities.
>> >>> >>
>> >>> >> For example, having gossip in its own process would decrease the
>> odds
>> >>> that
>> >>> >> a node is marked dead because STW GC is happening in the storage
>> JVM.
>> >>> On
>> >>> >> the flipside, you'd need checks to make sure that the gossip process
>> >>> can
>> >>> >> recognize when the storage process has died vs just running a long
>> GC.
>> >>> >>
>> >>> >> I don't know that I'd go so far as to have separate processes for
>> >>> >> keyspaces, etc.
>> >>> >>
>> >>> >> There is probably some interesting work that could be done to
>> support
>> >>> the
>> >>> >> orgs who run multiple cassandra instances on the same node (multiple
>> >>> >> gossipers in that case is at least a little wasteful).
>> >>> >>
>> >>> >> I've also played around with using domain sockets for IPC inside of
>> >>> >> cassandra. I never ran a proper benchmark, but there were some
>> >>> throughput
>> >>> >> advantages to this approach.
>> >>> >>
>> >>> >> Cheers,
>> >>> >>
>> >>> >> Gary.
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
>> >>> carl.muel...@smartthings.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> GC pauses may have been improved in newer releases, since we are in
>> >>> 2.1.x,
>> >>> >>> but I was wondering why cassandra uses one jvm for all tables and
>> >>> >>> keyspaces, intermingling the heap for on-JVM objects.
>> >>> >>>
>> >>> >>> ... so why doesn't cassandra spin off a jvm per table so each jvm
>> >>> can be
>> >>> >>> tuned per table and gc tuned and gc impacts not impact other
>> tables?
>> >>> It
>> >>> >>> would probably increase the number of endpoints if we avoid having
>> an
>> >>> >>> overarching query router.
>> >>> >>>
>> >>> >
>> >>> > 
>> -
>> >>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> >>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >>> >
>> >>>
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >>>
>> >>>
>> >>
>> >
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller

Alternative: JVM per vnode.

On Thu, Feb 22, 2018 at 4:52 PM, Carl Mueller 
wrote:

> BLoom filters... nevermind
>
>
> On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> Is the current reason for a large starting heap due to the memtable?
>>
>> On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller <
>> carl.muel...@smartthings.com> wrote:
>>
>>>  ... compaction on its own jvm was also something I was thinking about,
>>> but then I realized even more JVM sharding could be done at the table level.
>>>
>>> On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:
>>>
 Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world
 where we’re isolating crazy GC churning parts of the DB.  It would mean
 reworking how tasks are created and removal of all shared state in favor of
 messaging + a smarter manager, which imo would be a good idea regardless.

 It might be a better use of time (especially for 4.0) to do some GC
 performance profiling and cut down on the allocations, since that doesn’t
 involve a massive effort.

 I’ve been meaning to do a little benchmarking and profiling for a while
 now, and it seems like a few others have the same inclination as well,
 maybe now is a good time to coordinate that.  A nice perf bump for 4.0
 would be very rewarding.

 Jon

 > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
 >
 > I've heard a couple of folks pontificate on compaction in its own
 > process as well, given it has such a high impact on GC. Not sure about
 > the value of individual tables. Interesting idea though.
 >
 > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek 
 wrote:
 >> I've given it some thought in the past. In the end, I usually talk
 myself
 >> out of it because I think it increases the surface area for failure.
 That
 >> is, managing N processes is more difficult that managing one
 process. But
 >> if the additional failure modes are addressed, there are some
 interesting
 >> possibilities.
 >>
 >> For example, having gossip in its own process would decrease the
 odds that
 >> a node is marked dead because STW GC is happening in the storage
 JVM. On
 >> the flipside, you'd need checks to make sure that the gossip process
 can
 >> recognize when the storage process has died vs just running a long
 GC.
 >>
 >> I don't know that I'd go so far as to have separate processes for
 >> keyspaces, etc.
 >>
 >> There is probably some interesting work that could be done to
 support the
 >> orgs who run multiple cassandra instances on the same node (multiple
 >> gossipers in that case is at least a little wasteful).
 >>
 >> I've also played around with using domain sockets for IPC inside of
 >> cassandra. I never ran a proper benchmark, but there were some
 throughput
 >> advantages to this approach.
 >>
 >> Cheers,
 >>
 >> Gary.
 >>
 >>
 >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
 carl.muel...@smartthings.com>
 >> wrote:
 >>
 >>> GC pauses may have been improved in newer releases, since we are in
 2.1.x,
 >>> but I was wondering why cassandra uses one jvm for all tables and
 >>> keyspaces, intermingling the heap for on-JVM objects.
 >>>
 >>> ... so why doesn't cassandra spin off a jvm per table so each jvm
 can be
 >>> tuned per table and gc tuned and gc impacts not impact other
 tables? It
 >>> would probably increase the number of endpoints if we avoid having
 an
 >>> overarching query router.
 >>>
 >
 > -
 > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 > For additional commands, e-mail: dev-h...@cassandra.apache.org
 >


 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org


>>>
>>
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Jeff Jirsa

Bloom filters are offheap.

To be honest, there may come a time when it makes sense to move compaction
into its own JVM, but it would be FAR less effort to just profile what
exists now and fix the problems.



On Thu, Feb 22, 2018 at 2:52 PM, Carl Mueller 
wrote:

> BLoom filters... nevermind
>
>
> On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > Is the current reason for a large starting heap due to the memtable?
> >
> > On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller <
> > carl.muel...@smartthings.com> wrote:
> >
> >>  ... compaction on its own jvm was also something I was thinking about,
> >> but then I realized even more JVM sharding could be done at the table
> level.
> >>
> >> On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:
> >>
> >>> Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world
> >>> where we’re isolating crazy GC churning parts of the DB.  It would mean
> >>> reworking how tasks are created and removal of all shared state in
> favor of
> >>> messaging + a smarter manager, which imo would be a good idea
> regardless.
> >>>
> >>> It might be a better use of time (especially for 4.0) to do some GC
> >>> performance profiling and cut down on the allocations, since that
> doesn’t
> >>> involve a massive effort.
> >>>
> >>> I’ve been meaning to do a little benchmarking and profiling for a while
> >>> now, and it seems like a few others have the same inclination as well,
> >>> maybe now is a good time to coordinate that.  A nice perf bump for 4.0
> >>> would be very rewarding.
> >>>
> >>> Jon
> >>>
> >>> > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
> >>> >
> >>> > I've heard a couple of folks pontificate on compaction in its own
> >>> > process as well, given it has such a high impact on GC. Not sure
> about
> >>> > the value of individual tables. Interesting idea though.
> >>> >
> >>> > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek  >
> >>> wrote:
> >>> >> I've given it some thought in the past. In the end, I usually talk
> >>> myself
> >>> >> out of it because I think it increases the surface area for failure.
> >>> That
> >>> >> is, managing N processes is more difficult that managing one
> process.
> >>> But
> >>> >> if the additional failure modes are addressed, there are some
> >>> interesting
> >>> >> possibilities.
> >>> >>
> >>> >> For example, having gossip in its own process would decrease the
> odds
> >>> that
> >>> >> a node is marked dead because STW GC is happening in the storage
> JVM.
> >>> On
> >>> >> the flipside, you'd need checks to make sure that the gossip process
> >>> can
> >>> >> recognize when the storage process has died vs just running a long
> GC.
> >>> >>
> >>> >> I don't know that I'd go so far as to have separate processes for
> >>> >> keyspaces, etc.
> >>> >>
> >>> >> There is probably some interesting work that could be done to
> support
> >>> the
> >>> >> orgs who run multiple cassandra instances on the same node (multiple
> >>> >> gossipers in that case is at least a little wasteful).
> >>> >>
> >>> >> I've also played around with using domain sockets for IPC inside of
> >>> >> cassandra. I never ran a proper benchmark, but there were some
> >>> throughput
> >>> >> advantages to this approach.
> >>> >>
> >>> >> Cheers,
> >>> >>
> >>> >> Gary.
> >>> >>
> >>> >>
> >>> >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
> >>> carl.muel...@smartthings.com>
> >>> >> wrote:
> >>> >>
> >>> >>> GC pauses may have been improved in newer releases, since we are in
> >>> 2.1.x,
> >>> >>> but I was wondering why cassandra uses one jvm for all tables and
> >>> >>> keyspaces, intermingling the heap for on-JVM objects.
> >>> >>>
> >>> >>> ... so why doesn't cassandra spin off a jvm per table so each jvm
> >>> can be
> >>> >>> tuned per table and gc tuned and gc impacts not impact other
> tables?
> >>> It
> >>> >>> would probably increase the number of endpoints if we avoid having
> an
> >>> >>> overarching query router.
> >>> >>>
> >>> >
> >>> > 
> -
> >>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>> >
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>>
> >>
> >
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller

BLoom filters... nevermind


On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller 
wrote:

> Is the current reason for a large starting heap due to the memtable?
>
> On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>>  ... compaction on its own jvm was also something I was thinking about,
>> but then I realized even more JVM sharding could be done at the table level.
>>
>> On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:
>>
>>> Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world
>>> where we’re isolating crazy GC churning parts of the DB.  It would mean
>>> reworking how tasks are created and removal of all shared state in favor of
>>> messaging + a smarter manager, which imo would be a good idea regardless.
>>>
>>> It might be a better use of time (especially for 4.0) to do some GC
>>> performance profiling and cut down on the allocations, since that doesn’t
>>> involve a massive effort.
>>>
>>> I’ve been meaning to do a little benchmarking and profiling for a while
>>> now, and it seems like a few others have the same inclination as well,
>>> maybe now is a good time to coordinate that.  A nice perf bump for 4.0
>>> would be very rewarding.
>>>
>>> Jon
>>>
>>> > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
>>> >
>>> > I've heard a couple of folks pontificate on compaction in its own
>>> > process as well, given it has such a high impact on GC. Not sure about
>>> > the value of individual tables. Interesting idea though.
>>> >
>>> > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek 
>>> wrote:
>>> >> I've given it some thought in the past. In the end, I usually talk
>>> myself
>>> >> out of it because I think it increases the surface area for failure.
>>> That
>>> >> is, managing N processes is more difficult that managing one process.
>>> But
>>> >> if the additional failure modes are addressed, there are some
>>> interesting
>>> >> possibilities.
>>> >>
>>> >> For example, having gossip in its own process would decrease the odds
>>> that
>>> >> a node is marked dead because STW GC is happening in the storage JVM.
>>> On
>>> >> the flipside, you'd need checks to make sure that the gossip process
>>> can
>>> >> recognize when the storage process has died vs just running a long GC.
>>> >>
>>> >> I don't know that I'd go so far as to have separate processes for
>>> >> keyspaces, etc.
>>> >>
>>> >> There is probably some interesting work that could be done to support
>>> the
>>> >> orgs who run multiple cassandra instances on the same node (multiple
>>> >> gossipers in that case is at least a little wasteful).
>>> >>
>>> >> I've also played around with using domain sockets for IPC inside of
>>> >> cassandra. I never ran a proper benchmark, but there were some
>>> throughput
>>> >> advantages to this approach.
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Gary.
>>> >>
>>> >>
>>> >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
>>> carl.muel...@smartthings.com>
>>> >> wrote:
>>> >>
>>> >>> GC pauses may have been improved in newer releases, since we are in
>>> 2.1.x,
>>> >>> but I was wondering why cassandra uses one jvm for all tables and
>>> >>> keyspaces, intermingling the heap for on-JVM objects.
>>> >>>
>>> >>> ... so why doesn't cassandra spin off a jvm per table so each jvm
>>> can be
>>> >>> tuned per table and gc tuned and gc impacts not impact other tables?
>>> It
>>> >>> would probably increase the number of endpoints if we avoid having an
>>> >>> overarching query router.
>>> >>>
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>
>>
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller

Is the current reason for a large starting heap due to the memtable?

On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller 
wrote:

>  ... compaction on its own jvm was also something I was thinking about,
> but then I realized even more JVM sharding could be done at the table level.
>
> On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:
>
>> Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world where
>> we’re isolating crazy GC churning parts of the DB.  It would mean reworking
>> how tasks are created and removal of all shared state in favor of messaging
>> + a smarter manager, which imo would be a good idea regardless.
>>
>> It might be a better use of time (especially for 4.0) to do some GC
>> performance profiling and cut down on the allocations, since that doesn’t
>> involve a massive effort.
>>
>> I’ve been meaning to do a little benchmarking and profiling for a while
>> now, and it seems like a few others have the same inclination as well,
>> maybe now is a good time to coordinate that.  A nice perf bump for 4.0
>> would be very rewarding.
>>
>> Jon
>>
>> > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
>> >
>> > I've heard a couple of folks pontificate on compaction in its own
>> > process as well, given it has such a high impact on GC. Not sure about
>> > the value of individual tables. Interesting idea though.
>> >
>> > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek 
>> wrote:
>> >> I've given it some thought in the past. In the end, I usually talk
>> myself
>> >> out of it because I think it increases the surface area for failure.
>> That
>> >> is, managing N processes is more difficult that managing one process.
>> But
>> >> if the additional failure modes are addressed, there are some
>> interesting
>> >> possibilities.
>> >>
>> >> For example, having gossip in its own process would decrease the odds
>> that
>> >> a node is marked dead because STW GC is happening in the storage JVM.
>> On
>> >> the flipside, you'd need checks to make sure that the gossip process
>> can
>> >> recognize when the storage process has died vs just running a long GC.
>> >>
>> >> I don't know that I'd go so far as to have separate processes for
>> >> keyspaces, etc.
>> >>
>> >> There is probably some interesting work that could be done to support
>> the
>> >> orgs who run multiple cassandra instances on the same node (multiple
>> >> gossipers in that case is at least a little wasteful).
>> >>
>> >> I've also played around with using domain sockets for IPC inside of
>> >> cassandra. I never ran a proper benchmark, but there were some
>> throughput
>> >> advantages to this approach.
>> >>
>> >> Cheers,
>> >>
>> >> Gary.
>> >>
>> >>
>> >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
>> carl.muel...@smartthings.com>
>> >> wrote:
>> >>
>> >>> GC pauses may have been improved in newer releases, since we are in
>> 2.1.x,
>> >>> but I was wondering why cassandra uses one jvm for all tables and
>> >>> keyspaces, intermingling the heap for on-JVM objects.
>> >>>
>> >>> ... so why doesn't cassandra spin off a jvm per table so each jvm can
>> be
>> >>> tuned per table and gc tuned and gc impacts not impact other tables?
>> It
>> >>> would probably increase the number of endpoints if we avoid having an
>> >>> overarching query router.
>> >>>
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller

 ... compaction on its own jvm was also something I was thinking about, but
then I realized even more JVM sharding could be done at the table level.

On Thu, Feb 22, 2018 at 4:09 PM, Jon Haddad  wrote:

> Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world where
> we’re isolating crazy GC churning parts of the DB.  It would mean reworking
> how tasks are created and removal of all shared state in favor of messaging
> + a smarter manager, which imo would be a good idea regardless.
>
> It might be a better use of time (especially for 4.0) to do some GC
> performance profiling and cut down on the allocations, since that doesn’t
> involve a massive effort.
>
> I’ve been meaning to do a little benchmarking and profiling for a while
> now, and it seems like a few others have the same inclination as well,
> maybe now is a good time to coordinate that.  A nice perf bump for 4.0
> would be very rewarding.
>
> Jon
>
> > On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
> >
> > I've heard a couple of folks pontificate on compaction in its own
> > process as well, given it has such a high impact on GC. Not sure about
> > the value of individual tables. Interesting idea though.
> >
> > On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek 
> wrote:
> >> I've given it some thought in the past. In the end, I usually talk
> myself
> >> out of it because I think it increases the surface area for failure.
> That
> >> is, managing N processes is more difficult that managing one process.
> But
> >> if the additional failure modes are addressed, there are some
> interesting
> >> possibilities.
> >>
> >> For example, having gossip in its own process would decrease the odds
> that
> >> a node is marked dead because STW GC is happening in the storage JVM. On
> >> the flipside, you'd need checks to make sure that the gossip process can
> >> recognize when the storage process has died vs just running a long GC.
> >>
> >> I don't know that I'd go so far as to have separate processes for
> >> keyspaces, etc.
> >>
> >> There is probably some interesting work that could be done to support
> the
> >> orgs who run multiple cassandra instances on the same node (multiple
> >> gossipers in that case is at least a little wasteful).
> >>
> >> I've also played around with using domain sockets for IPC inside of
> >> cassandra. I never ran a proper benchmark, but there were some
> throughput
> >> advantages to this approach.
> >>
> >> Cheers,
> >>
> >> Gary.
> >>
> >>
> >> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <
> carl.muel...@smartthings.com>
> >> wrote:
> >>
> >>> GC pauses may have been improved in newer releases, since we are in
> 2.1.x,
> >>> but I was wondering why cassandra uses one jvm for all tables and
> >>> keyspaces, intermingling the heap for on-JVM objects.
> >>>
> >>> ... so why doesn't cassandra spin off a jvm per table so each jvm can
> be
> >>> tuned per table and gc tuned and gc impacts not impact other tables? It
> >>> would probably increase the number of endpoints if we avoid having an
> >>> overarching query router.
> >>>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Jon Haddad

Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world where we’re 
isolating crazy GC churning parts of the DB.  It would mean reworking how tasks 
are created and removal of all shared state in favor of messaging + a smarter 
manager, which imo would be a good idea regardless. 

It might be a better use of time (especially for 4.0) to do some GC performance 
profiling and cut down on the allocations, since that doesn’t involve a massive 
effort.  

I’ve been meaning to do a little benchmarking and profiling for a while now, 
and it seems like a few others have the same inclination as well, maybe now is 
a good time to coordinate that.  A nice perf bump for 4.0 would be very 
rewarding.

Jon

> On Feb 22, 2018, at 2:00 PM, Nate McCall  wrote:
> 
> I've heard a couple of folks pontificate on compaction in its own
> process as well, given it has such a high impact on GC. Not sure about
> the value of individual tables. Interesting idea though.
> 
> On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek  wrote:
>> I've given it some thought in the past. In the end, I usually talk myself
>> out of it because I think it increases the surface area for failure. That
>> is, managing N processes is more difficult that managing one process. But
>> if the additional failure modes are addressed, there are some interesting
>> possibilities.
>> 
>> For example, having gossip in its own process would decrease the odds that
>> a node is marked dead because STW GC is happening in the storage JVM. On
>> the flipside, you'd need checks to make sure that the gossip process can
>> recognize when the storage process has died vs just running a long GC.
>> 
>> I don't know that I'd go so far as to have separate processes for
>> keyspaces, etc.
>> 
>> There is probably some interesting work that could be done to support the
>> orgs who run multiple cassandra instances on the same node (multiple
>> gossipers in that case is at least a little wasteful).
>> 
>> I've also played around with using domain sockets for IPC inside of
>> cassandra. I never ran a proper benchmark, but there were some throughput
>> advantages to this approach.
>> 
>> Cheers,
>> 
>> Gary.
>> 
>> 
>> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller 
>> wrote:
>> 
>>> GC pauses may have been improved in newer releases, since we are in 2.1.x,
>>> but I was wondering why cassandra uses one jvm for all tables and
>>> keyspaces, intermingling the heap for on-JVM objects.
>>> 
>>> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
>>> tuned per table and gc tuned and gc impacts not impact other tables? It
>>> would probably increase the number of endpoints if we avoid having an
>>> overarching query router.
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Nate McCall

I've heard a couple of folks pontificate on compaction in its own
process as well, given it has such a high impact on GC. Not sure about
the value of individual tables. Interesting idea though.

On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek  wrote:
> I've given it some thought in the past. In the end, I usually talk myself
> out of it because I think it increases the surface area for failure. That
> is, managing N processes is more difficult that managing one process. But
> if the additional failure modes are addressed, there are some interesting
> possibilities.
>
> For example, having gossip in its own process would decrease the odds that
> a node is marked dead because STW GC is happening in the storage JVM. On
> the flipside, you'd need checks to make sure that the gossip process can
> recognize when the storage process has died vs just running a long GC.
>
> I don't know that I'd go so far as to have separate processes for
> keyspaces, etc.
>
> There is probably some interesting work that could be done to support the
> orgs who run multiple cassandra instances on the same node (multiple
> gossipers in that case is at least a little wasteful).
>
> I've also played around with using domain sockets for IPC inside of
> cassandra. I never ran a proper benchmark, but there were some throughput
> advantages to this approach.
>
> Cheers,
>
> Gary.
>
>
> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller 
> wrote:
>
>> GC pauses may have been improved in newer releases, since we are in 2.1.x,
>> but I was wondering why cassandra uses one jvm for all tables and
>> keyspaces, intermingling the heap for on-JVM objects.
>>
>> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
>> tuned per table and gc tuned and gc impacts not impact other tables? It
>> would probably increase the number of endpoints if we avoid having an
>> overarching query router.
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Gary Dusbabek

I've given it some thought in the past. In the end, I usually talk myself
out of it because I think it increases the surface area for failure. That
is, managing N processes is more difficult that managing one process. But
if the additional failure modes are addressed, there are some interesting
possibilities.

For example, having gossip in its own process would decrease the odds that
a node is marked dead because STW GC is happening in the storage JVM. On
the flipside, you'd need checks to make sure that the gossip process can
recognize when the storage process has died vs just running a long GC.

I don't know that I'd go so far as to have separate processes for
keyspaces, etc.

There is probably some interesting work that could be done to support the
orgs who run multiple cassandra instances on the same node (multiple
gossipers in that case is at least a little wasteful).

I've also played around with using domain sockets for IPC inside of
cassandra. I never ran a proper benchmark, but there were some throughput
advantages to this approach.

Cheers,

Gary.

On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller 
wrote:

> GC pauses may have been improved in newer releases, since we are in 2.1.x,
> but I was wondering why cassandra uses one jvm for all tables and
> keyspaces, intermingling the heap for on-JVM objects.
>
> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
> tuned per table and gc tuned and gc impacts not impact other tables? It
> would probably increase the number of endpoints if we avoid having an
> overarching query router.
>

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Michael Kjellman

it's an interesting idea. i'd wonder how much overhead you'd end up with 
message parsing and negate any potential GC wins. rick branson had played 
around a bunch with running storage nodes and doubling down on the old "fat 
client" model. if you had 1 tables (yes, barely works but we don't 
explicitly prevent it) you can't really run that many jvm processes on a single 
box.

> On Feb 22, 2018, at 12:39 PM, Carl Mueller  
> wrote:
> 
> GC pauses may have been improved in newer releases, since we are in 2.1.x,
> but I was wondering why cassandra uses one jvm for all tables and
> keyspaces, intermingling the heap for on-JVM objects.
> 
> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
> tuned per table and gc tuned and gc impacts not impact other tables? It
> would probably increase the number of endpoints if we avoid having an
> overarching query router.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller

GC pauses may have been improved in newer releases, since we are in 2.1.x,
but I was wondering why cassandra uses one jvm for all tables and
keyspaces, intermingling the heap for on-JVM objects.

... so why doesn't cassandra spin off a jvm per table so each jvm can be
tuned per table and gc tuned and gc impacts not impact other tables? It
would probably increase the number of endpoints if we avoid having an
overarching query router.

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Re: Why isn't there a separate JVM per table?

Why isn't there a separate JVM per table?

16 matches

Site Navigation

Mail list logo

Footer information