[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-12-16 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * Support for some architectures does not exist in all RocksDB versions, 
making Streams completely unusable for some users until we can upgrade the 
rocksdb dependency to one that supports their specific case. It's worth noting 
that we've only had [one user|https://issues.apache.org/jira/browse/KAFKA-9225] 
hit this so far (that we know of), and some workarounds have been discussed on 
the ticket.
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [|#issuecomment-83145980] [Custom 
Comparator|https://github.com/facebook/rocksdb/issues/538#issuecomment-83145980]:
 a custom comparator could significantly improve the performance of session 
windows. This is trivial to do but given the high performance cost of crossing 
the jni, it is currently only practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * More work (not to be trivialized, the truth is we don't and can't know how 
much extra work this will ultimately be)

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. Note that as of this writing the frocksdb repo has only needed to open 
5 PRs on top of the actual rocksdb (two of them trivial). Of course, the LOE to 
maintain this will only grow over time, so we should think carefully about 
whether and when to start taking on this potential burden.

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * Support for some architectures does not exist in all RocksDB versions, 
making Streams completely unusable for some users until we can upgrade the 
rocksdb dependency to one that supports their specific case
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-12-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * Support for some architectures does not exist in all RocksDB versions, 
making Streams completely unusable for some users until we can upgrade the 
rocksdb dependency to one that supports their specific case
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * More work (not to be trivialized, the truth is we don't and can't know how 
much extra work this will ultimately be)

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. Note that as of this writing the frocksdb repo has only needed to open 
5 PRs on top of the actual rocksdb (two of them trivial). Of course, the LOE to 
maintain this will only grow over time, so we should think carefully about 
whether and when to start taking on this potential burden.

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * Support for some architectures does not exist in all versions, making 
RocksDB – Streams RocksDB versions are  architecutres
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-12-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * Support for some architectures does not exist in all versions, making 
RocksDB – Streams RocksDB versions are  architecutres
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * More work (not to be trivialized, the truth is we don't and can't know how 
much extra work this will ultimately be)

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. Note that as of this writing the frocksdb repo has only needed to open 
5 PRs on top of the actual rocksdb (two of them trivial). Of course, the LOE to 
maintain this will only grow over time, so we should think carefully about 
whether and when to start taking on this potential burden.

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% [performance 
regression|https://github.com/facebook/rocksdb/issues/5774] that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. Note that as of this writing the frocksdb repo has only needed to open 
5 PRs on top of the actual rocksdb (two of them trivial).

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. Note that as of this writing the frocksdb repo has only needed to open 
5 PRs on top of the actual rocksdb (two of them trivial).

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely WriteBufferManager, see KAFKA-8215). Unfortunately, someone 
recently discovered a ~8% performance regression that exists in all versions 
5.18+ (latest being 6.2.2 as of now). Flink was able to react to this by 
downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone recently discovered a ~8% performance regression that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone recently discovered a ~8% performance regression that exists in all 
versions 5.18+ (latest being 6.2.2 as of now). Flink was able to react to this 
by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone recently discovered a ~8% performance regression that exists in all 
versions 5.18+ (latest being 6.2.2 as of now). Flink was able to react to this 
by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely WriteBufferManager, see KAFKA-8215). Unfortunately, someone 
recently discovered a ~8% performance regression that exists in all versions 
5.18+ (latest being 6.2.2 as of now). Flink was able to react to this by 
downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone from Flink recently discovered a ~8% performance regression that exists 
in all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
able to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
someone recently discovered a ~8% performance regression that exists in all 
versions 5.18+ (up through the current newest version, 6.2.2). Flink was able 
to react to this by downgrading to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and [picking the 
WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
fork (fRocksDB).

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and picking the WriteBufferManager to their fork, FRocksDB.

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

 

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get 

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and picking the WriteBufferManager to their fork, FRocksDB.

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

 

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows. This is trivial to do 
but given the high performance cost of crossing the jni, it is currently only 
practical to use a c++ comparator
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and picking the WriteBufferManager to their fork, FRocksDB.

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

 

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 


> Consider forking RocksDB for Streams 
> -
>
> Key: KAFKA-9148
>

[jira] [Updated] (KAFKA-9148) Consider forking RocksDB for Streams

2019-11-05 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9148:
---
Description: 
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and picking the WriteBufferManager to their fork, FRocksDB.

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

 

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
significantly improve the performance of session windows
 *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])

Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 

  was:
We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
abilities (WriteBufferManager -- KAFKA-8215). Unfortunately, someone recently 
discovered a ~8% performance regression that exists in all versions 5.18+ 
(latest being 6.2.2 as of now). Flink was able to react to this by downgrading 
to 5.17 and picking the WriteBufferManager to their fork, FRocksDB.

Due to this and other reasons enumerated below, we should consider also forking 
our own RocksDB for Streams.

 

Pros:
 * We can avoid passing sudden breaking changes on to our users, such removal 
of methods with no deprecation period (see discussion on KAFKA-8897)
 * We can pick whichever version has the best performance for our needs, and 
pick over any new features, metrics, etc that we need to use rather than being 
forced to upgrade (and breaking user code, introducing regression, etc)
 * The Java API seems to be a very low priority to the rocksdb folks.
 ** They leave out critical functionality, features, and configuration options 
that have been in the c++ API for a very long time
 ** Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
 ** Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
 *** [Custom 
comparator|[https://github.com/facebook/rocksdb/issues/538#issuecomment-83145980]]:
 a custom comparator could significantly improve the performance of session 
windows
 *** [Prefix Seek|[https://github.com/facebook/rocksdb/issues/6004]]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
 ** Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])


Cons:
 * more work

 

Given that we rarely upgrade the Rocks dependency, use only some fraction of 
its features, and would need or want to make only minimal changes ourselves, it 
seems like we could actually get away with very little extra work by forking 
rocksdb. 

 


> Consider forking RocksDB for Streams 
> -
>
> Key: KAFKA-9148
> URL: https://issues.apache.org/jira/browse/KAFKA-9148
> Project: