Re: [jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-02-27 Thread Benedict Elliott Smith
I've been investigating this, but a bit slowly as was looking in the wrong
place. I haven't yet confirmed, but I have a strong suspicion of the
problem. Could you confirm the total physical memory on the nodes? If 8gb
or less, try applying the not yet committed patches in CASSANDRA-6692
(atomic b tree improvements), and setting the memtable_cleanup_threshold to
0.2, if you are looking at this today. I suspect it will fix it, although
if so it doesn't quite adequately explain the behaviour exactly. If you let
me know the cluster you're using I can test the same environment to make
sure I'm testing like for like as well.
On 27 Feb 2014 20:58, Ryan McGuire (JIRA) j...@apache.org wrote:


 [
 https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915038#comment-13915038]

 Ryan McGuire commented on CASSANDRA-6746:
 -

 I tried re-running this with a 'nodetool flush' on each node after it's
 done with the write. It looked the same as above. I'm running a test with a
 5 minute wait between the write and read to see if that causes a change.

  Reads have a slow ramp up in speed
  --
 
  Key: CASSANDRA-6746
  URL:
 https://issues.apache.org/jira/browse/CASSANDRA-6746
  Project: Cassandra
   Issue Type: Bug
   Components: Core
 Reporter: Ryan McGuire
 Assignee: Benedict
   Labels: performance
  Fix For: 2.1 beta2
 
  Attachments: 2.1_vs_2.0_read.png
 
 
  On a physical four node cluister I am doing a big write and then a big
 read. The read takes a long time to ramp up to respectable speeds.
  !2.1_vs_2.0_read.png!
  [See data here|
 http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1
 ]



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)



Performance Tickets

2014-04-15 Thread Benedict Elliott Smith
It's only been six months since the last performance drive, and 2.1 is now
around the corner. But I'm hoping we can push performance even further for
3.0. With that in mind, I've picked out what I think are the nearest term
wins to focus on.

   - CASSANDRA-7039: DirectByteBuffer compatible LZ4 methods
   - CASSANDRA-6726: RAR/CRAR off-heap
   - CASSANDRA-6633: Dynamic bloom filter resizing
   - CASSANDRA-6755: Optimise CellName/Composite comparisons for NativeCell
   - CASSANDRA-7032: Improve vnode allocation
   - CASSANDRA-6809: Compressed Commit Log
   - CASSANDRA-5663: write batching in native protocol
   - CASSANDRA-5863: In-process (uncompressed) page cache
   - CASSANDRA-7040: Replace read/write stage with per-disk access
   coordination
   - CASSANDRA-6917: enum data type
   - CASSANDRA-6935: Make clustering part of primary key a first order
   component in the storage engine

I've arranged them in ascending order of my intuitive impression of their
difficulty. Don't all leap at the last few :)

Anything I've missed?


Re: Performance Tickets

2014-04-15 Thread Benedict Elliott Smith
I thought I'd spammed the change list enough for one day, but since you
mention it... :-)


On 15 April 2014 23:40, Jonathan Ellis jbel...@gmail.com wrote:

 And here's the Grand List of all performance tickets:

 https://issues.apache.org/jira/issues/?jql=labels%20%3D%20performance%20and%20project%20%3D%20CASSANDRA%20AND%20status!%3Dresolved

 Benedict, you might want to un-assign from yourself anything you're
 not working on in the near future in case anyone else wants to grab
 one.

 On Tue, Apr 15, 2014 at 3:28 PM, Benedict Elliott Smith
 belliottsm...@datastax.com wrote:
  It's only been six months since the last performance drive, and 2.1 is
 now
  around the corner. But I'm hoping we can push performance even further
 for
  3.0. With that in mind, I've picked out what I think are the nearest term
  wins to focus on.
 
 - CASSANDRA-7039: DirectByteBuffer compatible LZ4 methods
 - CASSANDRA-6726: RAR/CRAR off-heap
 - CASSANDRA-6633: Dynamic bloom filter resizing
 - CASSANDRA-6755: Optimise CellName/Composite comparisons for
 NativeCell
 - CASSANDRA-7032: Improve vnode allocation
 - CASSANDRA-6809: Compressed Commit Log
 - CASSANDRA-5663: write batching in native protocol
 - CASSANDRA-5863: In-process (uncompressed) page cache
 - CASSANDRA-7040: Replace read/write stage with per-disk access
 coordination
 - CASSANDRA-6917: enum data type
 - CASSANDRA-6935: Make clustering part of primary key a first order
 component in the storage engine
 
  I've arranged them in ascending order of my intuitive impression of their
  difficulty. Don't all leap at the last few :)
 
  Anything I've missed?



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: Performance Tickets

2014-04-16 Thread Benedict Elliott Smith

   - CASSANDRA-5220: Repair improvements when using vnodes


That definitely deserves a performance tag. Yuki, are you looking at this
or should we unassign in case somebody else wants to jump in?


On 15 April 2014 14:59, Michael Shuler mich...@pbandjelly.org wrote:

 On 04/15/2014 08:28 AM, Benedict Elliott Smith wrote:

 It's only been six months since the last performance drive, and 2.1 is now
 around the corner. But I'm hoping we can push performance even further for
 3.0. With that in mind, I've picked out what I think are the nearest term
 wins to focus on.

 - CASSANDRA-7039: DirectByteBuffer compatible LZ4 methods
 - CASSANDRA-6726: RAR/CRAR off-heap
 - CASSANDRA-6633: Dynamic bloom filter resizing
 - CASSANDRA-6755: Optimise CellName/Composite comparisons for
 NativeCell
 - CASSANDRA-7032: Improve vnode allocation
 - CASSANDRA-6809: Compressed Commit Log
 - CASSANDRA-5663: write batching in native protocol
 - CASSANDRA-5863: In-process (uncompressed) page cache
 - CASSANDRA-7040: Replace read/write stage with per-disk access
 coordination
 - CASSANDRA-6917: enum data type
 - CASSANDRA-6935: Make clustering part of primary key a first order

 component in the storage engine

 I've arranged them in ascending order of my intuitive impression of their
 difficulty. Don't all leap at the last few :)

 Anything I've missed?


   - CASSANDRA-5220: Repair improvements when using vnodes

 Not sure where that might go in your difficulty ordering, but since vnodes
 are default and repair time seems to be a pretty common question/pain, it's
 important for ops and highly relevant to cluster performance. If some of
 the above list might directly affect/help repair performance, let's get
 them tied together in Jira :)

 --
 Kind regards,
 Michael




Re: CQL unit tests vs dtests

2014-05-20 Thread Benedict Elliott Smith
+1 unit tests
On 21 May 2014 02:36, Jake Luciani jak...@gmail.com wrote:

 I think having cql unit tests is certainly a good idea.  It doesn't replace
 dtests but makes it easier to have better coverage locally.


 On Tue, May 20, 2014 at 7:10 PM, Tyler Hobbs ty...@datastax.com wrote:

  Sylvain and I have been having a discussion about testing CQL in unit
 tests
  vs dtests.  I'd like to hear if there are any other opinions on the
 topic.
 
  We currently only test CQL queries through dtests.  I'd like to start
  adding unit tests that exercise CQL where it makes sense.  To me, dtests
  make sense when:
  - Multiple nodes are needed
  - Nodes need to be shutdown, replaced, etc
  - We specifically want end-to-end testing
 
  When we don't need those, I'd like to use unit tests because:
  - They're typically quicker to run (especially with an IDE)
  - Unit tests tend to be run earlier and more often than dtests
  - There are fewer moving parts to break (no ccm or dtest machinery)
  - It's easier to use a debugger
 
  But Sylvain makes some good points about keeping all CQL tests in the
  dtests:
  - All of the related tests are in one place
  - Python tends to be more concise and easier to read and write
 (especially
  for tests)
  - dtests are always fully end-to-end
 
  I agree that Python can be nicer to work with, but Java hasn't been too
 bad
  in my experience[1].  And we do need end-to-end tests, just not on every
  test case.
 
  Does anybody else have an opinion on starting to use unit tests for some
  CQL testing vs keeping everything in dtests?
 
  [1]
 
 
 https://github.com/thobbs/cassandra/blob/CASSANDRA-6875-2.0/test/unit/org/apache/cassandra/cql3/MultiColumnRelationTest.java
  --
  Tyler Hobbs
  DataStax http://datastax.com/
 



 --
 http://twitter.com/tjake



Re: CMS GC / fragmentation / memtables etc

2014-05-21 Thread Benedict Elliott Smith
Graham,

This is largely fixed in 2.1 with the introduction of partially off-heap
memtables - the slabs reside off-heap, so do not cause any GC issues.

As it happens the changes would also permit us to recycle on-heap slabs
reasonable easily as well, so feel free to file a ticket for that, although
it won't be back ported to 2.0.


On 21 May 2014 00:57, graham sanderson gra...@vast.com wrote:

 So i’ve been tinkering a bit with CMS config because we are still seeing
 fairly frequent full compacting GC due to framgentation/promotion failure

 As mentioned below, we are usually too fragmented to promote new in-flight
 memtables.

 This is likely caused by sudden write spikes (which we do have), though
 actually the problems don’t generally happen at that time of our largest
 write spikes (though any write spikes likely cause spill of both new
 memtables along with many other new objects of unknown size into the
 tenured gen, so they cause fragmentation if not immediate GC issue). We
 have lots of things going on in this multi-tenant cluster (GC pauses are of
 course extra bad, since they cause spike in hinted-handoff on other nodes
 which were already busy etc…)

 Anyway, considering possibilities:

 0) Try and make our application behavior more steady state - this is
 probably possible, but there are lots of other things (e.g. compaction,
 opscenter, repair etc.) which are both tunable and generally throttle-able
 to think about too.
 1) Play with tweaking PLAB configs to see if we can ease fragmentation
 (I’d be curious what the “crud” is in particular that is getting spilled -
 presumably it is larger objects since it affects the binary tree of large
 objects)
 2) Given the above, if we can guarantee even  24 hours without full GC, I
 don’t think we’d mind running a regular rolling re-start on the servers
 during off hours (note usually the GCs don’t have a visible impact, but
 when they hit multiple machines at once they can)
 3) Zing is seriously an option, if it would save us large amounts of
 tuning, and constant worry about the “next” thing tweaking the allocation
 patterns - does anyone have any experience with Zing  Cassandra
 4) Given that we expect periodic bursts of writes,
 memtable_total_space_in_mb is bounded, we are not actually short of memory
 (it just gets fragmented), I’m wondering if anyone has played with pinning
 (up to or initially?) that many 1MB chunks of memory via SlabAllocator and
 re-using… It will get promoted once, and then these 1M chunks won’t be part
 of the subsequent promotion hassle… it will probably also allow more crud
 to die in eden under write load since we aren’t allocating these large
 chunks in eden at the same time. Anyway, I had a little look at the code,
 and the life cycles of memtables is not trivial, but was considering
 attempting a patch to play with… anyone have any thoughts?

 Basically in summary, the Slab allocator helps by allocating and freeing
 lots of objects at the same time, however any time slabs are allocated
 under load, we end up promoting them with whatever other live stuff in eden
 is still there. If we only do this once and reuse the slabs, we are likely
 to minimize our promotion problem later (at least for these large objects)

 On May 16, 2014, at 9:37 PM, graham sanderson gra...@vast.com wrote:

  Excellent - thank you…
 
  On May 16, 2014, at 7:08 AM, Samuel CARRIERE samuel.carri...@urssaf.fr
 wrote:
 
  Hi,
  This is arena allocation of memtables. See here for more infos :
  http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
 
 
 
 
  De :graham sanderson gra...@vast.com
  A : dev@cassandra.apache.org,
  Date :  16/05/2014 14:03
  Objet : Things that are about 1M big
 
 
 
  So just throwing this out there for those for whom this might ring a
 bell.
 
  I?m debugging some CMS memory fragmentation issues on 2.0.5 - and
  interestingly enough most of the objects giving us promotion failures
 are
  of size 131074 (dwords) - GC logging obviously doesn?t say what those
 are,
  but I?d wager money they are either 1M big byte arrays, or less likely
  256k entry object arrays backing large maps
 
  So not strictly critical to solving my problem, but I was wondering if
  anyone can think of any heap allocated C* objects which are (with no
  significant changes to standard cassandra config) allocated in 1M
 chunks.
  (It would save me scouring the code, or a 9 gig heap dump if I need to
  figure it out!)
 
  Thanks,
 
  Graham
 




Re: [jira] [Assigned] (CASSANDRA-7120) Bad paging state returned for prepared statements for last page

2014-05-21 Thread Benedict Elliott Smith
We need to add 7245 to that list. I'll try to get to it tomorrow.


On 21 May 2014 17:40, Tyler Hobbs (JIRA) j...@apache.org wrote:


  [
 https://issues.apache.org/jira/browse/CASSANDRA-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Tyler Hobbs reassigned CASSANDRA-7120:
 --

 Assignee: Tyler Hobbs  (was: Sylvain Lebresne)

  Bad paging state returned for prepared statements for last page
  ---
 
  Key: CASSANDRA-7120
  URL:
 https://issues.apache.org/jira/browse/CASSANDRA-7120
  Project: Cassandra
   Issue Type: Bug
   Components: Core
 Reporter: Tyler Hobbs
 Assignee: Tyler Hobbs
  Fix For: 2.1 rc1
 
 
  When executing a paged query with a prepared statement, a non-null
 paging state is sometimes being returned for the final page, causing an
 endless paging loop.
  Specifically, this is the schema being used:
  {noformat}
  CREATE KEYSPACE test3rf WITH replication = {'class':
 'SimpleStrategy', 'replication_factor': '3'}';
  USE test3rf;
  CREATE TABLE test3rf.test (
  k int PRIMARY KEY,
  v int
  )
  {noformat}
  The inserts are like so:
  {noformat}
  INSERT INTO test3rf.test (k, v) VALUES (?, 0)
  {noformat}
  With values from [0, 99] used for k.
  The query is {{SELECT * FROM test3rf.test}} with a fetch size of 3.
  The final page returns the row with k=3, and the paging state is
 {{000400420004000176007fa2}}.  This matches the paging state from
 three pages earlier.  When executing this with a non-prepared statement, no
 paging state is returned for this page.
  This problem doesn't happen with the 2.0 branch.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)



Re: CQL unit tests vs dtests

2014-05-22 Thread Benedict Elliott Smith
I would for defining the cql tests in a way that permits them being run as
both dtests and unit tests. But since we're on python for dtests that could
be troublesome.


On 22 May 2014 17:03, Jeremiah D Jordan jerem...@datastax.com wrote:

 The only thing I worry about here is that the unit tests don't come into
 the system the same way user queries will.  So we still need the system
 level dtests.  So I don't think all CQL tests should be unit tests, but I
 am all for there being unit level CQL tests.

 On May 22, 2014, at 10:58 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

  On Wed, May 21, 2014 at 10:46 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  I do think that CQL tests in general make more sense as unit tests,
  but I'm not so anal that I'm going to insist on rewriting existing
  ones.  But in theory, if I had an infinite army of interns, sure. I'd
  have one of them do that. :)
 
  But in the real world, compared to saying we don't have any cql unit
  tests, so we should always write them as dtests to be consistent I
  think having mixed unit + dtests is the lesser of evils.
 
 
  Fair enough. I'll try to make CQL tests as unit tests from now on as much
  as possible (I can't promise a few misfire in the short term). Let's hope
  you
  find your infinite intern army someday.
 
  --
  Sylvain




Re: CMS GC / fragmentation / memtables etc

2014-06-15 Thread Benedict Elliott Smith
Hi Graham,

Unfortunately the problem is more difficult than you might think.
References to the buffers can persist in-flight to clients long after the
memtable is discarded, so you would be introducing a subtle corruption risk
for data returned to clients. Unfortunately the current implementation in
2.1 won't solve this problem for on-heap buffers without introducing a
performance penalty (by copying data from the buffers on read, as we
currently do for off-heap data), so I don't expect this change will be
introduced until zero-copy offheap memtables are introduced, which have
been shelved for the moment.


On 15 Jun 2014 10:53, graham sanderson gra...@vast.com wrote:

 Hi Benedict,

 So I had a look at the code, and as you say it looked pretty easy to
 recycle on heap slabs… there is already RACE_ALLOCATED which keeps a
 strongly referenced pool, however I was thinking in this case of just
 WeakReferences.

 In terms of on heap slabs, it seemed to me that recycling the oldest slab
 you have is probably the best heuristic, since it is less likely to be in
 eden (of course re-using in eden is no worse than worst case today),
 however since the problem tends to be promotion failure of slabs due to
 fragmentation of old gen, recycling one that is already there is even
 better - better still if it has been compacted somewhere pretty stable. I
 think this heuristic would also work well for G1, though I believe the
 recommendation is still not to use that with cassandra.

 So for implementation of that I was thinking of using a
 ConcurrentSkipListMap, from a Long representing the allocation order of the
 Region to a weak reference to the Region (just regular 1M sized ones)…
 allocators can pull oldest and discard cleared references (might need a
 scrubber if the map got too big and we were only checking the first entry).
 Beyond that I don’t think there is any need for a configurable-lengthed
 collection of strongly referenced reusable slabs.

 Question 1:

 This is easy enough to implement, and probably should just be turned on by
 an orthogonal setting… I guess on heap slab is the current default, so this
 feature will be useful

 Question 2:

 Something similar could be done for off heap slabs… this would seem more
 like it would want a size limit on the number of re-usable slabs… strong
 references with explicit clean() is probably better, than using
 weak-references and letting PhantomReference cleaner on DirectByteBuffer do
 the cleaning later.

 Let me know any thoughts and I’ll open an issue (probably 2 - one for on
 heap one for off)… let me know whether you’d like me to assign the first to
 you or me (I couldn’t work on it before next week)

 Thanks,

 Graham.

 On May 21, 2014, at 2:20 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

  Graham,
 
  This is largely fixed in 2.1 with the introduction of partially off-heap
  memtables - the slabs reside off-heap, so do not cause any GC issues.
 
  As it happens the changes would also permit us to recycle on-heap slabs
  reasonable easily as well, so feel free to file a ticket for that,
 although
  it won't be back ported to 2.0.
 
 
  On 21 May 2014 00:57, graham sanderson gra...@vast.com wrote:
 
  So i’ve been tinkering a bit with CMS config because we are still seeing
  fairly frequent full compacting GC due to framgentation/promotion
 failure
 
  As mentioned below, we are usually too fragmented to promote new
 in-flight
  memtables.
 
  This is likely caused by sudden write spikes (which we do have), though
  actually the problems don’t generally happen at that time of our largest
  write spikes (though any write spikes likely cause spill of both new
  memtables along with many other new objects of unknown size into the
  tenured gen, so they cause fragmentation if not immediate GC issue). We
  have lots of things going on in this multi-tenant cluster (GC pauses
 are of
  course extra bad, since they cause spike in hinted-handoff on other
 nodes
  which were already busy etc…)
 
  Anyway, considering possibilities:
 
  0) Try and make our application behavior more steady state - this is
  probably possible, but there are lots of other things (e.g. compaction,
  opscenter, repair etc.) which are both tunable and generally
 throttle-able
  to think about too.
  1) Play with tweaking PLAB configs to see if we can ease fragmentation
  (I’d be curious what the “crud” is in particular that is getting
 spilled -
  presumably it is larger objects since it affects the binary tree of
 large
  objects)
  2) Given the above, if we can guarantee even  24 hours without full
 GC, I
  don’t think we’d mind running a regular rolling re-start on the servers
  during off hours (note usually the GCs don’t have a visible impact, but
  when they hit multiple machines at once they can)
  3) Zing is seriously an option, if it would save us large amounts of
  tuning, and constant worry about the “next” thing tweaking the
 allocation
  patterns - does anyone

Re: CMS GC / fragmentation / memtables etc

2014-06-15 Thread Benedict Elliott Smith
The current implementation is slower for a single memtable-only read
(dependent on workload, but similar ball park to on heap), but can store
substantially more per memtable under offheap_objects (again dependent on
workload, but as much as 8x in extreme cases) which means more queries are
answerable from memtable, and write amplification is reduced accordingly,
improving write throughput.
On 15 Jun 2014 13:32, graham sanderson gra...@vast.com wrote:

 Hi Benedict,

 Ah, my mistake, I had assumed that since the memory backing the off heap
 ByteBuffer Regions was freed on discard, that the Regions would be
 immediately recyclable in the on heap case.

 I guess I’ll just configure for one of the off-heap variants when 2.1
 comes out…

 Any idea of the performance of offheap_buffers and offheap_objects
 (ignoring GC) compared to heap_buffers? I assume offheap_objects must have
 some benefits but presumably at the cost of fragmentation of the native
 heap… obviously I’ll play with it when it comes out…  right now we are
 testing something else, so I don’t have a good environment to try 2.1 - and
 that is always kind of meaningless anyways, since we have many different
 apps using cassandra with different usage patterns, and it is hard to mimic
 production load on all of them at the same time in beta

 Thanks anyway for your detailed explanations,

 Graham

 On Jun 15, 2014, at 1:11 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

  Hi Graham,
 
  Unfortunately the problem is more difficult than you might think.
  References to the buffers can persist in-flight to clients long after the
  memtable is discarded, so you would be introducing a subtle corruption
 risk
  for data returned to clients. Unfortunately the current implementation in
  2.1 won't solve this problem for on-heap buffers without introducing a
  performance penalty (by copying data from the buffers on read, as we
  currently do for off-heap data), so I don't expect this change will be
  introduced until zero-copy offheap memtables are introduced, which have
  been shelved for the moment.
 
 
  On 15 Jun 2014 10:53, graham sanderson gra...@vast.com wrote:
 
  Hi Benedict,
 
  So I had a look at the code, and as you say it looked pretty easy to
  recycle on heap slabs… there is already RACE_ALLOCATED which keeps a
  strongly referenced pool, however I was thinking in this case of just
  WeakReferences.
 
  In terms of on heap slabs, it seemed to me that recycling the oldest
 slab
  you have is probably the best heuristic, since it is less likely to be
 in
  eden (of course re-using in eden is no worse than worst case today),
  however since the problem tends to be promotion failure of slabs due to
  fragmentation of old gen, recycling one that is already there is even
  better - better still if it has been compacted somewhere pretty stable.
 I
  think this heuristic would also work well for G1, though I believe the
  recommendation is still not to use that with cassandra.
 
  So for implementation of that I was thinking of using a
  ConcurrentSkipListMap, from a Long representing the allocation order of
 the
  Region to a weak reference to the Region (just regular 1M sized ones)…
  allocators can pull oldest and discard cleared references (might need a
  scrubber if the map got too big and we were only checking the first
 entry).
  Beyond that I don’t think there is any need for a configurable-lengthed
  collection of strongly referenced reusable slabs.
 
  Question 1:
 
  This is easy enough to implement, and probably should just be turned on
 by
  an orthogonal setting… I guess on heap slab is the current default, so
 this
  feature will be useful
 
  Question 2:
 
  Something similar could be done for off heap slabs… this would seem more
  like it would want a size limit on the number of re-usable slabs… strong
  references with explicit clean() is probably better, than using
  weak-references and letting PhantomReference cleaner on
 DirectByteBuffer do
  the cleaning later.
 
  Let me know any thoughts and I’ll open an issue (probably 2 - one for on
  heap one for off)… let me know whether you’d like me to assign the
 first to
  you or me (I couldn’t work on it before next week)
 
  Thanks,
 
  Graham.
 
  On May 21, 2014, at 2:20 AM, Benedict Elliott Smith 
  belliottsm...@datastax.com wrote:
 
  Graham,
 
  This is largely fixed in 2.1 with the introduction of partially
 off-heap
  memtables - the slabs reside off-heap, so do not cause any GC issues.
 
  As it happens the changes would also permit us to recycle on-heap slabs
  reasonable easily as well, so feel free to file a ticket for that,
  although
  it won't be back ported to 2.0.
 
 
  On 21 May 2014 00:57, graham sanderson gra...@vast.com wrote:
 
  So i’ve been tinkering a bit with CMS config because we are still
 seeing
  fairly frequent full compacting GC due to framgentation/promotion
  failure
 
  As mentioned below, we are usually too fragmented to promote new

Re: CMS GC / fragmentation / memtables etc

2014-06-17 Thread Benedict Elliott Smith
The discussion about virtual methods was around a number of considerations,
as there was some work to refactor the hierarchy and introduce more virtual
methods for a single allocator's cell hierarchies. The allocator also does
not determine the total set of possible cell implementations, as in-flight
data always exists as an on-heap buffer for this version of Cassandra, so
your supposition is a bit optimistic I'm afraid. This may change in a
future version, but depends on the zero-copy patch that has, as I
mentioned, been shelved for the moment.

1) offheap_objects refers to the fact that we serialize the whole object
into an offheap region of memory, except for a single object reference to
it for accessing this data; they are still (currently) allocated as regions
of offheap memory, so there should not be fragmentation of the native heap
2) Yes, although obviously they are slightly more experimental so will not
be the default memtable type when 2.1 first drops
3) I doubt there would be much difference in the current implementation for
offheap memtables (possibly for row cache) as we allocate large (1M)
regions of native memory to save space and fragmentation (native
allocations typically have ~16 byte overhead which is significant at the
cell level). Currently we support jemalloc as an alternative allocator, but
I would benchmark this allocator in your environment, as it has very high
deallocation costs in some environments.

Tuning these variables will, as with everything, be highly workload
dependent. A rule of thumb for offheap_objects: heap: 100 bytes per
partition, 20 bytes per cell; offheap: 8 (timestamp) +  8 (overhead) +
{clustering+name+data size} per cell

On a memtable flush you will see information printed about occupancy of
each limit, so you will be able to tune accordingly if your data is
consistent.


On Sun, Jun 15, 2014 at 11:52 PM, graham sanderson gra...@vast.com wrote:

 Hi Benedict thanks once again for your explanations (hopefully these are
 my last questions!)… I just read
 https://issues.apache.org/jira/browse/CASSANDRA-6694 really quickly, so
 didn’t follow all the discussions…. one thing to note that was argued in
 the middle was about the cost of virtual methods with multiple
 implementations. c2 certainly has fast path for 1, and N known
 implementations where 1Nsomething small (with slow path, possible
 de/re-compile on optimistic assumption failure, or actually when new
 classes appear). AFAIK this is also tracked per call site, and since the
 choice of allocator is static on startup, these calls are actually
 monomorphic and inline-able (note I don’t work on the compilers, nor have I
 looked closely at the code, but this is my understanding). Anyway I don’t
 think this affected your final implementation, but should be how it works
 for hot code.

 The questions:

 1) Is it fair to say offheap_objects is similar to offheap_buffers, but
 with the added benefit of being able to store (large numbers of) individual
 objects which weren’t kept in the Regions on heap either, because they
 needed to be tracked as actual objects?

 2) That offheap_objects is probably the way to go.

 Is/there or will there be some suggestions on tuning

 memtable_heap_space_in_mb;
 memtable_offheap_space_in_mb;

 compared to memtable_total_space_in_mb today (note
 memtable_total_space_in_mb is actually what is still in cassandra.yaml)

 3) I saw somewhere some talk of using a different malloc implementation
 for less native heap fragmentation, but can’t find it now - do you have a
 recommendation to use the non-standard (have to install it yourself) one.

 Thanks,

 Graham.

 On Jun 15, 2014, at 2:04 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 The current implementation is slower for a single memtable-only read
 (dependent on workload, but similar ball park to on heap), but can store
 substantially more per memtable under offheap_objects (again dependent on
 workload, but as much as 8x in extreme cases) which means more queries are
 answerable from memtable, and write amplification is reduced accordingly,
 improving write throughput.
 On 15 Jun 2014 13:32, graham sanderson gra...@vast.com wrote:

 Hi Benedict,

 Ah, my mistake, I had assumed that since the memory backing the off heap
 ByteBuffer Regions was freed on discard, that the Regions would be
 immediately recyclable in the on heap case.

 I guess I’ll just configure for one of the off-heap variants when 2.1
 comes out…

 Any idea of the performance of offheap_buffers and offheap_objects
 (ignoring GC) compared to heap_buffers? I assume offheap_objects must have
 some benefits but presumably at the cost of fragmentation of the native
 heap… obviously I’ll play with it when it comes out…  right now we are
 testing something else, so I don’t have a good environment to try 2.1 - and
 that is always kind of meaningless anyways, since we have many different
 apps using cassandra with different usage patterns, and it is hard

Re: 2.1 rc3?

2014-07-02 Thread Benedict Elliott Smith
Pretty sure we got this head of the hydra. Question is if any more will
spring up in its place.


On Wed, Jul 2, 2014 at 7:28 PM, Jonathan Ellis jbel...@gmail.com wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-7465 is a pretty big
 one, I'd like to get some more testing with the fix before rolling
 -final.  thoughts?

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: 2.1 rc3?

2014-07-07 Thread Benedict Elliott Smith
I think we should get 7403 in there (fixversion was missing)


On Mon, Jul 7, 2014 at 6:15 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Committed 7254, 7496, 7472.  Think we're ready to roll.

 (I'm okay with fixing 7484 in 2.1.1.)

 On Sat, Jul 5, 2014 at 6:40 PM, Michael Kjellman
 mkjell...@internalcircle.com wrote:
  Also putting my 2 cents in for more testing/another release.
 
  On Jul 5, 2014, at 4:31 PM, Jason Brown jasedbr...@gmail.com wrote:
 
  +1 on more testing. TBH, I was a little scared when I found #7465 as it
 was
  rather easy to uncover.
 
 
  On Wed, Jul 2, 2014 at 11:32 AM, Benedict Elliott Smith 
  belliottsm...@datastax.com wrote:
 
  Pretty sure we got this head of the hydra. Question is if any more will
  spring up in its place.
 
 
  On Wed, Jul 2, 2014 at 7:28 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  https://issues.apache.org/jira/browse/CASSANDRA-7465 is a pretty big
  one, I'd like to get some more testing with the fix before rolling
  -final.  thoughts?
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: Hinted Handoff/Atomic Updates SnapTree replacement with BTree

2014-07-14 Thread Benedict Elliott Smith
This discussion is probably better had on JIRA, but I favour an
asynchronous approach permitting only one modifying thread access to the
structure at a time, with each competing modification simply chaining their
to-be-merged state to a pending-list, which is repaired at read time if it
isn't merged before then. i.e. each writer attempts to take an exclusive
lock on modification, and if it succeeds it merges its own state and any
other pending state it sees on entry; if it fails to take the lock it
appends the unmerged modification to a simple list, and returns success.

I plan to address this, but probably not until after 3.0, so if you're in a
rush it may be worth exploring one of these alternative approaches. The
simplest, least modifying solution, is probably to use unsafe to acquire
the object monitor if we fail the first cas (but continue to execute the
same codepath), so that we simply gate the amount of contention we can
experience without incurring any extra costs in the common case.


On Mon, Jul 14, 2014 at 8:10 AM, graham sanderson gra...@vast.com wrote:

 Thanks - yeah I figured we’d want to use the original rowmutation
 partition key as part of the hint partition key such that at least hints
 for the same row come in the right order (it isn’t actually clear how much
 that matters).

 All that said, I delved a little deeper into
 AtomicSortedColumns.addAllWithSizeDelta, and got some new numbers (note I
 updated my test to actually call this method rather than simulate it -
 since I wanted to tweak it - and even though I am still skipping some
 copying of byte buffers, the memory overhead for the hinting is
 significant).

 The major wildcard problem however is that addAllWithSizeDelta is
 effectively a spin lock with a fair bit of memory allocation in it…

 It seems to me that the alternative patch would be to do extend the
 “Holder” CAS state machine to approximately track concurrency, and fall
 back down to a monitor under contention… I made such a patch on 2.0.x (I’ll
 clean it up some and open an issue), and the results look pretty good:

 This is the same sort of data as before, except for every thread_count,
 element_count, element_size, partition_count tuple, I run with the original
 version of addAllWithSizeDelta followed by the modified version.

 Note the raw data size is now definitely a bit too small (just the size of
 the UUID  the byte[] value), however relative numbers are interesting.

 There are three levels of code:

 “fast” - the original spin loop… thread coordination wise, it just reads
 an AtomicReference, copies and updates into a new state and attempts to CAS
 it back
 “medium” - the same as fast, but inside a try/finally so that it can track
 concurrency in an atomic integer instance (not available in fast path)
 “slow” - the same as medium, but protected by the AtomicSortedColumns
 instance’s monitor

 the a/b number pairs below for these leves, are (a) number method
 invocations that passed included that level, and (b) the number of loop
 iterations in that level… If you look closely you’ll see the mix of levels
 changing based on how much contention there is (note up/down are
 approximate counts for such state changes for an individual
 AtomicSortedColumns instance (aka partition in my test)

 Here is the worst case scenario:

 [junit] Threads = 100 elements = 10 (of size 64) partitions = 1
 [junit]   Duration = 1323ms maxConcurrency = 100
 [junit]   GC for PS Scavenge: 94 ms for 28 collections
 [junit]   Approx allocation = 9183MB vs 7MB; ratio to raw data size =
 1203.740583
 [junit]   fast 10/1542949 medium 0/0 slow 0/0 up 0 down 0
 [junit]
 [junit]  -
 [junit]   Duration = 1521ms maxConcurrency = 100
 [junit]   GC for PS Scavenge: 15 ms for 1 collections
 [junit]   Approx allocation = 550MB vs 7MB; ratio to raw data size =
 72.21061
 [junit]   fast 220/233 medium 3/3 slow 99779/99780 up 1 down 1

 Note that the original code did approximately 15 loop iterations (spins)
 per call - and ended up allocating about 9gig for 10 64 byte values

 The new code pretty much reverted to an object monitor in this case, and
 only allocated about 550meg.

 (In this case the new code was slower, but this generally seems to be
 pretty much within the margin of error - I need a longer test to determine
 that - there is really no reason it should be slower (other than I have
 some debug counters in there)

 Here’s another one picked at random with a mix of different levels (note
 that the original code is slower because it does a bunch of memory
 allocating spins - 190622 iterations for 100 inserts)

 [junit] Threads = 100 elements = 10 (of size 256) partitions = 16
 [junit]   Duration = 216ms maxConcurrency = 99
 [junit]   GC for PS Scavenge: 25 ms for 2 collections
 [junit]   Approx allocation = 772MB vs 25MB; ratio to raw data size =
 29.778822352941177
 [junit]   fast 10/190622 medium 0/0 

Re: [VOTE] Release Apache Cassandra 2.1.0-rc6

2014-08-13 Thread Benedict Elliott Smith
I'd prefer to patch CASSANDRA-7743 first.


On Wed, Aug 13, 2014 at 10:47 PM, Brandon Williams dri...@gmail.com wrote:

 +1
 On Aug 13, 2014 7:33 AM, Sylvain Lebresne sylv...@datastax.com wrote:

  I propose the following artifacts for release as 2.1.0-rc6. As it is
 just
  a
  RC, we'll keep the vote to a short 24h.
 
  sha1: 397c0b7c099cc1790c865d9dac7bd46b6194eddf
  Git:
 
 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.0-rc6-tentative
  Artifacts:
 
 
 https://repository.apache.org/content/repositories/orgapachecassandra-1025/org/apache/cassandra/apache-cassandra/2.1.0-rc6/
  Staging repository:
 
 https://repository.apache.org/content/repositories/orgapachecassandra-1025/
 
  The artifacts as well as the debian package are also available here:
  http://people.apache.org/~slebresne/
 
  The vote will be open for 24 hours (longer if needed).
 
  [1]: http://goo.gl/VUS8bD (CHANGES.txt)
  [2]: http://goo.gl/cA3LMH (NEWS.txt)
 



Re: Assigning tickets

2014-09-05 Thread Benedict Elliott Smith
Done


On Fri, Sep 5, 2014 at 4:34 PM, Jay Patel pateljay3...@gmail.com wrote:

 Hi Folks,

 Seems like I'm not able to assign this tix to myself. Can anyone help to
 assign to me? These all are actually related but I opened as separate just
 to track.

 CASSANDRA-7882 https://issues.apache.org/jira/browse/CASSANDRA-7882
 CASSANDRA-7883 https://issues.apache.org/jira/browse/CASSANDRA-7883
 CASSANDRA-7884 https://issues.apache.org/jira/browse/CASSANDRA-7884

 We're doing tests with high # of tables in a cluster  would like to have
 this options added possibly in 2.1.1. We've already implemented in the
 local env  doing some pert test on it. Thoughts are welcome!

 Thanks,
 Jay



Re: [contrib] Idea/ Reduced I/O and CPU cost for GET ops

2014-09-07 Thread Benedict Elliott Smith
Hi Mark,

This specific heuristic is unlikely to be applied, as (if I've understood
it correctly) it has a very narrow window of utility to only those updates
that hit *exactly* the same clustering columns (cql rows) *and *data
columns, and is not trivial to maintain (either cpu- or memory-wise).
However two variants on this heuristic are already being considered for
inclusion as part of the new sstable format we're introducing in
CASSANDRA-7447 https://issues.apache.org/jira/browse/CASSANDRA-7447,
which is an extension of the the heuristic that is already applied at the
whole sstable level.

(1) Per partition, we will store the maximum timestamp (whether or not this
sits in the hash index / key cache, or in the clustering column index part,
is an open question)
 - this permits us to stop looking at files once we have a complete set of
the data we expect to return, i.e. all selected fields for the complete set
of selected rows

(2) Per clustering row, we may store a enough information to construct the
max timestamp for the row
 - this permits us to stop looking at data pages if we know we have all
selected fields for a given row only




On Sun, Sep 7, 2014 at 11:30 PM, Mark Papadakis markuspapada...@icloud.com
wrote:

 Greetings,

 This heuristic helps us eliminate unnecessary I/O in certain workloads and
 datasets, by often many orders of magnitude. This is description of the
 problems we faced and how we dealt with it — I am pretty certain this can
 be easily implemented on C*, albeit will likely require a new SSTable
 format that can support the semantics described below.

 # Example
 One of our services, a price comparison service, has many millions of
 products in our datastore, and we access over 100+ rows on a single page
 request (almost all of them in 2-3 MultiGets - those are executed in
 parallel in our datastore implementation). This is fine, and it rarely
 takes more than 100ms to get back responses from all those requests.

 Because we, in practice, need to update all key=value rows multiple times
 a day (merchants tend to update their price every few hours for some odd
 reason), it means that a key’s columns exist in multiple(almost always all)
 SSTables of a CF, and so, we almost always have to merge the final value
 for each key from all those many SSTables, as opposed to only need to
 access a single SSTable to do that.

 In fact, for most CFs of this service, we need to merge most of their
 SSTables to get the final CF, because of that same reason (rows update very
 frequently, as opposed to say, a ‘users’ CF where you typically only set
 the row once on creation and very infrequently after that ).
 Surely, there  must have been ways to exploit this pattern and access and
 update semantics. (there are).

 Our SSTables are partitioned into chunks. One of those chunks is the index
 chunk which holds distinctKeyId:u64 = offset:u32, sorted by distinctKeyId.
 We have a map which allows us to map  distinctKeyId:u64= data chunk
 region(skip list), so that this offset is an offset into the respective
 data chunk region — this is so that we won’t have to store 64bit offsets
 there, and that saves us 4bytes / row (every ~4GB we track another data
 chunk region so in practice this is a constant operation ).

 # How we are mitigating IO access and merge costs
 Anyway, when enabled, with the additional cost of 64bits for each entry in
 the index chunk, instead of keyId:u64=(offset:u32), we now use
 keyId:u64=(offset:u32, latestTs:u32, signature:u32).

 For each CF we are about to serialise to an SSTable, we identify the
 latest timestamp of all columns(milliseconds, we need the unix timestamp).
 Depending on the configuration we also do either of:
 1. A digest of all column names.  Currently, we use CRC32. When we build
 the SSTable index chunk, we store distinctKeyId:u64 =
 (dataChunkSegmentOffset:u32, latestTimestamp:u32, digest:u32)
 2. Compute a mask based on the first 31 distinct column names encountered
 so far. Here is some pseudocode:

 Dictionary sstableDistinctColumnNames;
 uint32_t mask = 0;

 for (const auto it : cf-columns)
 {
 const auto name = it.name;

 if  (sstableDistinctColumnNames.IsSet(name))
 mask|=(1sstableDistinctColumnNames[name]);
 else if (sstableDistinctColumnNames.Size() == 31)
 {
 // Cannot track this column, so we won’t be able to do
 much about this row
 mask = 0;
 }
 else
 {
 mask|=(1sstableDistinctColumnsNames.Size());
 sstableDistinctColumnsNames.Set(name,
 sstableDistinctColumnNames.Size());
 }

 and so, we again store distinctKeyId:u64 = (dataChunkSegmentOffset:u32,
 latestTimestamp:u32, map:u32).
 We also store sstableDistinctColumnsNames in the SSTable header (each
 SSTable has a header chunk where we store KV records).

 Each method comes with pros and cons. Though they probably make sense and
 you get where this is going 

Re: [VOTE] Release Apache Cassandra 2.1.0

2014-09-08 Thread Benedict Elliott Smith
Fair enough (and yes, it did)

On Mon, Sep 8, 2014 at 2:40 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 On Sep 7, 2014 4:41 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com
 wrote:
 
  I've just commited 7519, which would be nice (but not essential) to
 include
  in 2.1.0, since it has breaking changes to the stress API

 Let's leave it for 2.1.1, we're past the 'nice to have' for 2.1.0.

 
  Also, not sure if this is just me missing something obvious, and is
  probably minor to fix, but ant fails to compile on
 
 org.apache.cassandra.hadoop.cql3.LimitedLocalNodeFirstLocalBalancingPolicy
 

 I suspect a 'ant realclean' should fix it.

 
  On Sun, Sep 7, 2014 at 9:27 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
   +1
  
  
   On Sun, Sep 7, 2014 at 9:23 AM, Sylvain Lebresne sylv...@datastax.com
 
   wrote:
  
We have no outstanding tickets open and tests are in the green, so I
propose
the following artifacts for release as 2.1.0.
   
sha1: c6a2c65a75adea9a62896269da98dd036c8e57f3
Git:
   
   
  

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.0-tentative
Artifacts:
   
   
  

 https://repository.apache.org/content/repositories/orgapachecassandra-1031/org/apache/cassandra/apache-cassandra/2.1.0/
Staging repository:
   
  
 https://repository.apache.org/content/repositories/orgapachecassandra-1031/
   
The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne
   
The vote will be open for 72 hours (longer if needed).
   
[1]: http://goo.gl/zfCTyc (CHANGES.txt)
[2]: http://goo.gl/uAoTTC (NEWS.txt)
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder, http://www.datastax.com
   @spyced
  



Re: Conditional Update Code?

2015-02-06 Thread Benedict Elliott Smith
It's quite possible support could be added to evaluate a UDF as part of the
condition check. The code you're looking for are implementors of
CASRequest.appliesTo(), in CQL3CasRequest and
CassandraServer.ThriftCASRequest

It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would
offer the basic functionality, except that it is expected to require ALLOW
FILTERING, which is unlikely to be permitted for a CAS operation, since the
implication is that the work is too expensive for normal use. Such a
constraint is probably not necessary if a clustering prefix is provided,
though (i.e. a full CQL row key).

On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill b...@alumni.brown.edu wrote:


 All,

 I¹m just looking for a little directionŠ

 Anyone know where I can find the code that checks the condition in a
 conditional update?
 We¹d love to have more expressive conditions, beyond just equality.  (e.g.
 column contains? value)

 I wanted to see how hard this would be to contribute.
 Is such a JIRA already open?

 -brian

 ---
 Brian O'Neill
 Chief Technology Officer
 Health Market Science, a LexisNexis Company
 215.588.6024 Mobile € @boneill42 http://www.twitter.com/boneill42


 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.






Re: [discuss] Modernization of Cassandra build system

2015-03-31 Thread Benedict Elliott Smith
I think the problem is everyone currently contributing is comfortable with
ant, and as much as it is imperfect, it isn't clear maven is going to be
better. Having the requisite maven functionality linked under the hood
doesn't seem particularly preferable to the inverse. The status quo has the
bonus of zero upheaval for the project and its contributors, though, so it
would have to be a very clear win to justify the change in my opinion.


On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki l...@code-house.org
wrote:

 Hey Tyler,
 Thank you very much for coming back. I already lost faith that I will get
 reply. :-) I am fine with code relocations. Moving constants into one place
 where they cause no circular dependencies is cool, I’m all for doing such
 thing.

 Currently Cassandra uses ant for doing some of maven functionalities (such
 deploying POM.xml into repositories with dependency information), it uses
 also maven type of artifact repositories. This can be easily flipped. Maven
 can call ant tasks for these parts which can not be made with existing
 maven plugins. Here is simplest example:
 http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin 
 http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin - you can see
 ant task definition embedded in maven pom.xml.

 Most of things can be made at this moment via maven plugins:
 apache-rat-plugin:
 http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11 
 http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
 maven-thrift-plugin:
 http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
 
 http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
 
 antlr4-maven-plugin:
 http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 
 http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 or
 antlr3-maven-plugin:
 http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2 
 http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
 maven-gpg-plugin:
 http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
 
 http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
 
 maven-cobertura-plugin: http://mojo.codehaus.org/cobertura-maven-plugin/ 
 http://mojo.codehaus.org/cobertura-maven-plugin/ (but these days jacoco
 with java agent instrumentation perfoms better)
 .. and so on

 I already made some evaluation of impact and it is big. Code has to be
 separated into different source roots. It’s not easy even for keeping
 current artifact structure: cassandra-all, cassandra-thrift and clientutil
 (cause of cyclic dependencies). What I can do is prepare of these src roots
 with dependencies which are declared for them and push that to my cassandra
 fork so you will be able to verify that and continue with relocations if
 you will like new build. Creating new modules (source roots) with maven is
 simple so you could possibly extract more than these 3 predefined
 artifacts/package roots.
 Just let me know if you are interested.

 Kind regards,
 Lukasz


  Wiadomość napisana przez Tyler Hobbs ty...@datastax.com w dniu 31 mar
 2015, o godz. 21:57:
 
  Hi Łukasz,
 
  I'm not very familiar with the build system, but I'll try to respond.
 
  The Serializer dependencies on org.apache.cassandra.transport are almost
  certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These are
  constants that represent the native protocol version in use, which
 affects
  how certain types are serialized.  These constants could easily be moved.
 
  The o.a.c.marshal dependency in MapSerializer is on AbstractType, but
 could
  easily be replaced with java.util.Comparator.
 
  In any case, I'm not necessarily opposed to improving the build system to
  make these errors more apparent.  Would your proposal still allow us to
  build with ant (and just change the way those artifacts are built)?
 
  On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki l...@code-house.org
 mailto:l...@code-house.org wrote:
 
  Dear cassandra commiters and development process followers,
  I would like to bring an important topic off build process of
 cassandra. I
  am an external user from community point of view, however I been walking
  around various  projects close to cassandra over past year or even more.
  What is worrying me a lot is how cassandra is publishing artifacts and
 how
  many problems are reported due that.
 
  First of all - I want to note that I am not born enemy of Ant itself. I
  never used it. I am also aware of problems with custom builds made with
  Maven, however I don’t really want to discuss any particular
 replacement,
  yet I want to note that Cassandra JIRA project contains about 116 issues
  related somehow to maven (http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5
 http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5,
  project=CASSANDRA, text ~ maven). Depends on the point of view it might
 be
  a lot or a 

Re: [discuss] Modernization of Cassandra build system

2015-04-02 Thread Benedict Elliott Smith
There are three distinct problems you raise: code structure, documentation,
and build system.

The build system, as far as I can tell, is a matter of personal preference.
I personally dislike the few interactions I've had with maven, but
gratefully my interactions with build system innards have been fairly
limited. I mostly just use them. Unless a concrete and significant benefit
is delivered by maven, though, it just doesn't seem worth the upheaval to
me. If you can make the argument that it actually improves the project in a
way that justifies the upheaval, it will certainly be considered, but so
far no justification has been made.

The documentation problem is common to many projects, though: out of
codebase documentation gets stale very rapidly. When we say to read the
code we mean read the code and its inline documentation - the quality of
this documentation has itself generally been substandard, but has been
improving significantly over the past year or so, and we are endeavouring
to improve with every change. In the meantime, there are videos from a
recent bootcamp we've run for both internal and external contributors
http://www.datastax.com/dev/blog/deep-into-cassandra-internals.

The code structure would be great to modularise, but the reality is that it
is not currently modular. There are no good clear dividing lines for much
of the project. The problem with refactoring the entire codebase to create
separate projects is that it is a significant undertaking that makes
maintenance of the project across versions significantly more costly. This
create a net drag on all productivity in the project. Such a major change
requires strong consensus, and strong evidence justifying it. So the
question is: would this create more new work than it loses? The evidence
isn't there that it would. It might, but I personally guess that it would
not, judging by the results of our other attempts to drive up contributions
to the project. Perhaps we can have a wider dialogue about the endeavour,
though, and see if a consensus can in fact be built.



On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com
wrote:

 Hi all,

 Not a cassandra contributor here, but I'm working on the cassandra sources
 too.

 This big cassandra source root caused me trouble too, firstly it was not
 easy to import in an IDE, try to import cassandra sources in netbeans, it's
 a headcache.

 It would be great if we had more small modules/projects in separate POM. It
 will be more easier to work on small part of the project, and as a
 consequences, I'm sure you will have more external contribution to this
 project.

 I know cassandra devs are used to ant build model, but it's like a thread I
 opened about updated and more complete documentation about sstable
 structures. I got answer that it was not needed to understand how to use
 Cassandra, and the only way to learn about that is to rtfcode. Because
 people working on cassandra already know how sstable structure are, it's
 not needed to provide up to date documentation.
 So it will take me a very long time to read and understand all the
 serialization code in cassandra to understand the sttable structure before
 I can work on the code. Up to date documentation about internals would have
 gave me the knowledge I need to contribute much quicker.

 Here we have the same problem, we have a complex non modular build system,
 and core cassandra dev are used to it, so it's not needed to make something
 more flexible, even if it could facilite external contribution.



 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith 
 belliottsm...@datastax.com:

  I think the problem is everyone currently contributing is comfortable
 with
  ant, and as much as it is imperfect, it isn't clear maven is going to be
  better. Having the requisite maven functionality linked under the hood
  doesn't seem particularly preferable to the inverse. The status quo has
 the
  bonus of zero upheaval for the project and its contributors, though, so
 it
  would have to be a very clear win to justify the change in my opinion.
 
 
  On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki l...@code-house.org
  wrote:
 
   Hey Tyler,
   Thank you very much for coming back. I already lost faith that I will
 get
   reply. :-) I am fine with code relocations. Moving constants into one
  place
   where they cause no circular dependencies is cool, I’m all for doing
 such
   thing.
  
   Currently Cassandra uses ant for doing some of maven functionalities
  (such
   deploying POM.xml into repositories with dependency information), it
 uses
   also maven type of artifact repositories. This can be easily flipped.
  Maven
   can call ant tasks for these parts which can not be made with existing
   maven plugins. Here is simplest example:
   http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin 
   http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin - you can
 see
   ant task definition embedded in maven pom.xml.
  
   Most

Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Benedict Elliott Smith
 multiple build modules, placeholders
 and
  so on, not because it’s handsome to do such.
 
  Modernization of build and internal dependencies is not something which
  brings huge benefit in first run cause now your frontend is CQL,
 however it
  gives real boost when it comes to community donations, tool
 development, or
  even debugging. Sadly keeping current Ant build is silent agreement to
 keep
  mess internally and rickety architecture of project. Ant was already
 legacy
  tool when Cassandra has been launched. The longer you will stay with it
 the
  more troubles you will get with it over time.
 
  Kind regards,
  Lukasz
 
 
  Wiadomość napisana przez Robert Stupp sn...@snazy.de w dniu 2 kwi
  2015, o godz. 14:51:
 
  TL;DR - Benedict is right.
 
  IMO Maven is a nice, straight-forward tool if you know what you’re
 doing
  and start on a _new_ project.
  But Maven easily becomes a pita if you want to do something that’s not
  supported out-of-the-box.
  I bet that Maven would just not work for C* source tree with all the
  little nice features that C*’s build.xml offers (just look at the
 scripted
  stuff in build.xml).
 
  Eventually gradle would be an option; I proposed to switch to gradle
  several months ago. Same story (although gradle is better than Maven ;)
 ).
  But… you need to know that build.xml is not just used to build the code
  and artifacts. It is also used in CI, ccm, cstar-perf and a some other
  custom systems that exist and just work. So - if we would exchange ant
 with
  something else, it would force a lot of effort to change several tools
 and
  systems. And there must be a guarantee that everything works like it did
  before.
 
  Regarding IDEs: i’m using IDEA every day and it works like a charm with
  C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”.
 TBH I
  don’t know NetBeans.
 
  As Benedict pointed out, the code has improved and still improves a lot
  - in structure, in inline-doc, in nomenclature and whatever else. As
 soon
  as we can get rid of Thrift in the tree, there’s another big
 opportunity to
  cleanup more stuff.
 
  TBH I don’t think that (beside the tools) there would be a need to
  generate multiple artifacts for C* daemon - you can do ”separation of
  concerns” (via packages) even with discipline and then measure it.
  IMO The only artifact worth to extract out of C* tree, and useful for a
  (limited) set of 3rd party code, is something like
  ”cassandra-jmx-interfaces.jar”
 
  Robert
 
  Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith 
  belliottsm...@datastax.com:
 
  There are three distinct problems you raise: code structure,
  documentation,
  and build system.
 
  The build system, as far as I can tell, is a matter of personal
  preference.
  I personally dislike the few interactions I've had with maven, but
  gratefully my interactions with build system innards have been fairly
  limited. I mostly just use them. Unless a concrete and significant
  benefit
  is delivered by maven, though, it just doesn't seem worth the upheaval
  to
  me. If you can make the argument that it actually improves the project
  in a
  way that justifies the upheaval, it will certainly be considered, but
 so
  far no justification has been made.
 
  The documentation problem is common to many projects, though: out of
  codebase documentation gets stale very rapidly. When we say to read
 the
  code we mean read the code and its inline documentation - the
  quality of
  this documentation has itself generally been substandard, but has been
  improving significantly over the past year or so, and we are
  endeavouring
  to improve with every change. In the meantime, there are videos from a
  recent bootcamp we've run for both internal and external contributors
  http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
 
  The code structure would be great to modularise, but the reality is
  that it
  is not currently modular. There are no good clear dividing lines for
  much
  of the project. The problem with refactoring the entire codebase to
  create
  separate projects is that it is a significant undertaking that makes
  maintenance of the project across versions significantly more costly.
  This
  create a net drag on all productivity in the project. Such a major
  change
  requires strong consensus, and strong evidence justifying it. So the
  question is: would this create more new work than it loses? The
 evidence
  isn't there that it would. It might, but I personally guess that it
  would
  not, judging by the results of our other attempts to drive up
  contributions
  to the project. Perhaps we can have a wider dialogue about the
  endeavour,
  though, and see if a consensus can in fact be built.
 
 
 
  On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com
 
  wrote:
 
  Hi all,
 
  Not a cassandra contributor here, but I'm working on the cassandra
  sources
  too.
 
  This big cassandra source root caused me trouble too

Re: Network transfer to one node twice as others

2015-04-22 Thread Benedict Elliott Smith
If you're connecting via thrift, all your traffic is most likely being
routed to just one node, which then communicates with the other nodes for
you.

On Wed, Apr 22, 2015 at 6:11 AM, Anishek Agarwal anis...@gmail.com wrote:

 Forwarding it here, someone with Cassandra internals knowledge can help may
 be

 Additionally, i observe the same behavior for reads too where Network read
 from one node is twice than other two..


 -- Forwarded message --
 From: Anishek Agarwal anis...@gmail.com
 Date: Tue, Apr 21, 2015 at 5:15 PM
 Subject: Network transfer to one node twice as others
 To: u...@cassandra.apache.org u...@cassandra.apache.org


 Hello,

 We are using cassandra 2.0.14 and have a cluster of 3 nodes. I have a
 writer test (written in java) that runs 50 threads to populate data to a
 single table in a single keyspace.

 when i look at the iftop  I see that the amount of network transfer
 happening on two nodes is same but on one of the nodes its almost 2ice as
 the other two, Any reason that would be the case ?

 Thanks
 Anishek



Re: DateTieredCompactionStrategy and static columns

2015-05-01 Thread Benedict Elliott Smith
It also doesn't solve the atomicity problem, which is its own challenge. We
would probably need to merge the memtables for the entire keyspace/node,
and split them out into their own sstables on flush. Or introduce mutual
exclusion at the partition key level for the node.

On Fri, May 1, 2015 at 3:01 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I'm down for adding JOIN support within a partition, eventually.  I can see
 a lot of stuff I'd rather prioritize higher in the short term though.

 On Fri, May 1, 2015 at 8:44 AM, Jonathan Haddad j...@jonhaddad.com wrote:

  I think what Benedict has described feels very much like a very
 specialized
  version of the following:
 
  1. Updates to different tables in a batch become atomic if the node is a
  replica for the partition
  2. Supporting Inner joins if the partition key is the same in both
 tables.
 
  I'd rather see join support personally :)
 
  Jon
 
  On Fri, May 1, 2015 at 6:38 AM graham sanderson gra...@vast.com wrote:
 
   I 100% agree with Benedict, but just to be clear about my use case
  
   1) We have state of lets say real estate listings
   2) We get field level deltas for them
   3) Previously we would store the base state all the deltas in partition
   and roll them up from the beginning of time (this was a prototype and
  silly
   since there was no expiration strategy)
   4) Preferred plan is to keep current state in a static map (i.e. one
  delta
   field only updates one cell) - we are MVCC but in the common case the
   latest version will be what we want
   5) However we require history, so we’d use the partition to keep TTL
   deltas going backwards from the now state - this seems like a common
   pattern people would want. Note also that sometimes we might need to
  apply
   reverse deltas if C* is ahead of our SOLR indexes
  
   The static columns and the regular columns ARE completely different in
   behavior/lifecycle, so I’d definitely vote for them being treated as
  such.
  
  
On May 1, 2015, at 7:27 AM, Benedict Elliott Smith 
   belliottsm...@datastax.com wrote:
   
   
How would it be different from creating an actual real extra table
   instead?
   
   
There's nothing that warrants making the codebase more complex to
accomplish something it already does.
   
   
As far as I was aware, the only point of static columns was to
 support
   the
thrift ability to mutate and read them in the same expression, with
atomicity and isolation. As to whether or not it is more complex, I'm
  not
at all convinced that it would be. We have had a lot of unexpected
   special
casing added to ensure they behave correctly (e.g. paging is broken),
  and
have complicated the comparison/slice logic to accommodate them, so
  that
   it
is harder to reason about (and to optimise). They also have very
   different
compaction characteristics, so the complexity on the user is
 increased
without their necessarily realising it. All told, it introduces a lot
   more
subtlety of behaviour than there would be with a separate set of
   sstables,
or perhaps a separate file attached to each sstable.
   
Of course, we've already implemented it as a specialisation of the
slice/comparator, I think because it seemed like the least frictional
   path
to do so, but that doesn't mean it is the least complex. It does mean
   it's
the least work (assuming we're now on top of the bugs), which is its
  own
virtue.
   
There are some advantages to having them managed separately, and
   advantages
to having them combined. Combined, for small partitions, they can be
  read
in the same seek. However for large partitions this is no longer
 true,
   and
we may behave much worse by polluting the page cache with lots of
   unwanted
data that is adjacent to the static columns. If they were managed
separately, the page cache would be populated mostly with other
 static
columns, which may be more likely of use. We could quite easily have
 a
static column cache, also, and completely avoid merging them. Or at
   least
we could easily read them with collectTimeOrderedData instead of
collectAllData semantics.
   
All told, it certainly isn't a terrible idea, and shouldn't be
  dismissed
   so
readily. Personally I think in the long run whether or not we manage
   static
columns together with non-static columns is dependent on if we intend
  to
add tiered static columns (i.e., if each level of clustering
  component
can have columns associated with it). If we do, we should definitely
  keep
it all inline. If not, it probably permits a lot better behaviour to
separate them, since it's easier to reason about and improve their
   distinct
characteristics.
   
   
On Fri, May 1, 2015 at 1:24 AM, graham sanderson gra...@vast.com
   wrote:
   
Well you lose the atomicity and isolation, but in this case that is
probably fine

Re: DateTieredCompactionStrategy and static columns

2015-05-02 Thread Benedict Elliott Smith
There I was referring to making operations across multiple (logical) tables
atomic and isolated, as opposed to splitting static and non-static at flush
(which is not particularly tricky)

On Fri, May 1, 2015 at 5:03 PM, graham sanderson gra...@vast.com wrote:

 Naively (I may be missing something) it seems much easier to flush a
 single memtable to more than one stable on disk (static and non static) and
 then allow for separate compaction of those

  On May 1, 2015, at 9:06 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:
 
  It also doesn't solve the atomicity problem, which is its own challenge.
 We
  would probably need to merge the memtables for the entire keyspace/node,
  and split them out into their own sstables on flush. Or introduce mutual
  exclusion at the partition key level for the node.
 
  On Fri, May 1, 2015 at 3:01 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  I'm down for adding JOIN support within a partition, eventually.  I can
 see
  a lot of stuff I'd rather prioritize higher in the short term though.
 
  On Fri, May 1, 2015 at 8:44 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:
 
  I think what Benedict has described feels very much like a very
  specialized
  version of the following:
 
  1. Updates to different tables in a batch become atomic if the node is
 a
  replica for the partition
  2. Supporting Inner joins if the partition key is the same in both
  tables.
 
  I'd rather see join support personally :)
 
  Jon
 
  On Fri, May 1, 2015 at 6:38 AM graham sanderson gra...@vast.com
 wrote:
 
  I 100% agree with Benedict, but just to be clear about my use case
 
  1) We have state of lets say real estate listings
  2) We get field level deltas for them
  3) Previously we would store the base state all the deltas in
 partition
  and roll them up from the beginning of time (this was a prototype and
  silly
  since there was no expiration strategy)
  4) Preferred plan is to keep current state in a static map (i.e. one
  delta
  field only updates one cell) - we are MVCC but in the common case the
  latest version will be what we want
  5) However we require history, so we’d use the partition to keep TTL
  deltas going backwards from the now state - this seems like a common
  pattern people would want. Note also that sometimes we might need to
  apply
  reverse deltas if C* is ahead of our SOLR indexes
 
  The static columns and the regular columns ARE completely different in
  behavior/lifecycle, so I’d definitely vote for them being treated as
  such.
 
 
  On May 1, 2015, at 7:27 AM, Benedict Elliott Smith 
  belliottsm...@datastax.com wrote:
 
 
  How would it be different from creating an actual real extra table
  instead?
 
 
  There's nothing that warrants making the codebase more complex to
  accomplish something it already does.
 
 
  As far as I was aware, the only point of static columns was to
  support
  the
  thrift ability to mutate and read them in the same expression, with
  atomicity and isolation. As to whether or not it is more complex, I'm
  not
  at all convinced that it would be. We have had a lot of unexpected
  special
  casing added to ensure they behave correctly (e.g. paging is broken),
  and
  have complicated the comparison/slice logic to accommodate them, so
  that
  it
  is harder to reason about (and to optimise). They also have very
  different
  compaction characteristics, so the complexity on the user is
  increased
  without their necessarily realising it. All told, it introduces a lot
  more
  subtlety of behaviour than there would be with a separate set of
  sstables,
  or perhaps a separate file attached to each sstable.
 
  Of course, we've already implemented it as a specialisation of the
  slice/comparator, I think because it seemed like the least frictional
  path
  to do so, but that doesn't mean it is the least complex. It does mean
  it's
  the least work (assuming we're now on top of the bugs), which is its
  own
  virtue.
 
  There are some advantages to having them managed separately, and
  advantages
  to having them combined. Combined, for small partitions, they can be
  read
  in the same seek. However for large partitions this is no longer
  true,
  and
  we may behave much worse by polluting the page cache with lots of
  unwanted
  data that is adjacent to the static columns. If they were managed
  separately, the page cache would be populated mostly with other
  static
  columns, which may be more likely of use. We could quite easily have
  a
  static column cache, also, and completely avoid merging them. Or at
  least
  we could easily read them with collectTimeOrderedData instead of
  collectAllData semantics.
 
  All told, it certainly isn't a terrible idea, and shouldn't be
  dismissed
  so
  readily. Personally I think in the long run whether or not we manage
  static
  columns together with non-static columns is dependent on if we intend
  to
  add tiered static columns (i.e., if each level of clustering

Re: DateTieredCompactionStrategy and static columns

2015-05-01 Thread Benedict Elliott Smith

 How would it be different from creating an actual real extra table instead?


There's nothing that warrants making the codebase more complex to
 accomplish something it already does.


As far as I was aware, the only point of static columns was to support the
thrift ability to mutate and read them in the same expression, with
atomicity and isolation. As to whether or not it is more complex, I'm not
at all convinced that it would be. We have had a lot of unexpected special
casing added to ensure they behave correctly (e.g. paging is broken), and
have complicated the comparison/slice logic to accommodate them, so that it
is harder to reason about (and to optimise). They also have very different
compaction characteristics, so the complexity on the user is increased
without their necessarily realising it. All told, it introduces a lot more
subtlety of behaviour than there would be with a separate set of sstables,
or perhaps a separate file attached to each sstable.

Of course, we've already implemented it as a specialisation of the
slice/comparator, I think because it seemed like the least frictional path
to do so, but that doesn't mean it is the least complex. It does mean it's
the least work (assuming we're now on top of the bugs), which is its own
virtue.

There are some advantages to having them managed separately, and advantages
to having them combined. Combined, for small partitions, they can be read
in the same seek. However for large partitions this is no longer true, and
we may behave much worse by polluting the page cache with lots of unwanted
data that is adjacent to the static columns. If they were managed
separately, the page cache would be populated mostly with other static
columns, which may be more likely of use. We could quite easily have a
static column cache, also, and completely avoid merging them. Or at least
we could easily read them with collectTimeOrderedData instead of
collectAllData semantics.

All told, it certainly isn't a terrible idea, and shouldn't be dismissed so
readily. Personally I think in the long run whether or not we manage static
columns together with non-static columns is dependent on if we intend to
add tiered static columns (i.e., if each level of clustering component
can have columns associated with it). If we do, we should definitely keep
it all inline. If not, it probably permits a lot better behaviour to
separate them, since it's easier to reason about and improve their distinct
characteristics.


On Fri, May 1, 2015 at 1:24 AM, graham sanderson gra...@vast.com wrote:

 Well you lose the atomicity and isolation, but in this case that is
 probably fine

 That said, in every interaction I’ve had with static columns, they seem to
 be an odd duck (e.g. adding or complicating range slices), perhaps worthy
 of their own code path and sstables. Just food for thought.

  On Apr 30, 2015, at 7:13 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 
  If you want it in a separate sstable, just use a separate table.  There's
  nothing that warrants making the codebase more complex to accomplish
  something it already does.
 
  On Thu, Apr 30, 2015 at 5:07 PM graham sanderson gra...@vast.com
 wrote:
 
  Anyone here have an opinion; how realistic would it be to have a
 separate
  memtable/sstable for static columns?
 
  Begin forwarded message:
 
  *From: *Jonathan Haddad j...@jonhaddad.com
  *Subject: **Re: DateTieredCompactionStrategy and static columns*
  *Date: *April 30, 2015 at 3:55:46 PM CDT
  *To: *u...@cassandra.apache.org
  *Reply-To: *u...@cassandra.apache.org
 
 
  I suspect this will kill the benefit of DTCS, but haven't tested it to
 be
  100% here.
 
  The benefit of DTCS is that sstables are selected for compaction based
 on
  the age of the data, not their size.  When you mix TTL'ed data and non
  TTL'ed data, you end up screwing with the drop the entire SSTable
  optimization.  I don't believe this is any different just because you're
  mixing in static columns.  What I think will happen is you'll end up
 with
  an sstable that's almost entirely TTL'ed with a few static columns that
  will never get compacted or dropped.  Pretty much the worst scenario I
 can
  think of.
 
 
 
  On Thu, Apr 30, 2015 at 11:21 AM graham sanderson gra...@vast.com
 wrote:
 
  I have a potential use case I haven’t had a chance to prototype yet,
  which would normally be a good candidate for DTCS (i.e. data delivered
 in
  order and a fixed TTL), however with every write we’d also be updating
 some
  static cells (namely a few key/values in a static maptext.text CQL
  column). There could also be explicit deletes of keys in the static
 map,
  though that’s not 100% necessary.
 
  Since those columns don’t have TTL, without reading thru the code code
  and/or trying it, I have no idea what effect this has on DTCS (perhaps
 it
  needs to use separate sstables for static columns). Has anyone tried
 this.
  If not I eventually will and will report back.
 
 




Staging Branches

2015-05-07 Thread Benedict Elliott Smith
A good practice as a committer applying a patch is to build and run the
unit tests before updating the main repository, but to do this for every
branch is infeasible and impacts local productivity. Alternatively,
uploading the result to your development tree and waiting a few hours for
CI to validate it is likely to result in a painful cycle of race-to-merge
conflicts, rebasing and waiting again for the tests to run.

So I would like to propose a new strategy: staging branches.

Every major branch would have a parallel branch:

cassandra-2.0 - cassandra-2.0_staging
cassandra-2.1 - cassandra-2.1_staging
trunk - trunk_staging

On commit, the idea would be to perform the normal merge process on the
_staging branches only. CI would then run on every single git ref, and as
these passed we would fast forward the main branch to the latest validated
staging git ref. If one of them breaks, we go and edit the _staging branch
in place to correct the problem, and let CI run again.

So, a commit would look something like:

patch - cassandra-2.0_staging - cassandra-2.1_staging - trunk_staging

wait for CI, see 2.0, 2.1 are fine but trunk is failing, so

git rebase -i trunk_staging ref~1
fix the problem
git rebase --continue

wait for CI; all clear

git checkout cassandra-2.0; git merge cassandra-2.0_staging
git checkout cassandra-2.1; git merge cassandra-2.1_staging
git checkout trunk; git merge trunk_staging

This does introduce some extra steps to the merge process, and we will have
branches we edit the history of, but the amount of edited history will be
limited, and this will remain isolated from the main branches. I'm not sure
how averse to this people are. An alternative policy might be to enforce
that we merge locally and push to our development branches then await CI
approval before merging. We might only require this to be repeated if there
was a new merge conflict on final commit that could not automatically be
resolved (although auto-merge can break stuff too).

Thoughts? It seems if we want an always releasable set of branches, we
need something along these lines. I certainly break tests by mistake, or
the build itself, with alarming regularity. Fixing with merges leaves a
confusing git history, and leaves the build broken for everyone else in the
meantime, so patches applied after, and development branches based on top,
aren't sure if they broke anything themselves.


Re: Staging Branches

2015-05-07 Thread Benedict Elliott Smith

 wouldn't you need to force push?


git push --force-with-lease

This works essentially like CAS; if the remote repositories are not the
same as the one you have modified, it will fail. You then fetch and repair
your local version and try again.

So what does this buy us?


This buys us a clean development process. We bought into always
releasable. It's already a tall order; if we start weakening the
constraints before we even get started, I am unconvinced we will
successfully deliver. A monthly release cycle requires *strict* processes,
not *almost* strict, or strict*ish*.

Something that could also help make a more streamlined process: if actual
commits were constructed on development branches ready for commit, with a
proper commit message and CHANGES.txt updated. Even more ideally: with git
rerere data for merging up to each of the branches. If we had that, and
each of the branches had been tested in CI, we would be much closer than we
are currently, as the risk-at-commit is minimized.

On Thu, May 7, 2015 at 2:48 PM, Jake Luciani jak...@gmail.com wrote:

 git rebase -i trunk_staging ref~1
 fix the problem
 git rebase --continue

 In this situation, if there was an untested follow on commit wouldn't
 you need to force push?

 On Thu, May 7, 2015 at 9:28 AM, Benedict Elliott Smith
 belliottsm...@datastax.com wrote:
 
  If we do it, we'll end up in weird situations which will be annoying for
  everyone
 
 
  Such as? I'm not disputing, but if we're to assess the relative
  strengths/weaknesses, we need to have specifics to discuss.
 
  If we do go with this suggestion, we will most likely want to enable a
  shared git rerere cache, so that rebasing is not painful when there are
  future commits.
 
  If instead we go with repairing commits, we cannot have a queue of
  things to merge up to. Say you have a string of commits waiting for
  approval C1 to C4; you made C1, and it broke something. You introduce C5
 to
  fix it, but the tests are still broken. Did you not really fix it? Or
  perhaps one of C2 to C4 are to blame, but which? And have you
 accidentally
  broken *them* with your commit? Who knows. Either way, we definitely
 cannot
  fast forward. At the very best we can hope that the new merge did not
  conflict or mess up the other people's C2 to C4 commits, and they have to
  now merge on top. But what if another merge comes in, C6, in the
 meantime;
  and C2 really did also break the tests in some way; how do we determine
 C2
  was to blame, and not C6, or C3 or C4? What do the committers for each of
  these do? We end up in a lengthy tussle, and aren't able to commit any of
  these to the mainline until all of them are resolved. Really we have to
  prevent any merges to the staging repository until the mistakes are
 fixed.
  Since our races in these scenario are the length of time taken for cassci
  to vet them, these problems are much more likely than current race to
  commit.
 
  In the scheme I propose, in this scenario, the person who broke the build
  rebases everyone's branches to his now fixed commit, and the next broken
  commit gets blamed, and all other commits being merged in on top can go
 in
  smoothly. The only pain point I can think of is the multi-branch rebase,
  but this is solved by git rerere.
 
  I agree running tests is painful, but at least for the build, this should
  be the responsibility of the committer to build before merging
 
 
  Why make the distinction if we're going to have staging commits? It's a
 bit
  of a waste of time to run three ant real-clean  ant tasks, and
 increases
  the race window for merging (which is painful whether or not involves a
  rebase), and it is not a *typical* occurrence (alarming is subjective)
 
  On Thu, May 7, 2015 at 2:12 PM, Sylvain Lebresne sylv...@datastax.com
  wrote:
 
   If one of them breaks, we go and edit the _staging branch in place to
  correct
   the problem, and let CI run again.
 
  I would strongly advise against *in place* edits. If we do it, we'll
 end up
  in
  weird situations which will be annoying for everyone. Editing commits
 that
  have
  been shared is almost always a bad idea and that's especially true for
  branch
  that will have some amount of concurrency like those staging branches.
 
  Even if such problems are rare, better to avoid them in the first place
 by
  simply
  commit new fixup commits as we currently do. Granted this give you a
  slightly
  less clean history but to the best of my knowledge, this hasn't been a
 pain
  point so far.
 
   wait for CI; all clear
  
   git checkout cassandra-2.0; git merge cassandra-2.0_staging
   git checkout cassandra-2.1; git merge cassandra-2.1_staging
   git checkout trunk; git merge trunk_staging
  
   This does introduce some extra steps to the merge process
 
  If we do this, we should really automate that last part (have the CI
  environment merge the staging branch to the non-staging ones on
 success).
 
   It seems if we want an always releasable set

Re: Staging Branches

2015-05-07 Thread Benedict Elliott Smith

 I would argue that we must *at least* do the following for now.


If we get this right, the extra staging branches can certainly wait to be
assessed until later.

IMO, any patch should have a branch in CI for each affected mainline
branch, and should have the commit completely wired up (CHANGES.txt, commit
message, the works), so that it can be merged straight in. If it conflicts
significantly, it can be bumped back to the author/reviewer to refresh.

On Thu, May 7, 2015 at 4:16 PM, Aleksey Yeschenko alek...@apache.org
wrote:

 I would argue that we must *at least* do the following for now.

 If your patch is 2.1-based, you need to create a private git branch for
 that and a merged trunk branch ( and -trunk). And you don’t push
 anything until cassci validates all of those three branches, first.

 An issue without a link to cassci for both of those branches passing
 doesn’t qualify as done to me.

 That alone will be enough to catch most merge-related regressions.

 Going with staging branches would also prevent any issues from concurrent
 pushes, but given the opposition, I’m fine with dropping that requirement,
 for now.

 --
 AY

 On May 7, 2015 at 18:04:20, Josh McKenzie (josh.mcken...@datastax.com)
 wrote:

 
  Merging is *hard*. Especially 2.1 - 3.0, with many breaking API changes
  (this is before 8099, which is going to make a *world* of hurt, and will
  stick around for a year). It is *very* easy to break things, with even
 the
  utmost care.


 While I agree re:merging, I'm not convinced the proportion of commits that
 will benefit from a staging branch testing pipeline is high enough to
 justify the time and complexity overhead to (what I expect are) the vast
 majority of commits that are smaller, incremental changes that won't
 benefit from this.

 On Thu, May 7, 2015 at 9:56 AM, Ariel Weisberg 
 ariel.weisb...@datastax.com
 wrote:

  Hi,
 
  Sorry didn't mean to blame or come off snarky. I just it is important not
  to #include our release process from somewhere else. We don't have to do
  anything unless it is necessary to meet some requirement of what we are
  trying to do.
 
  So the phrase Trunk is always releasable definitely has some wiggle
 room
  because you have to define what your release process is.
 
  If your requirement is that at any time you be able to tag trunk and ship
  it within minutes then yes staging branches help solve that problem.
 
  The reality is that the release process always takes low single digit
 days
  because you branch trunk, then wait for longer running automated tests to
  run against that branch. If there happens to be a failure you may have to
  update the branch, but you have bounded how much brokeness sits between
 you
  and release already. We also don't have a requirement to be able to ship
  nigh immediately.
 
  We can balance the cost of extra steps and process against the cost of
  having to delay some releases some of the time by a few days and pick
  whichever is more important. We are stilly reducing the amount of time it
  takes to get a working release. Reduced enough that we should be able to
  ship every month without difficulty. I have been on a team roughly our
 size
  that shipped every three weeks without having staging branches. Trunk
 broke
  infrequently enough it wasn't an issue and when it did break it wasn't
 hard
  to address. The real pain point was flapping tests and the diffusion of
  responsibility that prevented them from getting fixed.
 
  If I were trying to sell staging branches I would work the angle that I
  want to be able to bisect trunk without coming across broken revisions.
  Then balance the value of that with the cost of the process.
 
  Ariel
 
  On Thu, May 7, 2015 at 10:41 AM, Benedict Elliott Smith 
  belliottsm...@datastax.com wrote:
 
   It's a bit unfair to characterize Aleksey as subscribing to a cargo
 cult.
   *We* agreed to define the new release process as keeping trunk always
   releasable.
  
   Your own words that catalyzed this: If we release off trunk it is
 pretty
   much necessary for trunk to be in a releasable state all the time
  
   It is possible we have been imprecise in our discussions, and people
 have
   agreed to different things. But it does seem to me we agreed to the
   position Aleksey is taking, and he is not blindly following some other
   process that is not ours.
  
   On Thu, May 7, 2015 at 3:25 PM, Ariel Weisberg 
   ariel.weisb...@datastax.com
   wrote:
  
Hi,
   
Whoah. Our process is our own. We don't have to subscribe to any
 cargo
   cult
book buying seminar giving process.
   
And whatever we do we can iterate and change until it works for us
 and
solves the problems we want solved.
   
Ariel
   
On Thu, May 7, 2015 at 10:13 AM, Aleksey Yeschenko 
 alek...@apache.org
  
wrote:
   
 Strictly speaking, the train schedule does demand that trunk, and
 all
 other branches, must be releasable at all times, whether you like

Re: Staging Branches

2015-05-07 Thread Benedict Elliott Smith
It's odd, because I honestly think this release process will be easier,
since the stricter we make it the smoother it can become. It requires well
formed commits from everyone, and lets the committers asynchronously
confirm their work, and for it to never be in question *who* needs to fix
something, nor what the effect of their fixing it will be. It means we can,
as Ariel said, perform a bisect and honestly know its result is accurate.
Small commits don't need to worry about fast-forwarding; in fact, nobody
does. It can either be automated, or we can fast forward at a time that
suits us. In which case the process is *the same* as it is currently.

I have no interest in making the commit process harder.


On Thu, May 7, 2015 at 3:59 PM, Jake Luciani jak...@gmail.com wrote:

 Ok let's focus then on the idea that trunk is releasable.  Releasable
 to me doesn't mean it can't contain a bad merge.

 It means it doesn't contain some untested and unstable feature.  We
 can always release from trunk and we still have a release process.

 The idea that trunk must contain. a first time it hits the branch,
 releasable code is way overboard

 On Thu, May 7, 2015 at 10:50 AM, Benedict Elliott Smith
 belliottsm...@datastax.com wrote:
 
  This breaks your model of applying every commit ref by ref.
 
 
  How? The rebase only affects commits after the real branch, so it still
  cleanly fast forwards?
 
  Merging is *hard*. Especially 2.1 - 3.0, with many breaking API changes
  (this is before 8099, which is going to make a *world* of hurt, and will
  stick around for a year). It is *very* easy to break things, with even
 the
  utmost care.
 
  On Thu, May 7, 2015 at 3:46 PM, Jake Luciani jak...@gmail.com wrote:
 
  You then fetch and repair
  your local version and try again.
 
  This breaks your model of applying every commit ref by ref.
 
  I'm all for trying to avoid extra work/stability but we already have
  added a layer of testing every change before commit.  I'm not going to
  accept we need to also add a layer of testing before every merge.
 
 
 
 
  On Thu, May 7, 2015 at 10:36 AM, Benedict Elliott Smith
  belliottsm...@datastax.com wrote:
  
   wouldn't you need to force push?
  
  
   git push --force-with-lease
  
   This works essentially like CAS; if the remote repositories are not
 the
   same as the one you have modified, it will fail. You then fetch and
  repair
   your local version and try again.
  
   So what does this buy us?
  
  
   This buys us a clean development process. We bought into always
   releasable. It's already a tall order; if we start weakening the
   constraints before we even get started, I am unconvinced we will
   successfully deliver. A monthly release cycle requires *strict*
  processes,
   not *almost* strict, or strict*ish*.
  
   Something that could also help make a more streamlined process: if
 actual
   commits were constructed on development branches ready for commit,
 with a
   proper commit message and CHANGES.txt updated. Even more ideally: with
  git
   rerere data for merging up to each of the branches. If we had that,
 and
   each of the branches had been tested in CI, we would be much closer
 than
  we
   are currently, as the risk-at-commit is minimized.
  
   On Thu, May 7, 2015 at 2:48 PM, Jake Luciani jak...@gmail.com
 wrote:
  
   git rebase -i trunk_staging ref~1
   fix the problem
   git rebase --continue
  
   In this situation, if there was an untested follow on commit wouldn't
   you need to force push?
  
   On Thu, May 7, 2015 at 9:28 AM, Benedict Elliott Smith
   belliottsm...@datastax.com wrote:
   
If we do it, we'll end up in weird situations which will be
 annoying
  for
everyone
   
   
Such as? I'm not disputing, but if we're to assess the relative
strengths/weaknesses, we need to have specifics to discuss.
   
If we do go with this suggestion, we will most likely want to
 enable a
shared git rerere cache, so that rebasing is not painful when there
  are
future commits.
   
If instead we go with repairing commits, we cannot have a
 queue of
things to merge up to. Say you have a string of commits waiting for
approval C1 to C4; you made C1, and it broke something. You
 introduce
  C5
   to
fix it, but the tests are still broken. Did you not really fix it?
 Or
perhaps one of C2 to C4 are to blame, but which? And have you
   accidentally
broken *them* with your commit? Who knows. Either way, we
 definitely
   cannot
fast forward. At the very best we can hope that the new merge did
 not
conflict or mess up the other people's C2 to C4 commits, and they
  have to
now merge on top. But what if another merge comes in, C6, in the
   meantime;
and C2 really did also break the tests in some way; how do we
  determine
   C2
was to blame, and not C6, or C3 or C4? What do the committers for
  each of
these do? We end up in a lengthy tussle, and aren't able to commit
  any

Re: Staging Branches

2015-05-07 Thread Benedict Elliott Smith
It's a bit unfair to characterize Aleksey as subscribing to a cargo cult.
*We* agreed to define the new release process as keeping trunk always
releasable.

Your own words that catalyzed this: If we release off trunk it is pretty
much necessary for trunk to be in a releasable state all the time

It is possible we have been imprecise in our discussions, and people have
agreed to different things. But it does seem to me we agreed to the
position Aleksey is taking, and he is not blindly following some other
process that is not ours.

On Thu, May 7, 2015 at 3:25 PM, Ariel Weisberg ariel.weisb...@datastax.com
wrote:

 Hi,

 Whoah. Our process is our own. We don't have to subscribe to any cargo cult
 book buying seminar giving process.

 And whatever we do we can iterate and change until it works for us and
 solves the problems we want solved.

 Ariel

 On Thu, May 7, 2015 at 10:13 AM, Aleksey Yeschenko alek...@apache.org
 wrote:

  Strictly speaking, the train schedule does demand that trunk, and all
  other branches, must be releasable at all times, whether you like it or
 not
  (for the record - I *don’t* like it, but here we are).
 
  This, and other annoying things, is what be subscribed to tick-tock vs.
  supported branches experiment.
 
   We still need to run CI before we release. So what does this buy us?
 
  Ideally (eventually?) we won’t have to run CI, including duration tests,
  before we release, because we’ll never merge anything that hadn’t passed
  the full suit, including duration tests.
 
  That said, perhaps it’s too much change at once. We still have missing
  pieces of infrastructure, and TE is busy with what’s already back-logged.
  So let’s revisit this proposal in a few months, closer to 3.1 or 3.2,
 maybe?
 
  --
  AY
 
  On May 7, 2015 at 16:56:07, Ariel Weisberg (ariel.weisb...@datastax.com)
  wrote:
 
  Hi,
 
  I don't think this is necessary. If you merge with trunk, test, and
 someone
  gets in a head of you just merge up and push to trunk anyways. Most of
 the
  time the changes the other person made will be unrelated and they will
  compose fine. If you actually conflict then yeah you test again but this
  doesn't happen often.
 
  The goal isn't to have trunk passing every single time it's to have it
 pass
  almost all the time so the test history means something and when it fails
  it fails because it's broken by the latest merge.
 
  At this size I don't see the need for a staging branch to prevent trunk
  from ever breaking. There is a size where it would be helpful I just
 don't
  think we are there yet.
 
  Ariel
 
  On Thu, May 7, 2015 at 5:05 AM, Benedict Elliott Smith 
  belliottsm...@datastax.com wrote:
 
   A good practice as a committer applying a patch is to build and run the
   unit tests before updating the main repository, but to do this for
 every
   branch is infeasible and impacts local productivity. Alternatively,
   uploading the result to your development tree and waiting a few hours
 for
   CI to validate it is likely to result in a painful cycle of
 race-to-merge
   conflicts, rebasing and waiting again for the tests to run.
  
   So I would like to propose a new strategy: staging branches.
  
   Every major branch would have a parallel branch:
  
   cassandra-2.0 - cassandra-2.0_staging
   cassandra-2.1 - cassandra-2.1_staging
   trunk - trunk_staging
  
   On commit, the idea would be to perform the normal merge process on the
   _staging branches only. CI would then run on every single git ref, and
 as
   these passed we would fast forward the main branch to the latest
  validated
   staging git ref. If one of them breaks, we go and edit the _staging
  branch
   in place to correct the problem, and let CI run again.
  
   So, a commit would look something like:
  
   patch - cassandra-2.0_staging - cassandra-2.1_staging -
 trunk_staging
  
   wait for CI, see 2.0, 2.1 are fine but trunk is failing, so
  
   git rebase -i trunk_staging ref~1
   fix the problem
   git rebase --continue
  
   wait for CI; all clear
  
   git checkout cassandra-2.0; git merge cassandra-2.0_staging
   git checkout cassandra-2.1; git merge cassandra-2.1_staging
   git checkout trunk; git merge trunk_staging
  
   This does introduce some extra steps to the merge process, and we will
  have
   branches we edit the history of, but the amount of edited history will
 be
   limited, and this will remain isolated from the main branches. I'm not
  sure
   how averse to this people are. An alternative policy might be to
 enforce
   that we merge locally and push to our development branches then await
 CI
   approval before merging. We might only require this to be repeated if
  there
   was a new merge conflict on final commit that could not automatically
 be
   resolved (although auto-merge can break stuff too).
  
   Thoughts? It seems if we want an always releasable set of branches,
 we
   need something along these lines. I certainly break tests by mistake,
 or
   the build

Re: Staging Branches

2015-05-07 Thread Benedict Elliott Smith

 This breaks your model of applying every commit ref by ref.


How? The rebase only affects commits after the real branch, so it still
cleanly fast forwards?

Merging is *hard*. Especially 2.1 - 3.0, with many breaking API changes
(this is before 8099, which is going to make a *world* of hurt, and will
stick around for a year). It is *very* easy to break things, with even the
utmost care.

On Thu, May 7, 2015 at 3:46 PM, Jake Luciani jak...@gmail.com wrote:

 You then fetch and repair
 your local version and try again.

 This breaks your model of applying every commit ref by ref.

 I'm all for trying to avoid extra work/stability but we already have
 added a layer of testing every change before commit.  I'm not going to
 accept we need to also add a layer of testing before every merge.




 On Thu, May 7, 2015 at 10:36 AM, Benedict Elliott Smith
 belliottsm...@datastax.com wrote:
 
  wouldn't you need to force push?
 
 
  git push --force-with-lease
 
  This works essentially like CAS; if the remote repositories are not the
  same as the one you have modified, it will fail. You then fetch and
 repair
  your local version and try again.
 
  So what does this buy us?
 
 
  This buys us a clean development process. We bought into always
  releasable. It's already a tall order; if we start weakening the
  constraints before we even get started, I am unconvinced we will
  successfully deliver. A monthly release cycle requires *strict*
 processes,
  not *almost* strict, or strict*ish*.
 
  Something that could also help make a more streamlined process: if actual
  commits were constructed on development branches ready for commit, with a
  proper commit message and CHANGES.txt updated. Even more ideally: with
 git
  rerere data for merging up to each of the branches. If we had that, and
  each of the branches had been tested in CI, we would be much closer than
 we
  are currently, as the risk-at-commit is minimized.
 
  On Thu, May 7, 2015 at 2:48 PM, Jake Luciani jak...@gmail.com wrote:
 
  git rebase -i trunk_staging ref~1
  fix the problem
  git rebase --continue
 
  In this situation, if there was an untested follow on commit wouldn't
  you need to force push?
 
  On Thu, May 7, 2015 at 9:28 AM, Benedict Elliott Smith
  belliottsm...@datastax.com wrote:
  
   If we do it, we'll end up in weird situations which will be annoying
 for
   everyone
  
  
   Such as? I'm not disputing, but if we're to assess the relative
   strengths/weaknesses, we need to have specifics to discuss.
  
   If we do go with this suggestion, we will most likely want to enable a
   shared git rerere cache, so that rebasing is not painful when there
 are
   future commits.
  
   If instead we go with repairing commits, we cannot have a queue of
   things to merge up to. Say you have a string of commits waiting for
   approval C1 to C4; you made C1, and it broke something. You introduce
 C5
  to
   fix it, but the tests are still broken. Did you not really fix it? Or
   perhaps one of C2 to C4 are to blame, but which? And have you
  accidentally
   broken *them* with your commit? Who knows. Either way, we definitely
  cannot
   fast forward. At the very best we can hope that the new merge did not
   conflict or mess up the other people's C2 to C4 commits, and they
 have to
   now merge on top. But what if another merge comes in, C6, in the
  meantime;
   and C2 really did also break the tests in some way; how do we
 determine
  C2
   was to blame, and not C6, or C3 or C4? What do the committers for
 each of
   these do? We end up in a lengthy tussle, and aren't able to commit
 any of
   these to the mainline until all of them are resolved. Really we have
 to
   prevent any merges to the staging repository until the mistakes are
  fixed.
   Since our races in these scenario are the length of time taken for
 cassci
   to vet them, these problems are much more likely than current race to
   commit.
  
   In the scheme I propose, in this scenario, the person who broke the
 build
   rebases everyone's branches to his now fixed commit, and the next
 broken
   commit gets blamed, and all other commits being merged in on top can
 go
  in
   smoothly. The only pain point I can think of is the multi-branch
 rebase,
   but this is solved by git rerere.
  
   I agree running tests is painful, but at least for the build, this
 should
   be the responsibility of the committer to build before merging
  
  
   Why make the distinction if we're going to have staging commits? It's
 a
  bit
   of a waste of time to run three ant real-clean  ant tasks, and
  increases
   the race window for merging (which is painful whether or not involves
 a
   rebase), and it is not a *typical* occurrence (alarming is
 subjective)
  
   On Thu, May 7, 2015 at 2:12 PM, Sylvain Lebresne 
 sylv...@datastax.com
   wrote:
  
If one of them breaks, we go and edit the _staging branch in place
 to
   correct
the problem, and let CI run again.
  
   I would strongly

Re: Requiring Java 8 for C* 3.0

2015-05-07 Thread Benedict Elliott Smith
I have no position on this, but I would like to issue a word of caution to
everyone excited to use the new JDK8 features in development to please
discuss their use widely beforehand, and to consider them carefully. Many
of them are not generally useful to us (e.g. LongAdder), and may have
unexpected behaviours (e.g. hidden parallelization in streams).

On Thu, May 7, 2015 at 5:16 PM, Yuki Morishita mor.y...@gmail.com wrote:

 +1

 On Thu, May 7, 2015 at 11:13 AM, Jeremiah D Jordan
 jerem...@datastax.com wrote:
  With Java 7 being EOL for free versions I am +1 on this.  If you want to
 stick with 7, you can always keep running 2.1.
 
  On May 7, 2015, at 11:09 AM, Jonathan Ellis jbel...@gmail.com wrote:
 
  We discussed requiring Java 8 previously and decided to remain Java
  7-compatible, but at the time we were planning to release 3.0 before
 Java 7
  EOL.  Now that 8099 and increased emphasis on QA have delayed us past
 Java
  7 EOL, I think it's worth reopening this discussion.
 
  If we require 8, then we can use lambdas, LongAdder, StampedLock,
 Streaming
  collections, default methods, etc.  Not just in 3.0 but over 3.x for the
  next year.
 
  If we don't, then people can choose whether to deploy on 7 or 8 -- but
 the
  vast majority will deploy on 8 simply because 7 is no longer supported
  without a premium contract with Oracle.  8 also has a more advanced G1GC
  implementation (see CASSANDRA-7486).
 
  I think that gaining access to the new features in 8 as we develop 3.x
 is
  worth losing the ability to run on a platform that will have been EOL
 for a
  couple months by the time we release.
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



 --
 Yuki Morishita
  t:yukim (http://twitter.com/yukim)



Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Benedict Elliott Smith
” by ”ant generate-eclipse-files”. TBH I
 don’t know NetBeans.
 
  As Benedict pointed out, the code has improved and still improves a lot
 - in structure, in inline-doc, in nomenclature and whatever else. As soon
 as we can get rid of Thrift in the tree, there’s another big opportunity to
 cleanup more stuff.
 
  TBH I don’t think that (beside the tools) there would be a need to
 generate multiple artifacts for C* daemon - you can do ”separation of
 concerns” (via packages) even with discipline and then measure it.
  IMO The only artifact worth to extract out of C* tree, and useful for a
 (limited) set of 3rd party code, is something like
 ”cassandra-jmx-interfaces.jar”
 
  Robert
 
  Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith 
 belliottsm...@datastax.com:
 
  There are three distinct problems you raise: code structure,
 documentation,
  and build system.
 
  The build system, as far as I can tell, is a matter of personal
 preference.
  I personally dislike the few interactions I've had with maven, but
  gratefully my interactions with build system innards have been fairly
  limited. I mostly just use them. Unless a concrete and significant
 benefit
  is delivered by maven, though, it just doesn't seem worth the upheaval
 to
  me. If you can make the argument that it actually improves the project
 in a
  way that justifies the upheaval, it will certainly be considered, but so
  far no justification has been made.
 
  The documentation problem is common to many projects, though: out of
  codebase documentation gets stale very rapidly. When we say to read the
  code we mean read the code and its inline documentation - the
 quality of
  this documentation has itself generally been substandard, but has been
  improving significantly over the past year or so, and we are
 endeavouring
  to improve with every change. In the meantime, there are videos from a
  recent bootcamp we've run for both internal and external contributors
  http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
 
  The code structure would be great to modularise, but the reality is
 that it
  is not currently modular. There are no good clear dividing lines for
 much
  of the project. The problem with refactoring the entire codebase to
 create
  separate projects is that it is a significant undertaking that makes
  maintenance of the project across versions significantly more costly.
 This
  create a net drag on all productivity in the project. Such a major
 change
  requires strong consensus, and strong evidence justifying it. So the
  question is: would this create more new work than it loses? The evidence
  isn't there that it would. It might, but I personally guess that it
 would
  not, judging by the results of our other attempts to drive up
 contributions
  to the project. Perhaps we can have a wider dialogue about the
 endeavour,
  though, and see if a consensus can in fact be built.
 
 
 
  On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com
  wrote:
 
  Hi all,
 
  Not a cassandra contributor here, but I'm working on the cassandra
 sources
  too.
 
  This big cassandra source root caused me trouble too, firstly it was
 not
  easy to import in an IDE, try to import cassandra sources in netbeans,
 it's
  a headcache.
 
  It would be great if we had more small modules/projects in separate
 POM. It
  will be more easier to work on small part of the project, and as a
  consequences, I'm sure you will have more external contribution to this
  project.
 
  I know cassandra devs are used to ant build model, but it's like a
 thread I
  opened about updated and more complete documentation about sstable
  structures. I got answer that it was not needed to understand how to
 use
  Cassandra, and the only way to learn about that is to rtfcode. Because
  people working on cassandra already know how sstable structure are,
 it's
  not needed to provide up to date documentation.
  So it will take me a very long time to read and understand all the
  serialization code in cassandra to understand the sttable structure
 before
  I can work on the code. Up to date documentation about internals would
 have
  gave me the knowledge I need to contribute much quicker.
 
  Here we have the same problem, we have a complex non modular build
 system,
  and core cassandra dev are used to it, so it's not needed to make
 something
  more flexible, even if it could facilite external contribution.
 
 
 
  2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith 
  belliottsm...@datastax.com:
 
  I think the problem is everyone currently contributing is comfortable
  with
  ant, and as much as it is imperfect, it isn't clear maven is going to
 be
  better. Having the requisite maven functionality linked under the hood
  doesn't seem particularly preferable to the inverse. The status quo
 has
  the
  bonus of zero upheaval for the project and its contributors, though,
 so
  it
  would have to be a very clear win to justify the change in my

Re: March 2015 QA retrospective

2015-04-14 Thread Benedict Elliott Smith
.

 The testing would not
  be quick and easy, so I am unlikely to volunteer to patch quick fixes in
  the new world order.


 I think this gets into how we load balance bug fixes. There is a clear
 benefit to routing the bug to the person who will know how to fix and test
 it. I have never seen bugs as something you volunteer for. They typically
 belong somewhere and if it is with you then so be it.


  because the overhead eats too much into the work you're
  actually responsible for.


 We need to make sure that bug fixing isn't seen that way. I think it's
 important to make sure bugs find their way home. The work your actually
 responsible for is not done so you can't claim that bug fixes are eating
 into it. It already done been ate.

 We shouldn't prioritize new work over past work that was never finished.
 With monthly releases and breaking things down into much smaller chunks it
 means you have the option to let new work slip to accommodate without
 moving tasks between people.

 Ariel



 On Fri, Apr 10, 2015 at 7:07 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

  
   CASSANDRA-8459 https://issues.apache.org/jira/browse/CASSANDRA-8459
   autocompaction
   on reads can prevent memtable space reclaimation
  
   Can you link a ticket to CASSANDRA-9012 and characterize in a way we
 can
   try and implement how to make sufficiently large partitions, over
   sufficiently large periods of time?
 
  Maybe also enumerate the other permutations where this matters like
   secondary indexes and the access patterns (scans).
  
 
  Does this really qualify for its own ticket? This should just be one of
  many configurations for stress' part in the new tests. We should perhaps
  have an aggregation ticket where we ensure we enumerate the configuration
  data points we've met that need to be covered. But, IMO at least, a
  methodical exhaustive approach should be undertaken separately, and only
 be
  corroborated against such a list to ensure it was done sufficiently well.
 
 
  
   CASSANDRA-8619 https://issues.apache.org/jira/browse/CASSANDRA-8619
 -
   using
   CQLSSTableWriter gives ConcurrentModificationException
  
   OK. I don't think the original fix meets our new definition of done
 since
   the was insufficient coverage, and in this case no regression test. To
 be
   done you would have to either implement the coverage or file a JIRA to
  add
   it.
  
   Can you file a ticket with as much detail as you can on what a the test
   might look like and link it to CASSANDRA-9012?
  
  
  Well, the goal posts have shifted a smidgen since then :)
 
  I've already filed CASSANDRA-9163 and CASSANDRA-9164 (the former I have
  linked to CASSANDRA-9012). These problems would trivially be caught by
 any
  kind of randomized long testing of these utilities, basically.
 
  This does raise an interesting, but probably not significant downside to
  the new approach: I fixed this ticket because somebody mentioned to me
 that
  it was hurting them, and I saw a quick and easy fix. The testing would
 not
  be quick and easy, so I am unlikely to volunteer to patch quick fixes in
  the new world order. This will certainly lead to higher quality bug
 fixes,
  but it may lead to fewer of them, and fewer instances of volunteer work
 to
  help people out, because the overhead eats too much into the work you're
  actually responsible for. This may lead to bug fixing being seen as much
  more of a chore than it already can be. I don't say this to discourage
 the
  new approach; it is just a thought that occurs to me off the back of this
  specific discussion.
 
 
  CASSANDRA-8668 https://issues.apache.org/jira/browse/CASSANDRA-8668 We
   don't enforce offheap memory constraints; regression introduced by 7882
  
   We need to note somewhere that the kitchen sink test needs to insert
  large
   columns. How would it detect that the constraint was violated
 
 
  It would fall over with an OOM
 
 
   I am starting to think we need a google doc for kitchen sink test wish
   listing and design discussion rather then scattering bits about it in
  JIRA.
  
 
   Agreed.
 
 
 
   CASSANDRA-8719 https://issues.apache.org/jira/browse/CASSANDRA-8719
   Using
   thrift HSHA with offheap_objects appears to corrupt data
  
   Can you file a ticket for having the kitchen sink tests be configurable
  to
   run against all client access paths? Linked to 9012 for now?
  
 
  This only requires unit testing or dtests to be run this way. However for
  the kitchen sink tests this is just another dimension in the
 configuration
  state space, which IMO should be addressed as a whole methodically.
 Perhaps
  we should file a central JIRA, or the Google doc you suggested, for
  tracking all of these data points?
 
 
   CASSANDRA-8726 https://issues.apache.org/jira/browse/CASSANDRA-8726
   throw
   OOM in Memory if we fail to allocate OOM
  
   Can you create a ticket for this? I think that testing each allocation
 is
   not realistic

Re: March 2015 QA retrospective

2015-04-10 Thread Benedict Elliott Smith

 CASSANDRA-8459 https://issues.apache.org/jira/browse/CASSANDRA-8459
 autocompaction
 on reads can prevent memtable space reclaimation

 Can you link a ticket to CASSANDRA-9012 and characterize in a way we can
 try and implement how to make sufficiently large partitions, over
 sufficiently large periods of time?

Maybe also enumerate the other permutations where this matters like
 secondary indexes and the access patterns (scans).


Does this really qualify for its own ticket? This should just be one of
many configurations for stress' part in the new tests. We should perhaps
have an aggregation ticket where we ensure we enumerate the configuration
data points we've met that need to be covered. But, IMO at least, a
methodical exhaustive approach should be undertaken separately, and only be
corroborated against such a list to ensure it was done sufficiently well.



 CASSANDRA-8619 https://issues.apache.org/jira/browse/CASSANDRA-8619 -
 using
 CQLSSTableWriter gives ConcurrentModificationException

 OK. I don't think the original fix meets our new definition of done since
 the was insufficient coverage, and in this case no regression test. To be
 done you would have to either implement the coverage or file a JIRA to add
 it.

 Can you file a ticket with as much detail as you can on what a the test
 might look like and link it to CASSANDRA-9012?


Well, the goal posts have shifted a smidgen since then :)

I've already filed CASSANDRA-9163 and CASSANDRA-9164 (the former I have
linked to CASSANDRA-9012). These problems would trivially be caught by any
kind of randomized long testing of these utilities, basically.

This does raise an interesting, but probably not significant downside to
the new approach: I fixed this ticket because somebody mentioned to me that
it was hurting them, and I saw a quick and easy fix. The testing would not
be quick and easy, so I am unlikely to volunteer to patch quick fixes in
the new world order. This will certainly lead to higher quality bug fixes,
but it may lead to fewer of them, and fewer instances of volunteer work to
help people out, because the overhead eats too much into the work you're
actually responsible for. This may lead to bug fixing being seen as much
more of a chore than it already can be. I don't say this to discourage the
new approach; it is just a thought that occurs to me off the back of this
specific discussion.


CASSANDRA-8668 https://issues.apache.org/jira/browse/CASSANDRA-8668 We
 don't enforce offheap memory constraints; regression introduced by 7882

 We need to note somewhere that the kitchen sink test needs to insert large
 columns. How would it detect that the constraint was violated


It would fall over with an OOM


 I am starting to think we need a google doc for kitchen sink test wish
 listing and design discussion rather then scattering bits about it in JIRA.


 Agreed.



 CASSANDRA-8719 https://issues.apache.org/jira/browse/CASSANDRA-8719
 Using
 thrift HSHA with offheap_objects appears to corrupt data

 Can you file a ticket for having the kitchen sink tests be configurable to
 run against all client access paths? Linked to 9012 for now?


This only requires unit testing or dtests to be run this way. However for
the kitchen sink tests this is just another dimension in the configuration
state space, which IMO should be addressed as a whole methodically. Perhaps
we should file a central JIRA, or the Google doc you suggested, for
tracking all of these data points?


 CASSANDRA-8726 https://issues.apache.org/jira/browse/CASSANDRA-8726
 throw
 OOM in Memory if we fail to allocate OOM

 Can you create a ticket for this? I think that testing each allocation is
 not realistic in the sense that they don't fail in isolation. The JVM
 itself can ruin our day in OOM conditions as well. There is also heap OOM
 vs native memory OOM. It's worth some thought as to what the best bang for
 the buck testing strategy is going to be.


That's a bit of a different scope to the original problem, since in those
instances the VM explicitly throws an OOM. We can fault injection test both
of these scenarios, though, and I've already filed CASSANDRA-9165 for this.
I have commented on the ticket so that these scenarios are amongst those
explicitly considered when we address it, but I expect the scope of that
ticket to be very broad, and probably introduce its own entire class of
subtickets.


 Thanks,
 Ariel

 On Fri, Apr 10, 2015 at 8:04 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

  TL;DR: Kitchen sink (aggressive randomised stress with subsystem
  correctness) tests; commitlog/memtable isolated correctness stress
 testing;
  improved tool/utility testing; internal structural changes to prevent
  occurrence (delivered); fault injection testing. Filed #916[1-5]
 
  https://issues.apache.org/jira/browse/CASSANDRA-7704 Benedict
  FileNotFoundException during STREAM-OUT triggers 100% CPU usage Streaming
 
  This particular class of bug should be near

Re: Discussion: reviewing larger tickets

2015-07-08 Thread Benedict Elliott Smith
I've started leaning towards a hybrid approach:

I put everything I want to say, including some code changes, and sometimes
complex argumentation into comments the branch. I differentiate these into
two categories:

   1. Literal comments, to remain for posterity - typically things I agree
   with, but for which it wasn't immediately clear I would at the outset; and
   2. Queries/suggestions for the author, to be removed once they're
   resolved

Then on JIRA, I make sure to raise explicitly for outside input, and for
non-code-literate readers for posterity, any more major decisions that need
consideration / discussion.

Ideally, comments of type 2 would be replaced by summary comments of type
1, also for posterity. You can never have too many comments (so long as
they're explanatory, not just restating the code, obviously)

I think this probably leads to better JIRA and comments, as we:

   1. Avoid the higgledypiggledy JIRA messes that can be very hard to
   unpick, consciously limiting discussion to high level decisions about
   approach, or unexpected complexities, etc. The things readers of JIRA care
   about.
   2. Keep decisions about low level minutiae commented directly where they
   matter, to influence future authors without reference to JIRA


On Wed, Jul 8, 2015 at 8:21 PM, Josh McKenzie jmcken...@apache.org wrote:

 As some of you might have noticed, Tyler and I tossed around a couple of
 thoughts yesterday regarding the best way to perform larger reviews on
 JIRA.

 I've been leaning towards the approach Benedict's been taking lately
 w/putting comments inline on a branch for the initial author to inspect as
 that provides immediate locality for a reviewer to write down their
 thoughts and the same for the initial developer to ingest them. One
 downside to that approach is that the extra barrier to entry makes it more
 of a 1-on-1 conversation rather than an open discussion via JIRA comments.
 Also, if one deletes branches from github we then lose our discussion
 history on the review process which is a big problem for digging into why
 certain decisions were made or revised during the process.

 On the competing side, monster comments like this
 
 https://issues.apache.org/jira/browse/CASSANDRA-6477?focusedCommentId=14617221page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617221
 
 (which
 is one of multiple to come) are burdensome to create and map into a JIRA
 comment and, in my experience, also a burden to map back into the code-base
 as a developer. Details are lost in translation; I'm comfortable labeling
 this a sub-optimal method of communication.

 So what to do?

 --
 Joshua McKenzie



Re: Discussion: reviewing larger tickets

2015-07-08 Thread Benedict Elliott Smith
(git history navigation is also much more powerful in the IDE, in my
experience - can quickly scoot through many prior versions to see what the
context of prior authors was)

On Wed, Jul 8, 2015 at 9:15 PM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 Except that it would lack code navigation. So it would be alt-tabbing,
 then clicking through the clunky interface to find the file I want, and the
 location, which can be very cumbersome.



 On Wed, Jul 8, 2015 at 9:13 PM, Josh McKenzie josh.mcken...@datastax.com
 wrote:

 
  If you navigate in an IDE how do you know if you are commenting on code
  that has changed or not?

 I end up in the diff view and alt-tabbing over to the code view to look
 for
 details to navigate. In retrospect, working with a github diff would just
 be tabbing between a browser and IDE vs. an IDE diff and the IDE.

 On Wed, Jul 8, 2015 at 4:02 PM, Ariel Weisberg ar...@weisberg.ws wrote:

  Hi,
 
  If you navigate in an IDE how do you know if you are commenting on code
  that has changed or not?
 
  My workflow is usually to look at the diff and have it open in an IDE
  separately, but maybe I am failing hard at tools.
 
  Ariel
   On Jul 8, 2015, at 4:00 PM, Josh McKenzie josh.mcken...@datastax.com
 
  wrote:
  
   The ability to navigate a patch in an IDE and add comments while
  exploring
   is not something the github PR interface can provide; I expect I at
 least
   would end up having to use multiple tools to perform a review given
 the
  PR
   approach.
  
   On Wed, Jul 8, 2015 at 3:50 PM, Jake Luciani jak...@gmail.com
 wrote:
  
   putting comments inline on a branch for the initial author to inspect
  
   I agree and I think we can support this by using github pull requests
  for
   review.
  
   Pull requests live forever even if the source branch is removed. See
   https://github.com/apache/cassandra/pull/4
   They also allow for comments to be updated over time as new fixes are
   pushed to the branch.
  
   Once review is done we can just close them without committing and
 just
   commit the usual way
  
   Linking to the PR in JIRA for reference.
  
  
   On Wed, Jul 8, 2015 at 3:21 PM, Josh McKenzie jmcken...@apache.org
   wrote:
  
   As some of you might have noticed, Tyler and I tossed around a
 couple
  of
   thoughts yesterday regarding the best way to perform larger reviews
 on
   JIRA.
  
   I've been leaning towards the approach Benedict's been taking lately
   w/putting comments inline on a branch for the initial author to
 inspect
   as
   that provides immediate locality for a reviewer to write down their
   thoughts and the same for the initial developer to ingest them. One
   downside to that approach is that the extra barrier to entry makes
 it
   more
   of a 1-on-1 conversation rather than an open discussion via JIRA
   comments.
   Also, if one deletes branches from github we then lose our
 discussion
   history on the review process which is a big problem for digging
 into
  why
   certain decisions were made or revised during the process.
  
   On the competing side, monster comments like this
   
  
  
 
 https://issues.apache.org/jira/browse/CASSANDRA-6477?focusedCommentId=14617221page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617221
  
   (which
   is one of multiple to come) are burdensome to create and map into a
  JIRA
   comment and, in my experience, also a burden to map back into the
   code-base
   as a developer. Details are lost in translation; I'm comfortable
  labeling
   this a sub-optimal method of communication.
  
   So what to do?
  
   --
   Joshua McKenzie
  
  
  
  
   --
   http://twitter.com/tjake
  
  
  
  
   --
   Joshua McKenzie
   DataStax -- The Apache Cassandra Company
 
 


 --
 Joshua McKenzie
 DataStax -- The Apache Cassandra Company





Re: Discussion: reviewing larger tickets

2015-07-08 Thread Benedict Elliott Smith
Except that it would lack code navigation. So it would be alt-tabbing, then
clicking through the clunky interface to find the file I want, and the
location, which can be very cumbersome.



On Wed, Jul 8, 2015 at 9:13 PM, Josh McKenzie josh.mcken...@datastax.com
wrote:

 
  If you navigate in an IDE how do you know if you are commenting on code
  that has changed or not?

 I end up in the diff view and alt-tabbing over to the code view to look for
 details to navigate. In retrospect, working with a github diff would just
 be tabbing between a browser and IDE vs. an IDE diff and the IDE.

 On Wed, Jul 8, 2015 at 4:02 PM, Ariel Weisberg ar...@weisberg.ws wrote:

  Hi,
 
  If you navigate in an IDE how do you know if you are commenting on code
  that has changed or not?
 
  My workflow is usually to look at the diff and have it open in an IDE
  separately, but maybe I am failing hard at tools.
 
  Ariel
   On Jul 8, 2015, at 4:00 PM, Josh McKenzie josh.mcken...@datastax.com
  wrote:
  
   The ability to navigate a patch in an IDE and add comments while
  exploring
   is not something the github PR interface can provide; I expect I at
 least
   would end up having to use multiple tools to perform a review given the
  PR
   approach.
  
   On Wed, Jul 8, 2015 at 3:50 PM, Jake Luciani jak...@gmail.com wrote:
  
   putting comments inline on a branch for the initial author to inspect
  
   I agree and I think we can support this by using github pull requests
  for
   review.
  
   Pull requests live forever even if the source branch is removed. See
   https://github.com/apache/cassandra/pull/4
   They also allow for comments to be updated over time as new fixes are
   pushed to the branch.
  
   Once review is done we can just close them without committing and just
   commit the usual way
  
   Linking to the PR in JIRA for reference.
  
  
   On Wed, Jul 8, 2015 at 3:21 PM, Josh McKenzie jmcken...@apache.org
   wrote:
  
   As some of you might have noticed, Tyler and I tossed around a couple
  of
   thoughts yesterday regarding the best way to perform larger reviews
 on
   JIRA.
  
   I've been leaning towards the approach Benedict's been taking lately
   w/putting comments inline on a branch for the initial author to
 inspect
   as
   that provides immediate locality for a reviewer to write down their
   thoughts and the same for the initial developer to ingest them. One
   downside to that approach is that the extra barrier to entry makes it
   more
   of a 1-on-1 conversation rather than an open discussion via JIRA
   comments.
   Also, if one deletes branches from github we then lose our discussion
   history on the review process which is a big problem for digging into
  why
   certain decisions were made or revised during the process.
  
   On the competing side, monster comments like this
   
  
  
 
 https://issues.apache.org/jira/browse/CASSANDRA-6477?focusedCommentId=14617221page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617221
  
   (which
   is one of multiple to come) are burdensome to create and map into a
  JIRA
   comment and, in my experience, also a burden to map back into the
   code-base
   as a developer. Details are lost in translation; I'm comfortable
  labeling
   this a sub-optimal method of communication.
  
   So what to do?
  
   --
   Joshua McKenzie
  
  
  
  
   --
   http://twitter.com/tjake
  
  
  
  
   --
   Joshua McKenzie
   DataStax -- The Apache Cassandra Company
 
 


 --
 Joshua McKenzie
 DataStax -- The Apache Cassandra Company



Re: Discussion: reviewing larger tickets

2015-07-09 Thread Benedict Elliott Smith
While that approach optimises for people paying close attention to the JIRA
firehose, it is less optimal for people trying to figure out after the fact
what is going on wrt certain tickets. Some of the more complex tickets I
cannot make head or tails of even when I was one of the main participants.

It also doesn't promote translating these discussions into code comments
for the permanent record. From my POV, though, I guess I can stick to my
current approach and just cut/paste the results into JIRA if we want every
nuance replicated there.

On Thu, Jul 9, 2015 at 12:19 PM, Sam Tunnicliffe s...@beobal.com wrote:

 I'm +1 with Sylvain here; keeping the discussions open, accessible to all
 and persistent seems more valuable than reducing the friction for
 contributors  reviewers.

 Personally, my workflow involves following the JIRA firehose, so I tend to
 be aware (at least to some degree) of both major  minor comments, a
 lot of which I would surely miss were they to move GH. I also agree with
 the point that what seems minor to one viewer may raise red flags with
 another.

 That said, I often have offline conversations (from both the
 reviewer/contributor perspective) around minor-ish things like naming, nits
 and so forth. At the moment these are a) not recorded  b) marginally more
 difficult to do asynchronously. So I think in future I may personally start
 using a GH branch for such remarks, though I don't think that should become
 a mandated part of The Process.

 Sam

 On Thu, Jul 9, 2015 at 11:47 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

   One downside to that approach is that the extra barrier to entry makes
 it
   more of a 1-on-1 conversation rather than an open discussion via JIRA
   comments.
 
  Yes, and I really worry about that. And I (see the I, that's a totally
  personal opinion) value keeping discussions as open as possible much more
  than
  additional convenience for making and processing review comments. I'll
  admit
  however that the current use of JIRA comments for reviews has never
 burden
  me
  all that much so I don't see all that much convenience to be gained by
  changing
  that process (but then again, I'm happy using vim for my development, so
  your
  mileage probably varies).
 
  Typically, if we talk of work-flow, I personally read JIRA updates fairly
  religiously, which allows me to keep vaguely up to date on what's going
 on
  with
  reviews even for tickets I'm a priori not involved with. I consider it a
  good,
  healthy thing. If we move some of the review material outside of JIRA, I
  strongly suspect this won't be the case anymore due to the burden of
 having
  to
  check multiple places.
 
  Anyway, I worry a bit that changing for what I perceive as relatively
 minor
  convenience will make us lose more than we get. Just my 2 cents however
  really.
 
  --
  Sylvain
 
  On Wed, Jul 8, 2015 at 11:21 PM, Michael Shuler mich...@pbandjelly.org
  wrote:
 
   When we set up autojobs for the dev branches, I did some digging around
   the jenkins / githubPR integration, similar to what spark is doing. I'd
  be
   completely on board with working through that setup, if it helps this
   workflow.
  
   Michael
  
  
   On 07/08/2015 03:02 PM, Carl Yeksigian wrote:
  
   Spark has been using the GitHub PRs successfully; they have an
  additional
   mailing list which contains updates from GitHub (
   http://mail-archives.apache.org/mod_mbox/spark-reviews/), and they
 also
   have their PRs linked to JIRA so that going from the ticket to the PR
 is
   easily done.
  
   If we are going to start using GitHub PRs to conduct reviews, we
 should
   follow similar contribution guidelines to what Spark has (
  
  
 
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest
   
  https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
   ),
   and have Infra set up the same hooks for our repo. We can also hook up
   cassci to do the same jobs as the AmplabJenkins performs currently.
  
  
   On Wed, Jul 8, 2015 at 3:21 PM, Josh McKenzie jmcken...@apache.org
   wrote:
  
As some of you might have noticed, Tyler and I tossed around a couple
  of
   thoughts yesterday regarding the best way to perform larger reviews
 on
   JIRA.
  
   I've been leaning towards the approach Benedict's been taking lately
   w/putting comments inline on a branch for the initial author to
 inspect
   as
   that provides immediate locality for a reviewer to write down their
   thoughts and the same for the initial developer to ingest them. One
   downside to that approach is that the extra barrier to entry makes it
   more
   of a 1-on-1 conversation rather than an open discussion via JIRA
   comments.
   Also, if one deletes branches from github we then lose our discussion
   history on the review process which is a big problem for digging into
  why
   certain decisions were made or revised during the process.
  
   On the 

Re: NewBie Question

2016-06-15 Thread Benedict Elliott Smith
For newcomers that (
https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md) is
probably a bad document to point them to, as it will no doubt confuse them
- the naming, behaviour and format descriptions are all now partially
incorrect.

It was, by its own admission, intended only for those who already knew the
2.2 codebase intimately so they could understand the 8099 patch.  It should
really be edited heavily so that those who didn't live through 8099 might
now derive value from it.

It's a real shame that, despite this document living in-tree, even the
class names are out of date - and were before it was even committed.  So,
as much as CASSANDRA-8700 is a fantastic step forwards, it looks likely to
be insufficient by itself, and the project may need to come up with a
strategy to encourage maintenance of the docs.






On 15 June 2016 at 17:55, Michael Kjellman 
wrote:

> This was forwarded to me yesterday... a helpful first step
> https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md
>
> > On Jun 15, 2016, at 9:54 AM, Jonathan Haddad  wrote:
> >
> > Maybe some brave soul will document the 3.0 on disk format as part of
> > https://issues.apache.org/jira/browse/CASSANDRA-8700.
> >
> > On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford <
> bradfor...@gmail.com>
> > wrote:
> >
> >> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> >> engine.
> >>
> >>
> >>
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
> >>
> >> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey  >
> >> wrote:
> >>
>  http://wiki.apache.org/cassandra/ArchitectureSSTable
> >>>
> >>> Be aware that this page hasn't been updated since 2013, so it doesn't
> >>> reflect any changes to the SSTable format since then, including the
> >>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
> >>>
> >>> That said, I believe the linked Apache wiki page is the best
> >>> documentation for the format. Unfortunately, if you want a better or
> >>> more current understanding, you'll have to read the code and read some
> >>> SSTables.
> >>>
> >>
>
>


Re: COMPACT STORAGE in 4.0?

2016-04-11 Thread Benedict Elliott Smith
Compact storage should really have been named "not wasteful storage" - now
everything is "not wasteful storage" so it's void of meaning. This is true
without constraint. You do not need to limit yourself to a single non-PK
column; you can have many and it will remain as or more efficient than
"compact storage"

On Mon, 11 Apr 2016 at 15:04, Jack Krupansky 
wrote:

> My understanding is Thrift is being removed from Cassandra in 4.0, but will
> COMPACT STORAGE be removed as well? Clearly the two are related, but
> COMPACT STORAGE had a performance advantage in addition to Thrift
> compatibility, so its status is ambiguous.
>
> I recall vague chatter, but no explicit deprecation notice or 4.0 plan for
> removal of COMPACT STORAGE. Actually, I don't even see a deprecation notice
> for Thrift itself in CHANGES.txt.
>
> Will a table with only a single non-PK column automatically be implemented
> at a comparable level of efficiency compared to the old/current Compact
> STORAGE? That will still leave the question of how to migrate a non-Thrift
> COMPACT STORAGE table (i.e., used for performance by a CQL-oriented
> developer rather than Thrift compatibility per se) to pure CQL.
>
> -- Jack Krupansky
>


Re: COMPACT STORAGE in 4.0?

2016-04-11 Thread Benedict Elliott Smith
As Jeremiah indicates, it's 3.0+ only. The docs should definitely reflect
this

On Mon, 11 Apr 2016 at 16:21, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Thanks, Benedict. Is this only true as of 3.x (new storage engine), or was
> the equivalent efficiency also true with 2.x?
>
> It would be good to have an explicit statement on this efficiency question
> in the spec/doc since the spec currently does say: "The option also
> *provides
> a slightly more compact layout of data on disk* but at the price of
> diminished flexibility and extensibility for the table." So, if that
> "slightly
> more compact layout of data on disk" benefit is no longer true, away with
> it. See:
> https://cassandra.apache.org/doc/cql3/CQL-3.0.html
>
> And I would recommend that your own statement be added there instead.
>
> -- Jack Krupansky
>
> On Mon, Apr 11, 2016 at 5:03 PM, Jeremiah Jordan <jerem...@datastax.com>
> wrote:
>
> > As I understand it "COMPACT STORAGE" only has meaning in the CQL parser
> > for backwards compatibility as of 3.0. The on disk storage is not
> affected
> > by its usage.
> >
> > > On Apr 11, 2016, at 3:33 PM, Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> > >
> > > Compact storage should really have been named "not wasteful storage" -
> > now
> > > everything is "not wasteful storage" so it's void of meaning. This is
> > true
> > > without constraint. You do not need to limit yourself to a single
> non-PK
> > > column; you can have many and it will remain as or more efficient than
> > > "compact storage"
> > >
> > > On Mon, 11 Apr 2016 at 15:04, Jack Krupansky <jack.krupan...@gmail.com
> >
> > > wrote:
> > >
> > >> My understanding is Thrift is being removed from Cassandra in 4.0, but
> > will
> > >> COMPACT STORAGE be removed as well? Clearly the two are related, but
> > >> COMPACT STORAGE had a performance advantage in addition to Thrift
> > >> compatibility, so its status is ambiguous.
> > >>
> > >> I recall vague chatter, but no explicit deprecation notice or 4.0 plan
> > for
> > >> removal of COMPACT STORAGE. Actually, I don't even see a deprecation
> > notice
> > >> for Thrift itself in CHANGES.txt.
> > >>
> > >> Will a table with only a single non-PK column automatically be
> > implemented
> > >> at a comparable level of efficiency compared to the old/current
> Compact
> > >> STORAGE? That will still leave the question of how to migrate a
> > non-Thrift
> > >> COMPACT STORAGE table (i.e., used for performance by a CQL-oriented
> > >> developer rather than Thrift compatibility per se) to pure CQL.
> > >>
> > >> -- Jack Krupansky
> > >>
> >
>


[VOTE] Release Apache Cassandra 3.8

2016-07-28 Thread Benedict Elliott Smith
I think -1 lacks a little clarity when responding to a block of prose with
multiple clauses, suggestions and no single proposition requiring a yes/no
answer.

As fun as it is to type -1.


On Thursday, 28 July 2016, Jake Luciani > wrote:

> -1
>
> On Thu, Jul 28, 2016 at 2:19 PM, Aleksey Yeschenko 
> wrote:
>
> > Let me sum up my thoughts so far.
> >
> > Some of the most important goals of tick-tock were 1) predictable,
> regular
> > releases with manageable changesets and
> > 2)individual releases that are more stable than in our previous process.
> >
> > Now, we’ve already slipped a few times. Most recently with 3.6, and now
> > with 3.8. If we push 3.9 as well, then the delay
> > will cascade: 3.10, 3.11, and 3.12 will all be late according to the
> > original plan.
> >
> > In other words, if we delay 3.9, then 6 out of 12 tick-tock releases will
> > be off-schedule.
> >
> > Worse, so will be 3.0.9, 3.0.10, 3.0.11, and 3.0.12.
> >
> > Now, #12236 is indeed an issue, but it really is a minor annoyance, and
> > goes away quickly after upgrading. And let’s not
> > kid ourselves that just by fixing #12236 alone 3.8 will somehow become a
> > stable release. No amount of passive aggressive
> > remarks is going to change that, either. So the choices as I see them
> > were: a) release 3.8 with a known minor annoyance now,
> > so that we can at least save 3.9 to 3.12 schedule or b) delay 3.9-3.12
> and
> > 3.0.9-3.0.12 by a month, each, without that minor annoyance,
> > but ultimately have just as stable/unstable 3.8. The pragmatic choice in
> > my opinion is clearly (a): we at least maintain some regularity that way.
> >
> > That said, after having though about it more, I realised that it’s the
> odd
> > 3.9, not the even 3.8 that’s already late, that I really care about.
> > So here are the two options I suggest - and I’m fine with either:
> >
> > 1. Release 3.8 as is now. It’s an even preview release that can live fine
> > with one minor annoyance on upgrade. Have 3.9 released on schedule.
> > Since the vote technically passed, we can just do it, now.
> >
> > 2. Wait until #12236 is in, and release 3.8 then, doesn’t matter when.
> > Have 3.9+ released on schedule. Even though the delta between 3.8 and 3.9
> > would
> > be tiny, it’s still IMO less confusing than skipping a whole version, and
> > a lot more preferable than failing the schedule for 4 upcoming 3.x and
> > 3.0.x releases.
> >
> > 3.9, after all, *does* have a month of bugfix only stabilisation changes
> > in it. So does 3.0.9. The sooner we can get those into people’s hands,
> the
> > better.
> > 3.8 is ultimately unimportant. Even if we release 3.8 and 3.9 on the same
> > date, it’s not a huge deal.
> >
> >
> > P.S. I feel like 1 week freeze is insufficient given a monthly cadence.
> If
> > we are to keep the monthly cycle, we should probably extend the freeze to
> > two weeks,
> > so that we have time to fix problems uncovered by regular and, more
> > importantly, upgrade tests.
> >
> > --
> > AY
> >
> > On 27 July 2016 at 22:04:31, Michael Shuler (mshu...@apache.org) wrote:
> >
> > I apologize for messing this vote up.
> >
> > So, what should happen now? Drop RESULT from the subject and continue
> > discussion of alternatives and voting?
> >
> > --
> > Kind regards,
> > Michael
> >
> > On 07/27/2016 06:33 AM, Aleksey Yeschenko wrote:
> > > The difference is that those -1s were based on new information
> > > discovered after the vote was started, while this one wasn’t.
> > >
> > > In addition to that, the discussion was still ongoing, and a decision
> > > on the alternative has not been made. As such closing the vote was
> > > definitely premature.
> > >
> > > FWIW I intended to swap my +1 with a -1, but was not given a chance
> > > to do so. As for what alternative I prefer, I’m not sure yet.
> > >
> > > -- AY
> > >
> > > On 27 July 2016 at 09:59:50, Sylvain Lebresne (sylv...@datastax.com)
> > > wrote:
> > >
> > > On Wed, Jul 27, 2016 at 12:42 AM, Aleksey Yeschenko
> > >  wrote:
> > >
> > >> Sorry, but I’m counting 3 binding +1s and 1 binding -1 (2, if you
> > >> interpret Jonathan’s emails as such).
> > >>
> > >> Thus, if you were to do close the vote now, the vote is passing
> > >> with the binding majority, and the required minimum # of +1s
> > >> gained.
> > >>
> > >> I also don’t see the PMC consensus on ‘August 3.8 release target’.
> > >>
> > >>
> > >> As such, the vote is now reopened for further discussion, and to
> > >> allow PMC to change their votes if they feel like it (I, for one,
> > >> have just returned, and need to reevaluate 12236 in light of new
> > >> comments).
> > >>
> > >
> > > It has been my understanding that we took a more human approach to
> > > release decisions than strictly and blindly adhering to the Apache
> > > written voting rules. There has been many votes that has been
> > > re-rolled even though they 

Re: A proposal to move away from Jira-centric development

2016-08-16 Thread Benedict Elliott Smith
Like many difficult problems, it is easier to point them out than to
suggest improvements.  Anyway, I wasn't proposing we change the mechanisms
of communication, just excusing my simplification of (my view of) the
problem to avoid ending up in a quagmire on that topic.  This is a great
example of email's inadequacies, as this innocuous (to me) little textual
act resulted instead in *different* quagmire, while the first potential
quagmire is still in play!

Email is a minefield, and textual interactions can be exhausting.  So
people tap out without fully expressing themselves, to retain their life
and sanity.



On 16 August 2016 at 20:49, Eric Evans <john.eric.ev...@gmail.com> wrote:

> On Tue, Aug 16, 2016 at 2:38 PM, Benedict Elliott Smith
> <bened...@apache.org> wrote:
> > I think all complex, nuanced and especially emotive topics are
> challenging
> > to discuss over textual media, due to things like the attention span of
> > your readers, the difficulties in structuring your text, and especially
> the
> > hoops that have to be jumped through to minimise the potential for
> > misinterpretation, as well as correcting the inevitable
> misinterpretations
> > that happen anyway.
>
> Fair enough, I suppose, but some of these things are also difficult
> face to face.  Most people who collaborate over the Internet with
> people from different backgrounds in different timezones, etc, learn
> to adjust accordingly.  And, the asynchronicity of email is often a
> feature in this regard, giving people the opportunity to more
> carefully consider what they've read, and to be more deliberate in
> their response.
>
> I guess what I should have asked is, if not email, then how?
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: A proposal to move away from Jira-centric development

2016-08-16 Thread Benedict Elliott Smith
This was very much not my intention to imply.  I thought I had crafted the
email carefully to not imply that at all.  This topic is complex, and fully
exploring all of the detail would be onerous over email.  DataStax, in my
opinion, consciously tries to be a good citizen.  However there are
emergent properties of a system with this imbalance that are not conscious,
and are suboptimal, and it is not unreasonable for the Apache apparatus to
try to "rectify" the imbalance.  I personally support that *in principle*,
but I think they're not going about it brilliantly right now.  I also doubt
the success of any such endeavour, given how difficult the problem is.

I do, however, think the project could improve how welcoming it is.  Both
in the areas Jon mentions, but also in how much effort is put into
mentoring newcomers and responding to technical questions.  The project is
far from *unwelcoming, *but mentoring is (very) costly, and when success at
your dayjob is measured in the code you contribute, this clearly takes
priority.

I don't know how to change that - again, as far as conscious actions are
concerned, I have personally witnessed DataStax try to put more effort into
this, as well as trying to drum up new external contributors through
bootcamps.  But these efforts have had limited success.

On 16 August 2016 at 19:04, Dave Brosius <dbros...@mebigfatguy.com> wrote:

> While i agree with this generally, it's misleading.
>
> It comes across like Datastax is dictating and excluding others from
> participating, or perhaps discouraging others or whatever.
>
> The truth is, whenever someone comes along who is independent, and
> interested in developing Apache Cassandra, they are welcomed, and do
> participate, and do develop, and soon after become Datastax employees.
> Not always of course, but a common pattern. It only makes sense for
> Datastax to hire people who are interested in and capable of developing
> Apache Cassandra. But the truth is a whole lot less sinister than the
> inference.
>
> --dave
> [not associated with Datastax]
>
>
>
> On 2016-08-16 13:47, Benedict Elliott Smith wrote:
>
>> This is a much more useful focusing of the discussion, in my opinion.  It
>> seemed to me that city hall was focusing on a very narrow definition of
>> project health.
>>
>> I would be the first to say the project need to improve here, but doing so
>> will be challenging;  I'm not sure anyone really knows how to go about it.
>> Which is why we end up in these local minima of discussions about the
>> minutiae of JIRA replication.
>>
>> What this project really needs, and the board is chomping at the bit
>> about,
>> is diversity.  The fact is, right now DataStax does 95% of the substantive
>> development on the project, and so they make all the decisions.  As such,
>> their internal community outweighs the Apache community.  I will emphasise
>> clearly for my ex-colleagues, I'm not making any value judgement about
>> this, just clarifying the crux of the discussion that everyone seems to be
>> dancing around.
>>
>> The question is, what can be done about it?  The project needs a lot of
>> new
>> highly productive and independent contributors who are capable of
>> meaningfully shaping project direction.  The problem is we don't know how
>> to achieve that.
>>
>>
>>
>> On 16 August 2016 at 17:24, Dennis E. Hamilton <dennis.hamil...@acm.org>
>> wrote:
>>
>>
>>>
>>> > -Original Message-
>>> > From: Eric Stevens [mailto:migh...@gmail.com]
>>> > Sent: Tuesday, August 16, 2016 06:10
>>> > To: dev@cassandra.apache.org
>>> > Subject: Re: A proposal to move away from Jira-centric development
>>> >
>>> > I agree with Benedict that we really shouldn't be getting into a
>>> > legalese
>>> > debate on this subject, however "it didn't happen" has been brought up
>>> > as a
>>> > hammer in this conversation multiple times, and I think it's important
>>> > that
>>> > we put it to rest.  It's pretty clear cut that projects are free to
>>> > disregard this advice.  "It didn't happen" is a motto, not a rule.
>>> [orcmid]
>>>
>>> <http://community.apache.org/apache-way/apache-project-matur
>>> ity-model.html
>>> >
>>>
>>> Please read them all, but especially the sections on Community, Consensus
>>> Building, and Independence.
>>>
>>> Apache projects are expected to govern themselves, PMCs are responsible
>>> for it, and PMC Chairs (officers of the foundation) are 

Re: A proposal to move away from Jira-centric development

2016-08-16 Thread Benedict Elliott Smith
I think all complex, nuanced and especially emotive topics are challenging
to discuss over textual media, due to things like the attention span of
your readers, the difficulties in structuring your text, and especially the
hoops that have to be jumped through to minimise the potential for
misinterpretation, as well as correcting the inevitable misinterpretations
that happen anyway.  But that's a major side track we shouldn't deviate
down.

On 16 August 2016 at 20:28, Eric Evans <john.eric.ev...@gmail.com> wrote:

> On Tue, Aug 16, 2016 at 1:34 PM, Benedict Elliott Smith
> <bened...@apache.org> wrote:
> > This topic is complex, and fully exploring all of the detail would be
> onerous over email.
>
> Out of curiosity, why; What makes this topic so difficult to discuss over
> email?
>
> > DataStax, in my opinion, consciously tries to be a good citizen.
> However there
> > are emergent properties of a system with this imbalance that are not
> conscious,
> > and are suboptimal, and it is not unreasonable for the Apache apparatus
> to
> > try to "rectify" the imbalance.  I personally support that *in
> principle*, but I think
> > they're not going about it brilliantly right now.  I also doubt the
> success of any
> > such endeavour, given how difficult the problem is.
>
> This.  A good first step in my opinion would be for us all to simply
> recognize this.  An imbalance of this nature is not good for the
> project, full stop.  No malice needs to be attributed, no effigies
> burned, and it shouldn't be viewed as squaring up against those we
> know and respect who are employed by Datastax.
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: A proposal to move away from Jira-centric development

2016-08-16 Thread Benedict Elliott Smith
This is a much more useful focusing of the discussion, in my opinion.  It
seemed to me that city hall was focusing on a very narrow definition of
project health.

I would be the first to say the project need to improve here, but doing so
will be challenging;  I'm not sure anyone really knows how to go about it.
Which is why we end up in these local minima of discussions about the
minutiae of JIRA replication.

What this project really needs, and the board is chomping at the bit about,
is diversity.  The fact is, right now DataStax does 95% of the substantive
development on the project, and so they make all the decisions.  As such,
their internal community outweighs the Apache community.  I will emphasise
clearly for my ex-colleagues, I'm not making any value judgement about
this, just clarifying the crux of the discussion that everyone seems to be
dancing around.

The question is, what can be done about it?  The project needs a lot of new
highly productive and independent contributors who are capable of
meaningfully shaping project direction.  The problem is we don't know how
to achieve that.



On 16 August 2016 at 17:24, Dennis E. Hamilton 
wrote:

>
>
> > -Original Message-
> > From: Eric Stevens [mailto:migh...@gmail.com]
> > Sent: Tuesday, August 16, 2016 06:10
> > To: dev@cassandra.apache.org
> > Subject: Re: A proposal to move away from Jira-centric development
> >
> > I agree with Benedict that we really shouldn't be getting into a
> > legalese
> > debate on this subject, however "it didn't happen" has been brought up
> > as a
> > hammer in this conversation multiple times, and I think it's important
> > that
> > we put it to rest.  It's pretty clear cut that projects are free to
> > disregard this advice.  "It didn't happen" is a motto, not a rule.
> [orcmid]
>
>  >
>
> Please read them all, but especially the sections on Community, Consensus
> Building, and Independence.
>
> Apache projects are expected to govern themselves, PMCs are responsible
> for it, and PMC Chairs (officers of the foundation) are accountable to the
> Board on how the project is striving toward maturity.
>
> On occasion, deviations are so notable that there is objection.  It is not
> that folks run around policing the projects.  But there are occasions where
> there are concerns that a project has gone astray.
>
> One maturity factor that might not be emphasized enough is
> Sustainability.  It is about the transparency of project conduct, the
> inclusiveness of operation and visibility, and the ways that growth and
> turnover are accommodated.  Since we are looking at mottos, "community over
> code" comes to mind.
>
> Project freedom is a bit like the freedom to drive at 100mph on an
> arterial highway.  Occassionally, the infraction becomes worthy of
> attention and even a road block and spike strips.
>
> While individual preferences are being discussed here, and I agree it is
> more pertinent than top-posting versus bottom-posting, what is lacking is a
> broad discussion on community.  Not incumbents and the karma-privileged,
> but the overall community and how one sustains a thriving project that
> strives for maturity.
>
> Folks who are concerned about managing the mail stream and choosing what
> matters to them might want to discuss ways of operating lists in support of
> those concerns.  There are positions here and not enough questions about
> what might be workable inside of the practices and policies that are the
> waters Apache projects swim in.
>
>  - Dennis
>
> >
> > Per ASF newbie FAQ referenced by someone else earlier [1]:
> >
> > > The section on ASF Mottos is especially useful as a reminder of the
> > way
> > things are in most ASF projects. This section includes such gems as:
> > > * Put community before code.
> > > * Let they that do the work make the decisions.
> > > * If it didn't happen on a mailing list, it didn't happen.
> > > * Don't feed the trolls.
> >
> > This is presented as a general guideline and not a hard rule, and as
> > Benedict points out even this is preceded by a guideline suggesting that
> > developers are free to seek alternatives.
> [orcmid]
>
> The alternatives must fit within the overall principles, however.  Not
> deviate from or weaken them.  This is not an opening for arbitrary conduct.
>
> If a major exception is required, it is up to the project to deliberate on
> the matter, agree on the desired exception and its justification, and take
> it to an appropriate venue for ratification.
>
> (It is useful to keep in mind that exceptions are not precedents for
> others to cherry-pick.)
>
> It is also the case that the PMC and, indeed the Chair (although consensus
> is always better), can set policies for the project.  They must be explicit
> and documented and available to all.
>
> It would be really great to stop fighting city hall and, instead, start an
> inquiry into how 

Re: A proposal to move away from Jira-centric development

2016-08-16 Thread Benedict Elliott Smith
Unfortunately when rulebooks are consulted to shape this kind of
discussion, their ambiguity begins to show.  What does it mean for
something "to happen" on a mailing list?  It must be a loose
interpretation, because clearly many things do not "happen" on the mailing
list, such as all of the code development and commits to the codebase, as
well as an infinitude of micro decisions made by the implementor.  These
things clearly happen though.

It's also worth pointing out the *prior* rule, which presumably takes
precedence: "Let they that do the work make the decisions."  By this rule
perhaps we shouldn't even discuss on the mailing list as we may be
encroaching on their right to decide.

Now, this is all further clouded by the fact that we're quoting the Newbie
FAQs.  In other places different snappy phrases are used:

"Everything -- but *everything*-- inside the Apache world occurs *or is
reflected* in email"  (emphasis mine)
"If it isn't in my email, it didn't happen."

These are from the more official sounding "committer guide" and both
indicate commits@ receiving all JIRA comments means those comments "happen"
- although I don't know who the speaker in the second quote is, so perhaps
it has to end up in a very specific inbox.

Anyway, the point is: *let's not get into legalistic discussions when we
don't even have legalese.*

These rules are referred to as "mottos," "codes," "FAQs" - they are
guidelines, so should be interpreted with generosity.

But even if they are not, it seems the suggestion of noncompliance is a
stretch.  So let's just try to agree what the best policy is.


On 16 August 2016 at 11:44, James Carman  wrote:

> While all of these things are true, it's irrelevant.  The ASF has a clear
> policy on this (the "it didn't happen" policy).  Discussions and decisions
> about the project must be done on the mailing lists.  You may disagree with
> the policy (as many have before you) and feel free to take it up with the
> powers that be, but until that policy changes, it's what we have to adhere
> to.  The reason they chose mailing lists (IIUC) is that they are somewhat
> of a "least common denominator."
>
> I would suggest, instead of sending an email to the dev@ list saying "hey
> folks, go to JIRA and look at stuff", that we do the opposite.  Let's have
> the discussion on the mailing lists and in JIRA, we would link to the email
> threads for any supporting documentation about the ticket.
>
> On Mon, Aug 15, 2016 at 9:04 PM Eric Stevens  wrote:
>
> > There are a few strengths of discussion on the ticketing system over
> > mailing lists.  Mailing lists were fundamentally designed in the 1970's
> and
> > early 1980's, and the state of the art from a user experience perspective
> > has barely advanced since then.
> >
> > * Mailing lists tend to end up with fragmented threads for large
> > discussions, subject changes, conversation restarts, topic forks, and
> > simple etiquette errors - all of which can make it very difficult to
> locate
> > the entire discussion related to a feature.   There is no single source
> > that an interested party can study thoroughly to understand the entire
> > conversation, rather it's more of a scavenger hunt with no way to be
> > certain you've covered all the territory.  8844 for example would have
> > ended up being numerous parallel threads as people forked the
> conversation
> > to have side discussions or offer alternatives, there's no way such a
> > ticket would ever have simply been a single massive email thread with no
> > forks.
> >
> > * Mailing lists don't allow for selective subscription.  If I find a
> ticket
> > interesting, I can watch the ticket and follow along. Conversely and more
> > importantly if I find it uninteresting I don't have to wade through that
> > discussion as it progresses.  If I think I want to follow all tickets,
> that
> > should be possible too.  Likewise if I want to watch tickets that involve
> > certain components, certain milestones, certain labels, or even certain
> > contributors, I can create a subscription for such, and get emails
> > accordingly.  I can also subscribe to RSS feeds and add them to my news
> > reader if I prefer that approach better.  A tremendous amount of control
> is
> > given to the user over what they want to see, and how they want to see
> it.
> >
> > * The concern that Chris voiced about having to open a web browser to
> > participate is actually not true unless Apache's Jira install is not well
> > configured.  If you reply to an email notification from Jira it should
> > appear as a comment on the ticket.  It shouldn't exclude anyone (even
> those
> > who want to participate but somehow can't be motivated to create an
> account
> > in the ticketing system, but who _could_ be bothered to figure out the
> > arcane mailing list subscription incantation).
> >
> > * Permalinking conversations is an important capability.  It's possible
> > with a 

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Benedict Elliott Smith
By this definition the Cassandra project is already compliant? There's a
commits@ mailing list that behaves just as you describe.

I'd personally like to see some reform with how these things work, but
mostly because commits@ is rarely going to be subscribed to by anybody who
isn't working full time on the project, as it's painfully noisy. I would
hate for dev@ to become similarly noisy though.

On Monday, 15 August 2016, Dave Lester  wrote:

> For all Apache projects, mailing lists are the source of truth. See: "If
> it didn't happen on a mailing list, it didn't happen."
> https://community.apache.org/newbiefaq.html#is-there-a-
> code-of-conduct-for-apache-projects  newbiefaq.html#is-there-a-code-of-conduct-for-apache-projects>
>
> In response to Jason’s question, here are two things I’ve seen work well
> in the Apache Mesos community:
>
> 1. I’d suggest setting up an iss...@cassandra.apache.org 
> mailing list which posts all changes to JIRA tickets (comments, issue
> reassignments, status changes). This could be subscribed to like any other
> mailing list, and while this list would be high volume it increases
> transparency of what’s happening across the project.
>
> For Apache Mesos, we have a issues@mesos list:
> https://lists.apache.org/list.html?iss...@mesos.apache.org <
> https://lists.apache.org/list.html?iss...@mesos.apache.org> for this
> purpose. It can be hugely valuable for keeping tabs on what’s happening in
> the project. If there’s interest in creating this for Cassandra, here’s a
> link to the original INFRA ticket as a reference:
> https://issues.apache.org/jira/browse/INFRA-7971 <
> https://issues.apache.org/jira/browse/INFRA-7971>
>
> 2. Apache Mesos has formalized process of design documents / feature
> development, to encourage community discussion prior to being committed —
> this discussion takes place on the mailing list and often has less to do
> with the merits of a particular patch as much as it does on an overall
> design, its relationship to dependencies, its usage, or larger issues about
> the direction of a feature. These discussions belong on the mailing list.
>
> To keep these discussions / design documents connected to JIRA we attach
> links to JIRA issues. For example: https://cwiki.apache.org/
> confluence/display/MESOS/Design+docs+--+Shared+Links <
> https://cwiki.apache.org/confluence/display/MESOS/
> Design+docs+--+Shared+Links>. The design doc approach is more of a
> formalization of what Jonathan originally proposed.
>
> Dave
>
> > On Aug 15, 2016, at 11:34 AM, Jason Brown  > wrote:
> >
> > Chris,
> >
> > Can you give a few examples of other healthy Apache projects which you
> feel
> > would be good example? Note: I'm not trying to bait the conversation, but
> > am genuinely interested in what other successful projects do.
> >
> > Thanks
> >
> > Jason
> >
> > On Monday, August 15, 2016, Chris Mattmann  > wrote:
> >
> >> s/dev list followers//
> >>
> >> That’s (one of) the disconnect(s). It’s not *you the emboldened,
> powerful
> >> PMC*
> >> and then everyone else.
> >>
> >>
> >> On 8/15/16, 11:25 AM, "Jeremy Hanna"  
> >> > wrote:
> >>
> >>Regarding high level linking, if I’m in irc or slack or hipchat or a
> >> mailing list thread, it’s easy to reference a Jira ID and chat programs
> can
> >> link to it and bots can bring up various details.  I don’t think a hash
> id
> >> for a mailing list is as simple or memorable.
> >>
> >>A feature of a mailing list thread is that it can go in different
> >> directions easily.  The burden is that it will be harder to follow in
> the
> >> future if you’re trying to sort out implementation details.  So for high
> >> level discussion, the mailing list is great.  When getting down to the
> >> actual work and discussion about that focused work, that’s where a tool
> >> like Jira comes in.  Then it is reference-able in the changes.txt and
> other
> >> things.
> >>
> >>I think the approach proposed by Jonathan is a nice way to keep dev
> >> list followers informed but keeping ticket details focused.
> >>
> >>> On Aug 15, 2016, at 1:12 PM, Chris Mattmann  
> >> > wrote:
> >>>
> >>> How is it harder to point someone to mail?
> >>>
> >>> Have you seen lists.apache.org?
> >>>
> >>> Specifically:
> >>> https://lists.apache.org/list.html?dev@cassandra.apache.org
> >>>
> >>>
> >>>
> >>> On 8/15/16, 10:08 AM, "Jeremiah D Jordan"  
> >> > wrote:
> >>>
> >>>   I like keeping things in JIRA because then everything is in one
> >> place, and it is easy to refer someone to it in the future.
> >>>   But I agree that JIRA tickets with a bunch of design discussion
> >> and POC’s and such in them can get pretty long and convoluted.
> 

Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
It's worth noting more clearly that 3.5 is an arbitrary point in time.  All
3.X releases < 3.6 are affected.

If we backport to 3.5, it seems like 3.1 and 3.3 should get the same
treatment.  I do recall commitments to backport critical fixes, but exactly
what the bar is was never well defined.

I also cannot see how there would be any added confusion.


On 15 September 2016 at 18:31, Dave Lester  wrote:

> How would cutting a 3.5.1 release possibly confuse users of the software?
> It would be easy to document the change and to send release notes.
>
> Given the bug’s critical nature and that it's a minor fix, I’m +1
> (non-binding) to a new release.
>
> Dave
>
> > On Sep 15, 2016, at 7:18 AM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
> >
> > I’m with Jeff on this, 3.7 (bug fixes on 3.6) has already been released
> with the fix.  Since the fix applies cleanly anyone is free to put it on
> top of 3.5 on their own if they like, but I see no reason to put out a
> 3.5.1 right now and confuse people further.
> >
> > -Jeremiah
> >
> >
> >> On Sep 15, 2016, at 9:07 AM, Jonathan Haddad  wrote:
> >>
> >> As I follow up, I suppose I'm only advocating for a fix to the odd
> >> releases.  Sadly, Tick Tock versioning is misleading.
> >>
> >> If tick tock were to continue (and I'm very much against how it
> currently
> >> works) the whole even-features odd-fixes thing needs to stop ASAP, all
> it
> >> does it confuse people.
> >>
> >> The follow up to 3.4 (3.5) should have been 3.4.1, following semver, so
> >> people know it's bug fixes only to 3.4.
> >>
> >> Jon
> >>
> >> On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad 
> wrote:
> >>
> >>> In this particular case, I'd say adding a bug fix release for every
> >>> version that's affected would be the right thing.  The issue is so
> easily
> >>> reproducible and will likely result in massive data loss for anyone on
> 3.X
> >>> WHERE X < 6 and uses the "date" type.
> >>>
> >>> This is how easy it is to reproduce:
> >>>
> >>> 1. Start Cassandra 3.5
> >>> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> >>> 'replication_factor': 1};
> >>> 3. use test;
> >>> 4. create table fail (id int primary key, d date);
> >>> 5. delete d from fail where id = 1;
> >>> 6. Stop Cassandra
> >>> 7. Start Cassandra
> >>>
> >>> You will get this, and startup will fail:
> >>>
> >>> ERROR 05:32:09 Exiting due to error while processing commit log during
> >>> initialization.
> >>> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> >>> Unexpected error deserializing mutation; saved to
> >>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> mutation6313332720566971713dat.
> >>> This may be caused by replaying a mutation against a table with the
> same
> >>> name but incompatible schema.  Exception follows:
> >>> org.apache.cassandra.serializers.MarshalException: Expected 4 byte
> long for
> >>> date (0)
> >>>
> >>> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5
> (and
> >>> probably the other releases) and requires very little investment from
> >>> anyone.
> >>>
> >>>
> >>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa  >
> >>> wrote:
> >>>
>  We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
>  but we certainly didn’t/won’t go back and cut new releases from every
>  branch for every critical bug in future releases, so I think we need
> to
>  draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it
> seems
>  like you’ve got options (either stay on the tick and go up to 3.7, or
> bail
>  down to 3.0.x)
> 
>  Perhaps, though, this highlights the fact that tick/tock may not be
> the
>  best option long term. We’ve tried it for a year, perhaps we should
> instead
>  discuss whether or not it should continue, or if there’s another
> process
>  that gives us a better way to get useful patches into versions people
> are
>  willing to run in production.
> 
> 
> 
>  On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
> 
> > Common sense is what prevents someone from upgrading to yet another
> > completely unknown version with new features which have probably
> broken
> > even more stuff that nobody is aware of.  The folks I'm helping right
> > deployed 3.5 when they got started because
>  https://urldefense.proofpoint.com/v2/url?u=http-3A__
> cassandra.apache.org=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kq
> hAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=
> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=pLP3udocOcAG6k_
> sAb9p8tcAhtOhpFm6JB7owGhPQEs=
>  suggests
> > it's acceptable for production.  It turns out using 4 of the built in
> > datatypes of the database result in the server being unable to
> restart
> > without clearing out the commit logs and running a repair.  That

Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
Yes, agreed. I'm advocating a different cadence, not a random cadence.

On Thursday, 15 September 2016, Tyler Hobbs <ty...@datastax.com> wrote:

> On Thu, Sep 15, 2016 at 2:22 PM, Benedict Elliott Smith <
> bened...@apache.org <javascript:;>
> > wrote:
>
> > Feature releases don't have to be on the same cadence as bug fixes.
> They're
> > naturally different beasts.
> >
>
> With the exception of critical bug fixes (which can warrant an immediate
> release), I think keeping a regular cadence makes us less likely to slip
> and fall behind on releases.
>
>
> >
> > Why not stick with monthly feature releases, but mark every third (or
> > sixth) as a supported release that gets quarterly updates for 2-3
> quarters?
> >
>
> That's also a good idea.
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
I agree tick-tock is a failure.  But for two reasons IMO:

1) Ultimately, the users are the real testers and it takes a while for a
release to percolate into the wild for feedback.  The reality is that a
release doesn't have its tires properly kicked for at least three months
after it's cut.  So if we are to have any tocks, they should be completely
unwed from the ticks, and should probably happen on a ~3M cadence to keep
the labour down but the utility up (and there should probably still be more
than one tock per tick)

2) Those promised resources to improved process never happened.  We haven't
even reached parity with the 2.1 release until very recently, i.e. no
failing u/dtests.


On 15 September 2016 at 19:08, Jeff Jirsa 
wrote:

> I know we’ve got a lot of folks following the dev list without a lot of
> background, so let’s make sure we get some context here so everyone can be
> on the same page.
>
> Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and 3.3.1,
> etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first before
> the RE manpower is spent on backporting fixes, even critical fixes, because
> 3.9 has multiple critical fixes for people running 3.7).
>
> Now some background:
>
> For many years, Cassandra used to have a dev process that kept 3 active
> branches - “bleeding edge”, a “stable”, and an “old stable” branch, where
> developers would be committing ALL new contributions to the bleeding edge,
> non-api-breaking changes to stable, and bugfixes only to old stable. While
> the api changed and major features were added, that bleeding edge would
> just be ‘trunk’, and it’d get cut into a major version when it was ready to
> ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / 1.2,
> and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released as
> a major x.y.0, the third, oldest, most stable branch went EOL, and new
> features would go into trunk for the next major version.
>
> There were two big negatives observed with this:
>
> The first big negative is that if multiple major new features were in
> flight, releases were prone to delay. Nobody wants to break an API on a
> x.y.1 release, and nobody wants to add a new feature to a x.y.2 release, so
> the project would delay the x.y releases if major features were close, and
> then there’d be pressure to slip them in before they were fully tested, or
> cut features to avoid delaying the release. This pressure was observed to
> be bad for the project – it forced technical compromises.
>
> The second downside that was observed was that nobody would try to run the
> new versions when they launched, because they were buggy because they were
> filled with new features. 2.2, for example, introduced RBAC, commitlog
> compression, and user defined functions – major features that needed to be
> tested. Unfortunately, because there were few real-world testers, there
> were still major bugs being found for months – the first production-ready
> version of 2.2 is probably in the 2.2.5 or 2.2.6 range.
>
> For version 3, we moved to an alternate release, modeled on Intel’s
> tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model
>
> The intention was to allow new features into 3.even releases (3.0, 3.2,
> 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The hope
> was to allow more frequent releases to address the first big negative
> (flood of new features that blocked releases), while also helping to
> address the second – with fewer major features in a release, they better
> get more/better test coverage.
>
> In the tick/tock model, anyone running 3.odd (like 3.5) should be looking
> for bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken (as is
> 3.3, and 3.4, etc), but with this release model, the bugfix SHOULD BE in
> 3.7. As I mentioned previously, we have precedent for backporting critical
> fixes, but we don’t have a well defined bar (that I see) for what’s
> critical enough for a backport.
>
> Jon is noting (and what many of us who run Cassandra in production have
> really known for a very long time) is that nobody wants to run 3.newest
> (even or odd), because 3.newest is likely broken (because it’s a complex
> distributed database, and testing is hard, and it takes time and complex
> workloads to find bugs). In the tick/tock model, because new features went
> into 3.6, there are new features that may not be adequately
> tested/validated in 3.7 a user of 3.5 doesn’t want, and isn’t willing to
> accept the risk.
>
> The bottom line here is that tick/tock is probably a well intentioned but
> failed attempt to bring stability to Cassandra’s releases. The problems
> tick/tock was meant to solve are real problems, but tick/tock doesn’t seem
> to be addressing them – new features invalidate old testing, which makes it
> difficult/impossible for real users to sit on the 3.odd versions.
>
> We’re due for cutting 3.9 and 3.0.9, and we have limited 

Re: Moderation

2016-11-06 Thread Benedict Elliott Smith
It was nothing but an expression of my belief that Chris' excuses for his
inappropriate behaviour were wholly inadequate.

This is not an isolated incident; it is a pattern of behaviour, and excuses
do not cut it.  Anything less than a wholesale acceptance of
inappropriateness, retraction, and commitment not to repeat this kind of
behaviour in future is insufficient.

Unlike you, I have no power, only words to express my disappointment in you
both.


On 6 November 2016 at 17:26, Jim Jagielski <j...@jagunet.com> wrote:

> If this is your attempt to accept Chris' explanation, even if
> you don't agree with it, then you have not quite succeeded.
>
> If instead, this is your attempt to continue to heap fuel on
> a fire, and be just as aggressive as you paint others to be, then
> you have done quite well.
>
> I don't expect that others will be spending their time replying
> to your messages anymore, at least on list.
>
> > On Nov 6, 2016, at 11:19 AM, Benedict Elliott Smith <bened...@apache.org>
> wrote:
> >
> > In summary: you claim to be someone with years of experience at the
> forefront of an organisation that conducts all of its business primarily
> over email.  In that time you have not learned to express yourself over
> email in a manner that is not incendiary to those reading it, nor offensive
> to the intended recipient.
> >
> > That sounds to me like you are openly disclaiming your suitability for
> the position of responsibility you currently hold.
> >
>
>


Re: Review of Cassandra actions

2016-11-06 Thread Benedict Elliott Smith
Jim,

I would love it if you could take the time to explain how arrived at a
diagnosis of trolling.

Aleksey made a well written, cogent and on-topic criticism of your ongoing
behaviour, as well as a reasoned rebuttal of your absurd claim that your
power is inherent to *you*, not your position (I don't think many people
know who you are, only what you are).

It was explicitly the topic of discussion, and there is mounting evidence
of your misbehaviour.  This is the very definition of discussion, not
trolling.

Much like your "chess" comment, this appears to be an attempt to shut down
substantive discussion of your unsuitability for the role of board member.



On 6 November 2016 at 13:01, Jim Jagielski  wrote:

> Sorry that people took the reply as pompous... You are certainly
> within your rights to take it anyway you want. It was not
> meant that way.
>
> In the same vein, I am within my rights to take responses
> in the way I want, which I took as simple trolling. And
> with trolls, as with thermonuclear war, the only "winning"
> move is not to play.
>
> > On Nov 5, 2016, at 9:25 PM, Jeff Jirsa  wrote:
> >
> > I hope the other 7 members of the board take note of this response,
> > and other similar reactions on dev@ today.
> >
> > When Datastax violated trademark, they acknowledged it and worked to
> > correct it. To their credit, they tried to do the right thing.
> > When the PMC failed to enforce problems, we acknowledged it and worked
> > to correct it. We aren't perfect, but we're trying.
> >
> > When a few members the board openly violate the code of conduct, being
> > condescending and disrespectful under the auspices of "enforcing the
> > rules" and "protecting the community", they're breaking the rules,
> > damaging the community, and nobody seems willing to acknowledge it or
> > work to correct it. It's not isolated, I'll link examples if it's
> > useful.
> >
> > In a time when we're all trying to do the right thing to protect the
> > project and the community, it's unfortunate that high ranking, long
> > time members within the ASF actively work to undermine trust and
> > community while flaunting the code of conduct, which requires
> > friendliness, empathy, and professionalism, and the rest of the board
> > is silent on the matter.
> >
> >
> >
> >
> >> On Nov 5, 2016, at 4:08 PM, Dave Brosius  wrote:
> >>
> >> I take this response (a second time) as a pompous way to trivialize the
> responses of others as to the point of their points being meaningless to
> you. So either explain what this means, or accept the fact that you are as
> Chris is exactly what people are claiming you to be. Abnoxious bullies more
> interested in throwing your weight around and causing havoc, destroying a
> community, rather than actually being motivated by improving the ASF.
> >>
> >>
> >>> On 11/05/2016 06:16 PM, Jim Jagielski wrote:
> >>> How about a nice game of chess?
> >>>
>  On Nov 5, 2016, at 1:15 PM, Aleksey Yeschenko 
> wrote:
> 
>  I’m sorry, but this statement is so at odds with common sense that I
> have to call it out.
> 
>  Of course your position grants your voice extra power. A lot of extra
> power,
>  like it or not (I have a feeling you quite like it, though).
> 
>  In an ideal world, that power would entail corresponding duties:
>  care and consideration in your actions at least.
>  Instead, you are being hotheaded, impulsive, antagonising, and
> immature.
> 
>  In what possible universe dropping that hammer threat from the ’20%
> off” email thread,
>  then following up with a Game of Thrones youtube clip is alright?
> 
>  That kind of behaviour is inappropriate for a board member. Frankly,
> it wouldn’t be
>  appropriate for a greeter at Walmart. If you don’t see this, we do
> indeed have bigger
>  problems.
> 
>  --
>  AY
> 
>  On 5 November 2016 at 14:57:13, Jim Jagielski (j...@jagunet.com)
> wrote:
> 
> >> But I love the ability of VP's and Board to simply pretend their
> positions carried no weight.
> >>
> > I would submit that whatever "weight" someone's position may
> > carry, it is due to *who* they are, and not *what* they are.
> >
> > If we have people here in the ASF or in PMCs which really think
> > that titles manner in discussions like this, when one is NOT
> > speaking ex cathedra, then we have bigger problems. :)
> >>>
> >>
>
>


Re: Review of Cassandra actions

2016-11-06 Thread Benedict Elliott Smith
You've cherry picked, as usual.

"In what possible universe dropping that hammer threat from the ’20% off”
email thread,
then following up with a Game of Thrones youtube clip is alright?"

"In an ideal world, that power would entail corresponding duties:
care and consideration in your actions at least."

"That kind of behaviour is inappropriate for a board member... If you don’t
see this, we do indeed have bigger
problems."

You seem to suffer from double standards, in the wrong direction.  Far more
offensive language from a board member is completely justifiable by nothing
by frustration <https://twitter.com/jimjag/status/794616571079626753>.
>From somebody wronged by a board member, however, an expression of their
experience with far less incendiary language is completely inexcusable, and
obviates the rest of a message.


On 6 November 2016 at 17:33, Jim Jagielski <j...@jagunet.com> wrote:

> "well written, cogent and on-topic" ... "reasoned rebuttal"
>
> You keep on using those words. I don't think they mean
> what you think they do. Some data points:
>
>   o " A lot of extra power, like it or not (I have a feeling you quite
> like it, though)."
>   o "you are being hotheaded, impulsive, antagonising, and immature."
>   o "in what possible universe"
>   o "Frankly, it wouldn’t be appropriate for a greeter at Walmart"
>
> So if the above warrants what you consider well-written, cogent,
> on-topic and reasoned, then I fear that any further discussion
> is really worthless.
>
> o+o
>
> > On Nov 6, 2016, at 11:24 AM, Benedict Elliott Smith <bened...@apache.org>
> wrote:
> >
> > Jim,
> >
> > I would love it if you could take the time to explain how arrived at a
> diagnosis of trolling.
> >
> > Aleksey made a well written, cogent and on-topic criticism of your
> ongoing behaviour, as well as a reasoned rebuttal of your absurd claim that
> your power is inherent to you, not your position (I don't think many people
> know who you are, only what you are).
> >
> > It was explicitly the topic of discussion, and there is mounting
> evidence of your misbehaviour.  This is the very definition of discussion,
> not trolling.
> >
> > Much like your "chess" comment, this appears to be an attempt to shut
> down substantive discussion of your unsuitability for the role of board
> member.
> >
>
>


Re: Moderation

2016-11-06 Thread Benedict Elliott Smith
In summary: you claim to be someone with years of experience at the
forefront of an organisation that conducts all of its business primarily
over email.  In that time you have not learned to express yourself over
email in a manner that is not incendiary to those reading it, nor offensive
to the intended recipient.

That sounds to me like you are openly disclaiming your suitability for the
position of responsibility you currently hold.

On 6 November 2016 at 14:53, Chris Mattmann  wrote:

> For the record, your breakdown of the email trying to decipher what I
> meant is not
> correct. It’s not your fault, but email doesn’t convey tone, nor do you
> know what I am
> thinking or what I was trying to say. In fact, I was actually saying the
> PMC wasn’t doing its job,
> because (as I stated to you months ago), you (and many other community
> members of
> Cassandra) *should* have a binding vote. It wasn’t discrediting to you to
> point out that
> you don’t have the PMC or committer credentials; it was an example trying
> to point out
> that you *should* have them. And that you clearly care about the project
> as I believe you
> have developed a book on the subject of Apache Cassandra a while back IIRC
> which in Tika,
> Nutch, OODT, and a number of other projects would have earned you the
> ability to have a
> direct say in those Apache projects. And a lot of others.
>
> It’s these systematic fracturing of the community under the guise of a
> single vendor who
> has stated that they care about Cassandra (note the omission of Apache),
> but by demonstration
> has shown they either don’t understand, or don’t care about the Apache
> part of the equation.
> That’s what caused me to become frustrated when the following sequence of
> events
> happened:
>
> 1. After the Board meeting Mark Thomas one of our Directors took point on
> engaging
> the Apache Cassandra PMC with some of the concerns brought up over the
> past 6
> months and the role I was filling there became a back seat for me.
> 2. I saw over the past few days on a Twitter feed retweeted by an ASF
> member that
> Kelly Sommers (whom I have never met in person and do not know previously)
> was asking
> questions and stating negative things about the ASF that I believed could
> be much better
> understood by bringing them here to the ASF mailing lists for Apache
> Cassandra. I suggested
> on Twitter that she bring her concerns to the Apache lists and told her
> which email address
> to send it to. Some of the same people that eventually came onto the
> thread were people
> who were communicating with her on Twitter – this was disappointing as
> they could have
> done the same thing, and suggested Kelly come to the lists, Apache
> Cassandra PMC or not.
> 3. After 12 hours I checked back with Kelly and the Twitter dialogue had
> continued with several
> ASF members and even some Board members getting involved. Again, I asked
> Kelly why talk
> there, and why not just talk to the email list which is the canonical home
> for Apache Cassandra?
> She told me she sent the mail the prior night.
> 4. So of course I checked (after having already guessed it was stuck in
> moderation) and yes it
> was. What ensued was both frustration by my part and also email
> conversation that was heated
> on both sides. I felt swiped on by a few emails where I had good
> intentions but I felt we were
> wasting time debating whether we *should* moderate something through –
> which to me was
> a clear answer (yes). Where I failed there was to recognize that the real
> answer was that the Apache
> Cassandra PMC did not have enough moderators and the people I was mostly
> going back and forth
> with were not the moderators of the mailing lists.
> 5. One positive thing that came from #4 was that at least there are more
> moderators now. I’m not sure
> the reason for the lack of geographically diverse moderators, but it’s
> definitely something the PMC should
> check from time to time. Not pointing fingers, simply identifying
> responsibility.
>
> In my emails I used the word “shi*t” and “f’ing”. I didn’t direct either
> of these words at anyone in particular.
> I used them as color in expressing my frustration. It happens from time to
> time. Sorry.
>
> The Cassandra MVP comment was also not a diss on you as much as it was me
> saying – ideally – I would hope that
> the Apache Cassandra MVP people promote the concept of their community
> leaders becoming “ASF members”,
> and that Cassandra MVPs are great – but secondary – to the
> responsibilities of the PMC to move towards ensuring
> its community understands the Apache Way.
>
> Russell and I have never met in person so he does not really know me and
> nor I him. So he doesn’t know some of
> these nuances that people would normally know having met each other in
> mediums besides email or electronically.
> Many of you do not know me either. I will conclude with saying that I
> realize many of the people here for 

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Benedict Elliott Smith
Thanks Jeff, that was very well put.

I would quibble on one point, though: the ship has never sailed on topics
of community.  How the board acts towards the PMC and companies in the
community matters a great deal for continuing relations, as well as for
other projects.

The question is: did the board members all behave in a manner that the PMC
felt was reasonable and impartial?  What I saw suggested they did not; if
the PMC members agree, then perhaps the discussions should be made public*
so the community can decide their view.  Because if it is the case, that is
a *serious* problem the ASF needs to address.  The board must be held to an
even higher standard than the PMCs it governs.

*as the board appears to have just invested you with the authority to do,
if Jim is to be believed.

On 5 November 2016 at 15:33, Jeff Jirsa  wrote:

> I'm going to attempt to give the most complete answer I can without
> posting comments that were said with the expectation of privacy - it's not
> my place to violate that expectation. Some things discussed here are things
> I wouldn't typically mention in public (notably the topic of trademark
> compliance), but since it has already been mentioned by others and posted
> in the minutes, I'm going to be as open and compete as I can for the sake
> of the community.
>
> For the record and for context, I'm a member of the PMC, voted into the
> PMC fairly recently, but neither a Datastax employee nor customer.
>
> The ASF has very strict guidelines in the way they expect projects to be
> run. Some of these guidelines are hard legal requirements (protecting brand
> trademarks), some are designed to protect the health of the project
> (ensuring diverse contributors, lack of control by a single corporate
> entity).
>
> For a very long time, the most active committers and PMC members were
> Datastax employees - as full time sponsored contributors, they drove the
> vast majority of features. In addition to sponsoring the full time
> contributors, Datastax also actively tried to grow the community - for
> their business to grow, they need adoption of Apache Cassandra, so they
> spent a lot of time and money actively trying to find more contributors and
> creating opportunities for people to learn about Cassandra.
>
> Unfortunately, two unrelated problems arose.
>
> First, apparently, folks like Lucasz'  frustration and decisions like not
> wanting to have in-tree drivers are misinterpreted (in my opinion) as
> inappropriate control. Additionally, the Apache Way calls for decisions to
> be made In public, where a record exists. Some (many?) decisions were
> happening in places like IRC (real time collaboration among full time
> developers) which, while not hidden or private, wasn't logged (it is now)
> and wasn't necessarily obvious to casual observers. While I'll respond to
> Lucasz's email directly in a moment (I find many parts of it incorrect),
> the APPEARANCE for people only barely familiar with the project is that
> Datastax was likely inappropriately controlling the project, a violation of
> ASF guidelines.
>
> Second, some of what Datastax perceived as well intentioned community
> building occasionally violated trademark guidelines. I suspect the most
> likely cause is that marketing materials were written by marketing folks
> who don't understand trademark law. This isn't subjective. The active
> members of the PMC (which, at the time, were primarily Datastax employees)
> ARE responsible for policing trademark and MUST (unambiguously) correct
> misuse - that didn't happen as often as it should have. My opinion is that
> it didn't happen because the PMC was heads down on code and focusing on the
> database, not the marketing, but that's not an acceptable answer.
>
> The combination of these two factors causes the ASF to become involved.
> Apache Cassandra isn't alone here - other big data platforms of various
> shapes are also having similar interactions with the ASF, likely for
> similar reasons. There has been (and will continue to be) communication to
> ensure that ASF trademarks are respected and that Datastax doesn't exert
> undue control over the project. That communication was not a one time
> message - it was back and forth communication for quite some time at the
> PMC level.
>
> Factual objective background out of the way, I'll switch to opinion and
> speculation.
>
> Because this isn't an isolated case (ASF has to deal with multiple
> projects having similar issues) and everyone involved has strong opinions
> that they're acting in the best interest of the project, I SUSPECT that
> frustration runs high, tempers are short, and occasionally things are said
> that shouldn't be said - some of which one may classify as "prematurely
> inflammatory". This serve[s|d] to drive a wedge between two groups that
> nominally have the same goal - a strong Apache Cassandra project.
>
> Ultimately, Datastax has an obligation to their investors to make money
> and 

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Benedict Elliott Smith
Whether or not the actions should have been "FIRST" taken in private, this
is now a retrospective where we provide oversight for the overseers.

I reiterate again that all discussions and actions undertaken should be
made public.  *This community* has just been charged with judging if the
board acted appropriately.  You have not.  We cannot make that judgement
without the facts.




On 5 November 2016 at 13:30, Mark Struberg <strub...@yahoo.de.invalid>
wrote:

> Having a bit insight how the board operates (being PMC-chair for 2 other
> TLPs) I can ensure you that the board did handle this very cleanly!
>
> A few things really should FIRST get handled in private. This is the same
> regardless whether it's about board oversight or you as a PMC.
>
> An example is e.g. when we detect trademark violations. Or if ASF hosted
> pages make unfair advertisement for ONE of the involved contributors. In
> such cases the PMC (or board if the PMC doesn't act by itself) first tries
> to solve those issues _without_ breaking porcelain! Which means the
> respective person or company will get contacted in private and not
> immediately get hit by public shaming and blaming. In most cases it's just
> an oversight and too eager marketing people who don't understand the
> impact. Usually the problems quickly get resolved without anyone loosing
> it's face.
>
>
> Oh, talking about the 'impact' and some people wondering why the ASF board
> is so pissed?
> Well, the point is that in extremis the whole §501(c),3 (non-for-profit)
> status is at risk! Means if we allow a single vendor to create an unfair
> business benefit, then this might be interpreted as a profit making
> mechanism by the federal tax office...
> This is one of the huge differences to some other OSS projects which are
> basically owned by one company or where companies simply can buy a seat in
> the board.
>
>
> LieGrue,
> strub
>
> PS: I strongly believe that the technical people at DataStax really tried
> to do their best but got out-maneuvered by their marketing and sales
> people. The current step was just part of a clean separation btw a company
> and their OSS contributions. It was legally necessary and also important
> for the overall Cassandra community!
>
>
> PPS: DataStax did a lot for Cassandra, but the public perception nowadays
> seems to be that DataStax donated Cassandra to the ASF. This is not true.
> It was created and contributed by Facebook
> https://wiki.apache.org/incubator/Cassandramany years before DataStax was
> even founded
>
>
>
> On Saturday, 5 November 2016, 13:12, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> >
> >I would hope the board would engage with criticism substantively, and
> that "long emails" to boards@ would be responded to on their merit,
> without a grassroots effort to apply pressure.
> >
> >
> >In lieu of that, it is very hard for the community to "speak with one
> voice" because we do not know what actions the board has undertaken.  This
> is at odds with "The Apache Way" core tenet of Openness.
> >
> >
> >The actions I have seen on the public fora by both Chris and Mark make me
> doubt the actions in private were reasonable.
> >
> >
> >
> >I reiterate that the board should make all of its discussions about
> DataStax, particularly those with the PMC-private list, public.  Otherwise
> the community cannot perform the function you ask.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >On 5 November 2016 at 03:08, Ross Gardler <ross.gard...@microsoft.com>
> wrote:
> >
> >[In the mail below I try not to cast judgement, I do not know enough of
> the background to have an opinion on this specific situation. My comments
> are in response to the question “Where are the board's guidelines then, or
> do they make it up as they go?”.]
> >>
> >>The boards guidelines are the Apache Way. This is a fluid thing that
> adapts to individual project needs but has a few common pillars in all
> projects, e.g. PMC is responsible for community health and PMC members are
> expected to act as individuals in the interest of the community. The board
> is empowered, by the ASF membership (individuals with merit) to take any
> action necessary to ensure a PMC is carrying out its duty.
> >>
> >>If a PMC is being ineffective then the board only has blunt instruments
> to work with. Their actions appear to cut deep because they have no scalpel
> with which to work. The scalpel should be in the hands of the PMC, but by
> definition if the board intervenes the PMC is failing to use the scalpel.
> >>
> >&

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Benedict Elliott Smith
How am I misunderstanding you? "not in public" == "private"

The ASF trumpets openness, and you are now apparently campaigning for the
opposite.  All I am demanding is that these "not public" actions be made
"open" and public, inline with ASF ideals.

Ross indicated *this (Cassandra) community* needed to judge if the board
acted appropriately.  Several members of the community, myself included,
believe from the information we have that they may not have done.  If the
board cannot be open about its actions, what are we as a community - whose
views the ASF claims to value - to infer?



On 5 November 2016 at 14:29, Mark Struberg <strub...@yahoo.de.invalid>
wrote:

> You don't understand what I tried to say it seems: those actions HAVE been
> extensively discussed with both DataStax representatives and the Cassandra
> PMC since a LONG time. Just not in public. So this is nothing which just
> boiled up the last month - this really got pointed out amicably by the
> board for a LONG time before _finally_ they took actions!
>
>
> LieGrue,
> strub
>
>
> On Saturday, 5 November 2016, 14:42, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> >Whether or not the actions should have been "FIRST" taken in private,
> this is now a retrospective where we provide oversight for the overseers.
> >
> >
> >
> >I reiterate again that all discussions and actions undertaken should be
> made public.  This community has just been charged with judging if the
> board acted appropriately.  You have not.  We cannot make that judgement
> without the facts.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >On 5 November 2016 at 13:30, Mark Struberg <strub...@yahoo.de.invalid>
> wrote:
> >
> >Having a bit insight how the board operates (being PMC-chair for 2 other
> TLPs) I can ensure you that the board did handle this very cleanly!
> >>
> >>A few things really should FIRST get handled in private. This is the
> same regardless whether it's about board oversight or you as a PMC.
> >>
> >>An example is e.g. when we detect trademark violations. Or if ASF hosted
> pages make unfair advertisement for ONE of the involved contributors. In
> such cases the PMC (or board if the PMC doesn't act by itself) first tries
> to solve those issues _without_ breaking porcelain! Which means the
> respective person or company will get contacted in private and not
> immediately get hit by public shaming and blaming. In most cases it's just
> an oversight and too eager marketing people who don't understand the
> impact. Usually the problems quickly get resolved without anyone loosing
> it's face.
> >>
> >>
> >>Oh, talking about the 'impact' and some people wondering why the ASF
> board is so pissed?
> >>Well, the point is that in extremis the whole §501(c),3 (non-for-profit)
> status is at risk! Means if we allow a single vendor to create an unfair
> business benefit, then this might be interpreted as a profit making
> mechanism by the federal tax office...
> >>This is one of the huge differences to some other OSS projects which are
> basically owned by one company or where companies simply can buy a seat in
> the board.
> >>
> >>
> >>LieGrue,
> >>strub
> >>
> >>PS: I strongly believe that the technical people at DataStax really
> tried to do their best but got out-maneuvered by their marketing and sales
> people. The current step was just part of a clean separation btw a company
> and their OSS contributions. It was legally necessary and also important
> for the overall Cassandra community!
> >>
> >>
> >>PPS: DataStax did a lot for Cassandra, but the public perception
> nowadays seems to be that DataStax donated Cassandra to the ASF. This is
> not true. It was created and contributed by Facebook
> >>https://wiki.apache.org/ incubator/Cassandramany years before DataStax
> was even founded
> >>
> >>
> >>
> >>On Saturday, 5 November 2016, 13:12, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> >>>
> >>>I would hope the board would engage with criticism substantively, and
> that "long emails" to boards@ would be responded to on their merit,
> without a grassroots effort to apply pressure.
> >>>
> >>>
> >>>In lieu of that, it is very hard for the community to "speak with one
> voice" because we do not know what actions the board has undertaken.  This
> is at odds with "The Apache Way" core tenet of Openness.
> >>>
> >>>
> >&

DataStax role in Cassandra and the ASF

2016-11-05 Thread Benedict Elliott Smith
I would hope the board would engage with criticism substantively, and that
"long emails" to boards@ would be responded to on their merit, without a
grassroots effort to apply pressure.

In lieu of that, it is very hard for the community to "speak with one
voice" because
we do not know what actions the board has undertaken.  This is at odds with
"The Apache Way" core tenet of Openness.

The actions I have seen on the public fora by both Chris and Mark make me
doubt the actions in private were reasonable.

I reiterate that the board should make all of its discussions about
DataStax, particularly those with the PMC-private list, public.  Otherwise
the community cannot perform the function you ask.




On 5 November 2016 at 03:08, Ross Gardler <ross.gard...@microsoft.com>
wrote:

> [In the mail below I try not to cast judgement, I do not know enough of
> the background to have an opinion on this specific situation. My comments
> are in response to the question “Where are the board's guidelines then, or
> do they make it up as they go?”.]
>
> The boards guidelines are the Apache Way. This is a fluid thing that
> adapts to individual project needs but has a few common pillars in all
> projects, e.g. PMC is responsible for community health and PMC members are
> expected to act as individuals in the interest of the community. The board
> is empowered, by the ASF membership (individuals with merit) to take any
> action necessary to ensure a PMC is carrying out its duty.
>
> If a PMC is being ineffective then the board only has blunt instruments to
> work with. Their actions appear to cut deep because they have no scalpel
> with which to work. The scalpel should be in the hands of the PMC, but by
> definition if the board intervenes the PMC is failing to use the scalpel.
>
> So how do we identify appropriate action? Well I can tell you that any
> action of the board will result in more dissatisfied PMC members than
> satisfied ones. This is because, by definition, if the board are acting it
> is because the PMC is failing in its duty to build a vendor neutral and
> healthy community. The measure is whether the broader community feel that
> the board are acting in their best interests – including those who have not
> been given the privilege of merit (yes, PMC membership and committership is
> a privilege not a right).
>
> This is not to say the board are incapable of making a mistake. They are 9
> humans after all. However, I can assure you (based on painful experience)
> that getting 9 humans to agree to use a blunt instrument that will make a
> mess in the short term is extremely hard. That’s why we have a board of 9
> rather than 5 (or any other smaller number) it minimizes the chances of
> error. It’s also why the board is usually slower to move than one might
> expect.
>
> However, should the board make a mistake the correct action is to get the
> community as a whole to express their concern. Demonstrate that the
> community, as a whole, feels that the board acted inappropriately. Don’t
> waste time with long emails to board@. The people here trust in the
> process and the board. We don’t know what’s been happening inside your
> project, we don’t pass judgement. To make us care you must have your
> community speak with one voice. Demonstrate that you have consensus around
> your opinions. Then, and only then, will the membership - the people who
> vote for the board and hold them accountable – accept your argument that
> the board have acted inappropriately.
>
> Ross
>
> From: Benedict Elliott Smith [mailto:bened...@apache.org]
> Sent: Friday, November 4, 2016 7:08 PM
> To: dev@cassandra.apache.org
> Cc: Apache Board <bo...@apache.org>; Łukasz Dywicki <l...@code-house.org>;
> Chris Mattmann <mattm...@apache.org>; Kelly Sommers <
> kell.somm...@gmail.com>; Jim Jagielski <j...@jagunet.com>
> Subject: Re: DataStax role in Cassandra and the ASF
>
> Where are the board's guidelines then, or do they make it up as they go?
> Flame wars are a risk of every public forum and discussion, and doing
> everything in public is one of the tenets of the ASF.
>
> Jim Jagielski stated to me on twitter that a bare minimum of discussions
> happen in private, and did not list this as one of the exceptions, despite
> it being the context. His statement was inline with the link I provided,
> and he is a board member.  So ostensibly a board member agrees, at least in
> principle.
>
> Regardless, the issue in question is if the board was sufficiently hostile
> to DataStax for them to rationally and reasonably feel the correct course
> of action was to mitigate their business risk exposure to the ASF board. It
> seems to me that may well be the case, but we can

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-05 Thread Benedict Elliott Smith
Hi Ed,

I would like to try and clear up what I perceive to be some
misunderstandings.

Aleksey is relating that for *complex* tickets there are desperately few
people with the expertise necessary to review them.  In some cases it can
amount to several weeks' work, possibly requiring multiple people, which is
a huge investment.  EPaxos is an example where its complexity likely needs
multiple highly qualified reviewers.

Simpler tickets on the other hand languish due to poor incentives - they
aren't sexy for volunteers, and aren't important for the corporately
sponsored contributors, who also have finite resources.  Nobody *wants* to
do them.

This does contribute to an emergent lack of diversity in the pool of
contributors, but it doesn't discount Aleksey's point.  We need to find a
way forward that handles both of these concerns.

Sponsored contributors have invested time into efforts to expand the
committer pool before, though they have universally failed.  Efforts like
the "low hanging fruit squad" seem like a good idea that might payoff, with
the only risk being the cloud hanging over the project right now.  I think
constructive engagement with potential sponsors is probably the way forward.

(As an aside, the policy on test coverage was historically very poor
indeed, but is I believe much stronger today - try not to judge current
behaviours on those of the past)


On 5 November 2016 at 00:05, Edward Capriolo  wrote:

> "I’m sure users running Cassandra in production would prefer actual proper
> reviews to non-review +1s."
>
> Again, you are implying that only you can do a proper job.
>
> Lets be specific here: You and I are working on this one:
>
> https://issues.apache.org/jira/browse/CASSANDRA-10825
>
> Now, Ariel reported there was no/low code coverage. I went looking a the
> code and found a problem.
>
> If someone were to merge this: I would have more incentive to look for
> other things, then I might find more bugs and improvements. If this process
> keeps going, I would naturally get exposed to more of the code. Finally in
> maybe (I don't know in 10 or 20 years) I could become one of these
> specialists.
>
> Lets peal this situation apart:
>
> https://issues.apache.org/jira/browse/CASSANDRA-10825
>
> "If you grep test/src and cassandra-dtest you will find that the string
> OverloadedException doesn't appear anywhere."
>
> Now let me flip this situation around:
>
> "I'm sure the users running Cassandra in production would prefer proper
> coding practice like writing unit and integration test to rubber stamp
> merges"
>
> When the shoe is on the other foot it does not feel so nice.
>
>
>
>
>
>
>
>
>
>
> On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko 
> wrote:
>
> > Dunno. A sneaky correctness or data corruption bug. A performance
> > regression. Or something that can take a node/cluster down.
> >
> > Of course no process is bullet-proof. The purpose of review is to
> minimise
> > the odds of such a thing happening.
> >
> > I’m sure users running Cassandra in production would prefer actual proper
> > reviews to non-review +1s.
> >
> > --
> > AY
> >
> > On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxg...@gmail.com)
> > wrote:
> >
> > I feel that is really standing up on a soap box. What would be the worst
> > thing that happens here
>


Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Benedict Elliott Smith
This discussion is bundling up two issues:

1) Did DataStax have an outsized role on the project which needed to be
offset, preferably with increased participation?

2) Did the Board behave reasonably in trying to fix it?

As far as I can tell the answers are 1) Yes, 2) No

Can the board please now unequivocally answer if they followed protocol and
kept all discussions around company involvement to public mailing lists?

https://www.apache.org/dev/pmc.html#mailing-list-private

I'm certain they did not, and they cannot as a result claim to be upholding
ASF process and ideals.  Similarly to how Mark Thomas recently attempted to
misapply ASF policies, when policing user mailing list discussions.

I originally supported the ASF efforts to improve the project. I have since
lost all faith in the board.



On Saturday, 5 November 2016, Chris Mattmann  wrote:

> Thank you for sending this. I am not going to reply in depth now, but will
> do so to Kelly and
> others over the weekend, but this is *precisely* the reason that I have
> been so emphatic
> about trying to get the PMC to see the road they have already gone done
> and the ship that
> has already set sail.
>
> Those not familiar with Lucene and its vote to merge Lucene/Solr may want
> to Google the
> Apache archives around 2010 and see some of the effects of Individual
> organizations and
> vendors driving supposedly vendor neutral Apache projects. It’s not even
> conjecture at this
> point in Cassandra. The Board has acted as Greg referred to else-thread,
> and we asked Jonathan & the
> PMC to find a new chair (rotation is healthy yes, but we also need the
> chair to be the eyes
> and ears of the Board and we asked for a change there). Mark Thomas from
> the Apache Board
> also has a set of actions that he is working with the PMC having to do
> with trademarks and
> other items to move towards more independent governance.
>
> Your experience that you cite below Lukasz is precisely one I found in
> Lucene/Solr, Hadoop,
> Maven, and other projects. Sometimes the ship has been righted – for
> example in all of these
> projects they have moved towards much more independent governance,
> welcoming to contributors,
> and shared community for the project. However, in other cases (see
> IBATIS), it didn’t work out, for
> various reasons including community issues, but also misunderstandings as
> to the way that the
> ASF works. I know my own experience of being an unpaid, occasional
> contributor to some open
> source projects has put me to a disadvantage even in some ASF projects
> driven by a single vendor.
> I’ve also been paid to work on open source (at the ASF and elsewhere) and
> in doing so, been on the
> other side of the code. That’s why ASF projects and my own work in
> particular I strive to try and
> remain neutral and to address these types of issues by being welcoming,
> lower the bar to committership
> and PMC, and moving “contributors” to having a vote/shared governance of
> the project at the ASF.
>
> Thanks for sending this email and your insights are welcome below. The
> Apache Board should hear this
> too so I am CC’ing them.
>
> Cheers,
> Chris
>
>
>
>
>
> On 11/4/16, 5:03 PM, "Łukasz Dywicki" >
> wrote:
>
> Good evening,
> I feel myself a bit called to table by both Kelly and Chris. Thing is
> I don’t know personally nor have any relationship with both of you. I’m not
> even ASF member. My tweet was simply reaction for Kelly complaints about
> ASF punishing out DataStax. Kelly timeline also contained statement such
> "forming a long term strategy to grow diversity around” which reminded me
> my attempts to collaborate on Cassandra and Tinkerpop projects to grow such
> diversity. I collected message links and quotes and put it into gist who
> could be read by anyone:
> https://gist.github.com/splatch/aebe4ad4d127922642bee0dc9a8b1ec1
>
> I don’t want to bring now these topics back and disscuss technical
> stuff over again. It happened to me in the past to refuse (or vote against)
> some change proposals in other Apache projects I am involved. I was on the
> other ("bad guy") side multiple times. I simply collected public records of
> interactions with DataStax staff I was aware, simply because of my personal
> involvement. It shown how some ideas, yet cassandra mailing list don’t have
> many of these coming from externals, are getting put a side with very
> little or even lack of will to pull in others people work in. This is
> blocking point for anyone coming from external sides to get involved into
> project and help it growing. If someone changes requires moves in project
> core or it’s public APIs that person will require support from project
> members to get this done. If such help will not be given it any outside
> change will be ever completed and noone will invest time in doing something
> more than fixing typos or common programmer errors which we all do from
> 

Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Benedict Elliott Smith
Where are the board's guidelines then, or do they make it up as they go?
Flame wars are a risk of every public forum and discussion, and doing
everything in public is one of the tenets of the ASF.

Jim Jagielski stated to me on twitter that a bare minimum of
discussions happen in private, and did not list this as one of the
exceptions, despite it being the context. His statement was inline with the
link I provided, and he is a board member.  So ostensibly a board member
agrees, at least in principle.

Regardless, the issue in question is if the board was sufficiently hostile
to DataStax for them to rationally and reasonably feel the correct course
of action was to mitigate their business risk exposure to the ASF board. It
seems to me that may well be the case, but we cannot know for sure because
the board was doing it behind closed doors despite members of the
board suggesting this isn't how things work.

Given this inconsistency, and the fact that Mark Thomas (a board member)
apparently hadn't even read the ASF guidelines before wantonly enforcing
them, and the composure of Chris, as pointed out by Russel, I think it is
reasonable to doubt the boards' credibility entirely.

So, I'm asking for clarity.  Preferably, a complete publication of the
discussions that happened in private on the topic.







On Saturday, 5 November 2016, Tom Barber <tom.bar...@meteorite.bi> wrote:

> You know you've linked to a PMC page, when the board isn't a PMC? For
> example, board member a, thinks project X isn't doing things correctly and
> their first course of action is to post notes on a public development
> mailing list? You'd have arguments and flame wars left right and centre.
>
> Having watched the discussion unfolding, whilst some discussion clearly
> went on on a private mailing list, the details pertinent to the PMC  were
> made available and I believe they were CC'd pretty regularly.
>
> I won't answer directly for the board for #2, but I suspect the answer
> would be, Cassandra has been through the incubation phase, so the PMC
> should understand how the project should be run, its not the boards job to
> fix it directly. Did the board act unreasonably? I don't think so. Did some
> heated discussions take place? Undoubtedly.
>
>
>
> On Sat, Nov 5, 2016 at 12:28 AM, Benedict Elliott Smith <
> bened...@apache.org <javascript:;>
> > wrote:
>
> > This discussion is bundling up two issues:
> >
> > 1) Did DataStax have an outsized role on the project which needed to be
> > offset, preferably with increased participation?
> >
> > 2) Did the Board behave reasonably in trying to fix it?
> >
> > As far as I can tell the answers are 1) Yes, 2) No
> >
> > Can the board please now unequivocally answer if they followed protocol
> > and kept all discussions around company involvement to public mailing
> lists?
> >
> > https://www.apache.org/dev/pmc.html#mailing-list-private
> >
> > I'm certain they did not, and they cannot as a result claim to be
> > upholding ASF process and ideals.  Similarly to how Mark Thomas recently
> > attempted to misapply ASF policies, when policing user mailing
> > list discussions.
> >
> > I originally supported the ASF efforts to improve the project. I have
> > since lost all faith in the board.
> >
> >
> >
> > On Saturday, 5 November 2016, Chris Mattmann <mattm...@apache.org
> <javascript:;>> wrote:
> >
> >> Thank you for sending this. I am not going to reply in depth now, but
> >> will do so to Kelly and
> >> others over the weekend, but this is *precisely* the reason that I have
> >> been so emphatic
> >> about trying to get the PMC to see the road they have already gone done
> >> and the ship that
> >> has already set sail.
> >>
> >> Those not familiar with Lucene and its vote to merge Lucene/Solr may
> want
> >> to Google the
> >> Apache archives around 2010 and see some of the effects of Individual
> >> organizations and
> >> vendors driving supposedly vendor neutral Apache projects. It’s not even
> >> conjecture at this
> >> point in Cassandra. The Board has acted as Greg referred to else-thread,
> >> and we asked Jonathan & the
> >> PMC to find a new chair (rotation is healthy yes, but we also need the
> >> chair to be the eyes
> >> and ears of the Board and we asked for a change there). Mark Thomas from
> >> the Apache Board
> >> also has a set of actions that he is working with the PMC having to do
> >> with trademarks and
> >> other items to move towards more independent governance.
> >>
> >> Yo

Re: Review of Cassandra actions

2016-11-07 Thread Benedict Elliott Smith
Hi Mark,

Thanks, that was a calm and diplomatic email.

recognise where they might need to apologise


I will start the ball rolling here, as I have not always been generous in
my interpretations of others' actions, and have certainly contributed to
escalation.

But I wonder if you would also help get the ball rolling; your reasonable
tone gives me hope that you can.  The topic for me has been: can board
members recognise publicly where they have misstepped.  Doing so provides
assurances to the whole ASF community that the board can be trusted.

https://www.mail-archive.com/user@cassandra.apache.org/msg48692.html

In this email chain not long ago, you attempted to apply a misreading of
the ASF guidelines to non-ASF individuals.  When I pointed this out, you
went silent.  In that chain, as now, I had a righteous indignation that no
doubt inflamed the topic, and could have resolved the issue with more
diplomacy.  I'm also sure you had excellent intentions.

Nevertheless, you did misstep as a board member by quite badly misapplying
the guidelines.  With no public recognition of this, I was left with an
impression of unaccountability; I don't know how others responded.

I think it would be fantastic if board members, as people in positions of
authority, lead by example and began recognising where their public
behaviour has missed the mark.  Perhaps that would promote those in less
lofty positions to begin doing the same, and greater trust all round.




On 6 November 2016 at 21:42, Mark Thomas  wrote:

> For the sake of clarity I am a member of the ASF board but I am not
> speaking on behalf of the board in this email.
>
> On 06/11/2016 01:25, Jeff Jirsa wrote:
> > I hope the other 7 members of the board take note of this response,
> > and other similar reactions on dev@ today.
>
> I can't speak for all seven other board members but I can say that I am
> monitoring this thread and the related threads (although I haven't
> looked at Twitter where a lot of this seems to have originated). It is
> apparent to me that a number of the other directors are monitoring these
> threads too.
>
> > When Datastax violated trademark, they acknowledged it and worked to
> > correct it. To their credit, they tried to do the right thing.
> > When the PMC failed to enforce problems, we acknowledged it and worked
> > to correct it. We aren't perfect, but we're trying.
>
> I think you are being a little hard on the PMC there. There was scope
> for both parties to do better in a number of areas.
>
> I do agree that things in the PMC have improved and are heading in the
> right direction (with some more work still to do), as I hope I made
> clear in the summary section of the review e-mail I wrote (privately) to
> the PMC a few weeks ago.
>
> > When a few members the board openly violate the code of conduct, being
> > condescending and disrespectful under the auspices of "enforcing the
> > rules" and "protecting the community", they're breaking the rules,
> > damaging the community, and nobody seems willing to acknowledge it or
> > work to correct it. It's not isolated, I'll link examples if it's
> > useful.
>
> I take it you mean "nobody on the board seems willing...". Again, I
> can't speak for the other board members but let me try and explain my
> own thinking.
>
> A number of posts from a variety of authors on this topic in recent days
> have fallen short of the standard expected on an Apache list. Trying to
> correct that without causing the situation to escalate is hard. The last
> thing I want to do is add fuel to the fire. I've started to draft a
> couple of emails at various points over the weekend only to find by the
> time I'm happy(ish) with the draft, the thread has moved on and I need
> to start again.
>
> Alongside this I had hoped that things would have slowed down enough
> over the weekend to give everyone time to reflect, recognise where they
> might need to apologise and aim to start this coming week on a more
> positive footing. There have been signs of this which I take to be
> encouraging. Moving forward I'd encourage everyone to pause and review
> what they have just written with the Code of Conduct in mind before
> pressing send.
>
> > In a time when we're all trying to do the right thing to protect the
> > project and the community, it's unfortunate that high ranking, long
> > time members within the ASF actively work to undermine trust and
> > community while flaunting the code of conduct, which requires
> > friendliness, empathy, and professionalism, and the rest of the board
> > is silent on the matter.
>
> Your calm responses and efforts to inform the community are appreciated.
> It is not an easy task and kudos to you for taking it on.
>
> As as been said several times in recent days, board members are rarely
> speaking on behalf of the board (i.e. representing the agreed position
> of the board). It is unusual enough that when we do we'll make it
> explicit. One of the reasons for 

Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
... and continuing in the fashion of behaviours one might like to disabuse
people of, no code link is provided.



On 18 October 2016 at 16:28, Michael Kjellman 
wrote:

> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone


Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
This is what JIRA is for.  It seems to date back to CASSANDRA-1470, where
the default became immediately evicting newly compacted files.

This results in cold reads for *hot* data after compaction, so
CASSANDRA-6916 permitted evicting the *old* data instead, while
guaranteeing >= the same amount of eviction.

Whether or not the original issue of cold compaction data was a pain point,
I cannot attest, but I was assured (by whom, I do not recall) that it was.
In its present form it is at least not harmful.  It was (and is) not a
no-op:

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.6916v3-preempive-open-compact.mixed.2.json=op_rate=mixed=1_aggregates=true=0=545.6=0=114638.7




On 18 October 2016 at 17:42, Michael Kjellman 
wrote:

> Yeah, it has been there for years -- that being said most of the community
> is just catching up to 2.1 and 3.0 now where the usage did appear to change
> over 2.0-- and I'm more trying to figure out what the intent was in the
> various usages all over the codebase and make sure it's actually doing
> that. Maybe even add some comments about that intent. :)
>
> In 2.1 I saw that we were doing this to get the file descriptor in some
> cases (which obviously will return the wrong file descriptor so most likely
> would have made this even more of a potential no-op than it already was?):
>
> public static int getfd(String path)
> {
> RandomAccessFile file = null;
> try
> {
> file = new RandomAccessFile(path, "r");
> return getfd(file.getFD());
> }
> catch (Throwable t)
> {
> JVMStabilityInspector.inspectThrowable(t);
> // ignore
> return -1;
> }
> finally
> {
> try
> {
> if (file != null)
> file.close();
> }
> catch (Throwable t)
> {
> // ignore
> }
> }
> }
>
>
> On Oct 18, 2016, at 9:34 AM, Jake Luciani  s...@gmail.com>> wrote:
>
> Although given we have an in process page cache[1] now this may not be
> needed anymore?
> This is only for the data file though.  I think its been years? since we
> showed it helped so perhaps someone should show if this is still
> working/helping in the real world.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5863
>
>
> On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."
>
> On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
> mkjell...@internalcircle.com mkjell...@internalcircle.com>> wrote:
>
> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> On Oct 18, 2016, at 8:50 AM, Jake Luciani  s...@gmail.com> s...@gmail.com>> wrote:
>
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
>
> On Oct 18, 2016 11:29 AM, "Michael Kjellman"  
> >
> wrote:
>
> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone
>
>
>
>
>

Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
I'm not certain this is the best way to go about encouraging people to help
you, or generally encourage participation in the project.  You have seemed
to lash out at the project (and in this case me specifically) in a fairly
antagonistic manner a multitude of times in just a couple of hours.

Your original question, on zero, predates anything I know about.  JIRA is
your best bet, and provides historical context that is never going to live
in comments.  I did not imply that the comments were adequate, only that *this
is where you should probably look to answer your question.  *Comment policy
and norms have changed a lot throughout Cassandra's history, and you're
asking about a time that predates the current level of maturity, but JIRA
has always been (AFAIK) the main source of historical context.  I attempted
to provide some links into this to save you from the "billion" (handful) of
tickets.

I don't have time for another flamewar, so I will leave out trying to
assist you in future.




On 18 October 2016 at 18:28, Michael Kjellman <mkjell...@internalcircle.com>
wrote:

> Sorry, No. Always document your assumptions. I shouldn't need to git blame
> a thousand commits and read thru a billion tickets to maybe understand why
> something was done. Clearly thru the conversations on this topic I've had
> on IRC and the responses so far on this email thread it's not/still not
> obvious.
>
> best,
> kjellman
>
> On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith <bened...@apache.org
> <mailto:bened...@apache.org>> wrote:
>
> This is what JIRA is for.
>
>


Re: Moderation

2016-11-04 Thread Benedict Elliott Smith
Wow, that was quite the aggressive email. The thing is, it very much looks
like the only reason you care about this delay is because Kellabyte is
making the ASF board look bad on twitter.  If it weren't the case, it seems
unlikely such a "slow" 12hr response would receive board notice, let alone
ire.

I think the board forgets that all of these functions are fulfilled by
volunteers (whoever the moderators are - I genuinely haven't a clue).
Expecting volunteers to jump to it, because the board is looking bad, seems
like a pretty clear *abuse* of process.

On 4 November 2016 at 16:44, Chris Mattmann  wrote:

> So seriously, we're going to send now 4 emails talking about what a user
> of Apache Cassandra and possible community member could have done right or
> better or sooner, or that there is no time limit to moderating shit when it
> could have been as simple as literally sending a confirmation email to
> moderate it through? This is the definition of process over community. And
> it's the definition (wrongly so) of why people think it's "Apache" that
> induces the processes that make shit hard, and not the community itself.
> Seriously this is a joke. So what if she didn't do it right the first time.
> You think potentially moderating her mail through and then sending a kind
> email suggesting she look at the instructions for how to subscribe, which
> oh someone may not have found easy to do or simply not understood that
> simply sending an email to the list wouldn't have made it go through the
> first time? Is it that hard to figure out? Really?
>
> This is the definition of making things hard and not making them easy or
> friendly. And this is also exactly what enables people to sound off on
> Twitter about a project, and loses the conversation that could have been
> had on Apache mailing lists. Kelly has been tweeting for days. I saw her
> tweets retweeted by someone in my feed, and yesterday asked her kindly to
> bring her conversation to the list. 12 hours later it's still in
> moderation, and we are arguing whether to f'ing moderate it through. Wow.
> Great job.
>
> On 2016-11-04 09:37 (-0700), Edward Capriolo 
> wrote:
> > Is the message in moderation because
> > 1) it was sent by someone not registered with the list
> > 2) some other reason (anti-spam etc)
> >
> > If it is is case 1: Isn't the correct process to inform and encourage
> > someone list properly?
> > If it is case 2: Is there an expected ETA for list moderation events?
> > (probably not)
> >
> > I see twitter mentioned. We know that sometimes news and sentiment in
> > social media move fast and cause reactions on incorrect/unvetted
> > information.
> >
> >
> > On Fri, Nov 4, 2016 at 11:58 AM, Mattmann, Chris A (3010) <
> > chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> > > Hmm. Not excessive but you have a situation where someone is tweeting
> > > thinking her message didn't go through and conversation is happening
> there
> > > when that same conversation could be had on list. If you are ok with
> that
> > > continuing to happen then great but I am not. Can someone please
> moderate
> > > the message through?
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 4, 2016, at 8:54 AM, Mark Thomas  wrote:
> > > >
> > > >> On 04/11/2016 15:47, Chris Mattmann wrote:
> > > >> Hi Folks,
> > > >>
> > > >> Kelly Sommers sent a message to dev@cassandra and I'm trying to
> figure
> > > out if it's in moderation.
> > > >>
> > > >> Can the moderators speak up?
> > > >
> > > > Using my infra karma, I checked the mail server. That message is
> waiting
> > > > for moderator approval. It has been in moderation for 12 hours which
> > > > doesn't strike me as at all excessive.
> > > >
> > > > Mark
> > > >
> > >
> >
>


Re: Moderation

2016-11-04 Thread Benedict Elliott Smith
I'm not a PMC, and given everything that's happened recently, I no longer
have any desire to be.

I have nothing against improving the moderator situation.  What I have, and
continue to have, is something against the way you and the board go about
things.

On 4 November 2016 at 16:57, Chris Mattmann <mattm...@apache.org> wrote:

>
>
> On 2016-11-04 09:51 (-0700), Benedict Elliott Smith <bened...@apache.org>
> wrote:
> > Wow, that was quite the aggressive email. The thing is, it very much
> looks
> > like the only reason you care about this delay is because Kellabyte is
> > making the ASF board look bad on twitter.  If it weren't the case, it
> seems
> > unlikely such a "slow" 12hr response would receive board notice, let
> alone
> > ire.
> >
> > I think the board forgets that all of these functions are fulfilled by
> > volunteers (whoever the moderators are - I genuinely haven't a clue).
> > Expecting volunteers to jump to it, because the board is looking bad,
> seems
> > like a pretty clear *abuse* of process.
> >
>
>  She is welcome to denigrate the Apache Board. In fact, if you go back and
> read the Tweets she was originally doing so to DataStax. That said, the
> whole premise is that this is a conversation happening on Twitter where
> potentially knowledge could be gained about *Apache* Cassandra. You know,
> the project here at the ASF? And not somewhere else? Yet again, here we are
> at the 6th email, and the 2 second task to moderate a message through that
> could enable a conversation to be had on the *Apache* lists rather than
> Twitter still remains not being had here.
>
> I have been subscribed to dev@cassandra for months. This is not a high
> volume list. AT ALL. Yet you act like it's volunteer time that's preventing
> moderating a message through in 12 hours. Instead of asking the real
> question - are there enough moderators for the list in different timezones
> that can appropriately ensure that conversation happens on the list? Is
> that your goal? Are you on the Apache Cassandra PMC? Do you think it's
> healthy to send emails trying to talk shit instead of simply moderating
> messages through that could ground the conversation here at the ASF?
>
> Clearly per your snark and email you are pleased with Kelly "making the
> board look bad" [sic] on Twitter. Why not increase the visibility of making
> the board look bad and do so here on the official list for the project? Or
> is Twitter the official list now? Go ahead, I'll wait.
>
>
>
>
> > On 4 November 2016 at 16:44, Chris Mattmann <mattm...@apache.org> wrote:
> >
> > > So seriously, we're going to send now 4 emails talking about what a
> user
> > > of Apache Cassandra and possible community member could have done
> right or
> > > better or sooner, or that there is no time limit to moderating shit
> when it
> > > could have been as simple as literally sending a confirmation email to
> > > moderate it through? This is the definition of process over community.
> And
> > > it's the definition (wrongly so) of why people think it's "Apache" that
> > > induces the processes that make shit hard, and not the community
> itself.
> > > Seriously this is a joke. So what if she didn't do it right the first
> time.
> > > You think potentially moderating her mail through and then sending a
> kind
> > > email suggesting she look at the instructions for how to subscribe,
> which
> > > oh someone may not have found easy to do or simply not understood that
> > > simply sending an email to the list wouldn't have made it go through
> the
> > > first time? Is it that hard to figure out? Really?
> > >
> > > This is the definition of making things hard and not making them easy
> or
> > > friendly. And this is also exactly what enables people to sound off on
> > > Twitter about a project, and loses the conversation that could have
> been
> > > had on Apache mailing lists. Kelly has been tweeting for days. I saw
> her
> > > tweets retweeted by someone in my feed, and yesterday asked her kindly
> to
> > > bring her conversation to the list. 12 hours later it's still in
> > > moderation, and we are arguing whether to f'ing moderate it through.
> Wow.
> > > Great job.
> > >
> > > On 2016-11-04 09:37 (-0700), Edward Capriolo <edlinuxg...@gmail.com>
> > > wrote:
> > > > Is the message in moderation because
> > > > 1) it was sent by someone not registered with the list
> > > > 2) some other reason (anti-spam etc)
> > > >
>

Re: Proposal to retroactively mark materialized views experimental

2017-10-03 Thread Benedict Elliott Smith
This link is a helpful segway to another problem with MVs and defaults - the 
default behavioural unsafety we opt for by not writing through the batch log, 
opening far more windows for data inconsistency than the algorithm otherwise 
permits.  Without a way to detect or repair these inconsistencies this seems 
cavalier as a default, and it's even more pressing to change in my opinion, 
however we resolve the experimental status (I am in favour of marking them 
experimental also, ftr)

As the originator of the broad-strokes algorithm, I'd more generally like to 
offer my 2c that we do not sufficiently understand the algorithm's qualities to 
recommend it in production, or perhaps at all.  We had agreed over IRC that we 
would model/simulate the algorithm under a multiplicity of cluster scenarios 
(by which I mean something more complete than dtests) before we released, but 
this unfortunately never materialised.  

The documentation at present also doesn't highlight the problems we *know* can 
occur, such as with availability-loss, and the fuzzy-application of consistency 
levels with MVs.

Certainly it may be said that it was harder to achieve this functionality by 
application maintainers, but I do think we have a duty of care to fully 
understand and explain the new tools we provide, as it may be that we offer 
fewer guarantees than many users might be able to readily achieve, and they 
don't know this.  Even for those users we can offer better guarantees, it is in 
my opinion fundamentally a problem to offer a tool we do not fully understand 
or fully explain/caveat the behaviour of.


On 3 Oct 2017, at 15:02, Jake Luciani  wrote:

>> 
>> The remaining issues are:
>> 
>> * There's no way to determine if a view is out of sync with the base table.
>> * If you do determine that a view is out of sync, the only way to fix it
>> is to drop and rebuild the view.
>> * There are liveness issues with updates being reflected in the view.
>> 
> 
> I just want to mention that manual de-normalization has all the same issues
> as the list of above.  If you write to multiple tables with batch logs when
> do you know the data is consistent?
> In fact, manual de-normalization is worse because you can't manually handle
> updates to existing data due to the lack of synchronization on read before
> write.
> 
> I think a lot of you have lost sight on what MV was intended for, as a way
> to keep developers from manually maintaining a consistent view of data
> across tables.
> There is still the fundamental problem of managing multiple views of data
> even if you remove the MV feature, you just make it someone else's problem.
> 
> I'll re-post this blog from back when MVs first came out to hopefully clear
> questions up on the goals of MV.
> 
> https://www.datastax.com/dev/blog/understanding-materialized-views
> 
> -Jake
> 
> 
> On Tue, Oct 3, 2017 at 2:50 PM, Aleksey Yeshchenko 
> wrote:
> 
>> Indeed. Paulo and Zhao did a lot of good work to make the situation less
>> bad. You did some as well. Even I retouched small parts of it - metadata
>> related. I’m sorry if I came off as disrespectful - I didn’t mean to. I’ve
>> seen and I appreciate every commit that went into it.
>> 
>> It is however my opinion that we started at a very low point, for a
>> variety of reasons, and climbing out of that initial poor state, to the
>> level that power users start having trust in MVs and overcome the initial
>> deservedly poor impression, will probably take even more work. And when/if
>> we get there, maybe we won’t need the switch anymore.
>> 
>> —
>> AY
>> 
>> On 3 October 2017 at 17:00:31, Sylvain Lebresne (sylv...@datastax.com)
>> wrote:
>> 
>> You're giving little credit to the hard work that people have put into
>> getting MV in a usable state.
>> 
> 
> 
> 
> -- 
> http://twitter.com/tjake


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Proposal to retroactively mark materialized views experimental

2017-10-03 Thread Benedict Elliott Smith
So, I'm of the opinion there's a difference between users misusing a well 
understood feature whose shortcomings are widely discussed in the community, 
and providing a feature we don't fully understand, have not fully documented 
the caveats of, let alone discovered all the problems with nor had that 
knowledge percolate fully into the wider community.

I also think there's a huge difference between users shooting themselves in the 
foot, and us shooting them in the foot.  

There's a degree of trust - undeserved - that goes with being a database.  
People assume you're smarter than them, and that it Just Works.  Given this, 
and that squandering this trust as a bad thing, I personally believe it is 
better to offer the feature as experimental until we iron out all of the 
problems, fully understand it, and have a wider community knowledge base around 
it.

We can still encourage users that can tolerate problems to use it, but we won't 
be giving any false assurances to those that don't.  Doesn't that seem like a 
win-win?



> On 3 Oct 2017, at 21:07, Jeremiah D Jordan <jeremiah.jor...@gmail.com> wrote:
> 
> So for some perspective here, how do users who do not get the guarantees of 
> MV’s implement this on their own?  They used logged batches.
> 
> Pseudo CQL here, but you should get the picture:
> 
> If they don’t ever update data, they do it like so, and it is pretty safe:
> BEGIN BATCH
> INSERT tablea blah
> INSERT tableb blahview
> END BATCH
> 
> If they do update data, they likely do it like so, and get it wrong in the 
> face of concurrency:
> SELECT * from tablea WHERE blah;
> 
> BEGIN BATCH
> INSERT tablea blah
> INSERT tableb blahview
> DELETE tableb oldblahview
> END BATCH
> 
> A sophisticated user that understands the concurrency issues may well try to 
> implement it like so:
> 
> SELECT key, col1, col2 FROM tablea WHERE key=blah;
> 
> BEGIN BATCH
> UPDATE tablea col1=new1, col2=new2 WHERE key=blah IF col1=old1 and col2=old2
> UPDATE tableb viewc1=new2, viewc2=blah WHERE key=new1
> DELETE tableb WHERE key=old1
> END BATCH
> 
> And it wouldn’t work because you can only use LWT in a BATCH if all updates 
> have the same partition key value, and the whole point of a view most of the 
> time is that it doesn't (and there are other issues with this, like most 
> likely needing to use uuid’s or something else to distinguish between 
> concurrent updates, that are not realized until it is too late).
> 
> A user who does not dig in and understand how MV’s work, most likely also 
> does not dig in to understand the trade offs and draw backs of logged batches 
> to multiple tables across different partition keys.  Or even necessarily of 
> read before writes, and concurrent updates and the races inherent in them.  I 
> would guess that using MV’s, even as they are today is *safer* for these 
> users than rolling their own.  I have seen these patterns implemented by 
> people many times, including the “broken in the face of concurrency” version. 
>  So lets please not try to argue that a casual user that does not dig in to 
> the specifics of feature A is going dig in and understand the specifics of 
> any other features.  So yes, I would prefer my bank to use MV’s as they are 
> today over rolling their own, and getting it even more wrong.
> 
> Now, even given all that, if we want to warn users of the pit falls of using 
> MV’s, then lets do that.  But lets keep some perspective on how things 
> actually get used.
> 
> -Jeremiah
> 
>> On Oct 3, 2017, at 8:12 PM, Benedict Elliott Smith <_...@belliottsmith.com> 
>> wrote:
>> 
>> While many users may apparently be using MVs successfully, the problem is 
>> how few (if any) know what guarantees they are getting.  Since we aren’t 
>> even absolutely certain ourselves, it cannot be many.  Most of the 
>> shortcomings we are aware of are complicated, concern failure scenarios and 
>> aren’t fully explained; i.e. if you’re lucky they’ll never be a problem, but 
>> some users must surely be bitten, and they won’t have had fair warning.  The 
>> same goes for as-yet undiscovered edge cases.
>> 
>> It is my humble opinion that averting problems like this for just a handful 
>> of users, that cannot readily tolerate corruption, offsets any inconvenience 
>> we might cause to those who can.
>> 
>> For the record, while it’s true that detecting inconsistencies is as much of 
>> a problem for user-rolled solutions, it’s worth remembering that the 
>> inconsistencies themselves are not equally likely:
>> 
>> In cases where C* is not the database of record, it is quite easy to provide 
>> very good consistency guarantees when rolling your own
>> Conv

Re: Proposal to retroactively mark materialized views experimental

2017-10-03 Thread Benedict Elliott Smith
While many users may apparently be using MVs successfully, the problem is how 
few (if any) know what guarantees they are getting.  Since we aren’t even 
absolutely certain ourselves, it cannot be many.  Most of the shortcomings we 
are aware of are complicated, concern failure scenarios and aren’t fully 
explained; i.e. if you’re lucky they’ll never be a problem, but some users must 
surely be bitten, and they won’t have had fair warning.  The same goes for 
as-yet undiscovered edge cases.

It is my humble opinion that averting problems like this for just a handful of 
users, that cannot readily tolerate corruption, offsets any inconvenience we 
might cause to those who can.

For the record, while it’s true that detecting inconsistencies is as much of a 
problem for user-rolled solutions, it’s worth remembering that the 
inconsistencies themselves are not equally likely:

In cases where C* is not the database of record, it is quite easy to provide 
very good consistency guarantees when rolling your own
Conversely, a global-CAS with synchronous QUORUM updates that are retried until 
success, while much slower, also doesn’t easily suffer these consistency 
problems, and is the naive approach a user might take if C* were the database 
of record

Given our approach isn’t uniformly superior, I think we should be very cautious 
about how it is made available until we’re very confident in it, and we and the 
community fully understand it.


> On 3 Oct 2017, at 18:51, kurt greaves  wrote:
> 
> Lots of users are already using MV's, believe it or not in some cases quite
> effectively and also on older versions which were still exposed to a lot of
> the bugs that cause inconsistencies. 3.11.1 has come a long way since then
> and I think with a bit more documentation around the current issues marking
> MV's as experimental is unnecessary and likely annoying for current users.
> On that note we've already had complaints about changing defaults and
> behaviours willy nilly across majors and minors, I can't see this helping
> our cause. Sure, you can make it "seamless" from an upgrade perspective,
> but that doesn't account for every single way operators do things. I'm sure
> someone will express surprise when they run up a new cluster or datacenter
> for testing with default config and find out that they have to enable MV's.
> Meanwhile they've been using them the whole time and haven't had any major
> issues because they didn't touch the edge cases.
> 
> I'd like to point out that introducing "experimental" features sets a
> precedent for future releases, and will likely result in using the
> "experimental" tag to push out features that are not ready (again). In fact
> we already routinely say >=3 isn't production ready yet, so why don't we
> just mark 3+ as "experimental" as well? I don't think experimental is the
> right approach for a database. The better solution, as I said, is more
> verification and testing during the release process (by users!). A lot of
> other projects take this approach, and it certainly makes sense. It could
> also be coupled with beta releases, so people can start getting
> verification of their new features at an earlier date. Granted this is
> similar to experimental features, but applied to the whole release rather
> than just individual features.
> 
> * There's no way to determine if a view is out of sync with the base table.
>> 
> As already pointed out by Jake, this is still true when you don't use
> MV's. We should document this. I think it's entirely fair to say that
> users *should
> not *expect this to be done for them. There is also no way for a user to
> determine they have inconsistencies short of their own verification. And
> also a lot of the synchronisation problems have been resolved, undoubtedly
> there are more unknowns out there but what MV's have is still better than
> managing your own.
> 
>> * If you do determine that a view is out of sync, the only way to fix it
>> is to drop and rebuild the view.
>> 
> This is undoubtedly a problem, but also no worse than managing your own
> views. Also at least there is still a way to fix your view. It certainly
> shouldn't be as common in 3.11.1/3.0.15, and we have enough insight now to
> be able to tell when out of sync will actually occur, so we can document
> those cases.
> 
>> * There are liveness issues with updates being reflected in the view.
> 
> What specific issues are you referring to here? The only one I'm aware of
> is deletion of unselected columns in the view affecting out of order
> updates. If we deem this a major problem we can document it or at least put
> a restriction in place until it's fixed in CASSANDRA-13826
> 
> ​
> 
> In this case, 'out of sync' means 'you lost data', since the current design
>> + repair should keep things eventually consistent right?
> 
> I'd like Zhao or Paulo to confirm here but I believe the only way you 

Re: duration based config settings

2017-12-04 Thread Benedict Elliott Smith
I'm strongly in favour of it; it's always bugged me, and I hadn't realised
it might be contentious to implement.

I'd be in favour of never retiring the _ms format though - it's almost
free, there's no backward compatibility problems, and it's fairly intuitive
so long as we're consistent about it.

The only sticking point I can personally see is that there might be a
desire to roll this up into a standardisation effort, as we have some
duration properties (e.g. key_cache_save_period) which do not have a unit
suffix.  If we want to standardise on blah_blah_ms, and blah_blah we might
want to moonlight those, migrating them to some new blah_blahish_ms and
blah_blahish.

On 5 December 2017 at 00:24, Jon Haddad  wrote:

> I ways back I had entire CASSANDRA-13976 out of sheer annoyance to change
> the hint time to be in minutes instead of ms.  Millisecond based resolution
> is a bit absurd for things like hints.  I figured minutes would be better,
> but after some back and forth realized durations (3h, 30m, etc) would be a
> lot easier to work with, and would probably be appropriate across the board.
>
> I’ve dealt with quite a few clusters in the last year, and I’ve seen a
> handful of fat fingered config files, or non-standard times that make me
> bust out a calculator to be sure I’ve got things sorted out right, hence
> the original issue.
>
> Jeff Jirsa suggested migrating to duration types would result in migration
> pain, and I’m not disagreeing with him.  I think we if were to move to
> duration types, we’d want something like the following:
>
> 1. add a blah_blah for every blah_blah_ms setting which accepts a duration
> 2. convert every setting to use blah_blah
> 3. if blah_blah_ms is present, use that for blah_blah and set the duration
> to ms
> 4. internally everything converts to ms
> 5. make all node tool commands use durations
> 6. for ever setting that’s switch to blah_blah, leave a note that says the
> setting it’s replacing
> 7. put a warning when people use blah_blah_ms and suggest the conversation
> to the new config field
> 8. *sometime* in the future remove _ms.  Maybe as far as a year or two
> down the line.
>
> This seems to me like a significant change and I wanted to get some more
> opinions on the topic before pushing forward.  Thoughts?
>
> Jon
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
We already store the minimum timestamp in the EncodingStats of each
partition, to support more efficient encoding of atom timestamps.  This
just isn't exposed beyond UnfilteredRowIterator, though it probably could
be.

Storing the max alongside would still require justification, though its
cost would actually be fairly nominal (probably only a few bytes; it
depends on how far apart min/max are).

I'm not sure (IMO) that even a fairly nominal cost could be justified
unless there were widespread benefit though, which I'm not sure this would
provide.  Maintaining a patched variant of your own that stores this
probably wouldn't be too hard, though.

In the meantime, exposing and utilising the minimum timestamp from
EncodingStats is probably a good place to start to explore the viability of
the approach.

On 14 January 2018 at 15:34, Jeremiah Jordan  wrote:

> Don’t forget about deleted and missing data. The bane of all on replica
> aggregation optimization’s.
>
> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa  wrote:
> >
> >
> > You’re right it’s not stored in metadata now. Adding this to metadata
> isn’t hard, it’s just hard to do it right where it’s useful to people with
> other data models (besides yours) so it can make it upstream (if that’s
> your goal). In particular the worst possible case is a table with no
> clustering key and a single non-partition key column. In that case storing
> these extra two long time stamps may be 2-3x more storage than without,
> which would be a huge regression, so you’d have to have a way to turn that
> feature off.
> >
> >
> > Worth mentioning that there are ways to do this without altering
> Cassandra -  consider using static columns that represent the min timestamp
> and max timestamp. Create them both as ints or longs and write them on all
> inserts/updates (as part of a batch, if needed). The only thing you’ll have
> to do is find a way for “min timestamp” to work - you can set the min time
> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
> that future writes won’t overwrite those values. That gives you a first
> write win behavior for that column, which gives you an effective min
> timestamp for the partition as a whole.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka  wrote:
> >>
> >> Hi folks,
> >>
> >> Currently, I working on custom CQL operator that should return the max
> >> timestamp for some partition.
> >>
> >> I don't think that scanning of partition for that kind of data is a nice
> >> idea. Instead of it, I thinking about adding a metadata to the
> partition. I
> >> want to store minTimestamp and maxTimestamp for every partition as it
> >> already done in Memtable`s. That timestamps will be updated on each
> >> mutation operation, that is quite cheap in comparison to full scan.
> >>
> >> I quite new to Cassandra codebase and want to get some critics and
> ideas,
> >> maybe that kind of data already stored somewhere or you have better
> ideas.
> >> Is my assumption right?
> >>
> >> Best,
> >> Artur
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
(Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
that if TTLs or tombstones are involved the metadata we have, or can add,
is going to be worthless in most cases anyway)

On 14 January 2018 at 16:11, Benedict Elliott Smith <bened...@apache.org>
wrote:

> We already store the minimum timestamp in the EncodingStats of each
> partition, to support more efficient encoding of atom timestamps.  This
> just isn't exposed beyond UnfilteredRowIterator, though it probably could
> be.
>
> Storing the max alongside would still require justification, though its
> cost would actually be fairly nominal (probably only a few bytes; it
> depends on how far apart min/max are).
>
> I'm not sure (IMO) that even a fairly nominal cost could be justified
> unless there were widespread benefit though, which I'm not sure this would
> provide.  Maintaining a patched variant of your own that stores this
> probably wouldn't be too hard, though.
>
> In the meantime, exposing and utilising the minimum timestamp from
> EncodingStats is probably a good place to start to explore the viability of
> the approach.
>
> On 14 January 2018 at 15:34, Jeremiah Jordan <jerem...@datastax.com>
> wrote:
>
>> Don’t forget about deleted and missing data. The bane of all on replica
>> aggregation optimization’s.
>>
>> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>> >
>> >
>> > You’re right it’s not stored in metadata now. Adding this to metadata
>> isn’t hard, it’s just hard to do it right where it’s useful to people with
>> other data models (besides yours) so it can make it upstream (if that’s
>> your goal). In particular the worst possible case is a table with no
>> clustering key and a single non-partition key column. In that case storing
>> these extra two long time stamps may be 2-3x more storage than without,
>> which would be a huge regression, so you’d have to have a way to turn that
>> feature off.
>> >
>> >
>> > Worth mentioning that there are ways to do this without altering
>> Cassandra -  consider using static columns that represent the min timestamp
>> and max timestamp. Create them both as ints or longs and write them on all
>> inserts/updates (as part of a batch, if needed). The only thing you’ll have
>> to do is find a way for “min timestamp” to work - you can set the min time
>> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
>> that future writes won’t overwrite those values. That gives you a first
>> write win behavior for that column, which gives you an effective min
>> timestamp for the partition as a whole.
>> >
>> > --
>> > Jeff Jirsa
>> >
>> >
>> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <arhel...@gmail.com> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> Currently, I working on custom CQL operator that should return the max
>> >> timestamp for some partition.
>> >>
>> >> I don't think that scanning of partition for that kind of data is a
>> nice
>> >> idea. Instead of it, I thinking about adding a metadata to the
>> partition. I
>> >> want to store minTimestamp and maxTimestamp for every partition as it
>> >> already done in Memtable`s. That timestamps will be updated on each
>> >> mutation operation, that is quite cheap in comparison to full scan.
>> >>
>> >> I quite new to Cassandra codebase and want to get some critics and
>> ideas,
>> >> maybe that kind of data already stored somewhere or you have better
>> ideas.
>> >> Is my assumption right?
>> >>
>> >> Best,
>> >> Artur
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>


Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
It's a long time since I looked at the code, but I'm pretty sure that
comment is explaining why we translate *no* timestamp to *epoch*, to save
space when serializing the encoding stats.  Not stipulating that the data
may be inaccurate.

However, being such a long time since I looked, I forgot we still only
apparently store these stats per sstable.  It's not actually immediately
clear to me if storing them per partition would help tremendously (wrt
compression, as this data was intended) given you would expect a great deal
of correlation between partitions.  But they would also be extremely cheap
to persist per partition, so only a modestly positive impact on compression
would be needed to justify (or permit) them.

I don't think this use case would probably drive development, but if you
were to write the patch and demonstrate it approximately maintained present
data sizes, it's likely such a patch would be accepted.

On 14 January 2018 at 20:33, arhel...@gmail.com <arhel...@gmail.com> wrote:

> First of all, thx for all the ideas.
>
> Benedict ElIiott Smith, in code comments I found a notice that data in
> EncodingStats can be wrong, not sure that its good idea to use it for
> accurate results. As I understand incorrect data is not a problem for the
> current use case of it, but not for my one. Currently, I added fields for
> every AtomicBTreePartition. That fields I update in addAllWithSizeDelta
> call, but also now I get that I should think about the case of data
> removing.
>
> I currently don't really care about TTL's, but its the case about I should
> think, thx.
>
> Jeremiah Jordan, thx for notice, but I don't really get what are you mean
> about replica aggregation optimization’s. Can you please explain it for me?
>
> On 2018-01-14 17:16, Benedict Elliott Smith <bened...@apache.org> wrote:
> > (Obviously, not to detract from the points that Jon and Jeremiah make,
> i.e.
> > that if TTLs or tombstones are involved the metadata we have, or can add,
> > is going to be worthless in most cases anyway)
> >
> > On 14 January 2018 at 16:11, Benedict Elliott Smith <bened...@apache.org
> >
> > wrote:
> >
> > > We already store the minimum timestamp in the EncodingStats of each
> > > partition, to support more efficient encoding of atom timestamps.  This
> > > just isn't exposed beyond UnfilteredRowIterator, though it probably
> could
> > > be.
> > >
> > > Storing the max alongside would still require justification, though its
> > > cost would actually be fairly nominal (probably only a few bytes; it
> > > depends on how far apart min/max are).
> > >
> > > I'm not sure (IMO) that even a fairly nominal cost could be justified
> > > unless there were widespread benefit though, which I'm not sure this
> would
> > > provide.  Maintaining a patched variant of your own that stores this
> > > probably wouldn't be too hard, though.
> > >
> > > In the meantime, exposing and utilising the minimum timestamp from
> > > EncodingStats is probably a good place to start to explore the
> viability of
> > > the approach.
> > >
> > > On 14 January 2018 at 15:34, Jeremiah Jordan <jerem...@datastax.com>
> > > wrote:
> > >
> > >> Don’t forget about deleted and missing data. The bane of all on
> replica
> > >> aggregation optimization’s.
> > >>
> > >> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> > >> >
> > >> >
> > >> > You’re right it’s not stored in metadata now. Adding this to
> metadata
> > >> isn’t hard, it’s just hard to do it right where it’s useful to people
> with
> > >> other data models (besides yours) so it can make it upstream (if
> that’s
> > >> your goal). In particular the worst possible case is a table with no
> > >> clustering key and a single non-partition key column. In that case
> storing
> > >> these extra two long time stamps may be 2-3x more storage than
> without,
> > >> which would be a huge regression, so you’d have to have a way to turn
> that
> > >> feature off.
> > >> >
> > >> >
> > >> > Worth mentioning that there are ways to do this without altering
> > >> Cassandra -  consider using static columns that represent the min
> timestamp
> > >> and max timestamp. Create them both as ints or longs and write them
> on all
> > >> inserts/updates (as part of a batch, if needed). The only thing
> you’ll have
> > >> to do is find a way for “min timestam

Re: Coordinator Write Metrics per CF

2018-02-13 Thread Benedict Elliott Smith
For the record, I'm not certain there *is* a great deal of value in this.

The read latency metrics are expected to vary a great deal, since the
entire IO subsystem is involved.

Writes, however, go straight to a memtable.  They only block on IO if we
exceed our commit log flush bandwidth for an extended period of time.  We
already have a metric for tracking this: CommitLog.WaitingOnCommit.

I'm not saying there won't be any latency distribution, but it is unlikely
to be terribly interesting in very many situations.  I can't off the top of
my head think of a good reason to consult this metric, that couldn't better
be answered elsewhere.



On 13 February 2018 at 19:18, Sumanth Pasupuleti <
spasupul...@netflix.com.invalid> wrote:

> Thanks Kurt and Chris for your valuable inputs. Created
> https://issues.apache.org/jira/browse/CASSANDRA-14232; I shall start
> working on this.
>
> Thanks,
> Sumanth
>
> On Mon, Feb 12, 2018 at 7:43 AM, Chris Lohfink  wrote:
>
> > It would be good to have it. Its not that its not there because its
> > difficult or anything. I think its more that the read latency metric was
> > needed for speculative retry so it was added but the write side wasn't
> > needed for that feature so wasn't added at same time. It would be very
> > useful in determining the table that the coordinator writes are slow to.
> >
> > Chris
> >
> > > On Feb 11, 2018, at 10:33 PM, kurt greaves 
> wrote:
> > >
> > > I've tried to search around for a reason for this in the past and
> haven't
> > > found one. I don't think it's a bad idea. Would be a helpful metric to
> > > diagnose internode networking issues - although I'll note that the read
> > > metric will also achieve this assuming you have enough reads to get
> some
> > > useful information out of it.
> > >
> > > On 9 February 2018 at 17:43, Sumanth Pasupuleti <
> > > sumanth.pasupuleti...@gmail.com> wrote:
> > >
> > >> There is an existing CoordinatorReadLatency table metric. I am looking
> > to
> > >> add CoordinatorWriteLatency table metric - however, before I attempt a
> > shot
> > >> at it, would like to know if anyone has context around why we
> currently
> > do
> > >> not have such metric (while we have the read metric) - if someone has
> > >> already attempted and realized its a bad idea for some reason.
> > >>
> > >> Thanks,
> > >> Sumanth
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Use of OpOrder in memtable

2018-02-13 Thread Benedict Elliott Smith
If you look closely, there can be multiple memtables extant at once.  While
all "new" writes are routed to the latest memtable, there may still be
writes that have begun but not yet completed.  The memtable cannot be
flushed until any stragglers have completed, and some stragglers *may* still
need to be routed to their designated memtable (if they had only just begun
when the flush triggered).  It helps avoid these race conditions on either
side of the equation.

On 13 February 2018 at 22:09, Tyagi, Preetika 
wrote:

> Hi all,
>
> I'm trying to understand the behavior of memtable when writes/flush
> operations are going on in parallel.
>
> In my understanding, once a memtable is full it is queued for flushing and
> a new memtable is created for ongoing write operations.
> However, I was looking at the code and it looks like the OpOrder class is
> used (don't understand all details) to ensure the synchronization between
> producers (writes) and consumers (batch flushes).
> So I am a bit confused about when exactly it is needed. There will always
> be only one latest memtable for write operations and all old memtables are
> flushed so where this producer/consumer interaction on the same memtable is
> needed?
>
> Thanks,
> Preetika
>
>


Re: Coordinator Write Metrics per CF

2018-02-13 Thread Benedict Elliott Smith
Sorry, I guess I'm tired.  I thought this was discussing local write
latency.

I'm surprised we have that and not coordinator write latency.

Please do ignore me, I'm not sure why I got involved!

On 13 February 2018 at 21:48, Benedict Elliott Smith <bened...@apache.org>
wrote:

> For the record, I'm not certain there *is* a great deal of value in this.
>
> The read latency metrics are expected to vary a great deal, since the
> entire IO subsystem is involved.
>
> Writes, however, go straight to a memtable.  They only block on IO if we
> exceed our commit log flush bandwidth for an extended period of time.  We
> already have a metric for tracking this: CommitLog.WaitingOnCommit.
>
> I'm not saying there won't be any latency distribution, but it is unlikely
> to be terribly interesting in very many situations.  I can't off the top of
> my head think of a good reason to consult this metric, that couldn't better
> be answered elsewhere.
>
>
>
> On 13 February 2018 at 19:18, Sumanth Pasupuleti <spasupul...@netflix.com.
> invalid> wrote:
>
>> Thanks Kurt and Chris for your valuable inputs. Created
>> https://issues.apache.org/jira/browse/CASSANDRA-14232; I shall start
>> working on this.
>>
>> Thanks,
>> Sumanth
>>
>> On Mon, Feb 12, 2018 at 7:43 AM, Chris Lohfink <clohf...@apple.com>
>> wrote:
>>
>> > It would be good to have it. Its not that its not there because its
>> > difficult or anything. I think its more that the read latency metric was
>> > needed for speculative retry so it was added but the write side wasn't
>> > needed for that feature so wasn't added at same time. It would be very
>> > useful in determining the table that the coordinator writes are slow to.
>> >
>> > Chris
>> >
>> > > On Feb 11, 2018, at 10:33 PM, kurt greaves <k...@instaclustr.com>
>> wrote:
>> > >
>> > > I've tried to search around for a reason for this in the past and
>> haven't
>> > > found one. I don't think it's a bad idea. Would be a helpful metric to
>> > > diagnose internode networking issues - although I'll note that the
>> read
>> > > metric will also achieve this assuming you have enough reads to get
>> some
>> > > useful information out of it.
>> > >
>> > > On 9 February 2018 at 17:43, Sumanth Pasupuleti <
>> > > sumanth.pasupuleti...@gmail.com> wrote:
>> > >
>> > >> There is an existing CoordinatorReadLatency table metric. I am
>> looking
>> > to
>> > >> add CoordinatorWriteLatency table metric - however, before I attempt
>> a
>> > shot
>> > >> at it, would like to know if anyone has context around why we
>> currently
>> > do
>> > >> not have such metric (while we have the read metric) - if someone has
>> > >> already attempted and realized its a bad idea for some reason.
>> > >>
>> > >> Thanks,
>> > >> Sumanth
>> > >>
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>> >
>>
>
>


Re: Use of OpOrder in memtable

2018-02-13 Thread Benedict Elliott Smith
Right.  Although IIRC there is another OpOrder that coordinates the
migration of readers from memtables to the sstables that replace them (and
the reclamation of any off-heap memory associated with the memtable)

On 13 February 2018 at 22:59, Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> Ah I see. That makes sense.
> And it doesn't have to anything with the read requests going on in
> parallel with write requests, right?
> I mean we read the data from memtable depending on whatever has been
> written into memtable so far and return it to the client (of course
> including SSTable read and timestamp comparison etc.)
>
> -Original Message-
> From: Benedict Elliott Smith [mailto:bened...@apache.org]
> Sent: Tuesday, February 13, 2018 2:25 PM
> To: dev@cassandra.apache.org
> Subject: Re: Use of OpOrder in memtable
>
> If you look closely, there can be multiple memtables extant at once.
> While all "new" writes are routed to the latest memtable, there may still
> be writes that have begun but not yet completed.  The memtable cannot be
> flushed until any stragglers have completed, and some stragglers *may*
> still need to be routed to their designated memtable (if they had only just
> begun when the flush triggered).  It helps avoid these race conditions on
> either side of the equation.
>
> On 13 February 2018 at 22:09, Tyagi, Preetika <preetika.ty...@intel.com>
> wrote:
>
> > Hi all,
> >
> > I'm trying to understand the behavior of memtable when writes/flush
> > operations are going on in parallel.
> >
> > In my understanding, once a memtable is full it is queued for flushing
> > and a new memtable is created for ongoing write operations.
> > However, I was looking at the code and it looks like the OpOrder class
> > is used (don't understand all details) to ensure the synchronization
> > between producers (writes) and consumers (batch flushes).
> > So I am a bit confused about when exactly it is needed. There will
> > always be only one latest memtable for write operations and all old
> > memtables are flushed so where this producer/consumer interaction on
> > the same memtable is needed?
> >
> > Thanks,
> > Preetika
> >
> >
>


Re: GitHub PR ticket spam

2018-07-30 Thread Benedict Elliott Smith
I agree this is a mess.  I think we have previously taken the view that JIRA 
should be the permanent record of discussion, and that as such the git 
conversation should be duplicated there.

However, I think it would be better for JIRA to get a summary of important 
discussions, by one of the participants, not an illegible duplication of a page 
of code diffs attached to every one line nit comment.


> On 30 Jul 2018, at 09:27, Stefan Podkowinski  wrote:
> 
> Looks like we had some active PRs recently to discuss code changes in
> detail on GitHub, which I think is something we agreed is perfectly
> fine, in addition to the usual Jira ticket.
> 
> What bugs me a bit is that for some reasons any comments on the PR would
> be posted to the Jira ticket as well. I'm not sure what would be the
> exact reason for this, I guess it's because the PR is linked in the
> ticket? I find this a bit annoying while subscribed to commits@,
> especially since we created pr@ for these kind of messages. Also I don't
> really see any value in mirroring all github comments to the ticket.
> #14556 is a good example how you could end up with tons of unformatted
> code in the ticket that will also mess up search in jira. Does anyone
> think this is really useful, or can we stop linking the PR in the future
> (at least for highly active PRs)?
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: GitHub PR ticket spam

2018-08-06 Thread Benedict Elliott Smith
Also +1

It might perhaps be nice to (just once) have ASF Bot comment that a GitHub 
discussion has been replicated to the worklog? In the Arrow example, for 
instance, it isn’t immediately obvious that there are any worklog comments to 
look at.

Or perhaps we should require committers to summarise in the comments.  For most 
tickets, perhaps just stating ’nits from GitHub comments’.  But for any complex 
tickets, summarising the conclusions of any unexpected logical or structural 
discussion would be really helpful for posterity.  This has always been true, 
but especially so now, given how hard the replicated comments are to parse.


> On 6 Aug 2018, at 17:18, Jeremiah D Jordan  wrote:
> 
> Oh nice.  I like the idea of keeping it but moving it to the worklog tab.  +1 
> on that from me.
> 
>> On Aug 6, 2018, at 5:34 AM, Stefan Podkowinski  wrote:
>> 
>> +1 for worklog option
>> 
>> Here's an example ticket from Arrow, where they seem to be using the
>> same approach:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D2583=DwICaQ=adz96Xi0w1RHqtPMowiL2g=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM=KWt0xsOv9ESaieg432edGvPhktGkWHxVuLAdNyORiYY=
>>  
>> 
>> 
>> 
>> On 05.08.2018 09:56, Mick Semb Wever wrote:
 I find this a bit annoying while subscribed to commits@,
 especially since we created pr@ for these kind of messages. Also I don't
 really see any value in mirroring all github comments to the ticket.
>>> 
>>> 
>>> I agree with you Stefan. It makes the jira tickets quite painful to read. 
>>> And I tend to make comments on the commits rather than the PRs so to avoid 
>>> spamming back to the jira ticket.
>>> 
>>> But the linking to the PR is invaluable. And I can see Ariel's point about 
>>> a chronological historical archive.
>>> 
>>> 
 Ponies would be for this to be mirrored to a tab 
 separate from comments in JIRA.
>>> 
>>> 
>>> Ariel, that would be the the "worklog" option.
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__reference.apache.org_pmc_github=DwICaQ=adz96Xi0w1RHqtPMowiL2g=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM=1lWQawAO9fITzakpnmdzERuCbZs6IGQsUH_EEIMCMqs=
>>>  
>>> 
>>> 
>>> If this works for you, and others, I can open a INFRA to switch to worklog.
>>> wdyt?
>>> 
>>> 
>>> Mick.
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>>> 
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>>> 
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>> 
>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Side Car New Repo vs not

2018-08-23 Thread Benedict Elliott Smith
+1 also for separate repo

> On 24 Aug 2018, at 01:11, Jeff Jirsa  wrote:
> 
> +1 for separate repo
> 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Aug 23, 2018, at 1:00 PM, sankalp kohli  wrote:
>> 
>> Separate repo is in a majority so far. Please reply to this thread with
>> your responses.
>> 
>> On Tue, Aug 21, 2018 at 4:34 PM Rahul Singh 
>> wrote:
>> 
>>> +1 for separate repo. Especially on git. Maybe make it a submodule.
>>> 
>>> Rahul
>>> On Aug 21, 2018, 3:33 PM -0500, Stefan Podkowinski ,
>>> wrote:
 I'm also currently -1 on the in-tree option.
 
 Additionally to what Aleksey mentioned, I also don't see how we could
 make this work with the current build and release process. Our scripts
 [0] for creating releases (tarballs and native packages), would need
 significant work to add support for an independent side-car. Our ant
 based build process is also not a great start for adding new tasks, let
 alone integrating other tool chains for web components for a potential
>>> UI.
 
 [0] https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git
 
 
> On 21.08.18 19:20, Aleksey Yeshchenko wrote:
> Sure, allow me to elaborate - at least a little bit. But before I do,
>>> just let me note that this wasn’t a veto -1, just a shorthand for “I don’t
>>> like this option”.
> 
> It would be nice to have sidecar and C* version and release cycles
>>> fully decoupled. I know it *can* be done when in-tree, but the way we vote
>>> on releases with tags off current branches would have to change somehow.
>>> Probably painfully. It would be nice to be able to easily enforce freezes,
>>> like the upcoming one, on the whole C* repo, while allowing feature
>>> development on the sidecar. It would be nice to not have sidecar commits in
>>> emails from commits@ mailing list. It would be nice to not have C* CI
>>> trigger necessarily on sidecar commits. Groups of people working on the two
>>> repos will mostly be different too, so what’s the point in sharing the repo?
> 
> Having an extra repo with its own set of branches is cheap and easy -
>>> we already do that with dtests. I like cleanly separated things when
>>> coupling is avoidable. As such I would prefer the sidecar to live in a
>>> separate new repo, while still being part of the C* project.
> 
> —
> AY
> 
> On 21 August 2018 at 17:06:39, sankalp kohli (kohlisank...@gmail.com)
>>> wrote:
> 
> Hi Aleksey,
> Can you please elaborate on the reasons for your -1? This
> way we can make progress towards any one approach.
> Thanks,
> Sankalp
> 
> On Tue, Aug 21, 2018 at 8:39 AM Aleksey Yeshchenko 
> wrote:
> 
>> FWIW I’m strongly -1 on in-tree approach, and would much prefer a
>>> separate
>> repo, dtest-style.
>> 
>> —
>> AY
>> 
>> On 21 August 2018 at 16:36:02, Jeremiah D Jordan (
>> jeremiah.jor...@gmail.com) wrote:
>> 
>> I think the following is a very big plus of it being in tree:
 * Faster iteration speed in general. For example when we need to
>>> add a
 new
 JMX endpoint that the sidecar needs, or change something from
>>> JMX to a
 virtual table (e.g. for repair, or monitoring) we can do all
>>> changes
 including tests as one commit within the main repository and
>>> don't
>> have
 to
 commit to main repo, sidecar repo,
>> 
>> I also don’t see a reason why the sidecar being in tree means it
>>> would not
>> work in a mixed version cluster. The nodes themselves must work in a
>>> mixed
>> version cluster during a rolling upgrade, I would expect any
>>> management
>> side car to operate in the same manor, in tree or not.
>> 
>> This tool will be pretty tightly coupled with the server, and as
>>> someone
>> with experience developing such tightly coupled tools, it is *much*
>>> easier
>> to make sure you don’t accidentally break them if they are in tree.
>>> How
>> many times has someone updated some JMX interface, updated nodetool,
>>> and
>> then moved on? Breaking all the external tools not in tree, without
>> realizing it. The above point about being able to modify interfaces
>>> and the
>> side car in the same commit is huge in terms of making sure someone
>>> doesn’t
>> inadvertently break the side car while fixing something else.
>> 
>> -Jeremiah
>> 
>> 
>>> On Aug 21, 2018, at 10:28 AM, Jonathan Haddad 
>> wrote:
>>> 
>>> Strongly agree with Blake. In my mind supporting multiple versions
>>> is
>>> mandatory. As I've stated before, we already do it with Reaper, I'd
>>> consider it a major misstep if we couldn't support multiple with
>>> the
>>> project - provided admin tool. It's the same reason dtests are
>>> separate
>> -
>>> they work with multiple versions.
>>> 
>>> The number of repos does not affect distribution - 

Re: Testing 4.0 Post-Freeze

2018-07-09 Thread Benedict Elliott Smith
+1.



> On 9 Jul 2018, at 20:17, Mick Semb Wever  wrote:
> 
> 
>> We have done all this for previous releases and we know it has not worked
>> well. So how giving it one more try is going to help here. Can someone
>> outline what will change for 4.0 which will make it more successful?
> 
> 
> I (again) agree with you Sankalp :-)
> 
> Why not try something new? 
> It's easier to discuss these things more genuinely after trying it out.
> 
> One of the differences in the branching approaches: to feature-freeze on a 
> 4.0 branch or on trunk; is who it is that has to then merge and work with 
> multiple branches. 
> 
> Where that small but additional effort is placed I think becomes a signal to 
> what the community values most: new features or stability. 
> 
> I think most folk would vote for stability, so why not give this approach a 
> go and to learn from it.
> It also creates an incentive to make the feature-freeze period as short as 
> possible, moving us towards an eventual goal of not needing to feature-freeze 
> at all. 
> 
> regards,
> Mick
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Testing 4.0 Post-Freeze

2018-07-10 Thread Benedict Elliott Smith
It’s not like this is an irrevocable change.  

If we encounter a scenario that seems to question its validity, or its general 
applicability, it can be raised on the mailing list and we can revisit the 
decision, surely?  I can think of at least one way to weaken the rules in such 
a scenario, without frustrating the goal, but why complicate things when we 
can’t even yet imagine a situation to need it - besides discouraging a new 
contributor who, let’s be honest, was going to have their patch languish for a 
few months while somebody found time to review it anyway.  At least this way we 
can give them a decent excuse!



> On 10 Jul 2018, at 10:43, Jonathan Haddad  wrote:
> 
> I guess I look at the idea of changing the branching strategy as a
> means of blocking work as a very odd way of solving a human problem.
> Having a majority of votes temporarily block potentially good work
> might be a good thing, and it might not matter at all.  It might also
> frustrate some folks who have been around for a really long time.
> 
> I'm also making the assumption that I don't know every plausible
> reason someone might need / want to merge into trunk and considering
> that there's valid cases for it that we'll be blocking.
> 
> With regard to "the process has been broken for years" I've already
> said my bit on what I considered to different now, nobody has
> responded to that yet.  I think I've said all I need to on this, it's
> probably just noise now, so I won't respond any further on this
> thread.  I don't anticipate having a personal need to commit to a
> future 5.0 release before 4.0 is out, so it won't impact me
> personally.
> 
> On Tue, Jul 10, 2018 at 10:32 AM Benedict Elliott Smith
>  wrote:
>> 
>> That’s a peculiar way of looking at it.
>> 
>> Committer status is not an absolute right to autonomy over the codebase.  
>> It’s an embodiment of trust that you will follow the community's prevailing 
>> rules around commit, and that you’re competent to do so.
>> 
>> If the community wants to change those rules you’re trusted to follow, how 
>> does this modify your right, or the trust emplaced in you?
>> 
>> 
>> 
>> 
>> 
>>> On 10 Jul 2018, at 10:18, Jonathan Haddad  wrote:
>>> 
>>> I guess I look at the initial voting in of committers as the process
>>> by which people are trusted to merge things in.  This proposed process
>>> revokes that trust. If Jonathan Ellis or Dave Brosius (arbitrarily
>>> picked) wants to merge a new feature into trunk during the freeze, now
>>> they're not allowed?  That's absurd.  People have already met the bar
>>> and have been voted in by merit, they should not have their privilege
>>> revoked.
>>> On Tue, Jul 10, 2018 at 10:14 AM Ben Bromhead  wrote:
>>>> 
>>>> Well put Mick
>>>> 
>>>> +1
>>>> 
>>>> On Tue, Jul 10, 2018 at 1:06 PM Aleksey Yeshchenko 
>>>> wrote:
>>>> 
>>>>> +1 from me too.
>>>>> 
>>>>> —
>>>>> AY
>>>>> 
>>>>> On 10 July 2018 at 04:17:26, Mick Semb Wever (m...@apache.org) wrote:
>>>>> 
>>>>> 
>>>>>> We have done all this for previous releases and we know it has not
>>>>> worked
>>>>>> well. So how giving it one more try is going to help here. Can someone
>>>>>> outline what will change for 4.0 which will make it more successful?
>>>>> 
>>>>> 
>>>>> I (again) agree with you Sankalp :-)
>>>>> 
>>>>> Why not try something new?
>>>>> It's easier to discuss these things more genuinely after trying it out.
>>>>> 
>>>>> One of the differences in the branching approaches: to feature-freeze on a
>>>>> 4.0 branch or on trunk; is who it is that has to then merge and work with
>>>>> multiple branches.
>>>>> 
>>>>> Where that small but additional effort is placed I think becomes a signal
>>>>> to what the community values most: new features or stability.
>>>>> 
>>>>> I think most folk would vote for stability, so why not give this approach
>>>>> a go and to learn from it.
>>>>> It also creates an incentive to make the feature-freeze period as short as
>>>>> possible, moving us towards an eventual goal of not needing to
>>>>> feature-freeze at all.
>>>>> 
>>>>> regards,
>>>>> Mick
>>>>> 
>>>>

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Benedict Elliott Smith
(2.65) returns numeric 4
>> SELECT round(2.65::double precision) returns double 4
>> 
>> SELECT 2.65 * 1 returns double 2.65
>> SELECT 2.65 * 1::bigint returns numeric 2.65
>> SELECT 2.65 * 1.0 returns numeric 2.650
>> SELECT 2.65 * 1.0::double precision returns double 2.65
>> 
>> SELECT round(2.65) * 1 returns numeric 3
>> SELECT round(2.65) * round(1) returns double 3
>> 
>> So as we're going to have silly values in any case, why pretend something
>> else? Also, exact calculations are slow if we crunch large amount of
>> numbers. I guess I slightly deviated towards Postgres' implemention in this
>> case, but I wish it wasn't used as a benchmark in this case. And most
>> importantly, I would definitely want the exact same type returned each time
>> I do a calculation.
>> 
>>  - Micke
>> 
>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith 
>> wrote:
>> 
>>> As far as I can tell we reached a relatively strong consensus that we
>>> should implement lossless casts by default?  Does anyone have anything more
>>> to add?
>>> 
>>> Looking at the emails, everyone who participated and expressed a
>>> preference was in favour of the “Postgres approach” of upcasting to decimal
>>> for mixed float/int operands?
>>> 
>>> I’d like to get a clear-cut decision on this, so we know what we’re doing
>>> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
>>> concerns about overflow, which I think are also pressing - particularly for
>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>> integer type operations, but an approach for these will probably fall out
>>> of any decision on overflow.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan 
>>> wrote:
>>>> 
>>>> I think you're conflating two things here. There's the loss resulting
>>> from
>>>> using some operators, and loss involved in casting. Dividing an integer
>>> by
>>>> another integer to obtain an integer result can result in loss, but
>>> there's
>>>> no implicit casting there and no loss due to casting.  Casting an integer
>>>> to a float can also result in loss. So dividing an integer by a float,
>>> for
>>>> example, with an implicit cast has an additional avenue for loss: the
>>>> implicit cast for the operands so that they're of the same type. I
>>> believe
>>>> this discussion so far has been about the latter, not the loss from the
>>>> operations themselves.
>>>> 
>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I would like to try to clarify things a bit to help people to understand
>>>>> the true complexity of the problem.
>>>>> 
>>>>> The *float *and *double *types are inexact numeric types. Not only at
>>> the
>>>>> operation level.
>>>>> 
>>>>> If you insert 676543.21 in a *float* column and then read it, you will
>>>>> realize that the value has been truncated to 676543.2.
>>>>> 
>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>> Using *decimals
>>>>> *during operations will mitigate the problem but will not remove it.
>>>>> 
>>>>> 
>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
>>> in
>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>>>>> server and Oracle do. So all thoses databases will lose precision if you
>>>>> are not carefull.
>>>>> 
>>>>> If you truly need precision you can have it by using exact numeric types
>>>>> for your data types. Of course it has a cost on performance, memory and
>>>>> disk usage.
>>>>> 
>>>>> The advantage of the current approach is that it give you the choice.
>>> It is
>>>>> up to you to decide what you need for your application. It is also in
>>> line
>>>>> with the way CQL behave everywhere else.
>>>>> 
>>>> --
>>>> 
>>>> Muru
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Benedict Elliott Smith
I think we do, implicitly, support precision and scale - only dynamically.  The 
precision and scale are defined by the value on insertion, i.e. those necessary 
to represent it exactly.  During arithmetic operations we currently truncate to 
decimal128, but we can (and probably should) change this.   Ideally, we would 
support explicit precision/scale in the declared type, but our current 
behaviour is not inconsistent with introducing this later.

FTR, I wasn’t suggesting the spec required the most approximate type, but that 
the most consistent rule to describe this behaviour is that the approximate 
type always wins.  Somebody earlier justified this by the fact that one operand 
is already truncated to this level of approximation, so why would you want more 
accuracy in the result type?

I would be comfortable with either, fwiw, and they are both consistent with the 
spec.  It’s great if we can have a consistent idea behind why we do things 
though, so it seems at least worth briefly discussing this extra weirdness.





> On 12 Oct 2018, at 18:10, Ariel Weisberg  wrote:
> 
> Hi,
> 
> From reading the spec. Precision is always implementation defined. The spec 
> specifies scale in several cases, but never precision for any type or 
> operation (addition/subtraction, multiplication, division).
> 
> So we don't implement anything remotely approaching precision and scale in 
> CQL when it comes to numbers I think? So we aren't going to follow the spec 
> for scale. We are already pretty far down that road so I would leave it 
> alone. 
> 
> I don't think the spec is asking for the most approximate type. It's just 
> saying the result is approximate, and the precision is implementation 
> defined. We could return either float or double. I think if one of the 
> operands is a double we should return a double because clearly the schema 
> thought a double was required to represent that number. I would also be in 
> favor of returning a double all the time so that people can expect a 
> consistent type from expressions involving approximate numbers.
> 
> I am a big fan of widening for arithmetic expressions in a database to avoid 
> having to error on overflow. You can go to the trouble of only widening the 
> minimum amount, but I think it's simpler if we always widen to bigint and 
> double. This would be something the spec allows.
> 
> Definitely if we can make overflow not occur we should and the spec allows 
> that. We should also not return different types for the same operand types 
> just to work around overflow if we detect we need more precision.
> 
> Ariel
> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
>> out (and Mike for getting some empirical examples).
>> 
>> We still have to decide on the approximate data type to return; right 
>> now, we have float+bigint=double, but float+int=float.  I think this is 
>> fairly inconsistent, and either the approximate type should always win, 
>> or we should always upgrade to double for mixed operands.
>> 
>> The quoted spec also suggests that decimal+float=float, and decimal
>> +double=double, whereas we currently have decimal+float=decimal, and 
>> decimal+double=decimal
>> 
>> If we’re going to go with an approximate operand implying an approximate 
>> result, I think we should do it consistently (and consistent with the 
>> SQL92 spec), and have the type of the approximate operand always be the 
>> return type.
>> 
>> This would still leave a decision for float+double, though.  The most 
>> consistent behaviour with that stated above would be to always take the 
>> most approximate type to return (i.e. float), but this would seem to me 
>> to be fairly unexpected for the user.
>> 
>> 
>>> On 12 Oct 2018, at 17:23, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> I agree with what's been said about expectations regarding expressions 
>>> involving floating point numbers. I think that if one of the inputs is 
>>> approximate then the result should be approximate.
>>> 
>>> One thing we could look at for inspiration is the SQL spec. Not to follow 
>>> dogmatically necessarily.
>>> 
>>> From the SQL 92 spec regarding assignment 
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>> "
>>>Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>>>comparable and mutually assignable. If an assignment would result
>>>in a loss of the most significan

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith
The change of default property doesn’t seem to violate the freeze?  The 
predominant phrased used in that thread was 'feature freeze'.  A lot of people 
are now interpreting it more broadly, so perhaps we need to revisit, but that’s 
probably a separate discussion?

The current default is really bad for most users, so I’m +1 changing it.  
Especially as the last time this topic was raised was (iirc) around the 3.0 
freeze.  We decided not to change anything for similar reasons, and haven't 
revisited it since.


> On 19 Oct 2018, at 09:25, Jeff Jirsa  wrote:
> 
> Agree with Sylvain (and I think Benedict) - there’s no compelling reason to 
> violate the freeze here. We’ve had the wrong default for years - add a note 
> to the docs that we’ll be changing it in the future, but let’s not violate 
> the freeze now.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Oct 19, 2018, at 10:06 AM, Sylvain Lebresne  wrote:
>> 
>> Fwiw, as much as I agree this is a change worth doing in general, I do am
>> -0 for 4.0. Both the "compact sequencing" and the change of default really.
>> We're closing on 2 months within the freeze, and for me a freeze do include
>> not changing defaults, because changing default ideally imply a decent
>> amount of analysis/benchmark of the consequence of that change[1] and that
>> doesn't enter in my definition of a freeze.
>> 
>> [1]: to be extra clear, I'm not saying we've always done this, far from it.
>> But I hope we can all agree we were wrong to no do it when we didn't and
>> should strive to improve, not repeat past mistakes.
>> --
>> Sylvain
>> 
>> 
>>> On Thu, Oct 18, 2018 at 8:55 PM Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> For those who were asking about the performance impact of block size on
>>> compression I wrote a microbenchmark.
>>> 
>>> https://pastebin.com/RHDNLGdC
>>> 
>>>[java] Benchmark   Mode
>>> Cnt  Score  Error  Units
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt
>>> 15  331190055.685 ±  8079758.044  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt
>>> 15  353024925.655 ±  7980400.003  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt
>>> 15  365664477.654 ± 10083336.038  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
>>> 15  305518114.172 ± 11043705.883  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt
>>> 15  688369529.911 ± 25620873.933  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt
>>> 15  703635848.895 ±  5296941.704  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt
>>> 15  695537044.676 ± 17400763.731  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt
>>> 15  727725713.128 ±  4252436.331  ops/s
>>> 
>>> To summarize, compression is 8.5% slower and decompression is 1% faster.
>>> This is measuring the impact on compression/decompression not the huge
>>> impact that would occur if we decompressed data we don't need less often.
>>> 
>>> I didn't test decompression of Snappy and LZ4 high, but I did test
>>> compression.
>>> 
>>> Snappy:
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
>>> 2  196574766.116  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
>>> 2  198538643.844  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
>>> 2  194600497.613  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
>>> 2  186040175.059  ops/s
>>> 
>>> LZ4 high compressor:
>>>[java] CompactIntegerSequenceBench.bench16k thrpt2
>>> 20822947.578  ops/s
>>>[java] CompactIntegerSequenceBench.bench32k thrpt2
>>> 12037342.253  ops/s
>>>[java] CompactIntegerSequenceBench.bench64k  thrpt2
>>> 6782534.469  ops/s
>>>[java] CompactIntegerSequenceBench.bench8k   thrpt2
>>> 32254619.594  ops/s
>>> 
>>> LZ4 high is the one instance where block size mattered a lot. It's a bit
>>> suspicious really when you look at the ratio of performance to block size
>>> being close to 1:1. I couldn't spot a bug in the benchmark though.
>>> 
>>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith
Shall we move this discussion to a separate thread?  I agree it needs to be 
had, but this will definitely derail this discussion.

To respond only to the relevant portion for this thread:

> changing years-old defaults

I don’t see how age is relevant?  This isn’t some ‘battle hardened’ feature 
we’re changing - most users don’t even know to change this parameter, so we 
can’t claim its length of existence works in its favour.

The project had fewer resources to be as thorough when this tickets landed, so 
we can’t even claim we’re overturning careful work.  This default was defined 
in 2011 with no performance comparisons with other possible sizes, or 
justification for the selection made on ticket (CASSANDRA-47 - yes, they once 
went down to two digits!).  

That’s not to say this wasn’t a fine default - it was.  In this case, age has 
actively worked against it.  Since 2011, SSDs have become the norm, and most 
servers have more memory than we are presently able to utilise effectively.

This is a no brainer, and doesn’t have any impact on testing.  Tests run with 
64KiB are just as valid as those run with 16KiB.  Performance tests should 
anyway compare like-to-like, so this is completely testing neutral AFAICT.


> On 19 Oct 2018, at 15:16, Joshua McKenzie  wrote:
> 
> At the risk of hijacking this thread, when are we going to transition from
> "no new features, change whatever else you want including refactoring and
> changing years-old defaults" to "ok, we think we have something that's
> stable, time to start testing"?
> 
> Right now, if the community starts aggressively testing 4.0 with all the
> changes still in flight, there's likely going to be a lot of wasted effort.
> I think the root of the disconnect was that when we discussed "freeze" on
> the mailing list, it was in the context of getting everyone engaged in
> testing 4.0.



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-18 Thread Benedict Elliott Smith
FWIW, I’m not -0, just think that long after the freeze date a change like this 
needs a strong mandate from the community.  I think the change is a good one.





> On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> 
> Hi,
> 
> It's really not appreciably slower compared to the decompression we are going 
> to do which is going to take several microseconds. Decompression is also 
> going to be faster because we are going to do less unnecessary decompression 
> and the decompression itself may be faster since it may fit in a higher level 
> cache better. I ran a microbenchmark comparing them.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> 
> Fetching a long from memory:   56 nanoseconds
> Compact integer sequence   :   80 nanoseconds
> Summing integer sequence   :  165 nanoseconds
> 
> Currently we have one +1 from Kurt to change the representation and possibly 
> a -0 from Benedict. That's not really enough to make an exception to the code 
> freeze. If you want it to happen (or not) you need to speak up otherwise only 
> the default will change.
> 
> Regards,
> Ariel
> 
> On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
>> I think if we're going to drop it to 16k, we should invest in the compact
>> sequencing as well. Just lowering it to 16k will have potentially a painful
>> impact on anyone running low memory nodes, but if we can do it without the
>> memory impact I don't think there's any reason to wait another major
>> version to implement it.
>> 
>> Having said that, we should probably benchmark the two representations
>> Ariel has come up with.
>> 
>> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ  wrote:
>> 
>>> +1
>>> 
>>> I would guess a lot of C* clusters/tables have this option set to the
>>> default value, and not many of them are having the need for reading so big
>>> chunks of data.
>>> I believe this will greatly limit disk overreads for a fair amount (a big
>>> majority?) of new users. It seems fair enough to change this default value,
>>> I also think 4.0 is a nice place to do this.
>>> 
>>> Thanks for taking care of this Ariel and for making sure there is a
>>> consensus here as well,
>>> 
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>> 
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>> 
>>> Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg  a écrit :
>>> 
 Hi,
 
 This would only impact new tables, existing tables would get their
 chunk_length_in_kb from the existing schema. It's something we record in
>>> a
 system table.
 
 I have an implementation of a compact integer sequence that only requires
 37% of the memory required today. So we could do this with only slightly
 more than doubling the memory used. I'll post that to the JIRA soon.
 
 Ariel
 
 On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> 
> 
> I think 16k is a better default, but it should only affect new tables.
> Whoever changes it, please make sure you think about the upgrade path.
> 
> 
>> On Oct 12, 2018, at 2:31 AM, Ben Bromhead 
>>> wrote:
>> 
>> This is something that's bugged me for ages, tbh the performance gain
 for
>> most use cases far outweighs the increase in memory usage and I would
 even
>> be in favor of changing the default now, optimizing the storage cost
 later
>> (if it's found to be worth it).
>> 
>> For some anecdotal evidence:
>> 4kb is usually what we end setting it to, 16kb feels more reasonable
 given
>> the memory impact, but what would be the point if practically, most
 folks
>> set it to 4kb anyway?
>> 
>> Note that chunk_length will largely be dependent on your read sizes,
 but 4k
>> is the floor for most physical devices in terms of ones block size.
>> 
>> +1 for making this change in 4.0 given the small size and the large
>> improvement to new users experience (as long as we are explicit in
>>> the
>> documentation about memory consumption).
>> 
>> 
>>> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg 
 wrote:
>>> 
>>> Hi,
>>> 
>>> This is regarding
 https://issues.apache.org/jira/browse/CASSANDRA-13241
>>> 
>>> This ticket has languished for a while. IMO it's too late in 4.0 to
>>> implement a more memory efficient representation for compressed
>>> chunk
>>> offsets. However I don't think we should put out another release
>>> with
 the
>>> current 64k default as it's pretty unreasonable.
>>> 
>>> I propose that we lower the value to 16kb. 4k might never be the
 correct
>>> default anyways as there is a cost to compression and 16k will still
 be a
>>> large improvement.
>>> 
>>> Benedict and Jon 

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Benedict Elliott Smith
As far as I can tell we reached a relatively strong consensus that we should 
implement lossless casts by default?  Does anyone have anything more to add?

Looking at the emails, everyone who participated and expressed a preference was 
in favour of the “Postgres approach” of upcasting to decimal for mixed 
float/int operands?

I’d like to get a clear-cut decision on this, so we know what we’re doing for 
4.0.  Then hopefully we can move on to a collective decision on Ariel’s 
concerns about overflow, which I think are also pressing - particularly for 
tinyint and smallint.  This does also impact implicit casts for mixed integer 
type operations, but an approach for these will probably fall out of any 
decision on overflow.






> On 3 Oct 2018, at 11:38, Murukesh Mohanan  wrote:
> 
> I think you're conflating two things here. There's the loss resulting from
> using some operators, and loss involved in casting. Dividing an integer by
> another integer to obtain an integer result can result in loss, but there's
> no implicit casting there and no loss due to casting.  Casting an integer
> to a float can also result in loss. So dividing an integer by a float, for
> example, with an implicit cast has an additional avenue for loss: the
> implicit cast for the operands so that they're of the same type. I believe
> this discussion so far has been about the latter, not the loss from the
> operations themselves.
> 
> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer 
> wrote:
> 
>> Hi,
>> 
>> I would like to try to clarify things a bit to help people to understand
>> the true complexity of the problem.
>> 
>> The *float *and *double *types are inexact numeric types. Not only at the
>> operation level.
>> 
>> If you insert 676543.21 in a *float* column and then read it, you will
>> realize that the value has been truncated to 676543.2.
>> 
>> If you want accuracy the only way is to avoid those inexact types.
>> Using *decimals
>> *during operations will mitigate the problem but will not remove it.
>> 
>> 
>> I do not recall PostgreSQL behaving has described. If I am not mistaken in
>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>> server and Oracle do. So all thoses databases will lose precision if you
>> are not carefull.
>> 
>> If you truly need precision you can have it by using exact numeric types
>> for your data types. Of course it has a cost on performance, memory and
>> disk usage.
>> 
>> The advantage of the current approach is that it give you the choice. It is
>> up to you to decide what you need for your application. It is also in line
>> with the way CQL behave everywhere else.
>> 
> -- 
> 
> Muru


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Benedict Elliott Smith
If you undertake sufficiently many low risk things, some will bite you, I think 
everyone understands that.  It’s still valuable to factor a risk assessment 
into the equation, I think?

Either way, somebody asked who didn’t have the context to easily answer, so I 
did my best to offer them that information so they could make an informed 
decision.  I’m not campaigning for its inclusion, just trying to facilitate a 
collective decision.






> On 24 Oct 2018, at 16:27, Joshua McKenzie  wrote:
> 
> | The risk from such a patch is very low
> If I had a nickel for every time I've heard that... ;)
> 
> I'm neutral on the default change, -.5 (i.e. don't agree with it but won't
> die on that hill) on the data structure change post-freeze. We put this in,
> and that's a slippery slope as I'm sure we can find numerous other
> seemingly low-risk trivial optimizations and rewrites that cumulatively
> would make a "feature-freeze" effectively meaningless as a tool to start
> stabilizing the contents of the release.
> 
> In isolation many changes look innocuous. In the context of an organically
> grown open-source code-base that's this old, I've learned that it pays to
> be very, very cautious.
> 
> On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa  wrote:
> 
>> My objection (-0.5) is based on freeze not in code complexity
>> 
>> 
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith 
>> wrote:
>>> 
>>> To discuss the concerns about the patch for a more efficient
>> representation:
>>> 
>>> The risk from such a patch is very low.  It’s a very simple in-memory
>> data structure, that we can introduce thorough fuzz tests for.  The reason
>> to exclude it would be for reasons of wanting to begin strictly enforcing
>> the freeze only.  This is a good enough reason in my book, which is why I’m
>> neutral on its addition.  I just wanted to provide some context for
>> everyone else's voting intention.
>>> 
>>> 
>>>> On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I just asked Jeff. He is -0 and -0.5 respectively.
>>>> 
>>>> Ariel
>>>> 
>>>>> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
>>>>> I’m +1 change of default.  I think Jeff was -1 on that though.
>>>>> 
>>>>> 
>>>>>> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> To summarize who we have heard from so far
>>>>>> 
>>>>>> WRT to changing just the default:
>>>>>> 
>>>>>> +1:
>>>>>> Jon Haddadd
>>>>>> Ben Bromhead
>>>>>> Alain Rodriguez
>>>>>> Sankalp Kohli (not explicit)
>>>>>> 
>>>>>> -0:
>>>>>> Sylvaine Lebresne
>>>>>> Jeff Jirsa
>>>>>> 
>>>>>> Not sure:
>>>>>> Kurt Greaves
>>>>>> Joshua Mckenzie
>>>>>> Benedict Elliot Smith
>>>>>> 
>>>>>> WRT to change the representation:
>>>>>> 
>>>>>> +1:
>>>>>> There are only conditional +1s at this point
>>>>>> 
>>>>>> -0:
>>>>>> Sylvaine Lebresne
>>>>>> 
>>>>>> -.5:
>>>>>> Jeff Jirsa
>>>>>> 
>>>>>> This (
>> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>> is a rough cut of the change for the representation. It needs better
>> naming, unit tests, javadoc etc. but it does implement the change.
>>>>>> 
>>>>>> Ariel
>>>>>>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
>>>>>>> Sorry, to be clear - I'm +1 on changing the configuration default,
>> but I
>>>>>>> think changing the compression in memory representations warrants
>> further
>>>>>>> discussion and investigation before making a case for or against it
>> yet.
>>>>>>> An optimization that reduces in memory cost by over 50% sounds
>> pretty good
>>>>>>> and we never were really explicit that those sort of optimizations
>> would be
>>>>>>> excluded after our feature freeze.  I don't think they should
&g

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Benedict Elliott Smith
I’m +1 change of default.  I think Jeff was -1 on that though.


> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
> 
> Hi,
> 
> To summarize who we have heard from so far
> 
> WRT to changing just the default:
> 
> +1:
> Jon Haddadd
> Ben Bromhead
> Alain Rodriguez
> Sankalp Kohli (not explicit)
> 
> -0:
> Sylvaine Lebresne 
> Jeff Jirsa
> 
> Not sure:
> Kurt Greaves
> Joshua Mckenzie
> Benedict Elliot Smith
> 
> WRT to change the representation:
> 
> +1:
> There are only conditional +1s at this point
> 
> -0:
> Sylvaine Lebresne
> 
> -.5:
> Jeff Jirsa
> 
> This 
> (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>  is a rough cut of the change for the representation. It needs better naming, 
> unit tests, javadoc etc. but it does implement the change.
> 
> Ariel
> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
>> Sorry, to be clear - I'm +1 on changing the configuration default, but I
>> think changing the compression in memory representations warrants further
>> discussion and investigation before making a case for or against it yet.
>> An optimization that reduces in memory cost by over 50% sounds pretty good
>> and we never were really explicit that those sort of optimizations would be
>> excluded after our feature freeze.  I don't think they should necessarily
>> be excluded at this time, but it depends on the size and risk of the patch.
>> 
>> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
>> 
>>> I think we should try to do the right thing for the most people that we
>>> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
>>> of clusters created by a lot of different teams, going from brand new to
>>> pretty damn knowledgeable.  I can't think of a single time over the last 2
>>> years that I've seen a cluster use non-default settings for compression.
>>> With only a handful of exceptions, I've lowered the chunk size considerably
>>> (usually to 4 or 8K) and the impact has always been very noticeable,
>>> frequently resulting in hardware reduction and cost savings.  Of all the
>>> poorly chosen defaults we have, this is one of the biggest offenders that I
>>> see.  There's a good reason ScyllaDB  claims they're so much faster than
>>> Cassandra - we ship a DB that performs poorly for 90+% of teams because we
>>> ship for a specific use case, not a general one (time series on memory
>>> constrained boxes being the specific use case)
>>> 
>>> This doesn't impact existing tables, just new ones.  More and more teams
>>> are using Cassandra as a general purpose database, we should acknowledge
>>> that adjusting our defaults accordingly.  Yes, we use a little bit more
>>> memory on new tables if we just change this setting, and what we get out of
>>> it is a massive performance win.
>>> 
>>> I'm +1 on the change as well.
>>> 
>>> 
>>> 
>>> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
>>> wrote:
>>> 
 (We should definitely harden the definition for freeze in a separate
 thread)
 
 My thinking is that this is the best time to do this change as we have
 not even cut alpha or beta. All the people involved in the test will
 definitely be testing it again when we have these releases.
 
> On Oct 19, 2018, at 8:00 AM, Michael Shuler 
 wrote:
> 
>> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
>> 
>> At the risk of hijacking this thread, when are we going to transition
 from
>> "no new features, change whatever else you want including refactoring
 and
>> changing years-old defaults" to "ok, we think we have something that's
>> stable, time to start testing"?
> 
> Creating a cassandra-4.0 branch would allow trunk to, for instance, get
> a default config value change commit and get more testing. We might
> forget again, from what I understand of Benedict's last comment :)
> 
> --
> Michael
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
>>> 
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>> 
>> 
>> 
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Benedict Elliott Smith
To discuss the concerns about the patch for a more efficient representation:

The risk from such a patch is very low.  It’s a very simple in-memory data 
structure, that we can introduce thorough fuzz tests for.  The reason to 
exclude it would be for reasons of wanting to begin strictly enforcing the 
freeze only.  This is a good enough reason in my book, which is why I’m neutral 
on its addition.  I just wanted to provide some context for everyone else's 
voting intention.


> On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I just asked Jeff. He is -0 and -0.5 respectively.
> 
> Ariel
> 
> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
>> I’m +1 change of default.  I think Jeff was -1 on that though.
>> 
>> 
>>> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> To summarize who we have heard from so far
>>> 
>>> WRT to changing just the default:
>>> 
>>> +1:
>>> Jon Haddadd
>>> Ben Bromhead
>>> Alain Rodriguez
>>> Sankalp Kohli (not explicit)
>>> 
>>> -0:
>>> Sylvaine Lebresne 
>>> Jeff Jirsa
>>> 
>>> Not sure:
>>> Kurt Greaves
>>> Joshua Mckenzie
>>> Benedict Elliot Smith
>>> 
>>> WRT to change the representation:
>>> 
>>> +1:
>>> There are only conditional +1s at this point
>>> 
>>> -0:
>>> Sylvaine Lebresne
>>> 
>>> -.5:
>>> Jeff Jirsa
>>> 
>>> This 
>>> (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>>>  is a rough cut of the change for the representation. It needs better 
>>> naming, unit tests, javadoc etc. but it does implement the change.
>>> 
>>> Ariel
>>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
>>>> Sorry, to be clear - I'm +1 on changing the configuration default, but I
>>>> think changing the compression in memory representations warrants further
>>>> discussion and investigation before making a case for or against it yet.
>>>> An optimization that reduces in memory cost by over 50% sounds pretty good
>>>> and we never were really explicit that those sort of optimizations would be
>>>> excluded after our feature freeze.  I don't think they should necessarily
>>>> be excluded at this time, but it depends on the size and risk of the patch.
>>>> 
>>>> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
>>>> 
>>>>> I think we should try to do the right thing for the most people that we
>>>>> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
>>>>> of clusters created by a lot of different teams, going from brand new to
>>>>> pretty damn knowledgeable.  I can't think of a single time over the last 2
>>>>> years that I've seen a cluster use non-default settings for compression.
>>>>> With only a handful of exceptions, I've lowered the chunk size 
>>>>> considerably
>>>>> (usually to 4 or 8K) and the impact has always been very noticeable,
>>>>> frequently resulting in hardware reduction and cost savings.  Of all the
>>>>> poorly chosen defaults we have, this is one of the biggest offenders that 
>>>>> I
>>>>> see.  There's a good reason ScyllaDB  claims they're so much faster than
>>>>> Cassandra - we ship a DB that performs poorly for 90+% of teams because we
>>>>> ship for a specific use case, not a general one (time series on memory
>>>>> constrained boxes being the specific use case)
>>>>> 
>>>>> This doesn't impact existing tables, just new ones.  More and more teams
>>>>> are using Cassandra as a general purpose database, we should acknowledge
>>>>> that adjusting our defaults accordingly.  Yes, we use a little bit more
>>>>> memory on new tables if we just change this setting, and what we get out 
>>>>> of
>>>>> it is a massive performance win.
>>>>> 
>>>>> I'm +1 on the change as well.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
>>>>> wrote:
>>>>> 
>>>>>> (We should definitely harden the definition for freeze in a separate
>>>>>> thread)
>>>>>> 
>>>>>> My thinking is

Re: Proposed changes to CircleCI testing workflow

2018-10-26 Thread Benedict Elliott Smith
Just want to say +1, and thanks for taking the time to do this.  This all 
sounds great.

(And completely supersedes what I hoped to achieve with CASSANDRA-14648)


> On 26 Oct 2018, at 15:49, Stefan Podkowinski  wrote:
> 
> I'd like to give you a quick update on the work that has been done
> lately on running tests using CircleCI. Please let me know if you have
> any objections or don't think this is going into the right direction, or
> have any other feedback!
> 
> We've been using CircleCI for a while now and results are used on
> constant basis for new patches. Not only by committers, but also by
> casual contributors to run unit tests. Looks like people find the
> service valuable and we should keep using it. Therefor I'd like to make
> some improvements that will make it easier to add new tests and to
> continue making CircleCI an option for all contributors, both on paid
> and free plans.
> 
> The general idea of the changes implemented in #14806, is to consolidate
> the existing config to make it more modular and have smaller jobs that
> can be scheduled ad-hoc by the developer, instead of running a few big
> jobs on every commit. Reorganizing and breaking up the existing config
> was done using the new 2.1 config features. Starting jobs on request,
> instead of automatically, is done using the manual approval feature,
> i.e. you now have to click on that job in the workflow page in order to
> start it. I'd like to see us having smaller, more specialized groups of
> tests that we can run more selectively during development, while still
> being able to run bigger tests before committing, or firing up all of
> them during testing and releasing. Other example of smaller jobs would
> be testing coverage (#14788) or cqlsh tests (#14298). But also
> individual jobs for different ant targets, like burn, stress or benchmarks.
> 
> We'd now also be able to run tests using different docker images and
> different JDKs. I've already updated the used image to also include Java
> 11 and added unit and dtest jobs to the config for that. It's now really
> easy to run tests on Java 11, although these won't pass yet. It seems to
> be important to me to have this kind of flexibility, given the
> increasingly diverse ecosystem of Java distributions. We can also add
> jobs for packaging and doing smoke tests by installing and starting
> packages on different docker images (Redhat, Debian, Ubuntu,..) at a
> later point.
> 
> As for the paid vs free plans issue, I'd also like us to discuss how we
> can make tests faster and less resource intensive in general. As a
> desired consequence, we'd be able to move away from multi-node dtests,
> to something that can be run using the free plan. I'm looking forward to
> see if #14821 can get us into that direction. Ideally we can add these
> tests into a job that can be completed on the free plan and encourage
> contributors to add new tests there, instead of having to write a dtest,
> which they won't be able to run on CircleCI without a paid plan.
> 
> Whats changing for you as a CircleCI user?
> * All tests, except unit tests, will need to be started manually and
> will not run on every commit (this can be further discussed and changed
> anytime if needed)
> * Updating the config.yml file now requires using the CircleCI cli tool
> and should not be done directly (see #14806 for technical details)
> * High resource settings can be enabled using a script/patch, either run
> manually or as commit hook (again see ticket for details)
> * Both free and paid plan users now have more tests to run
> 
> As already mentioned, please let me know if you have any thoughts on
> this, or if you think this is going into the wrong direction.
> 
> Thanks.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Implicit Casts for Arithmetic Operators

2018-11-16 Thread Benedict Elliott Smith
So, this thread somewhat petered out.

There are still a number of unresolved issues, but to make progress I wonder if 
it would first be helpful to have a vote on ensuring we are ANSI SQL 92 
compliant for our arithmetic?  This seems like a sensible baseline, since we 
will hopefully minimise surprise to operators this way.

If people largely agree, I will call a vote, and we can pick up a couple of 
more focused discussions afterwards on how we interpret the leeway it gives.


> On 12 Oct 2018, at 18:10, Ariel Weisberg  wrote:
> 
> Hi,
> 
> From reading the spec. Precision is always implementation defined. The spec 
> specifies scale in several cases, but never precision for any type or 
> operation (addition/subtraction, multiplication, division).
> 
> So we don't implement anything remotely approaching precision and scale in 
> CQL when it comes to numbers I think? So we aren't going to follow the spec 
> for scale. We are already pretty far down that road so I would leave it 
> alone. 
> 
> I don't think the spec is asking for the most approximate type. It's just 
> saying the result is approximate, and the precision is implementation 
> defined. We could return either float or double. I think if one of the 
> operands is a double we should return a double because clearly the schema 
> thought a double was required to represent that number. I would also be in 
> favor of returning a double all the time so that people can expect a 
> consistent type from expressions involving approximate numbers.
> 
> I am a big fan of widening for arithmetic expressions in a database to avoid 
> having to error on overflow. You can go to the trouble of only widening the 
> minimum amount, but I think it's simpler if we always widen to bigint and 
> double. This would be something the spec allows.
> 
> Definitely if we can make overflow not occur we should and the spec allows 
> that. We should also not return different types for the same operand types 
> just to work around overflow if we detect we need more precision.
> 
> Ariel
> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
>> out (and Mike for getting some empirical examples).
>> 
>> We still have to decide on the approximate data type to return; right 
>> now, we have float+bigint=double, but float+int=float.  I think this is 
>> fairly inconsistent, and either the approximate type should always win, 
>> or we should always upgrade to double for mixed operands.
>> 
>> The quoted spec also suggests that decimal+float=float, and decimal
>> +double=double, whereas we currently have decimal+float=decimal, and 
>> decimal+double=decimal
>> 
>> If we’re going to go with an approximate operand implying an approximate 
>> result, I think we should do it consistently (and consistent with the 
>> SQL92 spec), and have the type of the approximate operand always be the 
>> return type.
>> 
>> This would still leave a decision for float+double, though.  The most 
>> consistent behaviour with that stated above would be to always take the 
>> most approximate type to return (i.e. float), but this would seem to me 
>> to be fairly unexpected for the user.
>> 
>> 
>>> On 12 Oct 2018, at 17:23, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> I agree with what's been said about expectations regarding expressions 
>>> involving floating point numbers. I think that if one of the inputs is 
>>> approximate then the result should be approximate.
>>> 
>>> One thing we could look at for inspiration is the SQL spec. Not to follow 
>>> dogmatically necessarily.
>>> 
>>> From the SQL 92 spec regarding assignment 
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>> "
>>>Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>>>comparable and mutually assignable. If an assignment would result
>>>in a loss of the most significant digits, an exception condition
>>>is raised. If least significant digits are lost, implementation-
>>>defined rounding or truncating occurs with no exception condition
>>>being raised. The rules for arithmetic are generally governed by
>>>Subclause 6.12, "".
>>> "
>>> 
>>> Section 6.12 numeric value expressions:
>>> "
>>>1) If the data type of both operands of a dyadic arithmetic opera-
>>>   

Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Benedict Elliott Smith
This (overflow) is an excellent point, but this also affects aggregations which 
were introduced a long time ago.  They already inherit Java semantics for all 
of the relevant types (silent wrap around).  We probably want to be consistent, 
meaning either changing aggregations (which incurs a cost for changing API) or 
continuing the java semantics here.

This is why having these discussions explicitly in the community before a 
release is so critical, in my view.  It’s very easy for these semantic changes 
to go unnoticed on a JIRA, and then ossify.


> On 2 Oct 2018, at 15:48, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I think we should decide based on what is least surprising as you mention, 
> but isn't overridden by some other concern.
> 
> It seems to me the priorities are
> 
> * Correctness
> * Performance
> * User visible complexity
> * Developer visible complexity
> 
> Defaulting to silent implicit data loss is not ideal from a correctness 
> standpoint.
> 
> Doing something better like using wider types doesn't seem like a performance 
> issue.
> 
> From a user standpoint doing something less lossy doesn't look more complex 
> as long as it's consistent, and documented and doesn't change from version to 
> version.
> 
> There is some developer complexity, but this is a public API and we only get 
> one shot at this. 
> 
> I wonder about how overflow is handled as well. In VoltDB I think we threw on 
> overflow and tended to just do widening conversions to make that less common. 
> We didn't imitate another database (as far as I know) we just went with what 
> least likely to silently corrupt data.
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> 
> Ariel
> 
> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>> ç introduced arithmetic operators, and alongside these 
>> came implicit casts for their operands.  There is a semantic decision to 
>> be made, and I think the project would do well to explicitly raise this 
>> kind of question for wider input before release, since the project is 
>> bound by them forever more.
>> 
>> In this case, the choice is between lossy and lossless casts for 
>> operations involving integers and floating point numbers.  In essence, 
>> should:
>> 
>> (1) float + int = float, double + bigint = double; or
>> (2) float + int = double, double + bigint = decimal; or
>> (3) float + int = decimal, double + bigint = decimal
>> 
>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>> double.  Simply casting between these types changes the value.  This is 
>> what MS SQL Server does.
>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>> is what PostgreSQL does.
>> 
>> The question I’m interested in is not just which is the right decision, 
>> but how the right decision should be arrived at.  My view is that we 
>> should primarily aim for least surprise to the user, but I’m keen to 
>> hear from others.
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>> <mailto:dev-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>> <mailto:dev-h...@cassandra.apache.org>
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> <mailto:dev-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: dev-h...@cassandra.apache.org 
> <mailto:dev-h...@cassandra.apache.org>


Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Benedict Elliott Smith
I agree, in broad strokes at least.  Interested to hear others’ positions.



> On 2 Oct 2018, at 16:44, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I think overflow and the role of widening conversions are pretty linked so 
> I'll continue to inject that into this discussion. Also overflow is much 
> worse since most applications won't be impacted by a loss of precision when 
> an expression involves an int and float, but will care quite a bit if they 
> get some nonsense wrapped number in an integer only expression.
> 
> For VoltDB in practice we didn't run into issues with applications not making 
> progress due to exceptions with real data due to the widening conversions. 
> The range of double and long are pretty big and that hides wrap 
> around/infinity. 
> 
> I think the proposal of having all operations return a decimal is attractive 
> in that these expressions always result in a consistent type. Two pain points 
> might be whether client languages have decimal support and whether there is a 
> performance issue? The nice thing about always returning decimal is we can 
> sidestep the issue of overflow.
> 
> I would start with seeing if that's acceptable, and if it isn't then look at 
> other approaches like returning a variety of types such when doing int + int 
> return a bigint or int + float return a double.
> 
> If we take an approach that allows overflow the ideal end state IMO would be 
> to get all users to run Cassandra in way that overflow results in an error 
> even in the context of aggregation. The road to get there is tricky, but 
> maybe start by having it as an opt in tunable in cassandra.yaml. I don't know 
> how/when we could ever change that as a default and it's unfortunate having 
> an option like this that 99% won't know they should flip.
> 
> It seems like having the default throw on overflow is not as bad as it sounds 
> if you do the widening conversions since most people won't run into them. The 
> change in the column types of results sets actually sounds worse if we want 
> to also improve aggregrations. Many applications won't notice if the client 
> library abstracts that away, but I think there are still cases where people 
> would notice the type changing.
> 
> Ariel
> 
>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
>> This (overflow) is an excellent point, but this also affects 
>> aggregations which were introduced a long time ago.  They already 
>> inherit Java semantics for all of the relevant types (silent wrap 
>> around).  We probably want to be consistent, meaning either changing 
>> aggregations (which incurs a cost for changing API) or continuing the 
>> java semantics here.
>> 
>> This is why having these discussions explicitly in the community before 
>> a release is so critical, in my view.  It’s very easy for these semantic 
>> changes to go unnoticed on a JIRA, and then ossify.
>> 
>> 
>>> On 2 Oct 2018, at 15:48, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> I think we should decide based on what is least surprising as you mention, 
>>> but isn't overridden by some other concern.
>>> 
>>> It seems to me the priorities are
>>> 
>>> * Correctness
>>> * Performance
>>> * User visible complexity
>>> * Developer visible complexity
>>> 
>>> Defaulting to silent implicit data loss is not ideal from a correctness 
>>> standpoint.
>>> 
>>> Doing something better like using wider types doesn't seem like a 
>>> performance issue.
>>> 
>>> From a user standpoint doing something less lossy doesn't look more complex 
>>> as long as it's consistent, and documented and doesn't change from version 
>>> to version.
>>> 
>>> There is some developer complexity, but this is a public API and we only 
>>> get one shot at this. 
>>> 
>>> I wonder about how overflow is handled as well. In VoltDB I think we threw 
>>> on overflow and tended to just do widening conversions to make that less 
>>> common. We didn't imitate another database (as far as I know) we just went 
>>> with what least likely to silently corrupt data.
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
>>> 
>>> Ariel
>>> 
>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict

Implicit Casts for Arithmetic Operators

2018-10-02 Thread Benedict Elliott Smith
CASSANDRA-11935 introduced arithmetic operators, and alongside these came 
implicit casts for their operands.  There is a semantic decision to be made, 
and I think the project would do well to explicitly raise this kind of question 
for wider input before release, since the project is bound by them forever more.

In this case, the choice is between lossy and lossless casts for operations 
involving integers and floating point numbers.  In essence, should:

(1) float + int = float, double + bigint = double; or
(2) float + int = double, double + bigint = decimal; or
(3) float + int = decimal, double + bigint = decimal

Option 1 performs a lossy implicit cast from int -> float, or bigint -> double. 
 Simply casting between these types changes the value.  This is what MS SQL 
Server does.
Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) is what 
PostgreSQL does.

The question I’m interested in is not just which is the right decision, but how 
the right decision should be arrived at.  My view is that we should primarily 
aim for least surprise to the user, but I’m keen to hear from others.
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



  1   2   3   4   >