Re: Flaky tablet_history_gc-itest

2016-11-28 Thread Alexey Serbin
Mike,

Thank you for clarifying on this and opening the JIRA item which tracks the
issue.

I'm not an expert on exactly-once semantics, but it seems David provided
more information on KUDU-1761.

Indeed, when using the auto background flush the client library does not
provide a means to guarantee the order of operation completion on the
server side, unless the number of batches is limited to 1 (by default it's
2).  From the client perspective, the client sends the operations
sequentially, but due to the background flushing they might be sent to the
server in parallel and completing at the server side even in reverse
order.  So, if the exactly-once semantics does not play here, we should
address this in a different way.


Best regards,

Alexey

On Thu, Nov 24, 2016 at 7:32 AM, Mike Percy  wrote:

> I investigated this further today and I believe this has to do with
> concurrent client flushes interleaving, resulting in out-of-order writes on
> the server side.
>
> More details in https://issues.apache.org/jira/browse/KUDU-1761
>
> David / Alexey, based on your knowledge of exactly-once semantics and the
> client flush path does my explanation make sense? I'll dig into it further
> but based on my hazy memory I think we allow 2 outstanding flushes at a
> time and do not enforce ordering with our exactly-once mechanism, only
> idempotence of individual operations (IIRC we do not do flow control if a
> client sequence number is skipped).
>
> Mike
>
> On Tue, Nov 22, 2016 at 5:10 PM, Mike Percy  wrote:
>
> > Alexey says that KuduSession::Flush() already blocks on all outstanding
> > async flushes, and apparently that's true, so either there is a
> non-obvious
> > bug there or it's caused by something else.
> >
> > Mike
> >
> > On Tue, Nov 22, 2016 at 4:46 PM, Mike Percy  wrote:
> >
> >> Hi Alexey,
> >> Thank you very much for investigating this test and providing an easy
> >> repro case. I just got around to spending some time on this due to
> David's
> >> read-your-writes (RYW) patches adding to this test. I think we should
> try
> >> to make this test reliable to be sure RYW doesn't have anomalies we
> aren't
> >> aware of. I applied your (second) patch mentioned earlier in this thread
> >> and was able to quickly reproduce the issues you described.
> >>
> >> I suspect that the problem stems from the fact that in the C++ client,
> >> session->Flush() is a no-op if an async flush is already in-progress. At
> >> least, it looks to me like those are the current semantics. If I remove
> the
> >> session->FlushAsync(null) from your patch then the test passes for me.
> >>
> >> The problem with a non-blocking Flush() while an async flush is
> >> outstanding that is that tablet_history_gc-itest relies on the following
> >> series of operations in order to ensure it knows what the consistent
> set of
> >> values should be for a particular timestamp:
> >>
> >>   FlushSessionOrDie(session);
> >> }
> >> SaveSnapshot(std::move(snapshot), clock_->Now());
> >>
> >> So it waits until the data is flushed to the server and then captures
> the
> >> current server timestamp as a key for its record of expected data
> values at
> >> that timestamp. The above appears in all of the write code paths in the
> >> test. If FlushSessionOrDie() is an async no-op, and the client has not
> >> truly flushed outstanding data at the current time stamp, then the
> >> "SaveSnapshot()" code path will think the latest writes are included in
> the
> >> snapshot for the current time, but in actuality the old value will be
> >> stored for that timestamp on the server.
> >>
> >> This matches with the observed behavior based on running the (patched)
> >> test with --vmodule=tablet_history_gc-itest=2 and seeing the following
> >> in the log:
> >>
> >> I1122 15:42:48.355686 15303 tablet_history_gc-itest.cc:166] Saving
> >> snapshot at ts = P: 10 usec, L: 1 (40961)
> >> I1122 15:42:48.355768 15303 tablet_history_gc-itest.cc:380] Starting
> >> round 1
> >> ...
> >> I1122 15:42:48.355917 15303 tablet_history_gc-itest.cc:463] Updating
> row
> >> to { 511, 1535296252, 1535296251, NOT_DELETED }
> >> I1122 15:42:48.355926 15303 tablet_history_gc-itest.cc:463] Updating
> row
> >> to { 511, 1708088659, 1708088658, NOT_DELETED }
> >> ...
> >> I1122 15:42:48.975769 15303 tablet_history_gc-itest.cc:166] Saving
> >> snapshot at ts = P: 10 usec, L: 514 (409600514)
> >> I1122 15:42:48.975796 15303 tablet_history_gc-itest.cc:496] Updated 512
> >> rows
> >> I1122 15:42:48.975807 15303 tablet_history_gc-itest.cc:380] Starting
> >> round 2
> >> ...
> >> I1122 15:42:54.547406 15303 tablet_history_gc-itest.cc:213] Round 26:
> >> Verifying snapshot scan for timestamp P: 10 usec, L: 1559
> >> (409601559)
> >> ../../src/kudu/integration-tests/tablet_history_gc-itest.cc:242:
> Failure
> >> Value of: int_val
> >>   Actual: 1535296252
> >> Expected: snap_iter->second.int_val
> >> Which is: 1708088659
> >> at row key 5

Re: checkstyle on console during precommit builds

2016-11-28 Thread Dan Burkert
Thanks, Jordan!

On Mon, Nov 28, 2016 at 10:17 AM, Jordan Birdsell 
wrote:

> This has gone in, you will need to rebase any patches if you want to avoid
> the extra large build output.
>
> On Fri, Nov 25, 2016, 4:25 PM Jordan Birdsell 
> wrote:
>
> > This appears to have done it https://gerrit.cloudera.org/#/c/5225/.
> >
> > Used a SuppressionFilter module in the style config and that seemed to
> > work.
> >
> > On Fri, Nov 25, 2016 at 12:02 PM Mike Percy  wrote:
> >
> > I am familiar with checkstyle on Maven and could potentially take a look
> at
> > this, but it won't be before Tuesday.
> >
> > Mike
> >
> > On Wed, Nov 23, 2016 at 8:08 PM, Dan Burkert  wrote:
> >
> > > Nevermind, that didn't fix it.
> > >
> > > On Wed, Nov 23, 2016 at 11:59 AM, Dan Burkert 
> wrote:
> > >
> > > > Still double checking, but this seems to have done the trick:
> > > > https://gerrit.cloudera.org/#/c/5206/.  Not sure why, probably never
> > > will.
> > > >
> > > > - Dan
> > > >
> > > > On Wed, Nov 23, 2016 at 11:50 AM, Todd Lipcon 
> > wrote:
> > > >
> > > >> On Wed, Nov 23, 2016 at 11:13 AM, Dan Burkert 
> > wrote:
> > > >>
> > > >>> I'm looking into this a bit.  It does get disabled on generated
> > sources
> > > >>> when I run it locally, and there is a line in the pom.xml
> > > >>> 
> > calling
> > > >>> that out.  I haven't been able to pinpoint what is different on the
> > > build
> > > >>> slaves yet.  The version of checkstyle appears to match.
> > > >>>
> > > >>
> > > >> hrm, that's odd... are we pinning the maven plugin version? I doubt
> > JVM
> > > >> version makes a difference, but perhaps...
> > > >>
> > > >> When you run locally using the same invocation as
> > > >> build-support/jenkins/build-and-test.sh it does the same?
> > > >>
> > > >>
> > > >>>
> > > >>> - Dan
> > > >>>
> > > >>> On Wed, Nov 23, 2016 at 10:45 AM, William Berkeley <
> > > >>> wdberke...@cloudera.com> wrote:
> > > >>>
> > >  Sorry. Will get to it asap. Away from home right now. Should be
> able
> > > to
> > >  disable on generated source.
> > > 
> > >  On Wed, Nov 23, 2016 at 12:59 Todd Lipcon 
> > wrote:
> > > 
> > >  > Ping?
> > >  >
> > >  > On Sun, Nov 20, 2016 at 7:53 PM, Todd Lipcon  >
> > >  wrote:
> > >  >
> > >  > Hey folks,
> > >  >
> > >  > It seems after we enabled checkstyle on java builds, there's
> now a
> > >  ton of
> > >  > spew in the log for precommit builds, eg:
> > >  > http://104.196.14.100/job/kudu-gerrit/4670/BUILD_TYPE=TSAN/c
> > >  onsoleText
> > >  >
> > >  > the build log is now 12MB, about 11MB of which appears to be
> > 52,045
> > >  > checkstyle warnings. Of those, 51962 appear to be in generated
> > code
> > > :)
> > >  >
> > >  > So, a few questions:
> > >  > 1) can we get the checkstyle output to go to a separate file
> > instead
> > >  of
> > >  > the console?
> > >  > 2) can we disable checkstyle on generated sources?
> > >  >
> > >  > -Todd
> > >  > --
> > >  > Todd Lipcon
> > >  > Software Engineer, Cloudera
> > >  >
> > >  >
> > >  >
> > >  >
> > >  > --
> > >  > Todd Lipcon
> > >  > Software Engineer, Cloudera
> > >  >
> > > 
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Todd Lipcon
> > > >> Software Engineer, Cloudera
> > > >>
> > > >
> > > >
> > >
> >
> >
>


Re: checkstyle on console during precommit builds

2016-11-28 Thread Jordan Birdsell
This has gone in, you will need to rebase any patches if you want to avoid
the extra large build output.

On Fri, Nov 25, 2016, 4:25 PM Jordan Birdsell 
wrote:

> This appears to have done it https://gerrit.cloudera.org/#/c/5225/.
>
> Used a SuppressionFilter module in the style config and that seemed to
> work.
>
> On Fri, Nov 25, 2016 at 12:02 PM Mike Percy  wrote:
>
> I am familiar with checkstyle on Maven and could potentially take a look at
> this, but it won't be before Tuesday.
>
> Mike
>
> On Wed, Nov 23, 2016 at 8:08 PM, Dan Burkert  wrote:
>
> > Nevermind, that didn't fix it.
> >
> > On Wed, Nov 23, 2016 at 11:59 AM, Dan Burkert  wrote:
> >
> > > Still double checking, but this seems to have done the trick:
> > > https://gerrit.cloudera.org/#/c/5206/.  Not sure why, probably never
> > will.
> > >
> > > - Dan
> > >
> > > On Wed, Nov 23, 2016 at 11:50 AM, Todd Lipcon 
> wrote:
> > >
> > >> On Wed, Nov 23, 2016 at 11:13 AM, Dan Burkert 
> wrote:
> > >>
> > >>> I'm looking into this a bit.  It does get disabled on generated
> sources
> > >>> when I run it locally, and there is a line in the pom.xml
> > >>> 
> calling
> > >>> that out.  I haven't been able to pinpoint what is different on the
> > build
> > >>> slaves yet.  The version of checkstyle appears to match.
> > >>>
> > >>
> > >> hrm, that's odd... are we pinning the maven plugin version? I doubt
> JVM
> > >> version makes a difference, but perhaps...
> > >>
> > >> When you run locally using the same invocation as
> > >> build-support/jenkins/build-and-test.sh it does the same?
> > >>
> > >>
> > >>>
> > >>> - Dan
> > >>>
> > >>> On Wed, Nov 23, 2016 at 10:45 AM, William Berkeley <
> > >>> wdberke...@cloudera.com> wrote:
> > >>>
> >  Sorry. Will get to it asap. Away from home right now. Should be able
> > to
> >  disable on generated source.
> > 
> >  On Wed, Nov 23, 2016 at 12:59 Todd Lipcon 
> wrote:
> > 
> >  > Ping?
> >  >
> >  > On Sun, Nov 20, 2016 at 7:53 PM, Todd Lipcon 
> >  wrote:
> >  >
> >  > Hey folks,
> >  >
> >  > It seems after we enabled checkstyle on java builds, there's now a
> >  ton of
> >  > spew in the log for precommit builds, eg:
> >  > http://104.196.14.100/job/kudu-gerrit/4670/BUILD_TYPE=TSAN/c
> >  onsoleText
> >  >
> >  > the build log is now 12MB, about 11MB of which appears to be
> 52,045
> >  > checkstyle warnings. Of those, 51962 appear to be in generated
> code
> > :)
> >  >
> >  > So, a few questions:
> >  > 1) can we get the checkstyle output to go to a separate file
> instead
> >  of
> >  > the console?
> >  > 2) can we disable checkstyle on generated sources?
> >  >
> >  > -Todd
> >  > --
> >  > Todd Lipcon
> >  > Software Engineer, Cloudera
> >  >
> >  >
> >  >
> >  >
> >  > --
> >  > Todd Lipcon
> >  > Software Engineer, Cloudera
> >  >
> > 
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> >
>
>