Flaky tablet_history_gc-itest

2016-09-26 Thread Todd Lipcon
This test has gotten flaky with a concerning failure mode (seeing "wrong"
results, not just a timeout or something):

http://dist-test.cloudera.org:8080/test_drilldown?test_name=tablet_history_gc-itest

It seems like it got flaky starting with Alexey's
commit bc14b2f9d775c9f27f2e2be36d4b03080977e8fa which switched it to use
AUTO_FLUSH_BACKGROUND. So perhaps the bug is actually a client bug and not
anything to do with GC.

Alexey, do you have time to take a look, and perhaps consult with Mike if
you think it's actually a server-side bug?

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Design doc for election storm mitigation

2016-09-26 Thread Todd Lipcon
Hi all,

I put a link to this in a JIRA comment, but figured I'd send a note to dev@
as well since it's easy to miss JIRA comments on issues you aren't watching.

Here's a document which covers an election storm issue that we've been
seeing in some of the more heavily-loaded test clusters at Cloudera, and
particularly badly in one where we are testing DWH-like workloads (TPC-DS,
TPC-H):

https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit

I've seen some users on the mailing list and Slack complaining of issues
which might be attributed to this as well, so I think it's important to
make some improvements in this area sooner rather than later.

The design document contains some info on how to reproduce and measure the
issue, as well as a list of ideas which could help fix the problem. I see
it more as a "roadmap of incremental improvements" rather than a "we must
complete 100% of these items". Perhaps if we just tackle the top items (in
terms of bang-for-buck) the problem will be sufficiently addressed that we
don't need to do the more difficult items.

Please take a look and feel free to leave comments/suggestions.
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Kudu Build: Unsupported options error

2016-09-26 Thread Valencia Serrao


Hi Dan,

Thanks for the quick response.

As you suggested, I will disable TSAN support in build-definitions,sh and
update you about it .

Meanwhile, please let me know if its advisable to wait on the kudu port
till Adar lands the reorganisation to the thirdparty ?

Regards,
Valencia





From:   Dan Burkert 
To: kudu-dev 
Cc: Manish Patil/Austin/Contr/IBM@IBMUS, Sudarshan
Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
Panpaliya/Austin/Contr/IBM@IBMUS, Valencia
Serrao/Austin/Contr/IBM@IBMUS
Date:   09/23/2016 10:40 PM
Subject:Re: Kudu Build: Unsupported options error



Valencia,

If these are just issues with building libstdc++ in thirdparty, then the
effect on Kudu should be negligible.  libstdc++ is *only* used (linked-to)
when building with thread sanitizer (TSAN) support.  Right now we
unconditionally build thirdparty libraries for TSAN when compiling on Linux
(including libstdc++), but you could easily opt-out by changing the
conditionals here:
https://github.com/apache/kudu/blob/master/thirdparty/build-thirdparty.sh#L86
.  It should also be noted that in the next few days Adar will be landing a
big reorganization to the thirdparty build that will make thirdparty
libraries build for TSAN when requested.  It's also going to switch out
libstdc++ for libc++ in thirdparty (still only used for TSAN).

- Dan

On Fri, Sep 23, 2016 at 2:47 AM, Valencia Serrao 
wrote:

  Hi All,

  I am building Kudu on ppc64le. However, while building Kudu's thirdparty
  "libstdc++-v3", I have encountered following errors:
  Error1: clang-3.8: error: unsupported option '-print-multi-os-directory'
  Error2: clang-3.8: error: unknown argument: '-mlong-double-64'

  I couldn't find any documentation to find the equivalent options to be
  used
  on ppc64le. To get around the Error1, , I have removed the
  -print-multi-os-directory option and continued with the build.

  The build has surely progressed with this work-around, however, I need to
  know:
  1. What is the impact of removing "-print-multi-os-directory" ? Will it
  significantly affect the functionality  for Impala ?
  2. Can I take the same approach for the Error2 ? What will be the impact
  on
  functionality ?
  3. If removing the above two options has a significant impact, then,
  could
  you share the documentation/equivalent options to be used for ppc64le ?

  Any pointers on this issue will be helpful.

  Regards,
  Valencia,