Re: [iovisor-dev] minutes: IO Visor TSC/Dev Meeting

2017-05-17 Thread Alexei Starovoitov via iovisor-dev
On Wed, May 17, 2017 at 2:14 PM, Brenden Blanco via iovisor-dev
 wrote:
> Hi All,
>
> Thanks for attending the call today, here are the notes.

thanks a lot for taking and publishing these notes!

> CFP for Linux Plumbers
> Sept 13-15, Los Angeles
> Tracing microconference accepted
>  (Alexei, Brendan, Josef leading topics)
>  http://wiki.linuxplumbersconf.org/2017:tracing
> Last year was successful, looking forward to more topics
>  (tbd: find previous etherpad)

https://etherpad.openstack.org/p/LPC2016_Tracing
a bit terse, but if you were there last year, it's a good memory refresh :)
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Three audiences

2017-05-17 Thread Suchakra via iovisor-dev
Hmm, ok I course-correct myself and add 1 more cent to the discussion.

So, as per my understanding of Brendan's thought process, I think
there is greater value in inserting BPF/bcc at the snap and collectd
stage (so that we basically outsource the trace management to these
early event collection interfaces). Alternatively, I think we can have
a BCC specific uniform interface that provides this trace management
and control (and risk getting in domain of sounding like a complete
tracing system). I would let experts give their opinion on what is
more congenial here. A uniform interface would allow a better control
over what is collected. This would allow an interface to other event
collection systems such as snap etc. It may also simplify the task of
custom-implementations of plugins for all such systems eventually

> Scope by Weaveworks:
>   * HTTP Stats plugin :
> https://github.com/weaveworks-plugins/scope-http-statistics
>   * tcptracer-bpf : https://github.com/weaveworks/tcptracer-bpf
> [Kinvolk's effort]
>
> Vector
>   * There was some discussion that died down :
> https://github.com/Netflix/vector/issues/74
>
> Graylog
> InfluxDB - time series event collection
> Prometheus
> Fluentd
> Grafana - time series visualizations
> Sysdig
> LTTng - if they agree to support events collected from eBPF. It
> already allows perf contexts
> (http://lttng.org/docs/v2.9/#doc-adding-context)
> Trace Compass (visualize stored CTF/XML/JSON or "whatever format"
> traces. We can define our own views in XML that we may package with
> eBPF tools - stuff which I am working on in my free time)
> HoneyComb
> Catapult (Chrome Trace Viewer)

So, in view of the above, out of this list, the only relevant addition
I gave is fluentd/LTTng. We just need to be sure that collected event
data format is compact enough so that we don't lose events as they are
delivered. I have mostly worked with post-mortem analysis usecase (C2)
only and primary concern there is less event loss as incoming trace
data is huge and analysis needs to have as much events as possible for
a better picture.

--
Suchakra
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


[iovisor-dev] minutes: IO Visor TSC/Dev Meeting

2017-05-17 Thread Brenden Blanco via iovisor-dev
Hi All,

Thanks for attending the call today, here are the notes.

==

CFP for Linux Plumbers
Sept 13-15, Los Angeles
Tracing microconference accepted
 (Alexei, Brendan, Josef leading topics)
 http://wiki.linuxplumbersconf.org/2017:tracing
Last year was successful, looking forward to more topics
 (tbd: find previous etherpad)

==
BCC has grown, albeit with some maturity/sore spots

We have a well established core group, but probably need to expand our
feedback
to incorporate those that may not be speaking up. Think users of the tools
rather than developers.

How do we get broader user-group feedback besides mailing list/github
issues?

Brendan:
#1 feedback he has heard - simplify syntax, still compared to dtrace
 awareness via conferences, etc.

Alexei:
 internally, writing programs is hard, struggling with verifier
  - working on llvm improvements
 many kernel hackers @fb have hacked up bcc tools for their own use case

Brenden:
 reach larger user group => collectd integration?
  command line users are the smaller subset of our potential audience
 intel snapd
 collectd
 others?

Going forward:
It's a big library, needs shrink/split
 both in terms of memory usage and disk size
 split out clang/llvm
 split out loader
AI: get some proposals into bcc/issues, either in documentation form to
start
 with, or code
 split header files
 split libraries
 start thinking about external tool integration

==
Alexei:
has started thinking about program chaining (generically)

mistake: tail call solved a need, but helper/map model makes it inflexible

prog chaining should be a single native instruction
 (not done until now because "too hard to verify")
now should be possible to introduce and verify cleanly
direct, indirect, and tail call support
 (stack preserving)
3 new bpf instructions
introduce global variables
verifier changes started
llvm support
 (looks like normal c)
configurable stack upper bound of function
loader would need to parse elf file
main() function concept, still callable via tail call

So far, the above is perhaps 10% compelete, so stay tuned.

==
Attendees:
Alexei Starovoitov
Bilal Anwer
Jesper Brouer
Martin Lau
Mauricio Vasquez
Brenden Blanco
Andy Gospodarek
Nic Viljoen
Daniel borkmann
Suchackra Sharma
Brendan Gregg
John Fastabend
Teng Qin
Jakub Kicinski
Marco Leogrande
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


[iovisor-dev] Three audiences

2017-05-17 Thread Brendan Gregg via iovisor-dev
As discussed on the call, here's who I think are the three audiences for
bcc/BPF tracing:

A) Kernel hackers (~100): who would like the language to be easier, better
errors, docs, etc.
B) Sysadmins/operators/senior devs (~5k): who will use the existing bcc
tools (and edit some).
C) Everyone else (>50k): via dashboards/GUIs.

I'm pretty happy with where the toolset is for audience B. There's a few
more tools I'd like to add (TCP internals and buffering, disk I/O
internals), but a lot can be done now.

Audience A might be served by something like ply, or partially served by
Sasha's multi-tools.

But we haven't spoken much about audience C, which will end up the biggest.
And that will include most of the developers at Netflix (I think only a few
of us are going to ssh onto instances and run the bcc tools, even though we
install them by default now).

I did file this for Intel snap:
https://github.com/intelsdi-x/snap/issues/800 . Sounds like they may need
our help. We should also file something similar for collectd... What else?

Brendan
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev