subject:"Re\: improving access to telemetry data\(Help Wanted\)"

Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Taras Glek




Justin Lebar wrote:

It sounds to me like people want both

1) Easier access to aggregated data so they can build their own
dashboards roughly comparable in features to the current dashboards.


I doubt people actually want to build own dashboards. I suspect this is 
mainly a need because of deficiencies in the current dashboard.




2) Easier access to raw databases so that people can build up more
complex analyses, either by exporting the raw data from the db, or by
analyzing it in the db.

That is, I don't think we can or should export JSON with all the data
in our databases.  That is a lot of data.


From concrete examples I've seen so far, people want basic 
aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works 
on aggregated histogram JSONs. It seems completely reasonable to 
aggregate all of the other info + simple_measurement fields(and is on my 
TODO). This would solve all of the other concrete use-cases mentioned 
(flash versions, hardware stats)


I think we can be more aggressive still. We can also allow filtering 
certain histograms by one of those highly variable info fields(eg TAB 
animations vs gfx hardware, specific chromehangs vs something useful, 
etc) without unreasonable overhead overhead.


I like my aggregated JSON approach because it's cheap on server CPU and 
as long as one partitions JSON carefully, it can be compact-enough for 
gzip encoding to make it fast-enough to download. This should also make 
it easy to fork the dashboards, contribute, etc.


I hope to feed more data into my frontend by end of today and will aim 
for a live-ish dashboard by end of next week.


For advanced use-cases, we can stick with hadoop querying.

==Help wanted==

If anyone knows a dev who is equally good at stats  programming, let me 
know. I think we have a lot of useful data, we can handle some 
visualizations of that data, but a person skilled at extracting signal 
out of noisy sources could help us squeeze the most use out of our data.



I spend too much time on management to make quick progress. I wrote up 
the prototype to prove to myself that the json schema is feasible.


If someone wants to help with aggregations, I can hook you up with raw 
json dumps from hadoop. For everything else, the code is on 
github(https://github.com/tarasglek/telemetry-frontend).
Help wanted: UX improvements such as easier-to-use selectors, 
incremental search, switching to superior charting such as flotcharts.org




On Thu, Feb 28, 2013 at 12:08 PM, Benjamin Smedberg
benja...@smedbergs.us  wrote:

On 2/28/2013 10:59 AM, Benoit Jacob wrote:

Because the raw crash files do not include new metadata fields, this has
led to weird engineering practices like shoving interesting metadata into
the freeform app notes field, and then parsing that data back out later.
I'm worried about perpetuating this kind of behavior, which is hard on
the
database and leads to very arcane queries in many cases.


I don't agree with the notion that freeform fields are bad. freeform plain
text is an amazing file format. It allows to add any kind of data without
administrative overhead and is still easy to parse (if the data was that
was added was formatted with easy parsing in mind).

The obvious disadvantage is that it is much more difficult to
machine-process. For example elasticsearch can't index on it (at least not
without lots of custom parsing), and in general you can't ask tools like
hbase or elasticsearch to filter on that without a user defined function.
(Regexes might work for some kinds of text processing.)


But if one considers it a bad thing that people use it, then one should
address the issues that are causing people to use it. As you mention, raw
crash files may not include newer metadata fields. So maybe that can be
fixed by making it easier or even automatable to include new fields in raw
crash files?

Yes, that is all filed. We can't automatically include the field, because we
don't know whether they are supposed to be public or private, but we should
soon be able to have a dynamically updateable list.

Note that if mcmanus is correct, we're going to be dealing with 1M fields
per day here. That's a lot more than the 250k from crash-stats, especially
because the payload is bigger. I believe that the flat files from
crash-stats are a really useful kludge because we couldn't figure out a
better way to expose the raw data. But that kludge will start to fall over
pretty quickly, and perhaps we should just expose a better way to do queries
using the databases, which are surprisingly good at doing these kinds of
queries efficiently.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Robert Kaiser


Taras Glek schrieb:

I doubt people actually want to build own dashboards. I suspect this is
mainly a need because of deficiencies in the current dashboard.


I disagree. I think people will want to integrate Telemetry data in 
dashboards that connect data from different sources, and not just 
Telemetry. That might be combinations with FHR data, with crash data, or 
even other things.
I for example would love to have stability-related data from all those 
sources be trimmed down by a dashboard to digestible this channel looks 
good/bad values.


Robert Kaiser
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Taras Glek




Robert Kaiser wrote:

Taras Glek schrieb:

I doubt people actually want to build own dashboards. I suspect this is
mainly a need because of deficiencies in the current dashboard.


I disagree. I think people will want to integrate Telemetry data in
dashboards that connect data from different sources, and not just
Telemetry. That might be combinations with FHR data, with crash data, or
even other things.
I for example would love to have stability-related data from all those
sources be trimmed down by a dashboard to digestible this channel looks
good/bad values.
You are correct. There is a valid use-case for integrating subsets of 
telemetry data into wikis, other dashboards, etc.


Taras


Robert Kaiser

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: improving access to telemetry data(Help Wanted)

Re: improving access to telemetry data(Help Wanted)

Re: improving access to telemetry data(Help Wanted)

3 matches

Site Navigation

Mail list logo

Footer information