[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Ottomata
Ottomata added a comment.

I'm not very opinionated here, but I think an analytics graphite instance will 
be very useful for other things than just this ticket, especially if these 
metrics are backed up by data in HDFS, and graphite is used as a visualization 
and rollup tool.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata
Cc: Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, 
Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

> As far as I understand Graphite, using it as a source for the backup means 
> that we are loosing data after the retention cutoff. So the source for the 
> backup needs to be somewhere before things are written into Graphite?


Yes, but we can simply set retention to 100 years and leave it for future us to 
worry about.
At a guess we won't still be using graphite, at least not in this version in 
100 years.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

I am not sure why this is considered to be "a simple use case" since as 
mentioned in https://phabricator.wikimedia.org/T117735 there are at least two 
different requirements.  Content metrics require long term (non-decaying) 
storage, operational metrics do not.

Whisper (Graphite's database) is not robust and has a fixed size.  Even the 
documentation says it is not "disk space efficient".   Of course, if we assume 
that the need is only to record a small number of data points with a low 
resolution, none of this matters.

The added complexity of introducing backups and HDFS,, etc. to the Graphite 
proposition does not seem "simple".  Also, the puppet module would still need 
to be reconfigured/modified as @Addshore tried to do, for long term retention, 
but this does not solve the archiving problem.   There has to be a built in way 
to preserve and "snapshot" the database, or else it could be a real pain to 
restore.  And, in the interim period from snapshot to restoration all 
measurements would be lost, unless it were on a cluster.

As far as I know, Cassandra can also run on a single instance, it does not need 
a cluster.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

As said above whatever the solution we will want to take backups in some form 
or another.
Having to do backups of graphite is trivial, adding data back if something goes 
wrong is trivial.

> But see my previous post on how to do backups from Graphite in a way that 
> sufficiently works around that.


Yes, this would work. And we would need to do a similar thing for any backup 
solution, not matter what we are backing up.
I mean, as a really crude second storage method we could simply write 
everything we send to graphite to a text file, then re adding in the case 
something goes wrong would be as simple as replaying the file.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread JanZerebecki
JanZerebecki added a comment.

I agree with https://phabricator.wikimedia.org/T117732#1793386. (Though I have 
no clear opinion on whether it should be a different instance than the one used 
for operations metrics.)
@Addshore Having the primary storage of these metrics in HDFS, which I assume 
has the backup question solved, sound like another way to address my concern 
from above. Which one should we use for the Wikidata related metrics that are 
not generated in Hadoop?


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JanZerebecki
Cc: Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, 
Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

If not HBase, what about Cassandra?  This is already puppetized.  At least you 
will be using a storage solution that is designed for HDFS.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

> And when this setting changes without us noticing until the backup is 
> overwritten, it overwrites something that is supposed to be append only


Quote from http://graphite.readthedocs.org/en/latest/config-carbon.html :

> This retention is set at the time the first metric is sent.


This means that if the configuration changes retention will remain the same for 
metrics and thus the data will remain.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread JanZerebecki
JanZerebecki added a comment.

> I really don't know why we are all expecting graphite to unexpectedly loose 
> our data?


I really don't know why we are all expecting this winter to unexpectedly get 
colder than the summer?
That German public long distance train operators don't expect the winter, 
doesn't mean you should make the same error.
Something being designed for never permanently keeping data and always deleting 
it after a certain time would surprise me if it were good for permanent long 
term storage. Notice how there is no indefinite retention setting.
But see my previous post on how to do backups from Graphite in a way that 
sufficiently works around that.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JanZerebecki
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread JanZerebecki
JanZerebecki added a comment.

In https://phabricator.wikimedia.org/T117732#1794036, @Addshore wrote:

> > As far as I understand Graphite, using it as a source for the backup means 
> > that we are loosing data after the retention cutoff. So the source for the 
> > backup needs to be somewhere before things are written into Graphite?
>
>
> Yes, but we can simply set retention to 100 years and leave it for future us 
> to worry about.


And when this setting changes without us noticing until the backup is 
overwritten, it overwrites something that is supposed to be append only. Seems 
simpler to write to Graphite and something else in addition. But if you really 
want to backup from Graphite, I guess you could keep infinite backup copies to 
prevent this. Then one also needs to make sure to run the backup each time 
after we write new data to Graphite.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JanZerebecki
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

> Content metrics require long term (non-decaying) storage, operational metrics 
> do not.


Both cases can be covered by configuration

> Of course, if we assume that the need is only to record a small number of 
> data points with a low resolution, none of this matters.


That is my current assumption, backed by having a limited number of things to 
record (an incredibly small amount compared with what is on the current 
graphite instance.

> The added complexity of introducing backups and HDFS


Well, we need not add HDFS. Backups can simply call the API and dump a TSV, 
which I guess could easily be stored in HDFS, or somewhere else. Or just a cron 
backing up actually graphite database. This could even just live on labs.?

I really don't know why we are all expecting graphite to unexpectedly loose our 
data? If you configure it to keep the data for 100 years / 25 years / whatever 
it will. I see no reports of parts of graphites databases vanishing when not 
already configured to do so.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Ottomata
Ottomata added a comment.

Cassandra is a complicated clustered solution, graphite will just require a 
simple single instance running and intaking events somewhere.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread fgiunchedi
fgiunchedi added a comment.

@addshore to clarify, more than functionality I was pointing out guarantees 
about the data stored. if the metrics are also being archived to hdfs for 
example so it is possible to dump/load into graphite then IMO that's acceptable.
re: analytics graphite instance, I think there's value in a single shared 
instance for ease of use, even though for example grafana supports mixed 
dashboards so it is possible to collate multiple graphite sources


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: fgiunchedi
Cc: Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, 
Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

If you are going to use HDFS, why not just use HBase instead of Graphite?


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Nuria
Nuria added a comment.

I second @ottomata


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nuria
Cc: Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, 
Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

> I think there's value in a single shared instance for ease of use


Well, this was also my initial thought. Until Joe said:

> any opsen if given the choice between a 10 minutes downtime of a monitoring 
> tool and dropping old data will choose the latter


@JanZerebecki I imagine that wherever we put the data, we will backup.
Right now they are mainly stored in SQL tables... which I backup / export.
If they were started in graphite I would likely run a daily / weekly export too.
In-fact even if they were primarily in HDFS I would still want them backed up 
elsewhere.

> Which one should we use for the Wikidata related metrics that are not 
> generated in Hadoop?


Well, as it stands basically no metrics are generated in hadoop.

We simply want to store time series data / numbers and timestamps. Have them 
easily writeable, accessible, backed up, integrated into other solutions. 
Graphite + a backup script / weekly / daily export does all of this.

If months down the line we decide on a potential better solution that the 
analytics team may also like migrating to it should be trivial.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread JanZerebecki
JanZerebecki added a comment.

As far as I understand Graphite, using it as a source for the backup means that 
we are loosing data after the retention cutoff. So the source for the backup 
needs to be somewhere before things are written into Graphite?


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JanZerebecki
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Addshore
Addshore added a comment.

Per http://graphite.readthedocs.org/en/latest/overview.html Graphite does 
exactly what we need here.
It "**Stores numeric time-series data**" and "**Renders graphs of this data on 
demand**" (as well as providing an API for data access.

We want to says that the value of X at point in time Y was Z, again exactly 
what graphite accepts, I mean the graphite plain text protocol is:



Of course if people feel this is really unlikely to happen then we can just 
close this and write our own small solution.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, 
Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs