Re: [tor-bugs] #28799 [Metrics/Website]: Use readr's read_csv() to speed up drawing graphs

2019-05-16 Thread Tor Bug Tracker & Wiki
#28799: Use readr's read_csv() to speed up drawing graphs
-+--
 Reporter:  karsten  |  Owner:  metrics-team
 Type:  enhancement  | Status:  closed
 Priority:  Low  |  Milestone:
Component:  Metrics/Website  |Version:
 Severity:  Normal   | Resolution:  fixed
 Keywords:   |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+--
Changes (by karsten):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 We already [https://gitweb.torproject.org/metrics-
 web.git/commit/?id=a94a3844644041f7c1f6e0a4451e19ce12cae9e8 switched] to
 readr for all remaining graphs in January. Time to close this ticket.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28799 [Metrics/Website]: Use readr's read_csv() to speed up drawing graphs

2018-12-19 Thread Tor Bug Tracker & Wiki
#28799: Use readr's read_csv() to speed up drawing graphs
-+--
 Reporter:  karsten  |  Owner:  metrics-team
 Type:  enhancement  | Status:  assigned
 Priority:  Low  |  Milestone:
Component:  Metrics/Website  |Version:
 Severity:  Normal   | Resolution:
 Keywords:   |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+--
Changes (by karsten):

 * owner:  karsten => metrics-team
 * status:  accepted => assigned


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28799 [Metrics/Website]: Use readr's read_csv() to speed up drawing graphs (was: Use R.cache to speed up drawing graphs)

2018-12-19 Thread Tor Bug Tracker & Wiki
#28799: Use readr's read_csv() to speed up drawing graphs
-+--
 Reporter:  karsten  |  Owner:  karsten
 Type:  enhancement  | Status:  accepted
 Priority:  Low  |  Milestone:
Component:  Metrics/Website  |Version:
 Severity:  Normal   | Resolution:
 Keywords:   |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+--
Changes (by karsten):

 * status:  merge_ready => accepted
 * priority:  Medium => Low


Old description:

> Let's use R.cache to speed up drawing graphs. I already prepared a patch
> that I'm going to post here as soon as I have a ticket number. From the
> commit message:
>
> Over two years ago, in commit 1f90b72 from October 2016, we made our user
> graphs faster by avoiding to read the large .csv file on demand.  Instead
> we read it once as part of the daily update, saved it to disk as .RData
> file using R's save() function, and loaded it back to memory using R's
> load() function when drawing a graph.
>
> This approach worked okay. It just had two disadvantages:
>
>  1. We had to write a small amount of R code for each graph type, which
> is why we only did it for graphs with large .csv files.
>  2. Running these small R script as part of the daily update made it
> harder to move away from Ant towards a Java-only execution model.
>
> The new approach implemented in this commit uses R.cache, which caches
> data for use by concurrent Rserve clients. The first time we read a .csv
> file we save it to the cache, and all subsequent times we just load it
> back from the cache. We're using the file name and last modified time as
> key in the cache to avoid using stale data. We're also clearing the cache
> on startup to avoid running out of disk space.
>
> One somewhat unwanted side effect is that drawing the first graph from a
> new .csv file may take a few more seconds as compared to drawing
> subsequent graphs. This seems acceptable, though.
>
> Requires installing the R.cache package from CRAN, which is available on
> Debian as r-cran-r.cache.

New description:

 Let's use R.cache to speed up drawing graphs. I already prepared a patch
 that I'm going to post here as soon as I have a ticket number. From the
 commit message:

 Over two years ago, in commit 1f90b72 from October 2016, we made our user
 graphs faster by avoiding to read the large .csv file on demand.  Instead
 we read it once as part of the daily update, saved it to disk as .RData
 file using R's save() function, and loaded it back to memory using R's
 load() function when drawing a graph.

 This approach worked okay. It just had two disadvantages:

  1. We had to write a small amount of R code for each graph type, which is
 why we only did it for graphs with large .csv files.
  2. Running these small R script as part of the daily update made it
 harder to move away from Ant towards a Java-only execution model.

 The new approach implemented in this commit uses R.cache, which caches
 data for use by concurrent Rserve clients. The first time we read a .csv
 file we save it to the cache, and all subsequent times we just load it
 back from the cache. We're using the file name and last modified time as
 key in the cache to avoid using stale data. We're also clearing the cache
 on startup to avoid running out of disk space.

 One somewhat unwanted side effect is that drawing the first graph from a
 new .csv file may take a few more seconds as compared to drawing
 subsequent graphs. This seems acceptable, though.

 Requires installing the R.cache package from CRAN, which is available on
 Debian as r-cran-r.cache.

 '''Edit: Turns out that we don't want R.cache but readr's read_csv()
 instead. See comments below.'''

--

Comment:

 Thanks for looking! Merged with a small
 [https://gitweb.torproject.org/user/karsten/metrics-
 web.git/commit/?h=task-28799-2&id=fecafc07b99798946308bbb3615c15bb0ce6a30f
 tweak], and deployed.

 Setting back to accepted for the remaining graphs after we gathered some
 more experience with this new approach. That could easily happen in 2019.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs