[weewx-development] Re: Long report generation time when using series

[email protected] Mon, 13 Jun 2022 13:44:56 -0700

I tested your skin, and I observed that even with tables disabled, the 
"statistics.htlm" page take 2 minutes to generate, on a Raspberry PI 3B+, 
with a MySQL database containing data from 2005 up to today.


As far as I know, the Season skin is limiting the history to the current 
year, with an aggregation period of 1 day for all parameters.

The statistics.html  page of your skin is looking for *"all-time" *values, 
and if I am not wrong , it is doing several  separate queries : tables with 
an aggregation time of 1 day for all parameters, and diagrams with  various 
aggregation periods, depending  on the parameter.  The imply that several 
queries for all time data will be performed for the same parameter. Why not 
first choosing a given aggregation time, doing one query for each 
parameter, and using it for both charts and table?

That said, don't forget that the time needed to produce all-time charts and 
tables will anyway increase with time, since the database size will also 
increase with time.

The final result will depend also  on the archive interval configured by a 
user. I suspect that a with an archive interval of less than 5 minutes (I 
know that users have a 1 minutes interval ) the amount of data to process 
and the time needed to perform the queries of all-time data, even if it not 
done at every archive interval, will be a problem.


Le lundi 13 juin 2022 à 02:06:02 UTC+2, [email protected] a écrit :

> Thank you very much for the feedback!
>
> *but is the skin generating all the tables (even historical months and 
> yearly table that are static)  at each archive interval ?*
> No, it does not generate everything at each run. The historical month/year 
> pages follow the same generation logic as the NOAA reports (SummaryByMonth 
> and SummaryByYear reports).
>
> I found out that even if I only disable the table on the 
> all-time-stats-page, generation time goes down by ~50%. So I think a 
> possible solution/workaround would be to work with rational stale_ages 
> for the templates. Eg. the all-time-stats-page does not need to be 
> generated at each run, once per day or twice per day should be enough for 
> most cases I guess:
>
> [CheetahGenerator]
>     [[ToDate]]
>         [[[statistics]]]
>             template = statistics.html.tmpl
>             stale_age = 43200 # Twice a day
>
> Guess it simply takes this time? But on the other hand: why is the image 
> generator so fast? The same user, that writes about the 180s, says that the 
> Seasons report takes less than 1 sec for the same data.
> [email protected] schrieb am Sonntag, 12. Juni 2022 um 18:29:48 UTC+2:
>
>> Hi,
>> Your skin is very nice and complete. I like it !
>> I haven't looked yet at all parts of your code in details, but is the 
>> skin generating all the tables (even historical months and yearly table 
>> that are static)  at each archive interval ?
>> If it is the case, one alternative would be to store table data in a text 
>> file, somehow the  same way as the NOAA reports and/or  monthly and yearly 
>> html pages are generated with your skin. In that case, text files 
>> containing past monthly or yearly values that will not change will not be 
>> calculated and generated anymore.
>> Your javascript code could pickup the right text file(s) to populate the 
>> tables.
>> Database queries ( I have no experience with SQLLite but with MySQL) on a 
>> raspberry Pi can take time, and more and more time  as the database is 
>> growing...
>> Le jeudi 9 juin 2022 à 18:00:37 UTC+2, [email protected] a écrit :
>>
>>> Hi guys,
>>>  
>>> The group description says: "…It is a forum for exchanging architectural 
>>> ideas, better ways of doing things...", so here I am to ask how to make 
>>> things better :)
>>>
>>> I recently made my first steps in python and the weewx ecosystem and 
>>> build a skin for weewx, see here 
>>> https://groups.google.com/g/weewx-user/c/sDK4kzO4vBw or here: 
>>> https://github.com/Daveiano/weewx-wdc.
>>>
>>> What's bothering me is the long time it takes to generate the report. On 
>>> my Raspberry Pi 4B with 2GB, it takes about ~50s with data back to March 
>>> 2021. Another user reports from a (suspected older model) Raspberry Pi, 
>>> which takes about ~180s (data from mid 2020).
>>>
>>> My skin includes Data Tables to show observations. If I hide these 
>>> tables the generation time goes down to ~10s on my Pi (reduction of >75%). 
>>> So the main problem seems to be these tables. I implemented them using the 
>>> series tag. The output is then processed and rendered via JS.
>>>
>>> In the cheetah template (I substituded some variables with values for 
>>> better understanding, original code is here: 
>>> https://github.com/Daveiano/weewx-wdc/blob/1.x/skins/weewx-wdc/includes/data-table-tile.inc
>>> ):
>>>
>>> var tableRows = $get_table_rows(['outTemp', 'barometer', 'rain', 
>>> 'humidity', ...], $span($hour_delta=some_hour_delta, 
>>> $week_delta=some_week_delta), 'year');
>>>
>>> Search List extension (
>>> https://github.com/Daveiano/weewx-wdc/blob/1.x/bin/user/table_util.py#L72
>>> ):
>>>
>>>     def get_table_rows(self, obs, period, precision):
>>>         """
>>>         Returns table values for use in carbon data table.
>>>
>>>         Args:
>>>             obs (list): $DisplayOptions.get("table_tile_..")
>>>             period (obj): Period to use, eg. $year, month, $span
>>>             precision (string): Day, week, month, year, alltime
>>>
>>>         Returns:
>>>             list: Carbon data table rows.
>>>         """
>>>         carbon_values = []
>>>
>>>         for observation in obs:
>>>             if getattr(period, observation).has_data:
>>>                 series = getattr(period, observation).series(
>>>                     aggregate_type=some_aggregate_type,
>>>                     aggregate_interval=some_aggregate_interval,
>>>                     time_series='start',
>>>                     time_unit='unix_epoch'
>>>                 ).round(some_rounding)
>>>
>>>                 for start, data in zip(series.start, series.data):
>>>                     cs_time = datetime.fromtimestamp(start.raw)
>>>                     # The current series item by time.
>>>                     cs_item = list(filter(
>>>                         lambda x: (x['time'] == cs_time.isoformat()),
>>>                         carbon_values
>>>                     ))
>>>
>>>                     if len(cs_item) == 0:
>>>                         carbon_values.append({
>>>                             "time": cs_time.isoformat(),
>>>                             observation: data.raw,
>>>                             'id': start.raw
>>>                         })
>>>                     else:
>>>                         cs_item = cs_item[0]
>>>                         cs_item_index = carbon_values.index(cs_item)
>>>                         cs_item[observation] = data.raw if data.raw is 
>>> not None else "-"
>>>                         carbon_values[cs_item_index] = cs_item
>>>
>>>         # Sort per time
>>>         carbon_values.sort(
>>>             key=lambda item: datetime.fromisoformat(item['time'])
>>>         )
>>>
>>>         return carbon_values
>>>
>>> Aggregate intervals are calculated via:
>>>
>>>         if precision == 'day':
>>>         return 900 * 8  # 2 hours
>>>
>>>     if precision == 'week':
>>>         return 900 * 24  # 6 hours
>>>
>>>     if precision == 'month':
>>>         return 900 * 48  # 12 hours
>>>
>>>     if precision == 'year' or precision == 'alltime':
>>>         return 3600 * 24  # 1 day
>>>
>>> You see what I did there: I call the series method on every observation 
>>> and then I am adding the values to an array for use in JS, something like 
>>> this is returned:
>>>
>>> [
>>> {time: '2022-06-08T02:40:00', outTemp: 13.6, id: 1654648800, 
>>> outHumidity: 92.8, barometer: 1010.5, …},
>>> {time: '2022-06-08T04:40:00', outTemp: 13.4, id: 1654656000, 
>>> outHumidity: 88.6, barometer: 1010.5, …},
>>> {...},
>>> ...
>>> ]
>>>
>>> I am guessing these multiple series calls are resulting in a long 
>>> generation time because the DB gets hit multiple times? I am using the 
>>> SQLite DB, would using MySQL make any difference? I'm not a database expert 
>>> but I think it should?
>>>
>>> I would very much appreciate any suggestions to make this more 
>>> performant, perhaps I missed some smart weewx utility helper?
>>>
>>> Just for reference: https://github.com/Daveiano/weewx-wdc/issues/14
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-development/049891e1-742d-4f2f-8a35-8b27bdffe5dan%40googlegroups.com.

[weewx-development] Re: Long report generation time when using series

Reply via email to