Re: [weewx-development] Long report generation time when using series

David Bätge Thu, 16 Jun 2022 16:19:41 -0700

@Tom I changed the behaviour according to your list and the generation time 
has decreased by 50%.


I added the boundary option to all $span calls. I assume the $week, $month 
and $year tags automatically use the daily summaries if the aggregation 
time is multiple of a day since there is no boundary option for them?

Using only one query for both, tables and diagrams would give another boost 
of ~15%. I will keep that in mind for later, this will need a refactoring 
in the Javascript part, which I do not have the time, currently. 
David Bätge schrieb am Mittwoch, 15. Juni 2022 um 03:02:45 UTC+2:

> *I tested your skin, and I observed that even with tables disabled, the 
> "statistics.htlm" page take 2 minutes to generate, on a Raspberry PI 3B+, 
> with a MySQL database containing data from 2005 up to today.*
> Thank you for taking the time and sharing this info!
>
> *As far as I know, the Season skin is limiting the history to the current 
> year, with an aggregation period of 1 day for all parameters.*
> Yes it seems so, I was not aware of that.
>
> *[...] Why not first choosing a given aggregation time, doing one query 
> for each parameter, and using it for both charts and table?*
> This is a really good idea I also got in the meantime. These are many 
> "duplicate" queries, which could be merged into one query. It collides with 
> another idea to make the aggregation configurable by the user - individual 
> for the graphs and the tables. I will see which direction I will take with 
> that, thanks for the good suggestions!
>
> *[...] but if they involve all time, then you really want to make sure 
> [...]*
> Well, I didn't know that! My all-time queries do only meet the first 
> condition. I will update this and report back! Many thanks! 
> Tom Keffer schrieb am Mittwoch, 15. Juni 2022 um 01:09:34 UTC+2:
>
>> I'm not exactly sure what queries you are doing, but if they involve all 
>> time, then you really want to make sure that you don't have to search the 
>> whole database and, instead, use the daily summaries.  To do this:
>>
>>    1. The aggregation time must be a multiple of a day; and
>>    2. The start time must be either on a midnight boundary or the first 
>>    timestamp in the database; and
>>    3. The end time must either be on a midnight boundary, or the last 
>>    timestamp in the database.
>>
>> If any one of these is not true, then the main archive table must be 
>> used, and the query time will be much longer. 
>>
>> Also, in general, SQLite is much faster than MySQL for these kinds of 
>> queries. See the Wiki article *SQLite vs MySQL 
>> <https://github.com/weewx/weewx/wiki/SQLite-vs-MySQL>*.
>>
>> On Thu, Jun 9, 2022 at 9:00 AM David Bätge <[email protected]> wrote:
>>
>>> Hi guys,
>>>  
>>> The group description says: "…It is a forum for exchanging architectural 
>>> ideas, better ways of doing things...", so here I am to ask how to make 
>>> things better :)
>>>
>>> I recently made my first steps in python and the weewx ecosystem and 
>>> build a skin for weewx, see here 
>>> https://groups.google.com/g/weewx-user/c/sDK4kzO4vBw or here: 
>>> https://github.com/Daveiano/weewx-wdc.
>>>
>>> What's bothering me is the long time it takes to generate the report. On 
>>> my Raspberry Pi 4B with 2GB, it takes about ~50s with data back to March 
>>> 2021. Another user reports from a (suspected older model) Raspberry Pi, 
>>> which takes about ~180s (data from mid 2020).
>>>
>>> My skin includes Data Tables to show observations. If I hide these 
>>> tables the generation time goes down to ~10s on my Pi (reduction of >75%). 
>>> So the main problem seems to be these tables. I implemented them using the 
>>> series tag. The output is then processed and rendered via JS.
>>>
>>> In the cheetah template (I substituded some variables with values for 
>>> better understanding, original code is here: 
>>> https://github.com/Daveiano/weewx-wdc/blob/1.x/skins/weewx-wdc/includes/data-table-tile.inc
>>> ):
>>>
>>> var tableRows = $get_table_rows(['outTemp', 'barometer', 'rain', 
>>> 'humidity', ...], $span($hour_delta=some_hour_delta, 
>>> $week_delta=some_week_delta), 'year');
>>>
>>> Search List extension (
>>> https://github.com/Daveiano/weewx-wdc/blob/1.x/bin/user/table_util.py#L72
>>> ):
>>>
>>>     def get_table_rows(self, obs, period, precision):
>>>         """
>>>         Returns table values for use in carbon data table.
>>>
>>>         Args:
>>>             obs (list): $DisplayOptions.get("table_tile_..")
>>>             period (obj): Period to use, eg. $year, month, $span
>>>             precision (string): Day, week, month, year, alltime
>>>
>>>         Returns:
>>>             list: Carbon data table rows.
>>>         """
>>>         carbon_values = []
>>>
>>>         for observation in obs:
>>>             if getattr(period, observation).has_data:
>>>                 series = getattr(period, observation).series(
>>>                     aggregate_type=some_aggregate_type,
>>>                     aggregate_interval=some_aggregate_interval,
>>>                     time_series='start',
>>>                     time_unit='unix_epoch'
>>>                 ).round(some_rounding)
>>>
>>>                 for start, data in zip(series.start, series.data):
>>>                     cs_time = datetime.fromtimestamp(start.raw)
>>>                     # The current series item by time.
>>>                     cs_item = list(filter(
>>>                         lambda x: (x['time'] == cs_time.isoformat()),
>>>                         carbon_values
>>>                     ))
>>>
>>>                     if len(cs_item) == 0:
>>>                         carbon_values.append({
>>>                             "time": cs_time.isoformat(),
>>>                             observation: data.raw,
>>>                             'id': start.raw
>>>                         })
>>>                     else:
>>>                         cs_item = cs_item[0]
>>>                         cs_item_index = carbon_values.index(cs_item)
>>>                         cs_item[observation] = data.raw if data.raw is 
>>> not None else "-"
>>>                         carbon_values[cs_item_index] = cs_item
>>>
>>>         # Sort per time
>>>         carbon_values.sort(
>>>             key=lambda item: datetime.fromisoformat(item['time'])
>>>         )
>>>
>>>         return carbon_values
>>>
>>> Aggregate intervals are calculated via:
>>>
>>>         if precision == 'day':
>>>         return 900 * 8  # 2 hours
>>>
>>>     if precision == 'week':
>>>         return 900 * 24  # 6 hours
>>>
>>>     if precision == 'month':
>>>         return 900 * 48  # 12 hours
>>>
>>>     if precision == 'year' or precision == 'alltime':
>>>         return 3600 * 24  # 1 day
>>>
>>> You see what I did there: I call the series method on every observation 
>>> and then I am adding the values to an array for use in JS, something like 
>>> this is returned:
>>>
>>> [
>>> {time: '2022-06-08T02:40:00', outTemp: 13.6, id: 1654648800, 
>>> outHumidity: 92.8, barometer: 1010.5, …},
>>> {time: '2022-06-08T04:40:00', outTemp: 13.4, id: 1654656000, 
>>> outHumidity: 88.6, barometer: 1010.5, …},
>>> {...},
>>> ...
>>> ]
>>>
>>> I am guessing these multiple series calls are resulting in a long 
>>> generation time because the DB gets hit multiple times? I am using the 
>>> SQLite DB, would using MySQL make any difference? I'm not a database expert 
>>> but I think it should?
>>>
>>> I would very much appreciate any suggestions to make this more 
>>> performant, perhaps I missed some smart weewx utility helper?
>>>
>>> Just for reference: https://github.com/Daveiano/weewx-wdc/issues/14
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "weewx-development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/weewx-development/8296590d-ba9c-453d-a22d-71a8d7baad59n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/weewx-development/8296590d-ba9c-453d-a22d-71a8d7baad59n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-development/b4e121da-3231-4b6d-b61d-565690e73b39n%40googlegroups.com.

Re: [weewx-development] Long report generation time when using series

Reply via email to