Re: [weewx-user] 'wee_database --calc-missing' long to execute

William Garber Mon, 20 Feb 2023 06:51:38 -0800

This is a year of data.  It works much better if you process it in small 
chunks.  *I think this is a bug.  *
I changed the schema so it was necessary to add --rebuild-daily first.
The following script behaves radically differently depending on the step 
size (+4 is good).
If the step size is too large (too many days at once) it gets stuck at 6000 
records predictably.
Small chunks are effective but you have to press "y" to continue every 4 
days.
<pre><code>
#!/bin/bash
rm end-date.log
# next line helps a lot
wee_database --rebuild-daily
for (( past=-392; past <= 0; past=past+4 ))
# for (( past=-301; past <= 0; past=++ ))
do
    past1=$(( past ))
    past2=$(( past + 4 ))
    DATE1=$(date -d "today $past1 days" +%F)
    DATE2=$(date -d "today $past2 days" +%F)
    echo
    echo "processing DATE=$DATE1"
    # wee_database --calc-missing --date="$DATE1"
    wee_database --calc-missing --from="$DATE1" --to="$DATE2"
    res=$?
    echo "res=$res"
    [[ $res -eq 0 ]] || exit 1
    echo $DATE >> end-date.log
done
# the final one runs MUCH FASTER and fixes any glitches
wee_database --calc-missing 
# eee eof
</code></pre>


On Monday, February 20, 2023 at 4:46:29 AM UTC-8 William Garber wrote:

> I am having the same problem.  I have about 130,000 different datetimes 
> (records) in weewx.sdb.  I also tried moving it to a ramdisk.  Still 
> extremely slow.  Any help please?  Should I run calc-missing on individual 
> timeslices like one week periods?  The data covers one year of measurements.
>
> On Monday, March 14, 2022 at 7:50:36 PM UTC-7 graha...@gmail.com wrote:
>
>> i reduced the --calc-missing time from 7 *hours* to 7 *minutes* by such 
>> a simple trick that i kick myself for not seeing it earlier - i moved the 
>> database to ramdisk and symlinked it under archive, ran wee_database, then 
>> moved the database back. chalk this one under 'handy tips'
>>
>> On Thursday, 10 March 2022 at 11:09:25 am UTC+11 graha...@gmail.com 
>> wrote:
>>
>>> for gw1000 dataset, i had been using top and iotop previously to 
>>> ascertain that cpu, memory and i/o usage were extremely low, and ps-efl 
>>> showed it was spending its time waiting on interrupt. i would usually just 
>>> conclude it was slow disk and spending all its time waiting on i/o 
>>> completion *except* this is only for the smaller gw1000 dataset not the 
>>> larger vp2 dataset. it is something to do with the different nature of the 
>>> data (perhaps something as simple as different missing data being calc’ed)
>>>
>>> to give an idea of the magnitude of the difference, using built-in shell 
>>> time to run command:
>>>
>>>
>>> *recs* *real /sec* *user /sec* *sys /sec* *Idle /%* 
>>> vp2 --rebuild-daily 505,336 165 148 2 9 
>>> vp2 --calc-missing 505,336 571 525 18 5 
>>> gw1000 --rebuild-daily 162,882 86 81 1 5 
>>> gw1000 --calc-missing 162,882 23,758 301 13 99
>>>
>> [snip] 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to weewx-user+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/8d420428-a941-4220-ba9e-5b73456b5539n%40googlegroups.com.

Re: [weewx-user] 'wee_database --calc-missing' long to execute

Reply via email to