Re: [weewx-user] 'wee_database --calc-missing' long to execute

Graham Eddy Wed, 09 Mar 2022 16:09:27 -0800

for gw1000 dataset, i had been using top and iotop previously to ascertain that 
cpu, memory and i/o usage were extremely low, and ps-efl showed it was spending 
its time waiting on interrupt. i would usually just conclude it was slow disk 
and spending all its time waiting on i/o completion *except* this is only for 
the smaller gw1000 dataset not the larger vp2 dataset. it is something to do 
with the different nature of the data (perhaps something as simple as different 
missing data being calc’ed)

to give an idea of the magnitude of the difference, using built-in shell time 
to run command:

recs    real /sec       user /sec       sys /sec        Idle /%
vp2     --rebuild-daily 505,336 165     148     2       9
vp2     --calc-missing  505,336 571     525     18      5
gw1000  --rebuild-daily 162,882 86      81      1       5
gw1000  --calc-missing  162,882 23,758  301     13      99

as it stands right now, for migration of production to split environment, i 
will have to
  * take a database snapshot and build the equivalent temp gw1000.sdb before 
migrating
  * do —calc-missing offline on the temp gw1000.sdb (7 hours !!)
  * dump the were-missing values in temp gw1000.sdb into a file
  * when dumped data avail, stop production system, split the databases, load 
the dumped were-missing values into gw1000.sdb
  * run —calc-missing on the interval only between dump and now ← hopefully not 
long, gw1000 data being lost!
  * start new production system on split databases

does anyone have insight into the origin of the wait-for-interrupt plaguing my 
gw1000 dataset migration? perhaps some wxxtypes in do_calculations() have a 
realtime delay built in? perhaps the yield in genBatchRecords() is not context 
switching to another thread effectively (internal python issue)? has anyone 
seen such behaviour elsewhere?

cheers
⊣ Graham Eddy ⊢

> On 9 Mar 2022, at 4:16 pm, vince <vinceska...@gmail.com> wrote:
> 
> Well I'd still try either splitting it up into pieces, or running it and 
> measuring its resource usage in another shell.   If it's not out of ram and 
> you're not pegging the CPU than waiting for i/o is the only thing left I'd 
> guess.  Run htop or the like to see.  
> 
> On Tuesday, March 8, 2022 at 7:03:37 PM UTC-8 graha...@gmail.com wrote:
> it’s an 8MB RPi 4B and RAM is abundant, though using μSD card for filesys.
> the interesting thing is that there are two datasets, one large and one 
> small, and the large one is quick but the small one is orders of magnitude 
> slower. the larger dataset (vp2-originated) gobbles >95% cpu but the smaller 
> dataset (gw1000-originated) <0.3%. it must be something to do with the nature 
> of the data, not the quantity.
> ⊣ Graham Eddy ⊢

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to weewx-user+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/28D3A6B3-A18E-4EDB-AC17-ABD2DAE1F29F%40gmail.com.

Re: [weewx-user] 'wee_database --calc-missing' long to execute

Reply via email to