Thank you for looking at the raw data and coming up with some solutions to the issue. Whereas I would like to retain as much data as possible, I also know that by doing so it will retain bad data which can affect things down the road. I am ok with taking a loss and removing data from 2016-03-01 to 2019-03-15 if that will fix the inconsistencies in the database. How might I go about doing that?
On Monday, November 3, 2025 at 10:21:54 PM UTC-6 vince wrote: > I'm sorry this will be a long reply. Your db is very odd and has a lot of > very unusual content. > > You have all kinds of varying 'interval' values before 2019-03-15 so I > can't guess what a reasonable amount of rain in an interval period might be > before that date. Typically an interval period is very consistent with > well-known values (see > https://www.weewx.com/docs/5.2/hardware/vantage/?h=interval#vantage_archive_interval > > for details). Yours is all over the map and before that date it's pretty > messy. The last odd intervals appear on 2019-03-15. > > To illustrate, this command looks at anything after 2019-03-01 as an > example and reports the archive period interval and human-friendly > yyyy-mm-dd date to show you what I'm seeing. > > echo "select interval,datetime(dateTime,'unixepoch','localtime') from > archive where interval!=5 and > datetime(dateTime,'unixepoch','localtime')>'2019-03-01;" | sqlite3 > weewx_20251103_1915.sdb | awk '{print $1}' | uniq > > As an example - 2016-07-24 has 38 different archive intervals (wow) > ranging from 2 to 60. That's so odd I don't know what to suggest there. > > So you have a few choices in this very unusual case: > > - 1. delete all data before the dateTime when the db intervals looked > odd, meaning any data from 2016-03-01 to 2019-0315 would be deleted > - 2. or keep it all and just fix up rain and rainRate for anyplace > there's stuff in there that you think needs fixing up > - 3. or totally null out rain and/or rainRate for all records from > 2016-03-01 to 2019-0315 when the intervals were odd > - 4. or just clean up rainRate and see if that's good enough for you > > Given how wildly unusual your db is, I really can't suggest which option > above to do. Sorry. > > That said - assuming you want to do option-2 above and keep all your data > and fix up only the overly high rain or rainRate data, here's how you can > do it... > > # 1. Save a list of dates have high rainRate(s) or high rain for an > archive period - you'll need this later > echo "select datetime(dateTime,'unixepoch','localtime') from archive where > rainRate>10;" | sqlite3 weewx_20251103_1915.sdb | awk '{print $1}' | sort > |uniq > > echo "select datetime(dateTime,'unixepoch','localtime') from archive where > rain>1;" | sqlite3 weewx_20251103_1915.sdb | awk '{print $1}' | sort |uniq > > Note - I picked 10" for rainRate as you asked, and 1" for rain because it > seemed more reasonable based on what is in your db. You might even lower > the rainRate threshold to a lower number. > > # 2. To clear the archive up you basically change "select something from > archive where...." to "update archive set something=....." in your command > update archive set rain=NULL where rain>1 > update archive set rainRate=NULL where rainRate>10 > > Unfortunately I didn't have luck updating both at once, but running two > commands is very quick so I didn't dig in google for more sqlite3 magic > syntax to 'or' (not 'and') the two things together. > > # 3. then to fix up the archive_day_rain and archive_day_rainRate summary > tables you would rebuild-daily for just the dates you saved above... > weectl database rebuild-daily > [[--date=YYYY-mm-dd] | [--from=YYYY-mm-dd] [--to=YYYY-mm-dd]] > [--config=FILENAME] [--binding=BINDING-NAME] > [--dry-run] [-y] > > Given the odd intervals for many of the affected dates, I personally would > manually rebuild-daily for each of the dates with data you need to clean > up. I come up with the following list of days with rainRate > 10 "or" rain > > 1 for whatever archive period you had set then. Given the odd and very > varying archive periods, you might want to look day-by-day at each date > before 2019-03-16 and consider if the totals there are ok for you. I > really don't know what rebuild-daily is going to do with such a varying > interval even within one calendar day. > > (dates with odd intervals) > 2016-07-24 > 2016-08-12 > 2016-08-28 > 2017-09-15 > 2017-09-18 > 2017-09-22 > 2017-09-23 > 2017-09-26 > 2017-09-27 > 2018-03-17 > > (dates with reasonable consistent intervals) > 2019-10-11 > 2020-02-17 > 2020-05-17 > 2020-07-11 > 2021-06-24 > 2021-08-30 > 2022-02-22 > 2022-07-08 > 2023-08-10 > > # 4. you'll want to delete the NOAA files for any months and years with > data you altered. Belchertown has its own NOAA tables directory in > 'addition' to the one weewx typically generates if you have the Standard or > Seasons skin enabled. Be sure to delete those files in both places. > Weewx will regenerate its when it starts up. I typically let it start up > then just copy the Weewx NOAA files into the Belchertown NOAA tree to make > it start up quicker. > > Sorry this was so long - but your db is so unusual I needed to provide too > much info probably.... > > On Monday, November 3, 2025 at 4:59:58 PM UTC-8 S Phillips wrote: > >> So just for clarification, the version that I ran the queries on was the >> backup version prior to setting the bad values to NULL. I reverted back to >> this version based on your comment on 11/2 @ 21:21 CST where you mentioned >> the following: >> >> >> *"rainRate not rainrate in your image FWIW* >> *I would redo your queries off your working copy after you think it >> changed things."* >> >> Since this was a backup version without the NULL values, what would I >> need to run in order to set the incorrect values to NULL where rain > 5 and >> rainRate > 10? >> >> On Monday, November 3, 2025 at 4:22:02 PM UTC-6 vince wrote: >> >>> I mean yes you still have what appears to bad data in it >>> >>> And yes, you will need to null out bad rainRate and possibly bad rain >>> (accumulation in an archive period) fields for those records, and >>> rebuild-daily for those days (there are options to say which dates) or you >>> can rebuild all dates. In my experiencing rebuilding just the dates you >>> altered is much faster, but either way works. >>> >>> And yes everything can be scripted if you are so inclined. >>> >>> On Monday, November 3, 2025 at 1:09:43 PM UTC-8 S Phillips wrote: >>> >>>> "Your DB is messed up", so what do you mean? Is it corrupt or is there >>>> just a lot of bad data? >>>> I am aware of that there is bad data, hence the reasoning of the >>>> original post. >>>> >>>> When you say "fix that, rebuild-daily" are you referring to setting all >>>> the values for "rainRate" over the value of 5 to NULL? >>>> I assume that I would need to manually do a rebuild-daily for each date >>>> individually or can that be scripted with all the dates of the bad values? >>>> >>>> On Monday, November 3, 2025 at 2:36:39 PM UTC-6 vince wrote: >>>> >>>>> So your db is messed up. Fix that, rebuild-daily for the affected >>>>> dates. You should be ok then. >>>>> >>>>> You might need to run the query a few times or specify more than 10 >>>>> days to get all the bad days identified. Perhaps remove the limit 10 to >>>>> see >>>>> if you have a very lot of bad records in there… >>>>> >>>>> On Monday, November 3, 2025 at 12:16:16 PM UTC-8 S Phillips wrote: >>>>> >>>>>> Stopped Weewx >>>>>> sudo systemctl stop weewx >>>>>> >>>>>> Made a copy of the existing DB >>>>>> sudo cp /var/lib/weewx/weewx.sdb weewx_20251103_1338_bad.sdb.bak >>>>>> >>>>>> Copied the old version prior to values being changed to NULL >>>>>> sudo cp /home/<username>/Documents/weewx_20251102_1851.sdb >>>>>> /var/lib/weewx/weewx.sdb >>>>>> >>>>>> Started WeeWX to maintain current data while old data is being >>>>>> modified >>>>>> sudo systemctl start weewx >>>>>> sudo systemctl status weewx >>>>>> >>>>>> I copied the copy of "weewx_20251102_1851.sdb" file back down to my >>>>>> Macbook using Filezilla to get a "fresh start". Then looked at Vince's >>>>>> latest comment at 13:38 CST and ran the SQL queries. >>>>>> >>>>>> The first query results the following: >>>>>> >>>>>> SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, max from >>>>>> archive_day_rainRate where max>2 ORDER BY max DESC LIMIT 10; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *datetime(dateTime,'unixepoch','localtime') dateTime max 2017-09-18 >>>>>> 00:00:00 1505710800 84.6236220472441 2017-09-22 00:00:00 1506056400 >>>>>> 84.6047244094488 2017-09-23 00:00:00 1506142800 84.6047244094488 >>>>>> 2017-09-26 >>>>>> 00:00:00 1506402000 84.6047244094488 2017-09-27 00:00:00 1506488400 >>>>>> 84.6047244094488 2019-10-11 00:00:00 1570770000 82.29 2020-05-17 >>>>>> 00:00:00 >>>>>> 1589691600 82.29 2021-06-24 00:00:00 1624510800 82.29 2022-07-08 >>>>>> 00:00:00 >>>>>> 1657256400 82.29 2023-08-10 00:00:00 1691643600 82.29* >>>>>> >>>>>> Second query: >>>>>> >>>>>> SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, sum from >>>>>> archive_day_rain where sum>2 ORDER BY sum DESC LIMIT 10; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *datetime(dateTime,'unixepoch','localtime') dateTime sum 2016-08-12 >>>>>> 00:00:00 1470978000 14.18 2016-07-24 00:00:00 1469336400 12.37 >>>>>> 2016-10-06 >>>>>> 00:00:00 1475730000 4.96 2018-03-17 00:00:00 1521262800 4.90157480314961 >>>>>> 2016-06-01 00:00:00 1464757200 4.43 2016-08-15 00:00:00 1471237200 4.41 >>>>>> 2016-07-25 00:00:00 1469422800 3.69 2016-08-28 00:00:00 1472360400 3.35 >>>>>> 2016-09-08 00:00:00 1473310800 2.88 2016-07-14 00:00:00 1468472400 2.87* >>>>>> >>>>>> Third query: >>>>>> >>>>>> SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, rainRate >>>>>> from archive where rainRate>2 ORDER BY rainRate DESC LIMIT 10; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *datetime(dateTime,'unixepoch','localtime') dateTime rainRate >>>>>> 2017-09-18 04:50:00 1505728200 84.6236220472441 2017-09-18 07:45:00 >>>>>> 1505738700 84.6236220472441 2017-09-22 05:15:00 1506075300 >>>>>> 84.6047244094488 >>>>>> 2017-09-23 10:20:00 1506180000 84.6047244094488 2017-09-26 17:00:00 >>>>>> 1506463200 84.6047244094488 2017-09-27 04:40:00 1506505200 >>>>>> 84.6047244094488 >>>>>> 2019-10-11 16:55:00 1570830900 82.29 2020-05-17 12:05:00 1589735100 >>>>>> 82.29 >>>>>> 2021-06-24 21:25:00 1624587900 82.29 2022-07-08 08:05:00 1657285500 >>>>>> 82.29* >>>>>> >>>>>> On Monday, November 3, 2025 at 1:38:22 PM UTC-6 vince wrote: >>>>>> >>>>>>> Lets go back to square one. What does the database show ? If >>>>>>> that's still not correct, nothing related to graphs or html output >>>>>>> matters. >>>>>>> >>>>>>> Belchertown is unusual... >>>>>>> >>>>>>> - It also has its 'own' NOAA output directory in 'addition to' >>>>>>> the normal one weewx skins generate, so if you're going to do things >>>>>>> like >>>>>>> clearing out previously generated NOAA files for month(s) or >>>>>>> year(s), make >>>>>>> sure to get them in all locations under /var/www/html or wherever >>>>>>> the web >>>>>>> docroot is set to. >>>>>>> - It does a lot of sqlite queries under the hood to generate its >>>>>>> data that winds up in the html >>>>>>> - those alltime table entries come from db queries in >>>>>>> belchertown.py around line 780 or so if you wanted to see it in the >>>>>>> extension python code >>>>>>> >>>>>>> We need to see db queries of the rain-related archive and summary >>>>>>> tables.... >>>>>>> >>>>>>> # highest 10 summary table days where rainRate > 2 sorted highest to >>>>>>> lowest >>>>>>> echo "SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, >>>>>>> max from archive_day_rainRate where max>2 ORDER BY max DESC LIMIT 10;" >>>>>>> | >>>>>>> sqlite3 mydbname.sdb >>>>>>> >>>>>>> # highest 10 summary table days where rain for the day > 2 sorted >>>>>>> highest to lowest >>>>>>> echo "SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, >>>>>>> sum from archive_day_rain where sum>2 ORDER BY sum DESC LIMIT 10;" | >>>>>>> sqlite3 mydbname.sdb >>>>>>> >>>>>>> # highest 10 archive table records where rainRate > 2 sorted highest >>>>>>> to lowest >>>>>>> echo "SELECT datetime(dateTime,'unixepoch','localtime'), dateTime, >>>>>>> rainRate from archive where rainRate>2 ORDER BY rainRate DESC LIMIT >>>>>>> 10;" | >>>>>>> sqlite3 mydbname.sdb >>>>>>> >>>>>>> For the original poster.... >>>>>>> >>>>>>> - be sure to work off a 'copy' of your current database, >>>>>>> just-in-case.... >>>>>>> - please use the commandline on your pi for this - just >>>>>>> substitute in the filename of your temporary copy of the db >>>>>>> - if you don't have the sqlite3 utility on your pi, you can >>>>>>> install it via "sudo apt install sqlite3" >>>>>>> - I used '2' above which is a good number for my location since >>>>>>> we don't get much/heavy rain. Feel free to use whatever works for >>>>>>> you >>>>>>> there. >>>>>>> >>>>>>> The offer still stands for me to verify your db is ok if you can >>>>>>> make your db available someplace for download.... >>>>>>> >>>>>>> On Monday, November 3, 2025 at 10:22:45 AM UTC-8 Jeff A. D. wrote: >>>>>>> >>>>>>>> All affected reports, including NOAA Climatological Summaries and >>>>>>>> such, will also need to be deleted and rebuilt, as Tom says. Also >>>>>>>> note >>>>>>>> that if all you did was NULL the rain data for each archive period >>>>>>>> that >>>>>>>> showed rain, and not for the entire period (day, month, etc.) that had >>>>>>>> the >>>>>>>> bad data, your reports will still show 0 (instead of N/A) for the day. >>>>>>>> >>>>>>>> On Monday, November 3, 2025 at 7:44:13 AM UTC-7 Tom Keffer wrote: >>>>>>>> >>>>>>>>> Plot images are renewed only as often as their aggregation >>>>>>>>> interval. You may just be looking at your old data. Delete all the >>>>>>>>> old >>>>>>>>> images and try again. >>>>>>>>> >>>>>>>>> On Sun, Nov 2, 2025 at 6:47 PM S Phillips <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> After reviewing the comments to the orginal post and my follow-up >>>>>>>>>> comment, I performed the following tasks: >>>>>>>>>> >>>>>>>>>> I made a copy of the current DB to my home directory with: >>>>>>>>>> *sudo cp /var/lib/weewx/weewx.sdb >>>>>>>>>> /home/<username>/Documents/weewx_20251028_1731.sdb* >>>>>>>>>> >>>>>>>>>> I then copied the file to my Macbook via SFTP using Filezilla. I >>>>>>>>>> opened the .sdb file in DB Browser for SQLite, then ran the >>>>>>>>>> following >>>>>>>>>> command: >>>>>>>>>> *UPDATE archive SET rainRate=NULL and rain=NULL WHERE (rainRate > >>>>>>>>>> 5);* >>>>>>>>>> >>>>>>>>>> It returned the following: >>>>>>>>>> >>>>>>>>>> *Execution finished without errors.* >>>>>>>>>> *Result: query executed successfully. Took 63ms, 83 rows affected* >>>>>>>>>> *At line 1:* *UPDATE archive SET rainRate=NULL and rain=NULL >>>>>>>>>> WHERE (rainRate > 5);* >>>>>>>>>> >>>>>>>>>> After that was finished I performed a "Write Changes" from the DB >>>>>>>>>> Browser for SQLite and saved the file with the new timestamp name. >>>>>>>>>> Next I >>>>>>>>>> copied the file back to my home directory on the WeeWX VM via SFTP >>>>>>>>>> in >>>>>>>>>> FileZilla. I then stopped the DB using: >>>>>>>>>> *sudo systemctl stop weewx* >>>>>>>>>> >>>>>>>>>> Then I copied the latest sdb from the /var/lib directory as a >>>>>>>>>> backup. >>>>>>>>>> *sudo cp /var/lib/weewx/weewx.sdb weewx_20251102_1907.sdb.bak* >>>>>>>>>> >>>>>>>>>> Once that was done, I copied the edited sbd back to the /var/lib >>>>>>>>>> directory using the following: >>>>>>>>>> *sudo cp /home/<username>/Documents/weewx_20251102_1851.sdb >>>>>>>>>> /var/lib/weewx/weewx.sd <http://weewx.sd>**b* >>>>>>>>>> >>>>>>>>>> I then dropped the daily and rebuilt it using the following: >>>>>>>>>> >>>>>>>>>> *sudo weectl database drop-daily* *sudo weectl database >>>>>>>>>> rebuild-daily* >>>>>>>>>> >>>>>>>>>> After that was complete, I started WeeWX back up using >>>>>>>>>> *sudo systemctl start weewx* >>>>>>>>>> >>>>>>>>>> After it did an upload to the webserver, I checked the records >>>>>>>>>> page and the bad values are still listed. When I look for any >>>>>>>>>> rainRate >>>>>>>>>> values over 4.9, it returns one result. Thoughts? >>>>>>>>>> >>>>>>>>>> [image: Screenshot 2025-11-02 at 20.40.40.png] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [image: Bad Records.png] >>>>>>>>>> >>>>>>>>>> On Friday, October 31, 2025 at 11:03:09 AM UTC-5 vince wrote: >>>>>>>>>> >>>>>>>>>>> I can see either answer in this case. Agree with Mark about >>>>>>>>>>> NULL vs. zero. Tom's words in the wiki recommend NULL (link >>>>>>>>>>> <https://github.com/weewx/weewx/wiki/Cleaning-up-old-bad-data>). >>>>>>>>>>> >>>>>>>>>>> On Friday, October 31, 2025 at 3:31:16 AM UTC-7 Mark Jenks wrote: >>>>>>>>>>> >>>>>>>>>>>> That is exactly what I've done in the past. Just find the bad >>>>>>>>>>>> data and NULL it out. NULL says no data, 0 says no rain. >>>>>>>>>>>> There is a >>>>>>>>>>>> difference. >>>>>>>>>>>> >>>>>>>>>>>> There is no good reason to edit it to try and figure out what >>>>>>>>>>>> it was, unless there was some huge event that you failed to >>>>>>>>>>>> capture >>>>>>>>>>>> accurately. >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, October 30, 2025 at 12:54:31 PM UTC-5 Jeff A. D. >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> For the sake of accuracy, I think the quickest and easiest way >>>>>>>>>>>>> would be to just go through and select all the dates with >>>>>>>>>>>>> questionable data >>>>>>>>>>>>> in the database and set all the rain and rain rate data to null, >>>>>>>>>>>>> rather >>>>>>>>>>>>> than zero, and then rebuild dailies. That should tell you you >>>>>>>>>>>>> have no data >>>>>>>>>>>>> for those times, rather than indicating no rain. (It should show >>>>>>>>>>>>> "N/A", >>>>>>>>>>>>> rather than 0, for those dates on the Climatological Summary.) >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, October 30, 2025 at 9:44:57 AM UTC-6 vince wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I would just identify which 5-minute archive periods have bad >>>>>>>>>>>>>> data, then zero out the rain and rainRate fields out for those >>>>>>>>>>>>>> 5-minute >>>>>>>>>>>>>> period records. That would be close enough for me. You seem to >>>>>>>>>>>>>> have >>>>>>>>>>>>>> something far more complicated in mind, so best of luck. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thursday, October 30, 2025 at 4:18:31 AM UTC-7 S Phillips >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the data which I need to focus on is the rain data that >>>>>>>>>>>>>>> is held in the archive table and once I can determine the bad >>>>>>>>>>>>>>> values I can >>>>>>>>>>>>>>> then rebuilt the daily which should correct the issue. Since I >>>>>>>>>>>>>>> live so >>>>>>>>>>>>>>> close to where the official readings are kept (~1.5 miles) I >>>>>>>>>>>>>>> can use that >>>>>>>>>>>>>>> data as a reference. I know that there will be variation but >>>>>>>>>>>>>>> extremes >>>>>>>>>>>>>>> differences should be easy to spot. For example, here is July >>>>>>>>>>>>>>> 2016 from >>>>>>>>>>>>>>> NOAA and my PWS where you can see the extreme variations. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [image: Combined 2016-07 copy.png] >>>>>>>>>>>>>>> On Wednesday, October 29, 2025 at 7:52:27 PM UTC-5 vince >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Forgot to answer your question - if you rebuilt-daily then >>>>>>>>>>>>>>>> your bad data is in the archive table (which is used to >>>>>>>>>>>>>>>> generate the >>>>>>>>>>>>>>>> summary table) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Expecting your local rain total in an extreme event to >>>>>>>>>>>>>>>> match anybody else is a bad idea. Microclimates can have >>>>>>>>>>>>>>>> different answers >>>>>>>>>>>>>>>> across the street from the other station, let alone from one >>>>>>>>>>>>>>>> miles away. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You certainly can fix up the rainRate item in your archive >>>>>>>>>>>>>>>> table, or at least zero it out, but I would suspect your rain >>>>>>>>>>>>>>>> field (rain >>>>>>>>>>>>>>>> in that usually 5 minute period) likely needs similar cleanup. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "weewx-user" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> >>>>>>>>> To view this discussion visit >>>>>>>>>> https://groups.google.com/d/msgid/weewx-user/3f27c2db-4d1b-4fe3-86b1-f34c9420fe20n%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/weewx-user/3f27c2db-4d1b-4fe3-86b1-f34c9420fe20n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- You received this message because you are subscribed to the Google Groups "weewx-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/weewx-user/fc678ac2-540c-4d0f-a5e2-2a58bc54467dn%40googlegroups.com.
