[weewx-user] Re: SQLite schema report

vince Sun, 28 Feb 2021 15:00:45 -0800

If you want to talk about reengineering things under the hood, that might 
be better discussed in the weewx-development group, and I'm sure Tom and 
the usual others will chime in a lot there item-by-item.


But yes, the operative word there is 'summary' in the table names.   Weewx 
keeps the summary tables up to date for (I'm guessing) performance reasons. 
  There might be a little ancient history as well based on the original 
wview-compatibility goals regarding the naming conventions and the like.

My over-simplified understanding of the various summary tables is they are 
running totals of the measurements for the day and how many measurements 
they are, with the min/max and time of min/max recorded, as well as a 
weighted sum.   Things that are directional in nature are a bit more 
complicated I guess, but I never worried it basically so I don't know there.

One resource you might look at is http://weewx.com/docs/customizing.htm 
which has some architectural things in it.

On Sunday, February 28, 2021 at 2:47:42 PM UTC-8 [email protected] 
wrote:

> Side note:
> re-reading Vince's post, i had the process backwards - detailed event 
> records in Archive table; summary/special info in separate tables... 
>
> On Sunday, February 28, 2021 at 2:44:18 PM UTC-8 David Prellwitz wrote:
>
>> Thanks for the quick reply!  I apologize for any negative connotations 
>> (not intended), i guess my perceptions and assumptions are off-base! it's 
>> just in all my professional DB roles (25yrs +) i've had to use visual 
>> layout tools to understand existing data creation, utilization, ranges, 
>> applications and operational impacts. These visual tools allowed us to use 
>> as much third-normal form reduction steps as possible to simplify data 
>> architecture to the lowest possible level. My misjudgement of WeeWx layout 
>> and structure is just that, my misjudgement. 
>>
>> Perhaps i am over-thinking things as i try to get basic things done. 
>> Habits of a long life in IT Management/DB Architecture/Programming. I just 
>> think i may be some ideas and observations that may improve WeeWx but, I 
>> can drop these distractions if it's an issue. 
>>
>> Now, the issue I had was the lack of understanding the role of the 
>> additional tables - they didn't seem to have any relationship to the 
>> archive table; other than Epoch. If i'm reading your statement correctly, 
>> all the other tables are the detail records for each of the elements in the 
>> archive table - just related by element type ("Name"?) and epoch timestamp? 
>> So if i went looking for "Wind Gust Direction" instances, I'd look in the 
>> Archive table for "WindGustDir" and find an entry (which represents what, a 
>> summary, an average, ?),  and one or more "Archive_day_windGustDir" 
>> entries, depending on the frequency of that element's recording? (Assuming 
>> archived wind-gust data is recorded for one-minute intervals, but 
>> "Archive_day_windGustDir" are recorded at 3second intervals for 
>> Wind-Gust-Dir table.)  Knowledge at that level of detail should be known 
>> somewhere, right? Perhaps somewhere in GitHub?
>>
>> How is this useful? It appears to me (IMHO) that a good understanding of 
>> the data flow and mapping could help reduce event process time; reduce data 
>> storage needs; increase functionality; improve simplicity and make it a 
>> better product. Looking at the architecture of WeeWx, the process diagram 
>> shows a single thread for each event (i.e., UDP packet arrival); followed 
>> by event processing; followed by data-recording; followed by report 
>> generation. All of these appear to be serial. If the front end was to be 
>> multi-threaded to allow multiple device/station packet reception and placed 
>> in a triggered repository that the rest of WeeWx processed, wouldn't WeeWx 
>> be able to process multiple stations data without having multiple instances 
>> of WeeWx (and multiple SQL db's). A quick, multi-threaded front-end could 
>> be prioritized to allow for sub-second processing. Leaving more "leisured" 
>> approach for the rest of Weewx to process and create reports. 
>>
>> I'm a bit rusty in data-mapping (did it from 1975 through 1998) so i'm 
>> sure any post-grad would be sharper at this then I am.  Perhaps having some 
>> Data Science Professor create a post-doc project that would help (assuming 
>> i'm not too far off base here)!  Being retired, i now have time on my hands 
>> to assist if anyone thinks that would be useful.
>>
>> On Saturday, February 27, 2021 at 5:43:09 PM UTC-8 vince wrote:
>>
>>> I'd suggest your commentary is more than a little bit unfair and 
>>> inaccurate, and I'll leave it at that.
>>>
>>> Weewx by default uses an underlying sqlite3 database and puts its 
>>> readings into an 'archive' table that has a large number of fields for the 
>>> actual weather measurements.
>>>
>>> CREATE TABLE archive (`dateTime` INTEGER NOT NULL UNIQUE PRIMARY KEY, 
>>> `usUnits` INTEGER NOT NULL, `interval` INTEGER NOT NULL, `altimeter` REAL, 
>>> `appTemp` REAL, `appTemp1` REAL, `barometer` REAL, `batteryStatus1` REAL, 
>>> `batteryStatus2` REAL, `batteryStatus3` REAL, `batteryStatus4` REAL, 
>>> `batteryStatus5` REAL, `batteryStatus6` REAL, `batteryStatus7` REAL, 
>>> `batteryStatus8` REAL, `cloudbase` REAL, `co` REAL, `co2` REAL, 
>>> `consBatteryVoltage` REAL, `dewpoint` REAL, `dewpoint1` REAL, `ET` REAL, 
>>> `extraHumid1` REAL, `extraHumid2` REAL, `extraHumid3` REAL, `extraHumid4` 
>>> REAL, `extraHumid5` REAL, `extraHumid6` REAL, `extraHumid7` REAL, 
>>> `extraHumid8` REAL, `extraTemp1` REAL, `extraTemp2` REAL, `extraTemp3` 
>>> REAL, `extraTemp4` REAL, `extraTemp5` REAL, `extraTemp6` REAL, `extraTemp7` 
>>> REAL, `extraTemp8` REAL, `forecast` REAL, `hail` REAL, `hailBatteryStatus` 
>>> REAL, `hailRate` REAL, `heatindex` REAL, `heatindex1` REAL, `heatingTemp` 
>>> REAL, `heatingVoltage` REAL, `humidex` REAL, `humidex1` REAL, `inDewpoint` 
>>> REAL, `inHumidity` REAL, `inTemp` REAL, `inTempBatteryStatus` REAL, 
>>> `leafTemp1` REAL, `leafTemp2` REAL, `leafWet1` REAL, `leafWet2` REAL, 
>>> `lightning_distance` REAL, `lightning_disturber_count` REAL, 
>>> `lightning_energy` REAL, `lightning_noise_count` REAL, 
>>> `lightning_strike_count` REAL, `luminosity` REAL, `maxSolarRad` REAL, `nh3` 
>>> REAL, `no2` REAL, `noise` REAL, `o3` REAL, `outHumidity` REAL, `outTemp` 
>>> REAL, `outTempBatteryStatus` REAL, `pb` REAL, `pm10_0` REAL, `pm1_0` REAL, 
>>> `pm2_5` REAL, `pressure` REAL, `radiation` REAL, `rain` REAL, 
>>> `rainBatteryStatus` REAL, `rainRate` REAL, `referenceVoltage` REAL, 
>>> `rxCheckPercent` REAL, `signal1` REAL, `signal2` REAL, `signal3` REAL, 
>>> `signal4` REAL, `signal5` REAL, `signal6` REAL, `signal7` REAL, `signal8` 
>>> REAL, `snow` REAL, `snowBatteryStatus` REAL, `snowDepth` REAL, 
>>> `snowMoisture` REAL, `snowRate` REAL, `so2` REAL, `soilMoist1` REAL, 
>>> `soilMoist2` REAL, `soilMoist3` REAL, `soilMoist4` REAL, `soilTemp1` REAL, 
>>> `soilTemp2` REAL, `soilTemp3` REAL, `soilTemp4` REAL, `supplyVoltage` REAL, 
>>> `txBatteryStatus` REAL, `UV` REAL, `uvBatteryStatus` REAL, 
>>> `windBatteryStatus` REAL, `windchill` REAL, `windDir` REAL, `windGust` 
>>> REAL, `windGustDir` REAL, `windrun` REAL, `windSpeed` REAL);
>>>
>>> If you look at the beginning of the line, you'll see the unique key is 
>>> the dateTime (seconds since the unix epoch).
>>>
>>> There are also a number of additional sqlite3 tables created, one per 
>>> item above, with the dateTime for the record, the max/min for that 
>>> measurement, when the max/min occurred, and also for things that are 
>>> accumulated a running total.
>>>
>>> CREATE TABLE archive_day_inTemp (dateTime INTEGER NOT NULL UNIQUE 
>>> PRIMARY KEY, min REAL, mintime INTEGER, max REAL, maxtime INTEGER, sum 
>>> REAL, count INTEGER, wsum REAL, sumtime INTEGER);
>>>
>>> These tables are not connected in any way at the database level, they're 
>>> individual tables.  They're also using the dateTime of the record as the 
>>> primary key for the records in the table.
>>>
>>> Weewx handles the heavy lifting for keeping the summary tables up to 
>>> date from the data in the archive table.   When you see people who ask how 
>>> to clear bad data from their (archive table in the) database, they are 
>>> generally told to 'drop daily' (which deletes the summary tables) and 
>>> 'rebuild daily' (which regenerates the summary tables from the 
>>> hand-modified archive table).
>>>
>>> That help any ? 
>>>
>>> I guess I'm kinda speechless re: using a drawing tool to visualize a 
>>> database nor really what you're trying to do.   Perhaps the tool you want 
>>> to use is google and going through the sqlite3 documentation online to 
>>> better understand how to reverse engineer and examine a sqlite3 database, 
>>> but I guess that's a bit harsh.  I'll assume that you're overthinking 
>>> things a bit and looking for complexity where it doesn't exist.
>>>
>>>
>>>    - archive table has all the 'meat'
>>>    - summary archive_whatever tables have the pre-computed summaries of 
>>>    max/min/sums of each element for later use
>>>    
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/5b409fc8-0ad6-47d6-8f30-a7e16e62bc2dn%40googlegroups.com.

[weewx-user] Re: SQLite schema report

Reply via email to