Re: [weewx-user] ;:; SQLite schema report

David Prellwitz Tue, 02 Mar 2021 21:15:01 -0800

Thank you both for the replies! I've dived into the end of this pool 
without comprehending its depth. 
This all started after spending more than a few frustrating days trying to 
find the documents for, or figure out a proper set of, field maps to cover 
three apparent subsets of WeatherFlow data into the Weewx (UDP capture from 
station; Lightning data import; API get from WF) plus slightly different 
field names from some of those CSV headers.  As i looked over each of the 
CSV header entries, each header seemed to have several anomalies (names, 
abbreviations). Then i couldn't find the unit values for each as compared 
to what Weewx was looking for.  I started looking for a definitive source 
for field data-type/unit default values. Then i started looking for what 
each of the field elements actually represented, eg - link wind elements 
(wind gusts, wind lulls, etc), lightning elements (energy, avg distance), 
precipitation elements, and other elements.   I started to map all the 
published fields from WF, Weewx, GitHub, Google Groups, etc to see if i 
could document each use to a valid Weewx field name. So, i found several 
lightning elements mapped to fields with "XXX", "YYY", signal7, signal8, so 
on and so forth. I started looking for a Tempest station data mapping info 
similar to other station data (Acurite, CC3000, etc) found in the Hardware 
section, and tried to find measurement unit values (i.e., m/s vice 
meters-per-second; etc) and defaults for each field element. Not found. So 
i thought i'd try to map all of WF's elements, with measurement values, 
with weewx fields - as many as i could find. But, to do that, i need to 
know what each element represents - on both sides.


So that's were i sit - trying to use the tools i know to document WF/Weewx 
data fields. Can you provide some useful links and/or point me in the right 
direction to gather such info? I'll try to gather what i can and get it 
back to you if it doesn't exist,
thanks!
On Sunday, February 28, 2021 at 5:31:21 PM UTC-8 [email protected] wrote:

> Yes, the other tables are "daily summaries." They are strictly an 
> optimization. There's nothing in them that cannot be extracted from the 
> main archive table and, indeed, they can be rebuilt from the latter using 
> the tool wee_database.
>
> WeeWX is multi-threaded, but not in the way you describe. Yes, there is a 
> single "main" thread that blocks, waiting for data, then process it, then 
> stores it in the database. A separate thread handles reporting. Still other 
> threads upload through a RESTful facility.
>
> On the surface, multi-threading data acquisition (or, alternatively, using 
> an async interface) sounds attractive. However, if the goal is to acquire 
> data simultaneously from several sources, that can as easily be 
> accomplished by launching multiple instances of WeeWX. If the goal is to 
> intertwine the various sources and have them all stored in the same table 
> in the same database, that is far more complicated. The simple architecture 
> of WeeWX would be lost. 
>
> However, if you have some concrete, specific, ideas on how to service more 
> than one device simultaneously, then I'm all ears. I would really love to 
> take advantage of the new "async" interfaces in modern versions of Python 
> (unfortunately, the 3rd party libraries needed to use them have not caught 
> up. Specifically, async versions of pyusb and pyserial). But, we are way 
> beyond general ideas. What's needed is a working framework, even if it's 
> merely a prototype.
>
> Through the years, these ideas have come up from time to time. Various 
> alternative architectures have been considered. However, simplicity has 
> always won out. I think the rich ecosystem of 3rd party extensions that 
> surrounds WeeWX has shown that to be a good choice. 
>
> -tk
>
> On Sun, Feb 28, 2021 at 2:44 PM David Prellwitz <[email protected]> 
> wrote:
>
>> Thanks for the quick reply!  I apologize for any negative connotations 
>> (not intended), i guess my perceptions and assumptions are off-base! it's 
>> just in all my professional DB roles (25yrs +) i've had to use visual 
>> layout tools to understand existing data creation, utilization, ranges, 
>> applications and operational impacts. These visual tools allowed us to use 
>> as much third-normal form reduction steps as possible to simplify data 
>> architecture to the lowest possible level. My misjudgement of WeeWx layout 
>> and structure is just that, my misjudgement. 
>>
>> Perhaps i am over-thinking things as i try to get basic things done. 
>> Habits of a long life in IT Management/DB Architecture/Programming. I just 
>> think i may be some ideas and observations that may improve WeeWx but, I 
>> can drop these distractions if it's an issue. 
>>
>> Now, the issue I had was the lack of understanding the role of the 
>> additional tables - they didn't seem to have any relationship to the 
>> archive table; other than Epoch. If i'm reading your statement correctly, 
>> all the other tables are the detail records for each of the elements in the 
>> archive table - just related by element type ("Name"?) and epoch timestamp? 
>> So if i went looking for "Wind Gust Direction" instances, I'd look in the 
>> Archive table for "WindGustDir" and find an entry (which represents what, a 
>> summary, an average, ?),  and one or more "Archive_day_windGustDir" 
>> entries, depending on the frequency of that element's recording? (Assuming 
>> archived wind-gust data is recorded for one-minute intervals, but 
>> "Archive_day_windGustDir" are recorded at 3second intervals for 
>> Wind-Gust-Dir table.)  Knowledge at that level of detail should be known 
>> somewhere, right? Perhaps somewhere in GitHub?
>>
>> How is this useful? It appears to me (IMHO) that a good understanding of 
>> the data flow and mapping could help reduce event process time; reduce data 
>> storage needs; increase functionality; improve simplicity and make it a 
>> better product. Looking at the architecture of WeeWx, the process diagram 
>> shows a single thread for each event (i.e., UDP packet arrival); followed 
>> by event processing; followed by data-recording; followed by report 
>> generation. All of these appear to be serial. If the front end was to be 
>> multi-threaded to allow multiple device/station packet reception and placed 
>> in a triggered repository that the rest of WeeWx processed, wouldn't WeeWx 
>> be able to process multiple stations data without having multiple instances 
>> of WeeWx (and multiple SQL db's). A quick, multi-threaded front-end could 
>> be prioritized to allow for sub-second processing. Leaving more "leisured" 
>> approach for the rest of Weewx to process and create reports. 
>>
>> I'm a bit rusty in data-mapping (did it from 1975 through 1998) so i'm 
>> sure any post-grad would be sharper at this then I am.  Perhaps having some 
>> Data Science Professor create a post-doc project that would help (assuming 
>> i'm not too far off base here)!  Being retired, i now have time on my hands 
>> to assist if anyone thinks that would be useful.
>>
>> On Saturday, February 27, 2021 at 5:43:09 PM UTC-8 vince wrote:
>>
>>> I'd suggest your commentary is more than a little bit unfair and 
>>> inaccurate, and I'll leave it at that.
>>>
>>> Weewx by default uses an underlying sqlite3 database and puts its 
>>> readings into an 'archive' table that has a large number of fields for the 
>>> actual weather measurements.
>>>
>>> CREATE TABLE archive (`dateTime` INTEGER NOT NULL UNIQUE PRIMARY KEY, 
>>> `usUnits` INTEGER NOT NULL, `interval` INTEGER NOT NULL, `altimeter` REAL, 
>>> `appTemp` REAL, `appTemp1` REAL, `barometer` REAL, `batteryStatus1` REAL, 
>>> `batteryStatus2` REAL, `batteryStatus3` REAL, `batteryStatus4` REAL, 
>>> `batteryStatus5` REAL, `batteryStatus6` REAL, `batteryStatus7` REAL, 
>>> `batteryStatus8` REAL, `cloudbase` REAL, `co` REAL, `co2` REAL, 
>>> `consBatteryVoltage` REAL, `dewpoint` REAL, `dewpoint1` REAL, `ET` REAL, 
>>> `extraHumid1` REAL, `extraHumid2` REAL, `extraHumid3` REAL, `extraHumid4` 
>>> REAL, `extraHumid5` REAL, `extraHumid6` REAL, `extraHumid7` REAL, 
>>> `extraHumid8` REAL, `extraTemp1` REAL, `extraTemp2` REAL, `extraTemp3` 
>>> REAL, `extraTemp4` REAL, `extraTemp5` REAL, `extraTemp6` REAL, `extraTemp7` 
>>> REAL, `extraTemp8` REAL, `forecast` REAL, `hail` REAL, `hailBatteryStatus` 
>>> REAL, `hailRate` REAL, `heatindex` REAL, `heatindex1` REAL, `heatingTemp` 
>>> REAL, `heatingVoltage` REAL, `humidex` REAL, `humidex1` REAL, `inDewpoint` 
>>> REAL, `inHumidity` REAL, `inTemp` REAL, `inTempBatteryStatus` REAL, 
>>> `leafTemp1` REAL, `leafTemp2` REAL, `leafWet1` REAL, `leafWet2` REAL, 
>>> `lightning_distance` REAL, `lightning_disturber_count` REAL, 
>>> `lightning_energy` REAL, `lightning_noise_count` REAL, 
>>> `lightning_strike_count` REAL, `luminosity` REAL, `maxSolarRad` REAL, `nh3` 
>>> REAL, `no2` REAL, `noise` REAL, `o3` REAL, `outHumidity` REAL, `outTemp` 
>>> REAL, `outTempBatteryStatus` REAL, `pb` REAL, `pm10_0` REAL, `pm1_0` REAL, 
>>> `pm2_5` REAL, `pressure` REAL, `radiation` REAL, `rain` REAL, 
>>> `rainBatteryStatus` REAL, `rainRate` REAL, `referenceVoltage` REAL, 
>>> `rxCheckPercent` REAL, `signal1` REAL, `signal2` REAL, `signal3` REAL, 
>>> `signal4` REAL, `signal5` REAL, `signal6` REAL, `signal7` REAL, `signal8` 
>>> REAL, `snow` REAL, `snowBatteryStatus` REAL, `snowDepth` REAL, 
>>> `snowMoisture` REAL, `snowRate` REAL, `so2` REAL, `soilMoist1` REAL, 
>>> `soilMoist2` REAL, `soilMoist3` REAL, `soilMoist4` REAL, `soilTemp1` REAL, 
>>> `soilTemp2` REAL, `soilTemp3` REAL, `soilTemp4` REAL, `supplyVoltage` REAL, 
>>> `txBatteryStatus` REAL, `UV` REAL, `uvBatteryStatus` REAL, 
>>> `windBatteryStatus` REAL, `windchill` REAL, `windDir` REAL, `windGust` 
>>> REAL, `windGustDir` REAL, `windrun` REAL, `windSpeed` REAL);
>>>
>>> If you look at the beginning of the line, you'll see the unique key is 
>>> the dateTime (seconds since the unix epoch).
>>>
>>> There are also a number of additional sqlite3 tables created, one per 
>>> item above, with the dateTime for the record, the max/min for that 
>>> measurement, when the max/min occurred, and also for things that are 
>>> accumulated a running total.
>>>
>>> CREATE TABLE archive_day_inTemp (dateTime INTEGER NOT NULL UNIQUE 
>>> PRIMARY KEY, min REAL, mintime INTEGER, max REAL, maxtime INTEGER, sum 
>>> REAL, count INTEGER, wsum REAL, sumtime INTEGER);
>>>
>>> These tables are not connected in any way at the database level, they're 
>>> individual tables.  They're also using the dateTime of the record as the 
>>> primary key for the records in the table.
>>>
>>> Weewx handles the heavy lifting for keeping the summary tables up to 
>>> date from the data in the archive table.   When you see people who ask how 
>>> to clear bad data from their (archive table in the) database, they are 
>>> generally told to 'drop daily' (which deletes the summary tables) and 
>>> 'rebuild daily' (which regenerates the summary tables from the 
>>> hand-modified archive table).
>>>
>>> That help any ? 
>>>
>>> I guess I'm kinda speechless re: using a drawing tool to visualize a 
>>> database nor really what you're trying to do.   Perhaps the tool you want 
>>> to use is google and going through the sqlite3 documentation online to 
>>> better understand how to reverse engineer and examine a sqlite3 database, 
>>> but I guess that's a bit harsh.  I'll assume that you're overthinking 
>>> things a bit and looking for complexity where it doesn't exist.
>>>
>>>
>>>    - archive table has all the 'meat'
>>>    - summary archive_whatever tables have the pre-computed summaries of 
>>>    max/min/sums of each element for later use
>>>    
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "weewx-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/weewx-user/59c1ba34-84c8-42f7-8b01-249f9b25b350n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/weewx-user/59c1ba34-84c8-42f7-8b01-249f9b25b350n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/eaef23d0-6b68-4db6-80f8-65abafe33f43n%40googlegroups.com.

Re: [weewx-user] ;:; SQLite schema report

Reply via email to