Here's an updated query that does better checking and it can delete the
rows or update the temperatures. Test it out on a copy of your database
first. This works for sqlite. Don't know about others.
Walt
On Sunday, October 3, 2021 at 6:48:23 AM UTC-5 [email protected] wrote:
> Exactly. Is there a way to consider only cases such as +0, -10, +0 (i.e.
> spikes) and not those like +0, -6. -8 ?
> Meanwhile I thank you for your precious help.
> Andrea
>
> Il giorno giovedì 30 settembre 2021 alle 23:03:35 UTC+2 [email protected]
> ha scritto:
>
>> Just a reminder, this delete query will delete all situations where the
>> criteria is met. Some of these situations are legit, temps can change
>> rapidly. You want to make sure it's going to delete only the records you
>> want before running it. There are probably some enhancements that can be
>> made to the query to find more obvious spikes. Something like +0, -10, +0
>> might be a spike, whereas +0, -6. -8 is not, the temp is just falling fast.
>>
>> On Thursday, September 30, 2021 at 1:11:05 PM UTC-5 [email protected]
>> wrote:
>>
>>> Well, make sure you back up your database before you try to delete
>>> anything in case this goes wrong. I would
>>>
>>> 1. Stop weewx
>>> 2. Make a copy of the database
>>> 3. Execute the delete SQL
>>> 4. Run the original query again to see if things are OK
>>> 5. Rebuild weewx dailies
>>> 6. Start weewx
>>>
>>> Hopefully I'm not forgetting anything.
>>>
>>> I'm attaching SQL to delete the spiked records. You will have to change
>>> the database name in the .open statement to the db you want to change. You
>>> may need to run sqlite3 with sudo in order to delete records.
>>>
>>> I thought a better solution would be to update the spiked temp with an
>>> average of the readings surrounding the spike, but I'm having trouble
>>> coming up with the SQL to do that.
>>>
>>> Walt
>>> On Thursday, September 30, 2021 at 9:41:17 AM UTC-5 [email protected]
>>> wrote:
>>>
>>>> Thank you Walt, this script works fine! in fact it highlighted that
>>>> spike in my graph:
>>>>
>>>> DT datetime temp temp_prior
>>>> temp_next variance d_back d_forw
>>>> ------------------- ---------- ---------------- ----------------
>>>> ---------------- ---------- ------------------ ----------------
>>>> 2021-09-30 04:15:00 1632968100 19.2970370370371 19.3363567649282
>>>> 14.5003779289494 2.38 0.0393197278911508 4.79665910808766
>>>>
>>>> (I added two columns d_back = |t1-t2| and t_forw = |t1-t3|)
>>>> Now, I need to delete the wrong record: in this case, it is t3 (d_forw
>>>> = 4.8). How can I automatically find it and then remove? Unfortunately I
>>>> don't know the sql language and I can't complete the script myself.
>>>>
>>>> Thank you very much,
>>>> Andrea
>>>> Il giorno giovedì 30 settembre 2021 alle 15:27:37 UTC+2
>>>> [email protected] ha scritto:
>>>>
>>>>> Here's a query using the "join a table to itself" method if you don't
>>>>> have the LAG function. It prints observations that are outside the
>>>>> average
>>>>> of the surrounding observations by 2+ degrees. I get about 30
>>>>> observations
>>>>> this year and I think it's legit. Although I don't get spikes in my
>>>>> data,
>>>>> I do use a 15 minute interval and my station is closer to the ground than
>>>>> it should be. A lot can happen in 15 minutes and my temp graph is always
>>>>> kinda wiggly.
>>>>>
>>>>> Caveat Emptor on the attached query. You could modify it to delete or
>>>>> update the offending rows instead of printing them out.
>>>>>
>>>>> Walt
>>>>>
>>>>>
>>>>> On Thursday, September 30, 2021 at 7:52:41 AM UTC-5 [email protected]
>>>>> wrote:
>>>>>
>>>>>> Yes, it's possible, but it's not a simple SELECT statement. For
>>>>>> example, this query will return all rows where the temperature
>>>>>> difference
>>>>>> between adjacent rows is greater than 1 degree:
>>>>>>
>>>>>>
>>>>>> *SELECT g.* FROM (SELECT dateTime,
>>>>>> datetime(dateTime,'unixepoch','localtime'), LAG(outTemp) OVER (ORDER BY
>>>>>> dateTime) AS prev_outTemp, outTemp FROM archive) AS g WHERE
>>>>>> ABS(g.outTemp-g.prev_outTemp) > 1;*
>>>>>>
>>>>>> For my own database, this returns
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *1192323300|2007-10-13 17:55:00|64.099998|63.01192382100|2007-10-14
>>>>>> 10:15:00|51.900002|53.01192756800|2007-10-18
>>>>>> 18:20:00|44.700001|46.7000011192757100|2007-10-18
>>>>>> 18:25:00|46.700001|49.51192757400|2007-10-18 18:30:00|49.5|47.400002*
>>>>>> ...
>>>>>>
>>>>>> You would then need to arrange to delete those specific timestamps.
>>>>>>
>>>>>> The select statement will only work with modern versions of sqlite,
>>>>>> which have the LAG() function.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 30, 2021 at 3:35 AM [email protected] <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi there,
>>>>>>> I need to remove some temperature spikes from my weewx.sdb database.
>>>>>>> I know this has been already asked many times, but my case is
>>>>>>> different. I
>>>>>>> need to delete spikes in which the values are climatically acceptable:
>>>>>>> so,
>>>>>>> it is not sufficient to give (temperature is intended in °C)
>>>>>>>
>>>>>>> echo "delete from archive where outTemp < -6;" | sudo sqlite3
>>>>>>> /home/weewx/archive/weewx.sdb
>>>>>>>
>>>>>>> for solving the problem. In my case, the spike was about 16°C when
>>>>>>> temperature was about 20°C (see the graph below)
>>>>>>> [image: daytempdew.png]
>>>>>>> What I need to do is something like: DELETE if Δt = 300 AND |ΔT| > 5,
>>>>>>> where
>>>>>>> Δt is time difference (in seconds, I chose 300 because my archive
>>>>>>> interval is 5 minutes) and ΔT is the temperature difference in that
>>>>>>> time
>>>>>>> bin.
>>>>>>> In other words I am thinking about using a differential approach. It
>>>>>>> seems to me that this can be the only solution, is this possible with
>>>>>>> sqlite3?
>>>>>>> Thank you,
>>>>>>> Andrea
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "weewx-user" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/weewx-user/32298c8c-6d93-4a80-8b9e-29fdcc181c8an%40googlegroups.com
>>>>>>>
>>>>>>> <https://groups.google.com/d/msgid/weewx-user/32298c8c-6d93-4a80-8b9e-29fdcc181c8an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
--
You received this message because you are subscribed to the Google Groups
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/weewx-user/a3b5661f-58d6-4c17-8ce2-4c55ca620eb1n%40googlegroups.com.
-- find anomalies in temperature observations
-- optionally alter the temperature or
-- delete the entire observation record
--
-- see comments below for:
-- naming the database to operate on,
-- setting your archive interval,
-- refining the search,
-- what to do with the results (print, alter, or delete)
.mode columns
.headers on
.changes on
-- name the database to query/alter
.open c:/users/wfs91/documents/weather/weewx2.sdb
create temp table result1 as
with t0 as (
select t1.datetime
, t2.outtemp as temp_prior
, t1.outtemp as temp
, t3.outtemp as temp_next
, round(((t2.outtemp + t3.outtemp) / 2), 2) as temp_avg
, round(abs(t1.outtemp - ((t2.outtemp + t3.outtemp) / 2)), 2) as
temp_variance
from archive as t1
-- set your archive interval in the next two lines; my interval is 15 minutes
but many people use 5
, archive as t2 on t2.datetime = t1.datetime - (15 * 60)
, archive as t3 on t3.datetime = t1.datetime + (15 * 60)
-- uncomment the following line and change dates to limit the search over time
-- where date(t1.datetime, 'unixepoch', 'localtime') between '2000-01-01' and
'2030-12-31'
)
select datetime(datetime, 'unixepoch', 'localtime') as DT
, *
from t0
-- change the values in this where clause to refine the search
-- here we select temps that:
-- vary a certain amount from the average of the adjacent observations
-- are either higher or lower than both of the adjacent observations
-- depart a certain amount from adjacent observations
-- (a small change at either end might indicate legit observations)
where temp_variance >= 2.0
and
((temp > temp_prior and temp > temp_next) or (temp < temp_prior and temp
< temp_next))
and
abs(temp_prior - temp) >= 1.0
and
abs(temp_next - temp) >= 1.0
;
-- print out the rows
select * from result1
;
-- to delete the rows found, uncomment the statement below
-- NB: this deletes the entire row, all observations (temp, humidity, wind,
etc) are lost
--delete from archive where datetime in (select datetime from result1)
;
-- to update the temperature with the average of the adjacent observations
-- uncomment the next two lines
--update archive as ta set outtemp = (select temp_avg from result1 where
datetime == ta.datetime)
--where datetime in (select datetime from result1)
;