Just a quick update to this topic from my testing: I whipped up a quick python 
script to create the partition tables for me, which went smoothly enough, and 
created a table LIST partitioned on station. Once populated with my current 
data, this table proved marginally faster than the unpartitioned table, 
especially on the initial select (~4 seconds vs ~6 seconds, speeding up to 
around 2.4 seconds on subsequent queries). Of course, it is entirely possible 
that performance will remain higher than with the unpartitioned table as the 
column count grows. 

Then I tried partitioning by station, with the station tables sub-partitioned 
by channel, on the logic that most queries (especially the ones that need to be 
fast) are only interested in a single channel on a single station. This made a 
HUGE improvement (relatively speaking). Initial query time dropped to ~2.5 
seconds, with subsequent queries coming in at closer to 1 second!

I’ll have to think about the maintenance side - what happens if a new 
station/channel comes online that I don’t have a partition for? I’m thinking 
try to catch the error in my python code when I try to insert such a record, 
create the relevant table(s), then try the INSERT again, but I’ll have to 
investigate more to figure out if this is an option (what sort of error do I 
get), or if there is a better one.

I guess time will tell if this is a sustainable/good schema, but at least for 
the 1,171,575,191 rows I currently have, this gives much better performance 
than the non-partitioned table, and presumably will continue to do so into the 
future.

Of course, if anyone else has any other suggestions other than simple 
partitioning, I’m all ears! Like I said, this is far outside my experience in 
terms of sheer data size (I will be talking to the timescaledb people tomorrow)!

Thanks again!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Oct 4, 2021, at 12:46 PM, Ron <ronljohnso...@gmail.com> wrote:
> 
> On 10/4/21 12:36 PM, Israel Brewster wrote:
> [snip]
>> Indeed. Table per station as opposed to partitioning? The *most* I can 
>> reasonably envision needing is to query two stations, i.e. I could see 
>> potentially wanting to compare station a to some “baseline” station b. In 
>> general, though, the stations are independent, and it seems unlikely that we 
>> will need any multi-station queries. Perhaps query one station, then a 
>> second query for a second to display graphs for both side-by-side to look 
>> for correlations or something, but nothing like that has been suggested at 
>> the moment.
>> 
> 
> Postgresql partitions are tables.  What if you partition by station (or range 
> of stations)?
> 
> -- 
> Angular momentum makes the world go 'round.

Reply via email to