>Don't you have duplicate information within your UTC, location and local_time data ? Maybe you can just attach a timezone to each location...
Yes there is duplicate information but dealing with time zones are a PITA and the easiest way to solve the myriad of problems I have is to store the local time as it was at that instant. >With smaller partitions for hot data, it should be quite fast to move them one by one to the old data. I have no experience with the trigger based partitioning of Postgres (we handle partitions logically at the application level), so I'm not sure how difficult this approach is. I suppose that you'll need a function that move data from hot to old partitons and that fix the triggers accordingly. This I think would be a good approach but it does involve a heavy rewrite of the app. On Mon, Feb 9, 2015 at 11:14 PM, Marc Mamin <m.ma...@intershop.de> wrote: > > >I have two partitioning questions I am hoping somebody can help me with. > > > >I have a fairly busy metric(ish) table. It gets a few million records per > day, the data is transactional for a while but then settles down and is > used for analytical purposes later. > > > >When a metric is reported both the UTC time and the local times are > stored along with the other data belonging to the metric. > > Don't you have duplicate information within your UTC, location and > local_time data ? > Maybe you can just attach a timezone to each location... > > >I want to partition this table to both make it faster to query and also > to spread out the writes. Ideally the partitions would be based on the UTC > timestamp and the sending location. For example > > > >metrics_location_XXXXX_2015_01_01 > > > >First problem with this approach is that there could be tens of thousands > of locations so this is going to result hundreds of thousands of tables. > I know there are no upper limits to how many tables there are but I am > thinking this might really get me into trouble later. > > With only a few millions rows per day, weekly or even monthly partitions > without regard of locations should be sufficient for older data. > It should be possible to partition your hot data differently; But Instead > of using one partition per location, you may use a hash/modulo approach to > keep the number of partitions in a reasonable count if required at all > (This can be helpful: https://github.com/markokr/pghashlib). Here I would > avoid to include time information except for the limit between old and hot > tables. And depending on the pattern and performance requirement of your > analytic queries this may be sufficient (i.e. don't partition on the time > at all). > With smaller partitions for hot data, it should be quite fast to move them > one by one to the old data. I have no experience with the trigger based > partitioning of Postgres (we handle partitions logically at the application > level), so I'm not sure how difficult this approach is. I suppose that > you'll need a function that move data from hot to old partitons and that > fix the triggers accordingly. > > > > >Second and possibly more vexing problem is that often the local time is > queried. Ideally I would like to put three constraints on the child > tables. Location id, UTC timestamp and the local time but obviously the > local timestamps would overlap with other locations in the same timezone > Even if I was to only partition by UTC the local timestamps would overlap > between tables. > > > >So the questions in a nutshell are. > > > >1. Should I be worried about having possibly hundreds of thousands of > shards. > >2. Is PG smart enough to handle overlapping constraints on table and > limit it's querying to only those tables that have the correct time > constraint. > > If you partition on the UTC time only, you don't have overlapping. When > querying on the local time, the planner will consider all partitions, but > an additional index or constraint on this column should be sufficient as > long as your partition count remains small. > > regards, > Marc Mamin >