2014-09-21 17:00 GMT+02:00 Mart Kelder <[email protected]>: > Hi Pavel (and others), > > Op zondag 21 september 2014 15:35:46 schreef u: > > 2014-09-21 14:30 GMT+02:00 Mart Kelder <[email protected]>: > > > Hi Pavel (and others), > > > > > > Pavel Stehule wrote: > > > > Hi > > > > I tried to solve following task: > > > > > > > > I have a table > > > > > > > > start, reason, km > > > > ============= > > > > > > > > 2014-01-01 08:00:00, private, 10 > > > > 2014-01-01 09:00:00, commerc, 20 > > > > 2014-01-01 10:00:00, commerc, 20 > > > > 2014-01-01 11:00:00, private, 8 > > > > > > > > and I would reduce these rows to > > > > > > > > 2014-01-01 08:00:00, private, 10 > > > > 2014-01-01 09:00:00, commerc, 20 + 20 = 40 > > > > 2014-01-01 11:00:00, private, 8 > > > > > > > > It is relative hard to it now with SQL only. But we can simplify this > > > > task > > > > > > > > with window function that returns number of change in some column. > Then > > > > this task can be solved by > > > > > > > > select min(start), min(reason), sum(km) > > > > from (select start, reason, km, change_number(reason) over (order > by > > > > start)) > > > > > > > > group by change_number; > > > > > > What about > > > > > > select srk.reason, min(srk.start), sum(srk.km) > > > from start_reason_km srk > > > group by srk.reason, (select max(start) from start_reason_km other > WHERE > > > other.start < srk.start and other.reason != srk.reason); > > > > This query is Cartesian product, so for some large data it is > significantly > > slower then window function (required only sorts without joins) > > I think we have the same queryplan in mind (with only one scan). As far as > I > know, SQL is a language where you define the result you want to get, and > let > the server find a way how to find the data. I think windowed function also > say > how the server needs to get the information. >
What I know It is not true now. Any subselect enforce individual scan of source relation. Postgres has no any special support for selfjoin. > > The real challenge is of course of finding heuristics to remove the > additional > join. In this particular case, I can tell how to remove the inner join from > the subquery: > * the where-clause of the self-join contains other.start < srk.start. From > that we can conclude that if the table is (or can be) sorted on "start", we > have seen the data before the subquery is executed > * because we only need an aggregate, we need to store the intermediate > "max" > for each "reason". And then add the result to the stream. > > Recently, I had a simular problem with a table containing a timestamp, a > state > and a object where the state belongs to. A object remains in a state until > there is a more recent tuple in the table. I needed basically to query all > the > previous state for each tuple, but preverably without the additional join. > > > My motivation was a) to implement described task without Cartesian > product. > Good reason (if you consider the queryplan and not the query). > > yes. There is probably big space for improvements in more directions. For example - we have application, where is often used pattern SELECT FROM A JOIN (SELECT someagg() FROM A) .. ON Sometimes these queries are slow due terrible low estimation. It is one example of more Pavel > > b) introduce some tool for this kind of problems. I seen more times a > > request .. reduce a time series, and a window function "change_number" > (or > > maybe "consistent_series_number") can be good candidate. > > I also need to note that there is a lot of difference in complexity > between the > possible solutions of this problem. Where a new window function can > probably > be very easy implemented, the optimizer changes descripted above are > complex > and not easy to implement. > > I also want to note that I am not really against new window functions, I > only > want to point out that a more generic solution also might be possible. > > Regards, > > Mart > > > -- > Sent via pgsql-hackers mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
