Hi, Please don't top-post. If you're not responding to parts of the e-mail, then don't quote it.
On Fri, Sep 06, 2019 at 12:50:33PM +0200, Esteban Zimanyi wrote:
Dear Tom Many thanks for your quick reply. Indeed both solutions you proposed can be combined together in order to solve all the problems. However changes in the code are needed. Let me now elaborate on the solution concerning the combination of stakind/staop first and I will elaborate on adding a new kind identifier after. In order to understand the setting, let me explain a little more about the different kinds of temporal types. As explained in my previous email these are types whose values are composed of elements v@t where v is a PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There are four kinds of temporal types, depending on the their duration * Instant: Values of the form v@t. These are used for example to represent car accidents as in Point(0 0)@2000-01-01 08:30 * InstantSet: A set of values {v1@t1, ...., vn@tn} where the values between the points are unknown. These are used for example to represent checkins in FourSquare or RFID readings * Sequence: A sequence of values [v1@t1, ...., vn@tn] where the values between two successive instants vi@ti vj@tj are (linearly) interpolated. These are used to represent for example GPS tracks. * SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal gap between them. These are used to represent for example GPS tracks where the signal was lost during a time period.
So these are 4 different data types (or classes of data types) that you introduce in your extension? Or is that just a conceptual view and it's stored in some other way (e.g. normalized in some way)?
To compute the selectivity of temporal types we assume that time and space dimensions are independent and thus we can reuse all existing analyze and selectivity infrastructure in PostgreSQL/PostGIS. For the various durations this amounts to * Instant: Use the functions in analyze.c and selfuncs.c independently for the value and time dimensions * InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c independently for the value and time dimensions * Sequence and SequenceSet: To simplify, we do not take into account the gaps, and thus use the functions in rangetypes_typanalyze.c, rangetypes_selfuncs.c independently for the value and time dimensions
OK.
However, this requires that the analyze and selectivity functions in all the above files satisfy the following * Set the staop when computing statistics. For example in rangetypes_typanalyze.c the staop is set for STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for STATISTIC_KIND_BOUNDS_HISTOGRAM * Always call get_attstatsslot with the operator Oid not with InvalidOid. For example, from the 17 times this function is called in selfuncs.c only two are passed with an operator. This also requires to pass the operator as an additional parameter to several functions. For example, the operator should be passed to the function ineq_histogram_selectivity in selfuncs.c * Export several top-level functions which are currently static. For example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and several others in the file selfuncs.c should be exported. That would solve all the problems excepted for STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will always be Float8LessOperator, independently of whether we are computing lengths of value ranges or of tstzranges. This could be solved by using a different stakind for the value and time dimensions.
I don't think we're strongly against changing the code to allow this, as long as it does not break existing extensions/code (unnecessarily).
If you want I can prepare a PR in order to understand the implications of these changes. Please let me know.
I think having an actual patch to look at would be helpful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services