Thanks for the review! Will respond further after reviewing your suggested patches; this is a quick response just to the contents of the email.
On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Both ways have attracted criticism: the first involves touching > basically every core function that might eventually parse, plan or > execute a query to make it accept a Tsrcache and pass that on, and the > second involves a bunch of Rube Goldberg machine-like > callback/variable/parameter code. Just to quantify "touching basically every core function that might...", the number of functions which have a change in signature (adding one parameter) is seven. No more; no less. > I spent some time investigating whether a third way would be viable: > use ParamListInfo's setupParser hook and add an analogous one for the > executor, so that there is no registry equivalent to Tsrcache, but > also no dealing with param slots or (in plpgsql's case) new kinds of > variables. Instead, there would just be two callbacks: one for asking > the tuplestore provider for metadata (statistics, TupleDesc) at > planning time, and another for asking for the tuplestore by name at > execution time. One problem with this approach is that it requires > using the SPI_*_with_paramlist interfaces, not the more commonly used > arg-based versions, and requires using a ParamListInfo when you > otherwise have no reason to because you have no parameters. Also, > dealing with callbacks to register your tuplestore supplier is a > little clunky. More seriously, requiring providers of those callbacks > to write code that directly manipulates ParserState and EState and > calls addRangeTableEntryXXX seems like a modularity violation -- > especially for PLs that are less integrated with core Postgres code > than plpgsql. I got this more or less working, but didn't like it > much and didn't think it would pass muster. ok > After going through that experience, I now agree with Kevin: an > interface where a new SPI interface lets PLs push a named tuplestore > into the SPI connection to make it available to SQL seems like the > simplest and tidiest way. I do have some feedback and suggestions > though: > > 1. Naming: Using tuplestores in AfterTriggersData make perfect sense > to me but I don't think it follows that the exact thing we should > expose to the executor to allow these to be scanned is a > TuplestoreScan. We have other nodes that essentially scan a > tuplestore like WorkTableScan and Material but they have names that > tell you what they are for. This is for scanning data that has either > been conjured up or passed on by SPI client code and exposed to SQL > queries. How about SpiRelationScan, SpiDataScan, SpiRelVarScan, ....? I think an SPI centered approach is the wrong way to go. I feel that an SPI *interface* will be very useful, and probably save thousands of lines of fragile code which would otherwise be blurring the lines of the layering, but I feel there there should be a lower-level interface, and the SPI interface should use that to provide a convenience layer. In particular, I suspect that some uses of these named tuplestore relations will initially use SPI for convenience of development, but may later move some of that code to dealing with parse trees, for performance. Ideally, the execution plan would be identical whether or not SPI was involved, so naming implying the involvement of SPI would be misleading. NamedTuplestoreScan might be an improvement over just TuplestoreScan. > Also, Tsrcache is strangely named: it's not exactly a cache, it's > more of a registry. When I used the word "cache" here, I was thinking more of this English language definition: a : a hiding place especially for concealing and preserving provisions or implements b : a secure place of storage The intent being to emphasize that there is not one public "registry" of such objects, but context-specific collections where references are tucked away when they become available for later use in the only the appropriate context. Eventually, when these are used for some of the less "eager" timings of materialized view maintenance, they may be set aside for relatively extended periods (i.e., minutes or maybe even hours) before being used. Neither "registry" nor "cache" seems quite right; maybe someone can think of a word with more accurate semantics. > 2. Scope of changes: If we're going to touch functions all over the > source tree to get the Tsrcache where it needs to be for parsing and > execution, then I wonder if we should consider thinking a bit more > about where this is going. What if we had a thing called a > QueryEnvironment, and we passed a pointer to that into to all those > places, and it could contain the named tuplestore registry? I agree. I had a version building on the Tsrcache approach which differentiated between three levels of generality: Ephemeral Relations, a subclass of that call Named Ephemeral Relations, and a subclass of that called Named Tuplestore Relations. That experiment seems to have been lost somewhere along the way, but I think it was fairly easy to draw those lines in the Tsrcache version to support other types of lightweight relations. I didn't have the concept of a QueryEnvironment in that; I would be interested in hearing more about how you see that working. > See attached patches: > > * spi-relation-v1.patch, which provides: (1) an SpiRelationScan > executor node, (2) the SPI interface required to feed data to it, (3) > a new QueryEnvironment struct which is used to convey SpiRelation into > the right bits of the planner and executor; this patch is based on > fragments extracted from the -noapi and -tsr patches, and modified as > described above > > * spi-relation-plpgsql-v1.patch, to teach plpgsql how to expose the > new and old tables to SQL via the above > > * spi-relation-plpython-v1.patch, ditto for plpython; this patch makes > the OLD TABLE and NEW TABLE automatically available to any query you > run via plpy.execute, and is intended to show that the code required > to make this work for each PL is small, in support of Kevin's earlier > argument (a more featureful patch might would presumably also expose > the contents of the tuplestores directly to Python code, and let > Python code create arbitrary new tuplestores and expose those to SQL > queries) Right, I think we should assume that there will be other ways people want to use parts of what is done here, including building tuplestores through other means and referencing them in queries. As I said, I think we should make that very easy to do through SPI while providing a low-level mechanism for those building parse trees directly in C code. If we do this right, we might eventually want to collapse CTEs or some other existing types of ephemeral relations into this framework. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers