>
> > Are you thinking something like having a COPY command that provides
> > results in such a way that they could be referenced in a FROM clause
> > (perhaps a COPY that defines a cursor…)?
>
> That would also be nice, but what I was thinking of was that some
> highly restricted subset of cases of SQL in general could lend
> themselves to levels of optimization that would be impractical in
> other contexts.
>

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'


In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations
and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.

Reply via email to