I wish to define custom aggregate functions in Tcl, but this capability is
currently not exposed through the Tcl interface. Thus I am thinking about
how best to add it. Here's a first crack at a design proposal:

Extend the [db function] command to accept an -aggregate switch that makes
the new function be an aggregate function. Otherwise, the function defaults
to being a scalar function.

When an aggregate Tcl function is called from SQL, it is invoked in two
phases: step and final. The step phase receives the input data, and the
final phase produces the result.

During step, the function is invoked once for each row (in the group). For
the first row (in the group), the first argument to the function will be
empty string, and subsequent arguments are the SQL values from the row
being processed. For each additional row (in the group), the first argument
is the Tcl value returned by the prior invocation, and subsequent arguments
are as above.

During final, the function is invoked one last time (at the end of each
group). Its sole argument is the return value of the last step invocation,
or empty string if there were no rows. The return value of the Tcl function
is used as the return value of the SQL function.

If there were no rows and GROUP BY was used, the function is not invoked at
all, because there were no groups.

The Tcl function can tell whether it is in the step or final phase by how
many arguments it receives. If it receives multiple, it is in step. If it
receives only one, it is in final.

Depending on how it is written, the Tcl function may be able to tell if
it's being called for the first time (within the group) by checking if its
first argument is empty string. If non-empty, it definitely has been called
before (within the group). If empty, it is most likely on the first row
and, if in step, may need to initialize.

However, if the aggregate function wishes to disregard some of its input
data, it may choose to return empty string to discard any state data
arising from the current and prior rows (in the group). This will result in
empty string being the first argument to the next invocation. This mode of
operation is ill-advised but not illegal, though maybe some valid use cases
may exist, provided that ORDER BY is being used.

It is an error to pass an aggregate Tcl function zero arguments.

Here's an example that will return the entire Fossil global_config table as
a key/value dict:

db function list -deterministic -aggregate {
    apply {{state args} {
        concat $state $args
    }
}

db onecolumn {
    SELECT list(name, value)
      FROM global_config
     ORDER BY name
}

As for implementation, I think the existing code can be leveraged to a
great extent. Adapt the existing tclSqlFunc() to be the backend to new
tclSqlFuncScalar(), tclSqlFuncStep(), and tclSqlFuncFinal() functions, and
adjust the DB_FUNCTION code to recognize -aggregate and pass different
function pointers accordingly. Use sqlite3_aggregate_context() to hold the
Tcl_Obj * resulting from each invocation, then let it be the first argument
to the next invocation.

I would be happy to implement this myself, since this is functionality I
will be needing soon. Once I'm satisfied with it and have test suite
updates, what's the recommended method for sharing my patch?

Question: does xFinal() get called if an error occurs during (or between)
calling xStep()? Are errors even possible? I'm curious if there's any way
to leak the Tcl_Obj pointed to by the aggregate context.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to