On Tue, Dec 3, 2024 at 4:42 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Michel Pelletier <pelletier.mic...@gmail.com> writes: > > Here's a WIP patch for a pgexpanded example in src/test/modules. > > I didn't look at your patch yet, but in the meantime here's an update > that takes the next step towards what I promised. > Awesome! I made a support function for my set_element(matrix, i, j, value) function and it works great, but I do have a couple questions about how to move forward with some of my other methods. For reference, here's the set_element test function, and a trace of function calls, after implementing the support function for set_element, the expand_matrix function is called twice, first by nvals(), and later by the first set_element, but not the subsequent sets, so the true clause of the PLPGSQL_RWOPT_INPLACE case in plpgsql_param_eval_var_check works great and as expected: postgres=# create or replace function test_se(graph matrix) returns matrix language plpgsql as $$ declare nvals bigint; begin graph = wait(graph); nvals = nvals(graph); raise notice 'nvals: %', nvals; graph = set_element(graph, 4, 2, 42); graph = set_element(graph, 4, 3, 43); graph = set_element(graph, 4, 4, 44); return graph; end; $$; CREATE FUNCTION postgres=# select nvals(test_se('int32'::matrix)); DEBUG: new_matrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: matrix_wait DEBUG: DatumGetMatrix DEBUG: expand_matrix <- wait expands with support function DEBUG: new_matrix DEBUG: matrix_nvals DEBUG: DatumGetMatrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: expand_matrix <- nvals() reexpands DEBUG: new_matrix DEBUG: context_callback_matrix_free NOTICE: nvals: 0 DEBUG: scalar_int32 DEBUG: new_scalar DEBUG: matrix_set_element DEBUG: DatumGetMatrix < set_element does not reexpand, yay! DEBUG: DatumGetScalar DEBUG: context_callback_scalar_free DEBUG: scalar_int32 DEBUG: new_scalar DEBUG: matrix_set_element DEBUG: DatumGetMatrix DEBUG: DatumGetScalar DEBUG: context_callback_scalar_free DEBUG: scalar_int32 DEBUG: new_scalar DEBUG: matrix_set_element DEBUG: DatumGetMatrix DEBUG: DatumGetScalar DEBUG: context_callback_scalar_free DEBUG: matrix_nvals DEBUG: DatumGetMatrix DEBUG: context_callback_matrix_free DEBUG: context_callback_matrix_free ┌───────┐ │ nvals │ ├───────┤ │ 3 │ └───────┘ My question is about nvals = nvals(graph) in that function above, the object is flattened and then rexpanded, even after the object was expanded and returned by wait() [1] into a r/w pointer. set_element honors the expanded object from wait(), but nvals does not. It seems like I want to be able to pass the argument as a RO pointer, but I'm not sure how to trigger the "else" clause in the PLPGSQL_RWOPT_INPLACE case. I can see how the support function triggers the if, but I don't see a similar way to trigger the else. I'm almost certainly missing something. [1](Due to the asynchronous nature of the GraphBLAS API, there is a GrB_wait() function to wait until an object is "complete" computationally, but since this object was just expanded, there is no pending work, so the wait() is essentially a noop but a handy way for me to return a r/w pointer for subsequent operations). set_element and wait are the simple case where there is only one reference to the r/w object in the argument list. As we've discussed I have many functions that can possibly have multiple references to the same object. The core operation of the library is matrix multiplication, and all other complex functions follow a very similar pattern, so I'll just focus on that one here: CREATE FUNCTION mxm( a matrix, b matrix, op semiring default null, inout c matrix default null, mask matrix default null, accum binaryop default null, descr descriptor default null ) RETURNS matrix The order of arguments mostly follows the order in the C API. a and b are the left and right matrix operands and the matrix product is the return value. If c is not null, then it is a pre-created return value which may contain partial results already from some previous operations, otherwise mxm creates a new matrix of the correct dimensions and returns that. I think the inout is meaningless as it doesn't seem to change anything, but I'm using it as a visual indication in code that c can be the return value if it's not null. Here's an example of doing Triangle Counting using Burkhardt's method [2], where a, b, and the mask are all the same adjacency matrix (here the Newman/karate graph. The 'plus_pair' semiring is optimized for structural counting, and the descriptor 's' tells suitesparse to only use the structure of the mask and not to consider the values): [2] https://doi.org/10.1177/1473871616666393 CREATE OR REPLACE FUNCTION public.tcount_burkhardt(graph matrix) RETURNS bigint LANGUAGE plpgsql AS $$ begin graph = wait(graph); graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph, descr=>'s'); return reduce_scalar(graph) / 6; end; $$; postgres=# select tcount_burkhardt(graph) from karateg; DEBUG: matrix_wait DEBUG: DatumGetMatrix DEBUG: expand_matrix <- wait expands and returns r/w pointer with support function DEBUG: new_matrix DEBUG: matrix_mxm <- mxm starts here DEBUG: DatumGetMatrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: expand_matrix <- expanding left operand again DEBUG: new_matrix DEBUG: DatumGetMatrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: expand_matrix <- expanding right operand again DEBUG: new_matrix DEBUG: DatumGetSemiring DEBUG: expand_semiring DEBUG: new_semiring DEBUG: new_matrix DEBUG: DatumGetMatrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: expand_matrix <- expanding mask argument again DEBUG: new_matrix DEBUG: DatumGetDescriptor DEBUG: expand_descriptor DEBUG: new_descriptor DEBUG: context_callback_matrix_free DEBUG: context_callback_descriptor_free DEBUG: context_callback_matrix_free DEBUG: context_callback_semiring_free DEBUG: context_callback_matrix_free DEBUG: context_callback_matrix_free DEBUG: matrix_reduce_scalar DEBUG: DatumGetMatrix DEBUG: matrix_get_flat_size DEBUG: flatten_matrix DEBUG: expand_matrix <- reduce also re-expands matrix DEBUG: new_matrix DEBUG: new_scalar DEBUG: scalar_div_int32 DEBUG: DatumGetScalar DEBUG: new_scalar DEBUG: cast_scalar_int64 DEBUG: DatumGetScalar DEBUG: context_callback_scalar_free DEBUG: context_callback_scalar_free DEBUG: context_callback_matrix_free DEBUG: context_callback_matrix_free ┌──────────────────┐ │ tcount_burkhardt │ ├──────────────────┤ │ 45 │ └──────────────────┘ mxm calls expand_matrix three times for each of the three arguments. Ideally I'd like the already expanded rw pointer from wait() to be honored by mxm so that it doesn't re-expand the object three times but, like set_element, not at all. Hope that question makes sense, still going on the main theory that I'm not understanding the support function, and maybe the c argument thing being optional throws a wrench in the plan, and I'm happy to try and find a workaround for that. Maybe always requiring the result to be pre-constructed and then making c required and reassigning back to the same input argument is the right approach? Some good news I always like seeing is that 45 is the right answer. Very close to having optimal sparse linear algebra in Postgres! Thanks for your help! I'll move onto your comments on the test module next. -Michel