I've been slowly reading through the tsearch code and refactoring things as I go. Here's a patch of what I've done this far. It has grown larger than I intended, but it has helped me a lot to understand the code.
I hope I didn't break anything; at least it passes regression tests. If there's some changes that others don't agree with or feel uncomfortable doing for 8.3, let me know and I can adjust the patch. The usage of the QueryItem struct was very confusing. It was used for both operators and operands. For operators, "val" was a single character casted to a int4, marking the operator type. For operands, val was the CRC-32 of the value. Other fields were used only either for operands or for operators. The biggest change in the patch is that I broke the QueryItem struct into QueryOperator and QueryOperand. Type was really the only common field between them. QueryItem still exists, and is used in the TSQuery struct as before, but it's now a union of the two. Many other changes fell from that, like separation of pushval_asis function into pushValue, pushOperator and pushStop. Other changes include: - Moved some structs that were for internal use only from header files to the right .c-files. - Moved tsvector parser to a new tsvector_parser.c file. Parser code was about half of the size of tsvector.c, it's also used from tsquery.c, and it has some data structures of its own, so it seems better to separate it. Cleaned up the API so that TSVectorParserState is not accessed from outside tsvector_parser.c. - Separated enumerations (#defines, really) used for QueryItem.type field and as return codes from gettoken_query. It was just accidental code sharing. - Removed ParseQueryNode struct used internally by makepol and friends. push*-functions now construct QueryItems directly. - Changed int4 variables to just ints for variables like "i" or "array size", where the storage-size was not significant. - Probably a lot of other stuff I can't remember right now I have a few more things in mind I'm planning to look into: - I'm not convinced that there's enough sanity checking in the input functions. I think you can send queries into the server using the binary protocol that you couldn't produce otherwise. For example, queries with multiple VAL nodes, with no operator to connect them. Does that wreak havoc in any of the functions? And the left-field of QueryOperator is an int16, what happens if you have query that exceeds that? And parse_query always produces trees that are in prefix notation, so the left-field is really redundant, but using tsqueryrecv, you could inject queries that are not in prefix notation; is there anything in the code that depends on that? - There's many internal intermediate representations of a query: TSQuery, a QTNode-tree, NODE-tree (in tsquery_cleanup.c), prefix notation stack of QueryItems (in parser), infix-tree. Could we remove some of these? - There's a lot of recursive functions. Are they all using check_stack_depth? - More commenting and documenting and little refactorings -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Description: GNU Zip compressed data
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster