On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakan...@vmware.com> writes: > > On 23.01.2013 09:36, Alexander Korotkov wrote: > >> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane<t...@sss.pgh.pa.us> wrote: > >>> The biggest problem is that I really don't care for the idea of > >>> contrib/pg_trgm being this cozy with the innards of regex_t. > > >> The only option I see now is to provide a method like "export_cnfa" > which > >> would export corresponding CNFA in fixed format. > > > Yeah, I think that makes sense. The transformation code in trgm_regexp.c > > would probably be more readable too, if it didn't have to deal with the > > regex guts representation of the CNFA. Also, once you have intermediate > > representation of the original CNFA, you could do some of the > > transformation work on that representation, before building the > > "tranformed graph" containing trigrams. You could eliminate any > > non-alphanumeric characters, joining states connected by arcs with > > non-alphanumeric characters, for example. > > It's not just the CNFA though; the other big API problem is with mapping > colors back to characters. Right now, that not only knows way too much > about a part of the regex internals we have ambitions to change soon, > but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of > which should be known to the regex library IMO. So I'm not sure how we > divvy that up sanely. To be clear: I'm not going to insist that we have > to have a clean API factorization before we commit this at all. But it > worries me if we don't even know how we could get to that, because we > are going to need it eventually. > Now, we probably don't have enough of time before 9.3 to solve an API problem :(. It's likely we have to choose either commit to 9.3 without clean API factorization or postpone it to 9.4. ------ With best regards, Alexander Korotkov.