Please add this to the next commitfest. https://commitfest.postgresql.org/action/commitfest_view?id=22
Cheers, David. On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote: > Hi, > > Currently, unaccent extension only allows replacing one source > character with one or more target characters. In Arabic, Hebrew and > possibly other languages, diacritics are standalone characters that > are being added to normal letters. To use unaccent dictionary for > these languages, we need to allow empty targets to remove diacritics > instead of replacing them. > > The attached patch modfies unaacent.c so that dictionary parser uses > zero-length target when the line has no target. > > Best Regards, > > Mohammad Alhashash > > diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c > old mode 100644 > new mode 100755 > index a337df6..4e72829 > --- a/contrib/unaccent/unaccent.c > +++ b/contrib/unaccent/unaccent.c > @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, > char *replaceTo, int r > { > curnode->replacelen = replacelen; > curnode->replaceTo = palloc(replacelen); > - memcpy(curnode->replaceTo, replaceTo, replacelen); > + /* palloc(0) returns a valid address, not NULL */ > + if (replaceTo) /* memcpy() is undefined for NULL > pointers*/ > + memcpy(curnode->replaceTo, replaceTo, > replacelen); > } > } > else > @@ -105,10 +107,10 @@ initTrie(char *filename) > while ((line = tsearch_readline(&trst)) != NULL) > { > /* > - * The format of each line must be "src trg" > where src and trg > + * The format of each line must be "src [trg]" > where src and trg > * are sequences of one or more non-whitespace > characters, > * separated by whitespace. Whitespace at > start or end of > - * line is ignored. > + * line is ignored. If no trg added, a > zero-length string is used. > */ > int state; > char *ptr; > @@ -160,6 +162,13 @@ initTrie(char *filename) > } > } > > + /* if no trg (loop stops at state 1 or 2), use > zero-length target */ > + if (state == 1 || state == 2) > + { > + trglen = 0; > + state = 5; > + } > + > if (state >= 3) > rootTrie = placeChar(rootTrie, > > (unsigned char *) src, srclen, > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter <da...@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers