Please add this to the next commitfest.

https://commitfest.postgresql.org/action/commitfest_view?id=22

Cheers,
David.
On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote:
> Hi,
> 
> Currently, unaccent extension only allows replacing one source
> character with one or more target characters. In Arabic, Hebrew and
> possibly other languages, diacritics are standalone characters that
> are being added to normal letters. To use unaccent dictionary for
> these languages, we need to allow empty targets to remove diacritics
> instead of replacing them.
> 
> The attached patch modfies unaacent.c so that dictionary parser uses
> zero-length target when the line has no target.
> 
> Best Regards,
> 
> Mohammad Alhashash
> 

> diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
> old mode 100644
> new mode 100755
> index a337df6..4e72829
> --- a/contrib/unaccent/unaccent.c
> +++ b/contrib/unaccent/unaccent.c
> @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, 
> char *replaceTo, int r
>               {
>                       curnode->replacelen = replacelen;
>                       curnode->replaceTo = palloc(replacelen);
> -                     memcpy(curnode->replaceTo, replaceTo, replacelen);
> +                     /* palloc(0) returns a valid address, not NULL */
> +                     if (replaceTo) /* memcpy() is undefined for NULL 
> pointers*/
> +                             memcpy(curnode->replaceTo, replaceTo, 
> replacelen);
>               }
>       }
>       else
> @@ -105,10 +107,10 @@ initTrie(char *filename)
>                       while ((line = tsearch_readline(&trst)) != NULL)
>                       {
>                               /*
> -                              * The format of each line must be "src trg" 
> where src and trg
> +                              * The format of each line must be "src [trg]" 
> where src and trg
>                                * are sequences of one or more non-whitespace 
> characters,
>                                * separated by whitespace.  Whitespace at 
> start or end of
> -                              * line is ignored.
> +                              * line is ignored. If no trg added, a 
> zero-length string is used.
>                                */
>                               int                     state;
>                               char       *ptr;
> @@ -160,6 +162,13 @@ initTrie(char *filename)
>                                       }
>                               }
>  
> +                             /* if no trg (loop stops at state 1 or 2), use 
> zero-length target */
> +                             if (state == 1 || state == 2)
> +                             {
> +                                     trglen = 0;
> +                                     state = 5;
> +                             }
> +                             
>                               if (state >= 3)
>                                       rootTrie = placeChar(rootTrie,
>                                                                               
>  (unsigned char *) src, srclen,

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
David Fetter <da...@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to