Re: [HACKERS] FTS Configuration option

2016-10-13 Thread Artur Zakirov

On 13.10.2016 11:54, Emre Hasegeli wrote:



Maybe also better to use -> instead of AND? AND would has another
behaviour. I could create the following configuration:

=> ALTER TEXT SEARCH CONFIGURATION multi_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH (german_ispell AND english_ispell) OR simple;

which will return both german_ispell and english_ispell results. But
I'm not sure that this is a good solution.


I see you usecase for AND.  It might indeed be useful.  AND suits well to it.

Maybe THEN can be the keyword instead of -> for pass the results to
subsequent dictionaries.  They are all reserved keywords.  I guess it
wouldn't be a problem to use them.


I agree with THEN. It is better than using -> I think. I suppose it 
wouldn't be a problem too. I think it is necessary to fix gram.y and 
implement logic with OR, AND and THEN.





Of course if this syntax will be implemented, old syntax with commas
also should be maintained.


Yes, we should definitely.  The comma can be interpreted either one of
the keywords depending on left hand side dictionary.

I would be glad to review, if you develop this feature.



Then I will develop it :). But I suppose I can do it a few days or weeks 
later, because I have other tasks with higher priority.


BTW, I've already implemented USING option a few weeks before 
https://github.com/select-artur/postgres/tree/join_tsconfig . But of 
course it is not useful now.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-13 Thread Emre Hasegeli
> With such syntax we also don't need the TSL_FILTER flag for lexeme. At
> the current time unaccent extension set this flag to pass a lexeme to
> a next dictionary. This flag is used by the text-search parser. It
> looks like a hard coded solution. User can't change this behaviour.

Exactly.

> Maybe also better to use -> instead of AND? AND would has another
> behaviour. I could create the following configuration:
>
> => ALTER TEXT SEARCH CONFIGURATION multi_conf
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH (german_ispell AND english_ispell) OR simple;
>
> which will return both german_ispell and english_ispell results. But
> I'm not sure that this is a good solution.

I see you usecase for AND.  It might indeed be useful.  AND suits well to it.

Maybe THEN can be the keyword instead of -> for pass the results to
subsequent dictionaries.  They are all reserved keywords.  I guess it
wouldn't be a problem to use them.

> Of course if this syntax will be implemented, old syntax with commas
> also should be maintained.

Yes, we should definitely.  The comma can be interpreted either one of
the keywords depending on left hand side dictionary.

I would be glad to review, if you develop this feature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-12 Thread Artur Zakirov
Thank you for sharing your thoughts!

2016-10-12 15:08 GMT+03:00 Emre Hasegeli :
> However then the stemmer doesn't do a good job on those words, because
> the changed characters are important for the language.  What I really
> needed was something like this:
>
>> ALTER TEXT SEARCH CONFIGURATION turkish
>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, 
>> hword_part
>> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND 
>> unaccent);

Your syntax looks more flexible and prettier than with JOIN option. As
I understand there are three steps here. On each step a dictionary
return a lexeme and pass it to next dictionary. If dictionary return
NULL then the processing will interrupt.

With such syntax we also don't need the TSL_FILTER flag for lexeme. At
the current time unaccent extension set this flag to pass a lexeme to
a next dictionary. This flag is used by the text-search parser. It
looks like a hard coded solution. User can't change this behaviour.

Maybe also better to use -> instead of AND? AND would has another
behaviour. I could create the following configuration:

=> ALTER TEXT SEARCH CONFIGURATION multi_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH (german_ispell AND english_ispell) OR simple;

which will return both german_ispell and english_ispell results. But
I'm not sure that this is a good solution.

Of course if this syntax will be implemented, old syntax with commas
also should be maintained.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-12 Thread Emre Hasegeli
> => ALTER TEXT SEARCH CONFIGURATION multi_conf
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH german_ispell (JOIN), english_ispell, simple;

I have something like this in my mind since I dealt with FTS for a
Turkish real estate listing application.  Being able to pipe output of
some dictionaries is a nice feature we have since 9.0, but it is not
always sufficient.  I think it is wrong to decide this per dictionary
bases.  Something slightly more complicated to connect dictionaries
parallel or serial to each other might be more useful.

My problem was related to the special characters on Turkish (ç, ğ, ı,
ö, ü).  It is very common to just type 7-bit-close-looking characters
(c, g, i, o, u) instead of those.  Unaccent extension changes them as
desired, and passes the altered words to the subsequent dictionary,
when this configuration is changed like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR word, hword, hword_part
> WITH unaccent, turkish_stem;

However then the stemmer doesn't do a good job on those words, because
the changed characters are important for the language.  What I really
needed was something like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, 
> hword_part
> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND 
> unaccent);


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FTS Configuration option

2016-10-10 Thread Artur Zakirov
Hello hackers,

Sometimes there is a need to collect lexems from various dictionaries.
For example, if we have a column with text in various languages.

Let's say there is a new option JOIN. This option will allow to parser
to append lexems from current dictionary and go to next dictionary to
get another lexems:

=> CREATE TEXT SEARCH CONFIGURATION multi_conf (COPY=simple);
=> ALTER TEXT SEARCH CONFIGURATION multi_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH german_ispell (JOIN), english_ispell, simple;

Here there are the following cases:
- found lexem in german_ispell, didn't found lexem in english_ispell.
Return lexem from german_ispell.
- didn't found lexem in german_ispell, found lexem in english_ispell.
Return lexem from english_ispell.
- didn't found lexems in dictionaries. Return lexem from simple.
- found lexems in both dictionaries. Return lexems from both.

Could be such option is useful to the community? Name of the option is
debatable.

Thank you!

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FTS Configuration option

2016-10-10 Thread Artur Zakirov

Hello hackers,

Sometimes there is a need to collect lexems from various dictionaries.
For example, if we have a column with text in various languages.

Let's say there is a new option JOIN. This option will allow to the 
parser to append lexems from the current dictionary and go to the next 
dictionary:


=> CREATE TEXT SEARCH CONFIGURATION multi_conf (COPY=simple);
=> ALTER TEXT SEARCH CONFIGURATION multi_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH german_ispell (JOIN), english_ispell, simple;

Here there are the following cases:
- found lexem in german_ispell, didn't found lexem in english_ispell.
Return lexem from german_ispell.
- didn't found lexem in german_ispell, found lexem in english_ispell.
Return lexem from english_ispell.
- didn't found lexems in dictionaries. Return lexem from simple.
- found lexems in both dictionaries. Return lexems from both.

Could be such option is useful to the community? Name of the option is
debatable.

Thank you!

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers