Hitoshi,

there is no problem to write n-gram dictionary for tsearch2 ! The problem
is in how to define word boundary.

Oleg

On Sat, 26 May 2007, Hitoshi Harada wrote:

FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
n-gram index would be more universal for asian languages.
Yeah, I know, but in tsearch2 for japanese sample you must use external
morphological analysis libraries to separate words. It is powerful but I
need more "lightweight" approach. Also especially when you search for
non-document(such like titles, names, or pattern in the genome), the
approach above is not so useful.

As I mentioned, GIN is also powerful for array data type search, so I am
very expecting it will have additional information.

Anyway, thanks a lot for much information. I try to read it.

Regards,

Hitoshi Harada

-----Original Message-----
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 26, 2007 10:12 PM
To: Hitoshi Harada
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Why not keeping positions in GIN?

On Fri, 25 May 2007, Hitoshi Harada wrote:

Hi,

I was walking through GIN am source code these days, and found that it
has
only posting lists but no positions related those.

The reason I was doing that is, to try to implement n-gram text search
index
on GIN for myself. As you know Japanese is not like English or other
European languages. If you write Japanese (or other 'not separated')
text
index by n-gram, it should have entry positions on the entry as well as
the
posting lists, because you must know if each split query key are joined
with
each other in the data. To know this, position must be there.

FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
n-gram index would be more universal for asian languages.


It's not only about Japanese. When you search "phrase" for text in
English,
the same logic above will be needed. I don't research about tsearch2 but
is
there any problem?? Also, in some case int-array inverted index needs
the
entry positions as well, I guess. Obtaining positions with posting lists
is
"general" enough for GIN, isn't it?

Is there any future plan around it?

Yes, we do have plans. See our todo,
http://www.sai.msu.su/~megera/wiki/todo
You may read also FTSBOOK, http://www.sai.msu.su/~megera/postgres/fts/doc
and slides from PGCon2007,
http://www.sai.msu.su/~megera/postgres/talks/fts-pgcon2007.pdf


Regards,

Hitoshi Harada



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to