Re: [HACKERS] english parser in text search: support for multiple words in the same position

2011-01-06 Thread Sushant Sinha
Do not know if this mail got lost in between or no one noticed it! On Thu, 2010-12-23 at 11:05 +0530, Sushant Sinha wrote: Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-12-22 Thread Sushant Sinha
Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ sorry for not responding on this sooner, it's been hectic the last couple weeks ] Sushant Sinha sushant...@gmail.com writes:

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-10-03 Thread Tom Lane
[ sorry for not responding on this sooner, it's been hectic the last couple weeks ] Sushant Sinha sushant...@gmail.com writes: I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can already do. Why didn't you

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-29 Thread Robert Haas
On Wed, Sep 29, 2010 at 1:29 AM, Sushant Sinha sushant...@gmail.com wrote: Any updates on this? On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha sushant...@gmail.com wrote: I looked at this patch a bit.  I'm fairly unhappy that it seems to be inventing a brand new mechanism to do

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-28 Thread Sushant Sinha
Any updates on this? On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha sushant...@gmail.comwrote: I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can already do. Why didn't you code the url-part mechanism

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-21 Thread Sushant Sinha
I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can already do. Why didn't you code the url-part mechanism using the existing support for compound words? I am not familiar with compound word implementation

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-19 Thread Tom Lane
Sushant Sinha sushant...@gmail.com writes: For the headline generation to work properly, email/file/url/host need to become skip tokens. Updating the patch with that change. I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-08 Thread Sushant Sinha
For the headline generation to work properly, email/file/url/host need to become skip tokens. Updating the patch with that change. -Sushant. On Sat, 2010-09-04 at 13:25 +0530, Sushant Sinha wrote: Updating the patch with emitting parttoken and registering it with snowball config. -Sushant.

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-04 Thread Sushant Sinha
Updating the patch with emitting parttoken and registering it with snowball config. -Sushant. On Fri, 2010-09-03 at 09:44 -0400, Robert Haas wrote: On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote: I have attached a patch that emits parts of a host token, a url token,

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-03 Thread Robert Haas
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote: I have attached a patch that emits parts of a host token, a url token, an email token and a file token. Further, it makes sure that a host/url/email/file token and the first part-token are at the same position in

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-01 Thread Sushant Sinha
I have attached a patch that emits parts of a host token, a url token, an email token and a file token. Further, it makes sure that a host/url/email/file token and the first part-token are at the same position in tsvector. The two major changes are: 1. Tokenization changes: The patch exploits

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Markus Wanner
Hi, On 08/01/2010 08:04 PM, Sushant Sinha wrote: 1. We do not have separate tokens wikipedia and org 2. If we have the two tokens we should have them at adjacent position so that a phrase search for wikipedia org should work. This would needlessly increase the number of tokens. Instead you'd

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On 08/01/2010 08:04 PM, Sushant Sinha wrote: 1. We do not have separate tokens wikipedia and org 2. If we have the two tokens we should have them at adjacent position so that a phrase search for wikipedia org should work. This would needlessly increase the number of tokens. Instead you'd

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Markus Wanner
Hi, On 08/02/2010 03:12 PM, Sushant Sinha wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. Well, I think I simply turned that off to be able to search for plain words. It still works for complete URLs, those are just treated

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha sushant...@gmail.com wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. I am only asking for adding of normal english words as well so that if someone types only wikipedia he gets a

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote: On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha sushant...@gmail.com wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. I am only asking for adding of normal english words

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Tom Lane
Sushant Sinha sushant...@gmail.com writes: This would needlessly increase the number of tokens. Instead you'd better make it work like compound word support, having just wikipedia and org as tokens. The current text parser already returns url and url_path. That already increases the number

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Kevin Grittner
Sushant Sinha sushant...@gmail.com wrote: Yes thats what I am planning to do. I just wanted to see if anyone can help me in estimating whether this is doable in the current parser or I need to write a new one. If possible, then some idea on how to go about implementing? The current tsearch

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 10:21 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sushant Sinha sushant...@gmail.com wrote: Yes thats what I am planning to do. I just wanted to see if anyone can help me in estimating whether this is doable in the current parser or I need to write a new one.

[HACKERS] english parser in text search: support for multiple words in the same position

2010-08-01 Thread Sushant Sinha
Currently the english parser in text search does not support multiple words in the same position. Consider a word wikipedia.org. The text search would return a single token wikipedia.org. However if someone searches for wikipedia org then there will not be a match. There are two problems here: 1.