Re: [rt-users] Pg FTS query works, RT search returns 0 results ( was: '.' as delimiter/boundary breaks domain name searches)

2015-09-03 Thread Jeff Blaine
>> * An RT 4.2.12 search for the same string returns 0 results[2].
>>   I'm 98% certain the ticket *is* indexed though as other queries
>>   return it[2].

...

> Is the ticket in question a merged ticket?  You had a mail from the
> same timeframe which dealt with merged ticket FTS, which is a known
> bug: https://issues.bestpractical.com/Ticket/Display.html?id=9370
> Sadly, there's not good solution to that bug at this time.

I'm losing what little is left of my mind, apparently. I looked last
week at the problem ticket and could swear it had no merging in its
history. Looking today, I see it *does*.

> perl -MRT=-init -le 
> '$t=RT::Tickets->new(RT->SystemUser);$t->FromSQL("@ARGV");print 
> $t->BuildSelectQuery' \
> "Content LIKE 'f...@domain.com'"

Handy! Used this to generate the SQL. Pasted the SQL into psql for kicks
and got the unfortunately now-expected 0 results. Filed this snippet
away for later.

Thanks Alex.

Jeff


Re: [rt-users] Pg FTS query works, RT search returns 0 results ( was: '.' as delimiter/boundary breaks domain name searches)

2015-09-01 Thread Alex Vandiver
On Mon, Aug 31, 2015 at 09:39:19AM -0400, Jeff Blaine wrote:
> I'm reviving this one time in case anyone has further ideas.
> 
> * PostgreSQL 8.4.20 (RHEL 6.6) with FTS does the right
>   thing when parsing an email address[1]
> 
> * An RT 4.2.12 search for the same string returns 0 results[2].
>   I'm 98% certain the ticket *is* indexed though as other queries
>   return it[2].
> 
> * The rt-fulltext-indexer script runs every 10 minutes and has
>   no errors.

Is the ticket in question a merged ticket?  You had a mail from the
same timeframe which dealt with merged ticket FTS, which is a known
bug: https://issues.bestpractical.com/Ticket/Display.html?id=9370
Sadly, there's not good solution to that bug at this time.

Otherwise, seeing the query RT is generating, and stripping clauses
out of it until it matches, is likely your best bet.  You can see the
SQL generated from a TicketSQL query by running:

perl -MRT=-init -le 
'$t=RT::Tickets->new(RT->SystemUser);$t->FromSQL("@ARGV");print 
$t->BuildSelectQuery' \
"Content LIKE 'f...@domain.com'"

  - Alex


Re: [rt-users] Pg FTS query works, RT search returns 0 results ( was: '.' as delimiter/boundary breaks domain name searches)

2015-08-31 Thread Bill Cole

On 31 Aug 2015, at 9:39, Jeff Blaine wrote:


I'm reviving this one time in case anyone has further ideas.

* PostgreSQL 8.4.20 (RHEL 6.6) with FTS does the right
thing when parsing an email address[1]


I question that.

I know that with Pg 9.0 the FTS indexer does arguably "right" things in 
parsing email addresses, hostnames, and IP addresses BUT that behavior 
effectively hides the octets inside an IP, the host and domain name 
elements in a hostname, and the local and domain parts of an email 
address. For our business (customized hosting & related services) this 
made the indexing do more harm than good. We have adequate capacity (and 
suitable config) on our DB server to handle most of the searches we need 
to do in reasonable time without the indexing, so we disabled it.




[rt-users] Pg FTS query works, RT search returns 0 results ( was: '.' as delimiter/boundary breaks domain name searches)

2015-08-31 Thread Jeff Blaine
I'm reviving this one time in case anyone has further ideas.

* PostgreSQL 8.4.20 (RHEL 6.6) with FTS does the right
  thing when parsing an email address[1]

* An RT 4.2.12 search for the same string returns 0 results[2].
  I'm 98% certain the ticket *is* indexed though as other queries
  return it[2].

* The rt-fulltext-indexer script runs every 10 minutes and has
  no errors.

Jeff

Footnote 1:

rt4=# SELECT alias, description, token FROM ts_debug('f...@domain.com');
 alias |  description  | token
---+---+
 email | Email address | f...@domain.com
(1 row)

rt4=# SELECT alias, description, token FROM ts_debug('');
 alias |  description  | token
---+---+
 blank | Space symbols | <
 email | Email address | f...@domain.com
 blank | Space symbols | >
(3 rows)

rt4=#

Footnote 2:

Content LIKE 'f...@domain.com'

0 results

Content LIKE 'domain.com'

0 results

Content LIKE 'domain'

0 results

Content LIKE 'another unique string in the ticket I am trying to hit'

1 result

On 10/2/2014 11:08 AM, k...@rice.edu wrote:
> On Thu, Oct 02, 2014 at 10:56:56AM -0400, Kevin Falcone wrote:
>> On Wed, Oct 01, 2014 at 03:50:43PM -0400, Jeff Blaine wrote:
>>> [ Similar, but unrelated to my other message from 10 minutes ago. ]
>>>
>>> It appears any '.' is interpreted as a word boundary with
>>> Pg full-text indexing turned on.
>>>
>>> Is that known to be true, or am I wrong?
>>>
>>> This breaks searches for FQDNs names in ticket contents.
>>>
>>> Searching for 'foobar' will hit foobar.org
>>>
>>> Searching for 'foobar.org' will not hit 'foobar.org'
>>
>> What FTS will match/return is dictated by your database and its
>> configuration.
>>
>> Have you reviewed the Postgres full text search documentation for your
>> release of Pg?
>>
>> http://www.postgresql.org/docs/8.4/static/textsearch.html
>>
>> -kevin
> 
> 
> Wow! PostgreSQL 8.4, 4 major releases back! I cannot be certain that I
> am recalling this correctly, but the default parser in older versions
> of PostgreSQL did have that behavior. I do not know when they made the
> change to fix it. What do you get when you run:
> 
> rt3=# select plainto_tsquery('rice.edu');
>  plainto_tsquery 
> -
>  'rice.edu'
> (1 row)
> 
> I seem to recall that in the older version when I saw this issue, it
> returned:
> 
>  plainto_tsquery
> -
>  'rice' & 'edu'
> 
> You may be able to make a custom config for your text search using
> the definitions from the current release. I just ended up searching
> for 'rice' instead of 'rice.edu', for example.
> 
> Regards,
> Ken
> 

-- 
Jeff Blaine
kickflop.net
PGP/GnuPG Key ID: 0x0C8EDD02