Re: Problem with full text search on PDFs

Alexander Klimetschek Tue, 13 Jul 2010 10:50:53 -0700

On Tue, Jul 13, 2010 at 15:53, Julia Dain <[email protected]> wrote:
> I have got a problem with Jackrabbit 2.1.0 and full text search on PDFs.
>
> I have created a repository containing several plain text and PDF documents
> using the Java APIs. I am able to use the Java API to perform full text
> search on the text documents, but not the PDFs.
>
> When I use the CLI to the standalone server to execute this query
>
> [/] > xpathquery "//element(*, nt:file)[jcr:contains(jcr:content,
> \'*Typographical*\')]"
>
> the result is 11 file nodes, correctly. But with the Java API and code:
>
> String sql = "SELECT * FROM [nt:resource] AS resource WHERE
> CONTAINS(resource.*, '%Typographical%')";
> Query query = queryManager.createQuery(sql, Query.JCR_SQL2);
>
> the result is no nodes returned.  Thanks for any help on this.


The xpath query from above would be like this in jcr-sql2, I think:

SELECT * FROM [nt:file] WHERE CONTAINS(., 'Typographical')

The % is wrong here, for both jcr:contains() in xpath and CONTAINS()
in JCR-SQL2. It only applies to jcr:like/LIKE. contains does a full
text search anyway, so wildcards are mostly not needed, and that
string is the same for all query languages. If you need a wildcard, it
would be "*" (as in your xpath example).

See also earlier messages in this list, for example:
http://jackrabbit.markmail.org/thread/qbguxb3wn2in4sew

Side-note: I don't know how to exactly specify the node-scoped full
text index in JCR-SQL2, so the query might also be like this:

SELECT * FROM [nt:file] AS file WHERE CONTAINS(file.*, 'Typographical')
SELECT * FROM [nt:file] AS file WHERE CONTAINS(file, 'Typographical')

Regards,
Alex

-- 
Alexander Klimetschek
[email protected]

Re: Problem with full text search on PDFs

Reply via email to