really poor search performance

cfalletta Thu, 29 Jan 2015 10:51:43 -0800

Hi,

I’m using jackrabbit in a production environment where there are, for the
moment, 350.000 documents, mostly pdf’s. 
I don’t understand why it takes a few minutes to execute a search query for
so little documents… As of the following month, I’ll need to handle a flow
of 1Million documents a month, so I’d better improve the performance
already…


I’m using SQL2 to query the repository, and my request is really simple : 
select * from [nt:resource]. With a limit of 20.
Since there is no “count(*)” functionality in jackrabbit, I have to run the
query a second time without the limit to count the total size. Weirdly, it
seems to take less time than the first query (maybe because of the caching
mechanism).

I can’t optimize the query obviously, so either there is something wrong
with my configuration or the way I store the documents and index them. I
can’t imagine Jackrabbit is not capable of handling that little documents.

Of course, I’m looking on it for quite some time, and here are a few
information that may help solve the problem :
-       I’m storing up to 200 resource nodes (files) on a folder. Thus, my path
looks like /ATTACHMENT/2014/01/01/{someFolderUUID}/{theFile}. The reason of
this was that the database size was growing exponentially when storing
thousands of files in the same folder… 
-       I’m using MIX_REFERENCEABLE and MIX_VERSIONABLE when I store my 
documents,
and the backend opens/closes a session after each operation. However, for
the moment not so many people use it.
-       Not indexing the content (disabling tika parsers) doesn’t seem to change
the performance
-       I’m using a postgre database, and a LocalFileStore
-       It’s not the process that takes time but jackrabbit (I saw the query
execution time on jackrabbit logging)

Do you have an idea of why it is so slow or any lead on this ? 

Thanks !
Cédric




--
View this message in context: 
http://jackrabbit.510166.n4.nabble.com/really-poor-search-performance-tp4661920.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

really poor search performance

Reply via email to