Hi, I’m using jackrabbit in a production environment where there are, for the moment, 350.000 documents, mostly pdf’s. I don’t understand why it takes a few minutes to execute a search query for so little documents… As of the following month, I’ll need to handle a flow of 1Million documents a month, so I’d better improve the performance already…
I’m using SQL2 to query the repository, and my request is really simple : select * from [nt:resource]. With a limit of 20. Since there is no “count(*)” functionality in jackrabbit, I have to run the query a second time without the limit to count the total size. Weirdly, it seems to take less time than the first query (maybe because of the caching mechanism). I can’t optimize the query obviously, so either there is something wrong with my configuration or the way I store the documents and index them. I can’t imagine Jackrabbit is not capable of handling that little documents. Of course, I’m looking on it for quite some time, and here are a few information that may help solve the problem : - I’m storing up to 200 resource nodes (files) on a folder. Thus, my path looks like /ATTACHMENT/2014/01/01/{someFolderUUID}/{theFile}. The reason of this was that the database size was growing exponentially when storing thousands of files in the same folder… - I’m using MIX_REFERENCEABLE and MIX_VERSIONABLE when I store my documents, and the backend opens/closes a session after each operation. However, for the moment not so many people use it. - Not indexing the content (disabling tika parsers) doesn’t seem to change the performance - I’m using a postgre database, and a LocalFileStore - It’s not the process that takes time but jackrabbit (I saw the query execution time on jackrabbit logging) Do you have an idea of why it is so slow or any lead on this ? Thanks ! Cédric -- View this message in context: http://jackrabbit.510166.n4.nabble.com/really-poor-search-performance-tp4661920.html Sent from the Jackrabbit - Users mailing list archive at Nabble.com.