Re: Query Performance and Optimization

2007-03-19 Thread Marcel Reutegger
Christoph Kiehl wrote: Marcel Reutegger wrote: Wouldn't it make sense to rewrite all @foo:bar!='john' queries to not(@foo:bar!='john') by default instead of using creating a MatchAllQuery? do you mean rewrite: @foo:bar!='john' to not(@foo:bar='john') ? Yes, of course. My mistake. Do you th

Re: Query Performance and Optimization

2007-03-16 Thread Christoph Kiehl
Marcel Reutegger wrote: Wouldn't it make sense to rewrite all @foo:bar!='john' queries to not(@foo:bar!='john') by default instead of using creating a MatchAllQuery? do you mean rewrite: @foo:bar!='john' to not(@foo:bar='john') ? Yes, of course. My mistake. Do you think that's an option? C

Re: Query Performance and Optimization

2007-03-16 Thread Marcel Reutegger
Christoph Kiehl wrote: As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the contextHits are used to filter the subHits result to only include nodes of the given context. The context is something like /foo/bar//*, which means all descendents of /foo/bar. Is that right? yes,

Re: Query Performance and Optimization

2007-03-14 Thread David Johnson
Both of these proposals sound great - particularly the additional caching in DescendantSelfAxisQuery. I think this would address the scenario that I suggested additional indexing earlier in this thread. As I mentioned, in my query test set DescendantSelfAxisQuery.DescendantSelfAxisScorer.next()

Re: Query Performance and Optimization

2007-03-14 Thread Christoph Kiehl
Marcel Reutegger wrote: Christoph Kiehl wrote: I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791 Are you working on this issue? Or should I try to implement something? I just started working on it ;) Great news ;) Now that you are working on implementing this cache o

Re: Query Performance and Optimization

2007-03-14 Thread Marcel Reutegger
Hi Christoph, Christoph Kiehl wrote: I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791 Are you working on this issue? Or should I try to implement something? I just started working on it ;) Actually it's /foo/[EMAIL PROTECTED]:bar!='john'] ah, yes. that makes sense.

Re: Query Performance and Optimization

2007-03-14 Thread Christoph Kiehl
Marcel Reutegger wrote: Christoph Kiehl wrote: Christoph Kiehl wrote: I was digging a bit into Jackrabbit today and found another place where some caching did provide a substantial performance gain to queries which check one attribute for more than one value (like /foo/[EMAIL PROTECTED]:bar=

Re: Query Performance and Optimization

2007-03-14 Thread Marcel Reutegger
Christoph Kiehl wrote: Christoph Kiehl wrote: I was digging a bit into Jackrabbit today and found another place where some caching did provide a substantial performance gain to queries which check one attribute for more than one value (like /foo/[EMAIL PROTECTED]:bar='john' or foo:bar='doe'])

Re: Query Performance and Optimization

2007-03-14 Thread Marcel Reutegger
David Johnson wrote: While I can see how my suggested optimization could severely impact some use cases. Nevertheless, "our use case" :-) is mostly querying a stable hierarchy structure - i.e., we rarely, if ever, would move a tree with even 1000s of sub-nodes (famous last words). And we use t

Re: Query Performance and Optimization

2007-03-14 Thread Christoph Kiehl
David Johnson wrote: Do you have a patch for the file, I would love to check it out and run it against my query suite. For the eager I uploaded a quick an dirty patch for the calculateDocFilter() caching: http://download.yousendit.com/C4FC14DA01183678 I'll of course provide a complete patc

Re: Query Performance and Optimization

2007-03-13 Thread David Johnson
Do you have a patch for the file, I would love to check it out and run it against my query suite. -Dave On 3/13/07, Christoph Kiehl <[EMAIL PROTECTED]> wrote: Christoph Kiehl wrote: > I was digging a bit into Jackrabbit today and found another place where > some caching did provide a substant

Re: Query Performance and Optimization

2007-03-13 Thread Christoph Kiehl
Christoph Kiehl wrote: I was digging a bit into Jackrabbit today and found another place where some caching did provide a substantial performance gain to queries which check one attribute for more than one value (like /foo/[EMAIL PROTECTED]:bar='john' or foo:bar='doe']). The BitSet in calculat

Re: Query Performance and Optimization

2007-03-13 Thread Christoph Kiehl
David Johnson wrote: Out of the Jackrabbit code, DescendantSelfAxisQuery.DescendantSelfAxisScorer.next() is now taking the most time while executing my query suite - taking 68% of the time, within it, calls to DescendantSelfAxisQuery.DescendantSelfAxisScorer.calculateSubHits() taking the majorit

Re: Query Performance and Optimization

2007-03-13 Thread David Johnson
DescendantSelfAxisQuery is now taking the most time in the profiling that I have recently done. From my earlier post: Out of the Jackrabbit code, DescendantSelfAxisQuery.DescendantSelfAxisScorer.next() is now taking the most time while executing my query suite - taking 68% of the time, within

Re: Query Performance and Optimization

2007-03-13 Thread Marcel Reutegger
well, the problem with that approach is the following: assume you have a tree of nodes under /a, let's say 10 million nodes. then a user renames /a to /b. the index would have to re-index 10 million nodes. this operation is currently very efficient and takes just a couple of milliseconds, beca

Re: Query Performance and Optimization

2007-03-12 Thread Michael Neale
Yeah I would +1 to that, its something I do fairly often (there is often a lot of info in a path that is relevant to a query - given that we have gone ahead and nicely partitioned our content !). On 3/13/07, David Johnson <[EMAIL PROTECTED]> wrote: As another example, for each node, perhaps eve

Re: Query Performance and Optimization

2007-03-12 Thread David Johnson
As another example, for each node, perhaps every potential parent path could be added to the index - as an example a node at /a/b/c/d/e/f/g would have index entries: path1: /a path2: /a/b path3: /a/b/c path4: /a/b/c/d path5: /a/b/c/d/e path6: /a/b/c/d/e/f so queries for specific sub-paths - e.g.

Re: Query Performance and Optimization

2007-03-12 Thread David Johnson
Done: https://issues.apache.org/jira/browse/JCR-787 I did file it as a bug - as it really is an incorrect implementation (i.e., missing implementation) of equals and hashcode. -Dave On 3/12/07, Jukka Zitting <[EMAIL PROTECTED]> wrote: Hi, On 3/10/07, David Johnson <[EMAIL PROTECTED]> wrote:

Re: Query Performance and Optimization

2007-03-12 Thread Marcel Reutegger
David Johnson wrote: I think I was again focusing on range queries and giving Lucene some way of filtering out subsets of the document set, so that the whole document set wouldn't have to be walked. For the date range query the from and to dates would most likely share some set of most significa

Re: Query Performance and Optimization

2007-03-12 Thread Jukka Zitting
Hi, On 3/10/07, David Johnson <[EMAIL PROTECTED]> wrote: Will making an associated JIRA issue speed the inclusion of the change? From my understanding it is fixing a real bug. I'm currently not planning to cut any more releases from the 1.2 branch, as I'd like to focus on releasing 1.3 from sv

Re: Query Performance and Optimization

2007-03-11 Thread David Johnson
On 3/9/07, David Johnson <[EMAIL PROTECTED]> wrote: -- snip -- yes, this should ensure that caching in lucene is used wherever possible. > Even > though there might be bugs that prevent this. Just like this one: > > http://svn.apache.org/viewvc?view=rev&revision=506908 > > which prevented the

Re: Query Performance and Optimization

2007-03-09 Thread David Johnson
Will making an associated JIRA issue speed the inclusion of the change? From my understanding it is fixing a real bug. I can create an issue if that will bring it into a release faster. -Dave On 3/9/07, Jukka Zitting <[EMAIL PROTECTED]> wrote: Hi, On 3/9/07, David Johnson <[EMAIL PROTECTED

Re: Query Performance and Optimization

2007-03-09 Thread Jukka Zitting
Hi, On 3/9/07, David Johnson <[EMAIL PROTECTED]> wrote: > Even though there might be bugs that prevent this. Just like this one: > > http://svn.apache.org/viewvc?view=rev&revision=506908 > > which prevented the re-use of SharedFiledSortComparator even if nothing > changed between two query execu

Re: Query Performance and Optimization

2007-03-09 Thread David Johnson
-- snip -- yes, this should ensure that caching in lucene is used wherever possible. Even though there might be bugs that prevent this. Just like this one: http://svn.apache.org/viewvc?view=rev&revision=506908 which prevented the re-use of SharedFiledSortComparator even if nothing changed betw

Re: Query Performance and Optimization

2007-03-09 Thread Marcel Reutegger
David Johnson wrote: In my last tests, I think I have done this - through parameters in the repository.xml file and recreating the entire repository. Nevertheless, I did not see that significant of a speed change in query response. Perhaps I wasn't using a small enough resultFetchSize (128)?

Re: Query Performance and Optimization

2007-03-07 Thread David Johnson
On 3/6/07, Marcel Reutegger <[EMAIL PROTECTED]> wrote: Hi David, David Johnson wrote: > Yes, I am using Jackrabbit 1.2.x and I am not seeing that dramatic of a > difference between 1.1.x and the 1.2.x, although I have not done a direct > comparison between the two with the same query suite. pl

Re: Query Performance and Optimization

2007-03-06 Thread Marcel Reutegger
Hi David, David Johnson wrote: Yes, I am using Jackrabbit 1.2.x and I am not seeing that dramatic of a difference between 1.1.x and the 1.2.x, although I have not done a direct comparison between the two with the same query suite. please note that you have to change the configuration to get th

Re: Query Performance and Optimization

2007-03-02 Thread David Johnson
Hi Jukka, Thanks for the reply. Yes, I am using Jackrabbit 1.2.x and I am not seeing that dramatic of a difference between 1.1.x and the 1.2.x, although I have not done a direct comparison between the two with the same query suite. It looks like adding ordering and or large range queries can si

Re: Query Performance and Optimization

2007-03-02 Thread Jukka Zitting
Hi,, On 2/28/07, David Johnson <[EMAIL PROTECTED]> wrote: "select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and status <> 'hidden' order by publishDate desc" takes 500 ms to execute - this is just the execution time, I am not actually using or accessing the NodeIterator.

Re: Query Performance and Optimization

2007-03-01 Thread David Johnson
Any pointers and thoughts from the developers who have worked on the LuceneQueryBuilder would be very appreciated. As an idea, I was thinking of running the Query AST through an optimization before it is passed the the query builder. Perhaps in org.apache.jackrabbit.core.query.lucene.QueryImpl.e

Re: Query Performance and Optimization

2007-03-01 Thread Christoph Kiehl
David Johnson wrote: Digging into the internals of Jackrabbit, we have noticed that there is an implementation of RangeQuery that essentially walks the results if the # of query terms is greater than what Lucene can handle. Reading the Lucene documentation, it looks like Filters are the recomme