Re: [Neo4j] Lucene result sorting

Mattias Persson Wed, 20 Oct 2010 01:43:34 -0700

2010/10/20 Balazs E. Pataki <pat...@dsd.sztaki.hu>

> Yes, the idea is to overcome the limitations of Lucene sorting. The
> current solution I use is to get the IndexHits from LuceneIndex and then
> sort the neo4j Node's by their properties. But this requires loading all
> nodes in the hit list. Rather than doing this, sorting the Lucene
> Documents, which need to be loaded anyway, and then only "converting"
> the necessary Documents to Nodes seems more efficient to me.
>


Absolutely, I agree


> ---
> balazs
>
> On 10/20/10 10:09 AM, Mattias Persson wrote:
> > 2010/10/20 Balazs E. Pataki<pat...@dsd.sztaki.hu>
> >
> >> Hi Andrés,
> >>
> >> I just quickly read through the code and have an idea for an additional
> >> sorting solution via QueryContext: a user provided sorter, which is
> >> invoked right after lucene search has been executed, but before the
> >> lucene results are turned into neo4j Nodes. This would give developers
> >> the option to sort lucene Documents according to their fields using
> >> whatever sorting method they want. The code below is just theoretical, I
> >> haven't tried it yet, but would require just minimal additions to the
> >> current LucenIndex#search() method and to QueryContext.
> >>
> >> This is the current search() method in  LuceneIndex.java:
> >>
> >> private SearchResult search( IndexSearcherRef searcher, Query query,
> >>      QueryContext additionalParametersOrNull )
> >> {
> >>    try
> >>    {
> >>      searcher.incRef();
> >>      Sort sorting = additionalParametersOrNull != null ?
> >>          additionalParametersOrNull.sorting : null;
> >>      Hits hits = new Hits( searcher.getSearcher(), query, null, sorting
> );
> >>      return new SearchResult( new HitsIterator( hits ), hits.length() );
> >>    }
> >>    catch ( IOException e )
> >>    {
> >>      throw new RuntimeException( "Unable to query " + this + " with "
> >>                    + query, e );
> >>    }
> >> }
> >>
> >> I would add two things: a SortingIterator interface and a
> >> sortingIterator field to QueryContext:
> >>
> >> public interface SortingIterator extends Iterator<Document>{
> >>    public void setIterator(Iterator<Document>  hitIterator);
> >>    public itn length();
> >> }
> >>
> >> public class QueryContext
> >> {
> >>    final Object queryOrQueryObject;
> >>    Sort sorting;
> >>    SortingIterator sortingIterator;
> >>    Operator defaultOperator;
> >>    boolean tradeCorrectnessForSpeed;
> >>    ...
> >> }
> >>
> >> And would add these to search(): if "sorting" is available, it would be
> >> passed as usual to the constructor of Hits(), and then if a
> >> "sortingIterator" is set in the QueryContext we could pass it the
> >> original HitsIterator (via setIterator()) and access the results via the
> >> "sortingIterator" rather than the HitsIterator directly:
> >>
> >>
> >> private SearchResult search( IndexSearcherRef searcher, Query query,
> >>      QueryContext additionalParametersOrNull )
> >> {
> >>    try
> >>    {
> >>      searcher.incRef();
> >>      Sort sorting = additionalParametersOrNull != null ?
> >>          additionalParametersOrNull.sorting : null;
> >>      SortingIterator sortingIterator = additionalParametersOrNull !=
> null ?
> >>          additionalParametersOrNull.sortingIterator : null;
> >>      Hits hits = new Hits( searcher.getSearcher(), query, null, sorting
> );
> >>      Iterator<Document>  hitIterator =  new HitsIterator( hits );
> >>      int hitLength = hits.length();
> >>      if (sortingIterator != null) {
> >>        hitIterator = sortingIterator.setIterator(hitIterator);
> >>        hitLength = sortingIterator.length();
> >>      }
> >>      return new SearchResult( hitIterator, hitLength );
> >>    }
> >>    catch ( IOException e )
> >>    {
> >>      throw new RuntimeException( "Unable to query " + this + " with "
> >>                    + query, e );
> >>    }
> >> }
> >>
> >> The SortingIterator could fetch the lucene Documents via the iterator
> >> passed in setIterator(), possibly fetching the Documents and sorting
> >> using whatever method it wants based on whichever fields are available
> >> in the Documents. Then would provide the sorted result back via next().
> >> Also this intermediate iterator could do other things with the result,
> >> eg. remove some Documents, etc. (that's why it provides its own length()
> >> for the result list)
> >>
> >> Do you think such solution is feasible?
> >>
> >
> > Sure, something like that could be implemented. The point would be to be
> > able to control the order yourself, I assume? Because I'm guessing it'd
> be
> > hard to implement something that would be more efficient than the
> internal
> > Sort stuff in Lucene, however the options are quite limited there... you
> can
> > just specify which keys to sort on and the order is always the natural
> > lexical order, I think.
> >
> > Good input.
> >
> >
> >>
> >> Regards,
> >> ---
> >> balazs
> >>
> >>
> >> On 10/20/10 8:24 AM, Balazs E. Pataki wrote:
> >>> Hi Andrés,
> >>>
> >>> Thanks for the answer, looks cool :-)
> >>>
> >>> I give it a try immediately!
> >>>
> >>> Regards,
> >>> ---
> >>> balazs
> >>>
> >>>
> >>>
> >>> On 10/19/10 8:37 PM, Mattias Persson wrote:
> >>>> 2010/10/19 Andres Taylor<andres.tay...@neotechnology.com>
> >>>>
> >>>>> Hi Balazs,
> >>>>>
> >>>>> We've been working on a new lucene-index module just these last days.
> >> The
> >>>>> new index module allows sorting, through the QueryContext-class. You
> >> can
> >>>>> look in svn<https://svn.neo4j.org/components/lucene-index/trunk/>,
> if
> >> you
> >>>>> are so inclined, or wait for the next milestone release (Thursday).
> >>>>>
> >>>>
> >>>> Exactly, an example could be:
> >>>>
> >>>>       Index<Node>    myNodeIndex = ...
> >>>>       for ( Node hit : myNodeIndex.query(
> >>>>          new QueryContext( "name:Balazs" ).sort( "name" ) ) ) {
> >>>>              System.out.println( hit.getProperty( "name" ) );
> >>>>       }
> >>>>
> >>>>
> >>>>> HTH,
> >>>>>
> >>>>> Andrés
> >>>>>
> >>>>> On Tue, Oct 19, 2010 at 5:41 PM, Balazs E. Pataki<
> pat...@dsd.sztaki.hu
> >>>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Is it possible to do get sorted results form LuceneIndex#query()?
> >>>>>>
> >>>>>> It would be really helpful if results would be sorted at "lucene
> time"
> >>>>>> according to one or more indexed fields rather than loading the
> actual
> >>>>>> neo4j nodes and than iterating over them for sorting.
> >>>>>>
> >>>>>> Currently, it seems that sorting is not supported by LuceneIndex,
> but
> >>>>>> are there plans regarding this?
> >>>>>>
> >>>>>> Thanks for any hints,
> >>>>>> ---
> >>>>>> balazs
> >>>>>> _______________________________________________
> >>>>>> Neo4j mailing list
> >>>>>> User@lists.neo4j.org
> >>>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Neo4j mailing list
> >>>>> User@lists.neo4j.org
> >>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>
> >>>>
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Neo4j mailing list
> >>> User@lists.neo4j.org
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >
> >
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Lucene result sorting

Reply via email to