Hi Matthew. This is very helpful. I have something currently working
with hits object but I was a bit concerned that I may have been
duplicating some existing functionality. I appreciate what you have
found out about the original method and also the implementation approach
for BitSet.
I am also just getting into sorting and filtering myself after only a
short while with pyLucene.
I am also curious about best way of bringing data together from remote
sources. I will have separate indexes on various servers but also need a
central set of indexes for the consolidation of this information. I
guess this is a likely scenario for other folks as well. I was thinking
maybe a simple web service is best to request the data to the server for
indexing but most data is 10K to 100K records on each so these will be
significant files and chew up much bandwidth. I don't know if there is a
better way yet. Lucene is interesting software.
Many thanks for your help.
Regards,
David
Matthew O'Connor wrote:
David,
I was able to find the Java PageFilter source. It seems
that the Java code snippet you quoted came from this
article:
http://www.sys-con.com/read/37296.htm
The article provides code samples linked from the bottom of
the article:
http://res.sys-con.com/story/37296/Walls0712.zip
The Java PageFilter code is pretty short:
import java.io.IOException;
import java.util.BitSet;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Filter;
public class PageFilter extends Filter {
private int start;
private int end;
public PageFilter(int pageNum, int pageSize) {
start = pageNum * pageSize;
end = (pageNum+1) * pageSize;
}
public BitSet bits(IndexReader reader) throws IOException {
BitSet result = new BitSet(reader.maxDoc());
for(int i=start; (i<end) && (i<result.size()); i++) {
result.set(i);
}
return result;
}
}
You can implement this in PyLucene like this:
import PyLucene
class PageFilter(object):
def __init__(self, page_num=0, size=10):
self.start = page_num * size
self.end = self.start + size
def bits(self, reader):
results = PyLucene.BitSet(reader.maxDoc())
for i in xrange(self.start, min(self.end, results.size())):
results.set(i)
return results
Then you can do what you originally tried:
hits = searcher.search(query, PageFilter(1, 20))
HOWEVER, the PageFilter code in Java doesn't work right and
neither does the PageFilter code in Python. As far as I can
tell this is because the article's author made a mistake.
There's a comment on the article that shows Java Lucene
users haven't been able to get the PageFilter example to
work either:
http://www.sys-con.com/read/37296_f.htm
I'm not very experienced with Filters in Lucene (just
started with Lucene, via PyLucene, a few weeks ago).
However, reading the Java Lucene documentation it appears
that the author's strategy isn't going to work right. You
can read the Java Lucene doc's yourself:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Filter.html
You can read about extending Java Lucene objects from
PyLucene in the README, here:
http://svn.osafoundation.org/pylucene/trunk/README
Look for the section called "'Extending' Java classes from
Python". And you can see an example here:
http://svn.osafoundation.org/pylucene/trunk/samples/LuceneInAction/lia/extsearch/filters/SpecialsFilter.py
FWIW, the logic for pagination is sufficiently simple that
I'd probably just apply it directly to the hits object. You
should be able to figure that out from the examples above.
Hope that helps.
-matthew
David Pratt [EMAIL PROTECTED] said:
Hi. I've read a few things about paging functionality for the searcher.
I have already rolled my own in the meantime for batching and paging but
still wondering if this functionality already exists somewhere that I am
just unaware of. I am providing a start position and calculating an end
position for xrange based on hits.length() to keep the end position
within the range of results. In any case, I read:
Hits hits = searcher.search(query, new PageFilter(1,20));
In another, this version:
hits = searcher.search(query, 0, 10);
I could not locate a PageFilter method in the java docs and the second
method throws an exception.
Regards,
David
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev