If there are no filters, then LatLonDocValuesField is going to be asked to
sort all of your docs, which is obviously going to take awhile. Can you
simply add a filter? Like a distance filter using LatLonPoint?
On Thu, Jun 29, 2017 at 11:49 AM sc wrote:
> Hi,
>
>I have similar requirement o
Hi,
I have similar requirement of searching points within a radius of 50m.
Loaded 100M latlon, indexed/searching with LatLonDocValuesField. I am
testing it on my macbook pro.
I have used all Directory(RAM/FS/MMap) types but it takes 3-4 secs to do
search/sort to return of 5 points with in rad
Nice!
On Tue, Jun 13, 2017 at 11:12 PM Tom Hirschfeld
wrote:
> Hey All,
>
> I was able to solve my problem a few weeks ago and wanted to update you
> all. The root issue was with the caching mechanism in
> "makedistancevaluesource" method in the lucene spatial module, it appears
> that documents
Hey All,
I was able to solve my problem a few weeks ago and wanted to update you
all. The root issue was with the caching mechanism in
"makedistancevaluesource" method in the lucene spatial module, it appears
that documents were being pulled into the cache and not expired. To address
this issue, w
I know I'm late to this thread, but I saw this and specifically "reverse
geocoding" and it caught my attention. I recently did this on a public
project with Solr, which you may find of interest:
https://github.com/cga-harvard/hhypermap-bop/tree/master/enrich/solr-geo-admin
I'm super pleased with t
Hi,
Are you sure that the term index is the problem? Even with huge indexes you
never need 65 good of heap! That's impossible.
Are you sure that your problem is not something else?:
- too large heap? Heaps greater than 31 gigs are bad by default. Lucene needs
only few heap, although you have lar
That sounds like a fun amount of terms!
Note that Lucene does not load all terms into memory; only the "prefix
trie", stored as an FST (
http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html),
mapping term prefixes to on-disk blocks of terms. FSTs are very compact
data str
Is upgrading to Lucene 6 and using points rather than terms an option?
Points typically have lower memory usage (see GeoPoint which is based on
terms vs LatLonPoint which is based on points at
http://people.apache.org/~mikemccand/geobench.html#reader-heap).
Le jeu. 18 mai 2017 à 02:35, Tom Hirschf
ent: Monday 1st May 2017 12:33
> To: java-user@lucene.apache.org; solr-user
> Subject: RE: Term no longer matches if PositionLengthAttr is set to two
>
> Hello again, apologies for cross-posting and having to get back to this
> unsolved problem.
>
> Initially i thought this
Hello again, apologies for cross-posting and having to get back to this
unsolved problem.
Initially i thought this is a problem i have with, or in Lucene. Maybe not, so
is this problem in Solr? Is here anyone who has seen this problem before?
Many thanks,
Markus
-Original message-
> Fr
Hi,
I guess you are working with default techproducts.
can you try using the terms request handler:
query.setRequestHandler("terms")
Ahmet
On Friday, January 6, 2017 1:19 AM, huda barakat
wrote:
Thank you for fast reply, I add the query in the code but still not working:
Thank you for fast reply, I add the query in the code but still not working:
import java.util.List;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrR
Hi,
I think you are missing the main query parameter? q=*:*
By the way you may get more response in the sole-user mailing list.
Ahmet
On Wednesday, January 4, 2017 4:59 PM, huda barakat
wrote:
Please help me with this:
I have this code which return term frequency from techproducts example:
This the error I get it is the same:
Exception in thread "main" java.lang.NullPointerException
at solr_test.solr.SolrJTermsApplication.main(SolrJTermsApplication.java:30)
I know the object is null but I don't know why it is null??
when I change the query to this:
SolrQuery query = new SolrQue
the exception line does not match the code you pasted, but do make
sure your object actually not null before accessing its method.
On Thu, Nov 24, 2016 at 5:42 PM, huda barakat
wrote:
> I'm using SOLRJ to find term frequency for each term in a field, I wrote
> this code but it is not working:
>
>
On Sun, Dec 27, 2015 at 1:31 AM, Ishan Chattopadhyaya
wrote:
> I'm trying: DimensionalRangeQuery.new1DIntRange(fname, 1, true, 1, true);
Yes, that is the best way!
Remember that dimensional values are trunk only (to be Lucene 6.0,
hopefully soonish), and index file format is free to change on t
My Solr Deep Dive e-book has a whole chapter on the Solr term vector search
component, which is based on the Lucene term vector support.
It won't help you directly for Java coding, but the examples may help
illustrate what this feature can do.
See:
http://www.lulu.com/us/en/shop/jack-krupansk
On Tue, Apr 2, 2013 at 12:45 PM, andi rexha wrote:
> Hi Adrien,
> Thank you very much for the reply.
>
> I have two other small question about this:
> 1) Is "final int freq = docsAndPositions.freq();" the same with
> "iterator.totalTermFreq()" ? In my tests it returns the same result and from
>
u...@gmail.com
> Date: Tue, 2 Apr 2013 12:05:12 +0200
> Subject: Re: Term vector Lucene 4.2
> To: java-user@lucene.apache.org
>
> Hi Andi,
>
> Here is how you could retrieve positions from your document:
>
> Terms termVector = indexReader.getTermVector(docId, fieldN
Hi Andi,
Here is how you could retrieve positions from your document:
Terms termVector = indexReader.getTermVector(docId, fieldName);
TermsEnum reuse = null;
TermsEnum iterator = termVector.iterator(reuse);
BytesRef ref = null;
DocsAndPositionsEnum docsAndPositions = null;
Thanks Simon!
On 29.10.2012 г. 21:38, Simon Willnauer wrote:
you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload() payloads are per positions so you
need to advance the pos first!
simon
On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote:
Hi G
you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload() payloads are per positions so you
need to advance the pos first!
simon
On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote:
> Hi Guys,
>
> I use the following code to index documents and set Pa
http://www.gossamer-threads.com/lists/lucene/java-user/86299 looks relevant.
--
Ian.
On Tue, Jun 7, 2011 at 10:05 AM, G.Long wrote:
> Hi :)
>
> In my index, there are documents like :
>
> doc { question: 1, response: 1, word: excellent }
> doc { question 1, response: 1, word: great }
> doc { q
The performance impact should only be at indexing time, unless you
actually retrieve the vectors for some number of hits at search time.
Mike
On Tue, Nov 30, 2010 at 2:28 PM, Maricris Villareal wrote:
> Hi,
>
> Could someone tell me the effect (if any) of having term vectors set to
> WITH_POSITI
sary in other API calls.
BTW, that environment is Java 1.6.0_12 on 64-bit SUSE Linux with 32G of RAM and
using MMapDirectory.
Thanks.
-John
-Original Message-
From: Nader, John P [mailto:john.na...@cengage.com]
Sent: Thursday, July 29, 2010 5:49 PM
To: java-user@lucene.apache.org
Sub
: > My other question is whether there are planned performance
: > enhancements to address this loss of performance?
:
: These APIs are very different in the next major release (4.0) of
: Lucene, so except for problems spotted by users like you, there's not
: much more dev happening against them
the added synchronization. I don't
think is waiting on locks, but rather the memory flush and loading that goes on.
-John
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Thursday, July 29, 2010 5:55 AM
To: java-user@lucene.apache.org
Subject: Re:
On Wed, Jul 28, 2010 at 2:39 PM, Nader, John P wrote:
> We recently upgraded from lucene 2.4.0 to lucene 3.0.2. Our load testing
> revealed a serious performance drop specific to traversing the list of terms
> and their associated documents for a given indexed field. Our code looks
> somethin
Well, counting frequency isn't the best approach. For instance, if a field
has 1,000 terms and 10 occurrences of your target, is that a better match
than a field with 10 terms and 5 occurrences of your target?
This kind of thing is already taken into account with Lucene scoring, you
might
want to
Hi Erik,
Thanks for the reply. What I want to do is, to identify key terms and key
phrases of a document according to their number of occurences in the
document. Output should be the highest freequency words and (two or three
word) phrases. For this purpose can I use Lucene?
Thanks
Manjula
On Th
Terms are relatively easy, see TermFreqVector in the JavaDocs.
Phrases aren't as easy, before you go there, though, what is the
high-level problem you're trying to solve? Possibly this is an XY problem
(see http://people.apache.org/~hossman/#xyproblem).
Best
Erick
On Thu, May 6, 2010 at 6:39 AM,
: Monday, April 26, 2010 10:55 AM
To: java-user@lucene.apache.org
Subject: Re: Term offsets for highlighting
Stephen Greene wrote:
> Hi Koji,
>
> Thank you. I implemented a solution based on the
FieldTermStackTest.java
> and if I do a search like "iron ore" it matches iron or o
Stephen Greene wrote:
Hi Koji,
Thank you. I implemented a solution based on the FieldTermStackTest.java
and if I do a search like "iron ore" it matches iron or ore. The same is
true if I specify iron AND ore.
The termSetMap[0].value[0] = ore and termSetMap[0].value[1] = iron.
What am I missing
tIndexReader(), pintDocId,
fieldName);
-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Saturday, April 24, 2010 5:18 AM
To: java-user@lucene.apache.org
Subject: Re: Term offsets for highlighting
Hi Steve,
> is there a way to access a TermVector containin
Hi Steve,
> is there a way to access a TermVector containing only matched terms,
> or is my previous approach still the
So you want to access FieldTermStack, I understand.
The way to access it, I wrote it at previous mail:
You cannot access FieldTermStack from FVH, but I think you
can create i
ay, April 19, 2010 9:02 PM
To: java-user@lucene.apache.org
Subject: Re: Term offsets for highlighting
Stephen Greene wrote:
> Hi Koji,
>
> An additional question. Is it possible to access the FieldTermStack
from
> the FastVectorHighlighter after the it has been populated with
matching
Stephen Greene wrote:
Hi Koji,
An additional question. Is it possible to access the FieldTermStack from
the FastVectorHighlighter after the it has been populated with matching
terms from the field?
I think this would provide an ideal solution for this problem, as
ultimately I am only concerned
positional offsets to have
highlighting tags applied to them in a separate process.
Thank you for your insight,
Steve
-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Sunday, April 18, 2010 10:42 AM
To: java-user@lucene.apache.org
Subject: Re: Term offsets for
Subject: Re: Term offsets for highlighting
Stephen Greene wrote:
> Hi Koji,
>
> Thank you for your reply. I did try the QueryScorer without success,
but
> I was using Lucene 2.4.x
>
Hi Steve,
I thought you were using 2.9 or later because you mentioned
FastVectorHighlighter in you
Stephen Greene wrote:
Hi Koji,
Thank you for your reply. I did try the QueryScorer without success, but
I was using Lucene 2.4.x
Hi Steve,
I thought you were using 2.9 or later because you mentioned
FastVectorHighlighter in your previous mail (FVH was first
introduced in 2.9). If I remembe
...@r.email.ne.jp]
Sent: Friday, April 16, 2010 9:49 PM
To: java-user@lucene.apache.org
Subject: Re: Term offsets for highlighting
Stephen Greene wrote:
> Hello,
>
>
>
> I am trying to determine begin and end offsets for terms and phrases
> matching a query.
>
> Is there a way usin
Stephen Greene wrote:
Hello,
I am trying to determine begin and end offsets for terms and phrases
matching a query.
Is there a way using either the highlighter or fast vector highlighter
in contrib?
I have already attempted extending the highlighter which would match
terms but would not
What are the associated Analyzers for your Gene and Token?
Because if they're NOT something akin to KeywordAnalyzer, you
have a problem. Specifically, most of the "regular" tokenizers will
break this stream up into three separate terms,
"brain", "natriuetic", and "peptide". If that's the case, the
I'm not going to go into too much code level detail, however I'd index
the phrases using tri-gram shingles, and as uni-grams. I think
this'll give you the results you're looking for. You'll be able to
quickly recall the count of a given phrase aka tri-gram such as
"blue_shorts_burough"
On Fri, J
@All : Elaborating the problem
The phrase is being indexed as a single token ...
I have a Gene tag in the xml document which is like
brain natriuretic peptide
This phrase is present in the abstract text for the given document .
Code is as :
doc.add(new Field("Gene", geneName, Field.Store.YES
When do you detect that they are phrases? During indexing or during search?
On Jan 8, 2010, at 5:16 AM, hrishim wrote:
>
> Hi .
> I have phrases like brain natriuretic peptide indexed as a single token
> using Lucene.
> When I calculate the term frequency for the same the count is 0 since the
On a quick read, your statements are contradictory
<<>>
<<>>
Either "brain natriuretic peptide" is a single token/term or it's not
Are you sure you're not confusing indexing and storing? What
analyzer are you using at index time?
Erick
On Fri, Jan 8, 2010 at 5:16 AM, hrishim wrote:
Issue a PhraseQuery and count how many hits came back? Is that too
slow? If so, you could detect all phrases during indexing and add
them as tokens to the index?
Mike
On Fri, Jan 8, 2010 at 5:16 AM, hrishim wrote:
>
> Hi .
> I have phrases like brain natriuretic peptide indexed as a single tok
On Fri, Nov 13, 2009 at 4:21 PM, Max Lynch wrote:
> Well already, without doing any boosting, documents matching more of the
> > terms
> > in your query will score higher. If you really want to make this effect
> > more
> > pronounced, yes, you can boost the more important query terms higher.
>
Well already, without doing any boosting, documents matching more of the
> terms
> in your query will score higher. If you really want to make this effect
> more
> pronounced, yes, you can boost the more important query terms higher.
>
> -jake
>
But there isn't a way to determine exactly what bo
On Fri, Nov 13, 2009 at 4:02 PM, Max Lynch wrote:
> > > Now, I would like to know exactly what term was found. For example, if
> a
> > > result comes back from the query above, how do I know whether John
> Smith
> > > was
> > > found, or both John Smith and his company, or just John Smith
> > Ma
> > Now, I would like to know exactly what term was found. For example, if a
> > result comes back from the query above, how do I know whether John Smith
> > was
> > found, or both John Smith and his company, or just John Smith
> Manufacturing
> > was found?
>
>
> In general, this is actually very
On Fri, Nov 13, 2009 at 3:35 PM, Max Lynch wrote:
> > query: "San Francisco" "California" +("John Smith" "John Smith
> > Manufacturing")
> >
> > Here the San Fran and CA clauses are optional, and the ("John Smith" OR
> > "John Smith Manufacturing") is required.
> >
>
> Thanks Jake, that works nic
> query: "San Francisco" "California" +("John Smith" "John Smith
> Manufacturing")
>
> Here the San Fran and CA clauses are optional, and the ("John Smith" OR
> "John Smith Manufacturing") is required.
>
Thanks Jake, that works nicely.
Now, I would like to know exactly what term was found. For e
Did I do that wrong? I always mess up the AND/OR human-readable form
of this - it's clearer when you use +/- unary operators instead:
query: "San Francisco" "California" +("John Smith" "John Smith
Manufacturing")
Here the San Fran and CA clauses are optional, and the ("John Smith" OR
"John Smith
> You want a query like
>
> ("San Francisco" OR "California") AND ("John Smith" OR "John Smith
> Manufacturing")
>
Won't his require San Francisco or California to be present? I do not
require them to be, I only require "John Smith" OR "John Smith
Manufacturing", but I want to get a bigger scor
Hi Max,
You want a query like
("San Francisco" OR "California") AND ("John Smith" OR "John Smith
Manufacturing")
essentially? You can give Lucene exactly this query and it will require
that
either "John Smith" or "John Smith Manufacturing" be present, but will score
results which have these
I would just throw your doc into a MemoryIndex (lives in contrib/
memory, I think; it only holds one doc), get the Vector and do what
you need to do. So you would kind of be doing indexing, but not really.
On Aug 13, 2009, at 8:43 AM, joe_coder wrote:
Grant, thanks for responding.
My i
For example, I am able to do
Analyzer analyzer = new StandardAnalyzer(); // or any other analyzer
TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some
text goes here"));
Token t = ts.next();
while (t!=null) {
System.out.println("token: "+t));
t
Grant, thanks for responding.
My issue is that I am not planning to use lucene ( as I don't need any
search capability, atleast yet). All I have is a text document and I need to
extract keywords and their frequency ( which could be a simple split on
space and tracking the count). But I realize th
On Aug 13, 2009, at 7:40 AM, joe_coder wrote:
I was wondering if there is any way to directly use Lucene API to
extract
terms from a given string. My requirement is that I have a text
document for
which I need a term frequency vector ( after stemming, removing
stopwords
and synonyms che
Chrisitan,
if you haven't done so you might find Luke
(http://www.getopt.org/luke/) very helpful so see what has been
indexed and how.
simon
On Thu, Aug 13, 2009 at 6:10 AM, Christian
Bongiorno wrote:
> turns out the index is being built with lower-case terms which is why we
> aren't getting hits
turns out the index is being built with lower-case terms which is why we
aren't getting hits the way we expect. When I change my search terms to
lower I see more of what I expect.
Gonna keep working on this and post updates.
On Wed, Aug 12, 2009 at 12:46 PM, Christian Bongiorno <
christ...@bongio
You have a bunch of log statements in there, what are they printing out?
Also, IndexSearcher.explain() is your friend for understanding why a
doc matched the way it did.
On Aug 12, 2009, at 3:46 PM, Christian Bongiorno wrote:
I have a situation where I have a series of terms queries as par
ant Ingersoll"
To:
Sent: Tuesday, June 30, 2009 9:48 PM
Subject: Re: Term Frequency vector consumes memory
In Lucene, a Term Vector is a specific thing that is stored on disk
when creating a Document and Field. It is optional and off by
default. It is separate from being able to get th
er to load term vector. I want to switch off
this feature? Is that possible without re-indexing?
Regards
Ganesh
- Original Message -
From: "Grant Ingersoll"
To:
Sent: Tuesday, June 30, 2009 9:48 PM
Subject: Re: Term Frequency vector consumes memory
> In Lucene, a Term Ve
In Lucene, a Term Vector is a specific thing that is stored on disk
when creating a Document and Field. It is optional and off by
default. It is separate from being able to get the term frequencies
for all the docs in a specific field. The former is decided at
indexing time and there is
For all the docs, and in fact, I think it might be the document frequency.
Basically I need to be able to do a query and get a list of terms with how
many documents in the result set contain that term. I'm not so worried about
how often the term appears in each document.
Thanks
Rob
On Thu, May 21
This is often requested, but Lucene doesn't make it easy. I'd love
for someone to come up and build this feature :)
Do you need term freqs for just the top N that were collected? Or for
all docs that matched the query?
Mike
On Thu, May 21, 2009 at 6:34 AM, Robert Young wrote:
> Hi,
> I would
OK I opened https://issues.apache.org/jira/browse/LUCENE-1586 to track
this. Thanks deminix!
Mike
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.or
Ah yes. I'd be happy with the ability to monitor it for now. Assuming it
is too involved to remove the limitation.
For all practical purposes we should only be using, worst case, 10% of the
term space today. That happens to make it risky enough that it needs an eye
kept on it, as this will be o
On Sat, Apr 4, 2009 at 11:57 AM, deminix wrote:
> Yea. That is all that matters anyway right, is the limit at the segment
> level?
Well... the problem is when merges kick off.
You could have N segments that each are below the limit, but when a
merge runs the merged segment would try to exceed t
On Sat, Apr 4, 2009 at 11:52 AM, deminix wrote:
> My crude regex'ing of the code has me thinking it is only term vectors that
> are limited to 32 bits, since they allocate arrays. Otherwise it seems
> good. Does that sound right?
Not quite... SegmentTermEnum.seek takes "int p". TermInfosReader
Yea. That is all that matters anyway right, is the limit at the segment
level?
On Sat, Apr 4, 2009 at 8:44 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Sat, Apr 4, 2009 at 10:25 AM, deminix wrote:
>
> > AFAIK there isn't an api that returns the current number of terms,
> cor
My crude regex'ing of the code has me thinking it is only term vectors that
are limited to 32 bits, since they allocate arrays. Otherwise it seems
good. Does that sound right?
On Sat, Apr 4, 2009 at 7:25 AM, deminix wrote:
> Thanks for the clarification.
>
> I'm partitioning the document spac
On Sat, Apr 4, 2009 at 10:25 AM, deminix wrote:
> AFAIK there isn't an api that returns the current number of terms, correct?
Alas, no. This limitation has been talked about before... maybe we
should add it.
But: it's not actually simple to compute, at the MultiSegmentReader
level. Each Segme
Thanks for the clarification.
I'm partitioning the document space, so I'm not really concerned about the
fact documents are ints. Some fields have very unique value spaces though
(and many values per document), and they don't align to the same way the
documents are partitioned so may have a very
Correct, and, not that I know of.
Mike
On Sat, Apr 4, 2009 at 7:55 AM, Murat Yakici
wrote:
>
> I assume the total number of documents that you can index is also limited
> by Java max int. Is this correct? Is there any way to index documents
> beyond this number in a single index?
>
> Murat
>
>
>
I assume the total number of documents that you can index is also limited
by Java max int. Is this correct? Is there any way to index documents
beyond this number in a single index?
Murat
> I tentatively think you are correct: the file format itself does not
> impose this limitation.
>
> But in
I tentatively think you are correct: the file format itself does not
impose this limitation.
But in a least a couple places internally, Lucene uses a java int to
hold the term number, which is actually a limit of 2,147,483,648
terms. I'll update fileformats.html for 2.9.
Mike
On Sat, Apr 4, 200
In contrib/analysis there are also some TokenFilters that provide
examples of using Payloads. See the
org.apache.lucene.analysis.payloads package: http://lucene.apache.org/java/2_4_1/api/contrib-analyzers/org/apache/lucene/analysis/payloads/package-summary.html
-Grant
On Mar 24, 2009, at 4
Seid Mohammed wrote:
ok, but I need to know how to proceed with it.
I mean how to include to my application
many thanks
Seid M
You may want to look at the following articles:
http://lucene.jugem.jp/?eid=133
http://lucene.jugem.jp/?eid=134
articles are in Japanese, but ignore them. :)
Pro
ok, but I need to know how to proceed with it.
I mean how to include to my application
many thanks
Seid M
On 3/24/09, Koji Sekiguchi wrote:
> Seid Mohammed wrote:
>> Hi All
>> I want my lucene to index documents and making some terms to have more
>> boost value.
>> so, if I index the document "
Seid Mohammed wrote:
Hi All
I want my lucene to index documents and making some terms to have more
boost value.
so, if I index the document "The quick fox jumps over the lazy dog"
and I want the term fox and dog to have greater boost value.
How can I do that
Thanks a lot
seid M
How about
On Feb 25, 2009, at 2:52 PM, Tim Williams wrote:
Is there a syntax to set the term position in a query built with
queryparser? For example, I would like something like:
PhraseQuery q = new PhraseQuery();
q.add(t1, 0);
q.add(t2, 0);
q.setSlop(0);
As I understand it, the slop defaults to 0, bu
On Sun, Feb 15, 2009 at 10:50 AM, Joel Halbert wrote:
> When constructing a query, using a series of terms e.g.
>
> Term1=X, Term2=Y etc...
>
> does it make sense, like in sql, to place to most restrictive term query
> first?
>
> i.e. if I know that the query will be mainly constrained by the valu
: The easiest way to change the tf calculation would be overwriting
: tf in an own implementation of Similarity like it's done in
: SweetSpotSimilarity. But the average term frequency of the
: document is missing. Is there a simple way to get or calc this
: number?
there was quite a bit of discus
Mark,
This is exactly what I want and It worked perfectly. Thanks!
I'll post my highlighter to JIRA in a few days (hopegully).
It uses term offsets with positions (WITH_POSITIONS_OFFSETS)
to support PhraseQuery.
Thanks again,
Koji
Mark Miller wrote:
Okay, Koji, hopefully I'll be more luckily
Okay, Koji, hopefully I'll be more luckily suggesting this this time.
Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am
not sure if its in an applyable state, but I hope that covers your issue.
On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi wrote:
> Hello,
>
> I'm writi
: References:
:
: <1998.130.159.185.12.1232021837.squir...@webmail.cis.strath.ac.uk>
: Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST)
: Subject: Term Frequency and IndexSearcher
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion
Hi Paul,
I am tempted to suggest the following ( I am assuming here that the
document and the particular fields are TFVed when indexing):
For every doc in the result set:
- get the doc id
- using the doc id, get the TermFreqVector of this document from the
index reader (tfv=ireader.getTermFr
Tim,
Op Wednesday 19 November 2008 02:32:40 schreef Tim Sturge:
...
> >>
> >> This is less than 2x slower than the dedicated bitset and more
> >> than 50x faster than the range boolean query.
> >>
> >> Mike, Paul, I'm happy to contribute this (ugly but working) code
> >> if there is interest. Let
> With "Allow Filter as clause to BooleanQuery":
> https://issues.apache.org/jira/browse/LUCENE-1345
> one could even skip the ConstantScoreQuery with this.
> Unfortunately 1345 is unfinished for now.
>
That would be interesting; I'd like to see how much performance improves.
>> startup: 2811
Op Wednesday 19 November 2008 00:43:56 schreef Tim Sturge:
> I've finished a query time implementation of a column stride filter,
> which implements DocIdSetIterator. This just builds the filter at
> process start and uses it for each subsequent query. The index itself
> is unchanged.
>
> The resul
I've finished a query time implementation of a column stride filter, which
implements DocIdSetIterator. This just builds the filter at process start
and uses it for each subsequent query. The index itself is unchanged.
The results are very impressive. Here are the results on a 45M document
index:
Just to followup... I opened these three issues:
https://issues.apache.org/jira/browse/LUCENE-1441 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1442 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1448 (still iterating)
Mike
Christian Reuschling wrote:
Hi Guy
Paul Elschot wrote:
Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.
Ie since we know there are X unique terms in this segment, whe
Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
> Also, one nice optimization we could do with the "term number column-
> stride array" is do bit packing (borrowing from the PFOR code)
> dynamically.
>
> Ie since we know there are X unique terms in this segment, when
> populating t
Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.
Ie since we know there are X unique terms in this segment, when
populating the array that maps docID to term number we could use
exactly the r
Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:
>
> The other part of your proposal was to somehow "number" term text
> such that term range comparisons can be implemented fast int
> comparison.
...
>
>http://fontoura.org/papers/paramsearch.pdf
>
> However that'd be quite a bit
1 - 100 of 144 matches
Mail list logo