Hello,
I want to score span queries based on the simple presence or absence of a hit
(I'm not interested in Tf or Idf here), with a possible boost on specific
spans. I've already extended DefaultSimilarity to deal with single terms.
From looking at the code it seems that I want to override
T
On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward
wrote:
> Hello,
>
> (I'm not interested in Tf or Idf here)
> I've already extended DefaultSimilarity
In this case, then extending DefaultSimilarity/TFIDFSimilarity is not
the best approach.
> Or should the SimScorer methods on TDIDFSimilarity be unf
On 13 Feb 2012, at 12:16, Robert Muir wrote:
> On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward
> wrote:
>> Hello,
>>
>> (I'm not interested in Tf or Idf here)
>> I've already extended DefaultSimilarity
>
> In this case, then extending DefaultSimilarity/TFIDFSimilarity is not
> the best approach
You might want to post this on sites such as odesk.com, rentacoder.com,
guru.com, freelancer.com
On Mon, Feb 13, 2012 at 9:31 AM, SearchTech wrote:
> am currently working on a search engine based on lucene and have some
> issues because java is not my regular programming language, which ma
for 2.x and 3.x you can simply use this codes:
Directory dir=FSDirectory.open(new File("./testindex"));
IndexReader reader=IndexReader.open(dir);
List urls=new ArrayList(reader.numDocs());
for(int i=0;i wrote:
> Hi there,
>
> I am currently working on a search engine based on lucen
Hi,
Is there a way to improve query performance when using a leading * as a
wildcard on a path property?
I have hundreds of queries to run on a lucene index (~250mo). Executing
those queries without the leading * is about 5x faster than with the
leading *. My problem is that I sometimes need
You could possibly tokenize the value both forwards and in reverse, for
example:
123456 and 654321
You can then convert a query for *56 to 65* and this will increase
performance.
-Original Message-
From: G.Long [mailto:jde...@gmail.com]
Sent: 13 February 2012 16:39
To: java-user@lucene.
I think you can solve this with the tokenizers in the
org.apache.lucene.analysis.path package (in lucene-analyzers.jar)
In your case, looks like ReversePathHierarchyTokenizer might be what
you want, though you will need to upgrade to at least 3.2 to get it.
On Mon, Feb 13, 2012 at 11:38 AM, G.Lon
Hi,
> as for Trunk 4.x, I can't find the isDeleted(int) method. any one could
tell me
> why this method is removed?
See MIGRATE.txt... Hint: AtomicReader.getLiveDocs()
Uwe
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene
Thank you for the tips,
Is there an analyzer which uses this tokenizer? If not, do you know any
tutorial which explain how to implement a custom analyzer? I didn't find
any.
Regards.
Le 13/02/2012 17:46, Robert Muir a écrit :
I think you can solve this with the tokenizers in the
org.apache.
IndexWriter doesn't require refreshing... just keep it open forever.
It'll run it's own merges when needed (see the MergePolicy/Scheduler).
Just call .commit() when you want changes to be durable (survive
OS/JVM crash, power loss, etc.).
Mike McCandless
http://blog.mikemccandless.com
On Mon, Fe
Hello,
Quick poll for those who have an opinion about what index size monitoring
should report in terms of the number of documents in the index.
Poll: http://blog.sematext.com/2012/02/13/poll-solr-index-size-monitoring/
For example, imagine that in some 5-minute time period (say 10:00 AM to 10:
12 matches
Mail list logo