date:20120213

Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward

Hello, I want to score span queries based on the simple presence or absence of a hit (I'm not interested in Tf or Idf here), with a possible boost on specific spans. I've already extended DefaultSimilarity to deal with single terms. From looking at the code it seems that I want to override T

Re: Overriding SloppySimScorer

2012-02-13 Thread Robert Muir

On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward wrote: > Hello, > > (I'm not interested in Tf or Idf here) > I've already extended DefaultSimilarity In this case, then extending DefaultSimilarity/TFIDFSimilarity is not the best approach. > Or should the SimScorer methods on TDIDFSimilarity be unf

Re: Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward

On 13 Feb 2012, at 12:16, Robert Muir wrote: > On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward > wrote: >> Hello, >> >> (I'm not interested in Tf or Idf here) >> I've already extended DefaultSimilarity > > In this case, then extending DefaultSimilarity/TFIDFSimilarity is not > the best approach

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Shashi Kant

You might want to post this on sites such as odesk.com, rentacoder.com, guru.com, freelancer.com On Mon, Feb 13, 2012 at 9:31 AM, SearchTech wrote: > am currently working on a search engine based on lucene and have some > issues because java is not my regular programming language, which ma

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Li Li

for 2.x and 3.x you can simply use this codes: Directory dir=FSDirectory.open(new File("./testindex")); IndexReader reader=IndexReader.open(dir); List urls=new ArrayList(reader.numDocs()); for(int i=0;i wrote: > Hi there, > > I am currently working on a search engine based on lucen

query performance with leading *

2012-02-13 Thread G.Long

Hi, Is there a way to improve query performance when using a leading * as a wildcard on a path property? I have hundreds of queries to run on a lucene index (~250mo). Executing those queries without the leading * is about 5x faster than with the leading *. My problem is that I sometimes need

RE: query performance with leading *

2012-02-13 Thread Austin, Carl

You could possibly tokenize the value both forwards and in reverse, for example: 123456 and 654321 You can then convert a query for *56 to 65* and this will increase performance. -Original Message- From: G.Long [mailto:jde...@gmail.com] Sent: 13 February 2012 16:39 To: java-user@lucene.

Re: query performance with leading *

2012-02-13 Thread Robert Muir

I think you can solve this with the tokenizers in the org.apache.lucene.analysis.path package (in lucene-analyzers.jar) In your case, looks like ReversePathHierarchyTokenizer might be what you want, though you will need to upgrade to at least 3.2 to get it. On Mon, Feb 13, 2012 at 11:38 AM, G.Lon

RE: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Uwe Schindler

Hi, > as for Trunk 4.x, I can't find the isDeleted(int) method. any one could tell me > why this method is removed? See MIGRATE.txt... Hint: AtomicReader.getLiveDocs() Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene

Re: query performance with leading *

2012-02-13 Thread G.Long

Thank you for the tips, Is there an analyzer which uses this tokenizer? If not, do you know any tutorial which explain how to implement a custom analyzer? I didn't find any. Regards. Le 13/02/2012 17:46, Robert Muir a écrit : I think you can solve this with the tokenizers in the org.apache.

Re: When to refresh writer?

2012-02-13 Thread Michael McCandless

IndexWriter doesn't require refreshing... just keep it open forever. It'll run it's own merges when needed (see the MergePolicy/Scheduler). Just call .commit() when you want changes to be durable (survive OS/JVM crash, power loss, etc.). Mike McCandless http://blog.mikemccandless.com On Mon, Fe

Poll: how to report # of docs in index over time

2012-02-13 Thread Otis Gospodnetic

Hello, Quick poll for those who have an opinion about what index size monitoring should report in terms of the number of documents in the index. Poll: http://blog.sematext.com/2012/02/13/poll-solr-index-size-monitoring/ For example, imagine that in some 5-minute time period (say 10:00 AM to 10:

Overriding SloppySimScorer

Re: Overriding SloppySimScorer

Re: Overriding SloppySimScorer

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

query performance with leading *

RE: query performance with leading *

Re: query performance with leading *

RE: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

Re: query performance with leading *

Re: When to refresh writer?

Poll: how to report # of docs in index over time

12 matches

Site Navigation

Mail list logo

Footer information