[ANNOUNCE] : Lucene Server

2004-09-23 Thread Cocula Remi
I am glad to introduce a new project on SourceForge that is related to Lucene. Lucene Server is a java server application for simply create and manage Jakarta Lucene Indexes. It is designed to help you integrate Lucene in distributed environnements. The first release 0.1 is available for

Strange search results with wildcard - Bug?

2004-09-23 Thread Ulrich Mayring
Hi all, first, here's how to reproduce the problem: Go to http://www.denic.de/en/special/index.jsp and enter obscure service in the search field. You'll get 132 hits. Now enter obscure service* - and you only get 1 hit. The above website is running Lucene 1.3rc3, but I was able to reproduce

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Morus Walter
Ulrich Mayring writes: Hi all, first, here's how to reproduce the problem: Go to http://www.denic.de/en/special/index.jsp and enter obscure service in the search field. You'll get 132 hits. Now enter obscure service* - and you only get 1 hit. The above website is running Lucene

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Ulrich Mayring
Morus Walter wrote: Your number/handle samples look ok to me if the default operator is AND. But it's OR ;-) Using AND explicitly I get different results and using OR explicitly I get the same results as documented. Note that wildcard expressions are not analyzed so if service is stemmed to

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Morus Walter
Ulrich Mayring writes: Will do, thank you very much. However, how do I get at the analyzed form of my terms? Instanciate the analyzer, create a token stream feeding your input, loop over the tokens, output the results. Morus

MultiSearcher + Sort

2004-09-23 Thread Karthik N S
Guys Apologies Am I doing Wrong or is ther a bug with Lucene on Linux O/s When using ' MultiSearcher with Sort ' Please Somebody Reply me ASAP Tested both Lucene-1.4-final.jar,Lucene-1.4.1.jar hits = multiSearcher.search(query,sortField); Exception raised on Linux O/s Only

Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Dear all, I saw a post about an attempt to integrate Carrot2 with Lucene. It was a while ago, so I'm curious if any outcome has been achieved. Anyway, as the project coordinator I can offer my help with such integration; if you're looking for some ready-to-use code then there is a clustering

Re: problem with get/setBoost of document fields

2004-09-23 Thread Bastian Grimm [Eastbeam GmbH]
hmm ok, but how will i be able to set different boosts to fields, if this value is not stored?! i dont really understand why i can set a boost factor and it is not stored and used. what i want to do, is to weight my searchable index fields (type: Field.UnStored) with a different factors for

Re: problem with get/setBoost of document fields

2004-09-23 Thread Erik Hatcher
The boost is not thrown away, but rather combined with the length normalization factor during indexing. So while your actual boost value is not stored directly in the index, it is taken into consideration for scoring appropriately. Erik On Sep 23, 2004, at 8:17 AM, Bastian Grimm

RE: Clustering lucene's results

2004-09-23 Thread William W
Hi Dawid, I would like to use Carrot2 with lucene. Do you have examples ? Thanks a lot, William. From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Clustering lucene's results Date: Thu, 23 Sep 2004 13:36:03 +0200 Dear all, I saw a

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Send me such a snippet and I'll try to

Re: problem with get/setBoost of document fields

2004-09-23 Thread Bastian Grimm [Eastbeam GmbH]
thanks for your reply, eric. so i am right that its not possible to change the boost without reindexing all files? thats not good... or is it ok only to change the boosts an optimize the index to take changes effecting the index? if not, will i be able to boost those fields in the searcher?

RE: MultiSearcher + Sort

2004-09-23 Thread Wermus Fernando
Karthik, I have a kind of similar problem. Test the following: when you create a field, don't use Field(String), instead use Field(String, int) where int is a constant for the field's type. May be this could help. -Mensaje original- De: Karthik N S [mailto:[EMAIL PROTECTED]

RE: Questions related to closing the searcher

2004-09-23 Thread Aviran
The best way is to use IndexReader's getCurrentVersion() method to check whether the index has changed. If it has, just get a new Searcher http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade r.html#getCurrentVersion(java.lang.String) Aviran -Original Message-

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Ulrich Mayring
Erik Hatcher wrote: Look at AnalysisDemo referred to here: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Keep in mind that phrase queries do not support wildcards - they are analyzed and any wildcard characters are likely stripped and cause tokens to split. Ok, I did all that and

Re: problem with get/setBoost of document fields

2004-09-23 Thread Doug Cutting
You can change field boosts without re-indexing. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#setNorm(int,%20java.lang.String,%20byte) Doug Bastian Grimm [Eastbeam GmbH] wrote: thanks for your reply, eric. so i am right that its not possible to change the

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Ulrich Mayring
Ulrich Mayring wrote: If the user searches for 007001 handle, the MultiFieldQueryParser, which searches in the fields title and contents, changes that query to: (title:007001 +title:handl) (contents:007001 +contents:handl) Ok, I cleared this up, there was some invisible magic going on in the

Re: Clustering lucene's results

2004-09-23 Thread Andrzej Bialecki
Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Hi Dawid :-) I believe

Re: demo HTML parser question

2004-09-23 Thread roy-lucene-user
Hi Fred, We were originally attempting to use the demo html parser (Lucene 1.2), but as you know, its for a demo. I think its threaded to optimize on time, to allow the calling thread to grab the title or top message even though its not done parsing the entire html document. That's just a

compiling 1.4 source

2004-09-23 Thread roy-lucene-user
Hi guys, So we started upgrading to 1.4 and we need to add some of our own custom code. After compiling with ant, I noticed that the 1.4 ant script builds a jar called lucene-1.5-rc1-dev.jar, not lucene-1.4-final.jar. I'm pretty sure I did not download the wrong source. Is this just a wrong

Re: compiling 1.4 source

2004-09-23 Thread Erik Hatcher
If you obtained the 1.4.1 source distribution, then you're fine and its simply an issue with the properties. We keep the properties set to the _next_ version of Lucene (or as a beta/rc version label) to avoid the CVS HEAD codebase from building as a release label when it is very likely not

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Hi Andrzej :) Yep, ok, I'll take a look at it. After I come back from abroad (next week). I just wanted to save myself some time and have an already written code that fetches the information we need for clustering; you know what I mean, I'm sure. But I'll start from scratch when I get back. D.

Power Point Processing

2004-09-23 Thread Zhang, Lisheng
Hi, Does anyone know a good tool to processing MS Power Point file (*.ppt) into plain text so we can use lucene to index it? I looked at jakarta/POI, and only see Word and Excel documents can be processed, some JavaDoc pages mentioned ppt, but status is not clear to me? Thanks very much for

Re: Clustering lucene's results

2004-09-23 Thread William W
Hi Dawid, The demos (under /src/demo) are very good. They have the basic usage scenario. Thanks Andrzej. William. Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index

Re: demo HTML parser question

2004-09-23 Thread Doug Cutting
[EMAIL PROTECTED] wrote: We were originally attempting to use the demo html parser (Lucene 1.2), but as you know, its for a demo. I think its threaded to optimize on time, to allow the calling thread to grab the title or top message even though its not done parsing the entire html document.

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
yeah... I know there have to be demos... I tried to be lazy, you know :) Anyway, as I told Andrzej -- I'll take a look at it (and with a pleasure) after I come back. i don't think the delay will matter much. And if it does, ask Andrzej -- he has excellent experience with both projects -- he's

Document contents split among different Fields

2004-09-23 Thread Greg Langmead
I am working on extending Lucene to support documents with special islands of an XML language, and I want to index the islands differently from the text. My current plan is to break the document's contents into two Fields, one with all the text and one with all the special islands, and use a

Re: Document contents split among different Fields

2004-09-23 Thread Doug Cutting
Greg Langmead wrote: Am I right in saying that the design of Token's support for highlighting really only supports having the entire document stored as one monolithic contents Field? No, I don't think so. Has anyone tackled indexing multiple content Fields before that could shed some light? Do you

RE: Document contents split among different Fields

2004-09-23 Thread Greg Langmead
Doug Cutting wrote: Do you need highlights from all fields? If so, then you can use: TextFragment[] getBestTextFragments(TokenStream, ...); with a TokenStream for each field, then select the highest scoring fragments across all fields. Would that work for you? Thanks for the reply.