Re: lucene cutomized indexing

2004-07-21 Thread John Wang
Hi Eric and Grant: Thanks for the replies and this is certainly encouraging. As suggested, I will post furthere such discussions to the dev list. Thanks -John On Tue, 20 Jul 2004 15:37:35 -0400, Grant Ingersoll [EMAIL PROTECTED] wrote: It seems to me the answer to this is not necessarily

Re: speeding up lucene search

2004-07-21 Thread John Wang
In general, yes. By splitting up a large index into smaller indicies, you are linearizing the search time. Furthermore, that allows you to make your search distributable. -John On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau [EMAIL PROTECTED] wrote: Hello guys, What are some general techniques

Can I retrieve token offsets from Hits?

2004-07-21 Thread Stepan Mik
Hi, It is possible to retrieve tokens offsets (Token.startOffset(), Token.endOffset()) later when document is found and returned in hit collection? I need these values for hihglighting. I've already looked to Highlighter in sandbox but it actually re-analyzes the original document's field.

Re: Can I retrieve token offsets from Hits?

2004-07-21 Thread Erik Hatcher
On Jul 21, 2004, at 6:59 AM, Stepan Mik wrote: It is possible to retrieve tokens offsets (Token.startOffset(), Token.endOffset()) later when document is found and returned in hit collection? No offsets are not stored in the index. In fact, the only place they are currently used is with the

RE: Lucene vs. MySQL Full-Text

2004-07-21 Thread Anson Lau
Depending on what MySQL Full-text search support you probably will lose some of the advance things you get for free from Lucene, such as proximity search, wildcard search, search term and search field boosting, scoring of the documents, etc. Afterall it depends on what you need to do. In our dev

Re: Lucene vs. MySQL Full-Text

2004-07-21 Thread Erik Hatcher
Interestingly (and ironically) enough, the project I'm currently working on requires full-text searching of Word and PDF resumes. SQL Server is already the required database as well, so we are leveraging the full-text indexing capabilities it has. There is a special trick to drop a BLOB into

Extracting Lucene onto Tomcat

2004-07-21 Thread Ian McDonnell
Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Ian

RE: speeding up lucene search

2004-07-21 Thread Anson Lau
Has anyone tried splitting up an index into smaller chunks, without putting the different indicies on a different physical disk/box? What sort of performance gain do you get from it? Anson -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Erik Hatcher
On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene

RE: Sorting on tokenized fields

2004-07-21 Thread Aviran
You can create a new field which contains the full untokened string and use it as a sort field. -Original Message- From: Florian Sauvin [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 20, 2004 20:13 PM To: Lucene Users List Subject: Sorting on tokenized fields I see in the Javadoc that

Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Greg Gershman
When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Ian McDonnell
Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher [EMAIL

Weighting database fields

2004-07-21 Thread John Patterson
Hi, What is the best way to get Lucene to assign weightings to certain fields from a database? For example, the 'name' field should be weighted higher than the 'description' field. Thanks, John. - To unsubscribe, e-mail:

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Ian McDonnell
Also another silly question, do i need to setup a war on the server? --- Ian McDonnell [EMAIL PROTECTED] wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Zilverline info
Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: Also another silly question, do i

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Ian McDonnell
I was looking at your instructions there, but couldnt really figure out what you mean. Can i manually add the extracted directories onto the tomcat server, if so what should my root directory be? Say for example the extracted directories org/apache/lucene/ Should i have that as

Re: Weighting database fields

2004-07-21 Thread Erik Hatcher
On Jul 21, 2004, at 10:09 AM, Anson Lau wrote: Apply boost factor to fields when you do a lucene search. Or... set the boost on the Field during indexing. Erik Anson -Original Message- From: John Patterson [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 12:07 AM To: [EMAIL

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Erik Hatcher
There is no need to extract Lucene's JAR file. Your questions indicate that you have some Tomcat and Java web application learning to do and this forum is not the most appropriate place to ask. Lucene includes a web application demo that you could try deploying by following the steps here:

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Zilverline info
Hi Ian, You don't extract war files, or jar files. To deploy a web application that comes as a war file, you just have to drop it into webserver/servlet engine. So just: copy lucene.war tomcatserver/webapps. That's it. I advice you to read some of the documentation on the Tomcat website on

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Ian McDonnell
No sorry i didnt mean that i was trying to extract the jars at all. I meant the extraction of the original lucene source bundle. I have been developing in java for going on 5 years now, but am relatively new to Web Apps. I have some experience in TomCat from days as an undergrad and do

Re: Weighting database fields

2004-07-21 Thread John Patterson
Thanks, that was what I was after! - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:52 PM Subject: Re: Weighting database fields On Jul 21, 2004, at 10:09 AM, Anson Lau wrote: Apply boost factor to

RE: Weighting database fields

2004-07-21 Thread Anson Lau
Erik, Is there any benefit to set the boost during indexing rather than set it during query? I usually set it when doing a query because you can change that boost values easily without having to re-index. Thanks, ANson -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED]

Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Erik Hatcher
On Jul 21, 2004, at 11:19 AM, Ian McDonnell wrote: No sorry i didnt mean that i was trying to extract the jars at all. I meant the extraction of the original lucene source bundle. I have been developing in java for going on 5 years now, but am relatively new to Web Apps. I have some experience

Re: Weighting database fields

2004-07-21 Thread Erik Hatcher
On Jul 21, 2004, at 11:40 AM, Anson Lau wrote: Is there any benefit to set the boost during indexing rather than set it during query? It allows setting each document differently. For example, TheServerSide is using field-level boosts at index time to control ordering by date, such that newer

Re: Use of Convertes or Parser

2004-07-21 Thread Otis Gospodnetic
Lucene cannot parse those document formats that you mentioned. You need 3rd party parsers to do that. For example, POI will parse Excel and MS Word docs, PDFBox will parse PDF. Otis --- Natarajan.T [EMAIL PROTECTED] wrote: Hi Guys, I have a small query, ie. Lucene 1.4 APIs directly

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Greg Gershman
I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Greg Gershman
I switched the Comparators and FieldCache classes to use java.util.HashMap instead of java.util.WeakHashMap, and got the performance boost I was looking for (test index of 100K documents; initial search took 991 ms, all subsequent searchs took 90ms. Before, I was seeing initial query of ~1sec,

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I think I found the problem FieldCacheImpl uses WeakHashMap to store the cached objects, but since there is no other reference to this cache it is getting released. Switching to HashMap solves it. The only problem is that I don't see anywhere where the cached object will get released if you open a

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I just saw this post, I guess we both came to the same conclusion. The only problem is that the cached object never gets released, and a new one will get created every time you open a new IndexReader Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent:

Re: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Doug Cutting
The key in the WeakHashMap should be the IndexReader, not the Entry. I think this should become a two-level cache, a WeakHashMap of HashMaps, the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I think the Entry class can also be changed to not include an IndexReader field.

RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I will post a patch soon Aviran -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:56 PM To: Lucene Users List Subject: Re: Sort: 1.4-rc3 vs. 1.4-final The key in the WeakHashMap should be the IndexReader, not the Entry. I think this

Slightly off topic, I need to have luke use my Analyzer

2004-07-21 Thread Rob Jose
Sorry for the slightly off topic post, but I have a need to use luke with my Analyzer. Has anyone done this? I have added a jar file to my classpath, but that didn't help. Thanks in advance Rob - To unsubscribe, e-mail:

RE: Slightly off topic, I need to have luke use my Analyzer

2004-07-21 Thread Chellappa, Kannan
Worked for me. I added my jar to the classpath and my analyzer appeared in the analyzers list in the search tab as well as in the analyzers list in the plugins tab. I am using Luke v 0.5 (2004-05-25) Kannan -Original Message- From: Rob Jose [mailto:[EMAIL PROTECTED] Sent: Wednesday,

RE: Slightly off topic, I need to have luke use my Analyzer

2004-07-21 Thread Chellappa, Kannan
Sorry typo in the version date in my previous mail -- I meant Luke v 0.5 (2004-06-25) -Original Message- From: Chellappa, Kannan Sent: Wednesday, July 21, 2004 12:16 PM To: Lucene Users List Subject: RE: Slightly off topic, I need to have luke use my Analyzer Worked for me. I added

Re: Weighting database fields

2004-07-21 Thread Ernesto De Santis
Hi Erik On Jul 21, 2004, at 11:40 AM, Anson Lau wrote: Is there any benefit to set the boost during indexing rather than set it during query? It allows setting each document differently. For example, TheServerSide is using field-level boosts at index time to control ordering by date,

Re: Weighting database fields

2004-07-21 Thread Doug Cutting
Ernesto De Santis wrote: If some field have set a boots value in index time, and when in search time the query have another boost value for this field, what happens? which value is used for boost? The two boosts are both multiplied into the score. Doug

Re: Syntax of Query

2004-07-21 Thread Hetan Shah
Guys/Gals, Does and one have any pointers for this kind of query? Thanks. Need some help with creating a query. Here is the scenario: Field 1: Field 2: Field 3: MultiSelect 1 :

RE: Use of Convertes or Parser

2004-07-21 Thread Natarajan.T
Ok Thanks. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:33 PM To: Lucene Users List Subject: Re: Use of Convertes or Parser Lucene cannot parse those document formats that you mentioned. You need 3rd party parsers to do that. For

RE: Extracting Lucene onto Tomcat

2004-07-21 Thread Karthik N S
hi Just Copy the lucene.war file into the TomCat webApps Directory, and then start the Tomcat On the Browser type... http://localhost:8080/luceneweb will serve u the Pages. But first u have to index u'r directory for the web module to Serve u the searchable hits , I think there