Re: Lucene : avoiding locking
I'm working on a similar project... Make sure that only one call to the index method is occuring at a time. Synchronizing that method should do it. --- Luke Shannon [EMAIL PROTECTED] wrote: Hi All; I have hit a snag in my Lucene integration and don't know what to do. My company has a content management product. Each time someone changes the directory structure or a file with in it that portion of the site needs to be re-indexed so the changes are reflected in future searches (indexing must happen during run time). I have written a Indexer class with a static Index() method. The idea is too call the method every time something changes and the index needs to be re-examined. I am hoping the logic put in by Doug Cutting surrounding the UID will make indexing efficient enough to be called so frequently. This class works great when I tested it on my own little site (I have about 2000 file). But when I drop the functionality into the QA environment I get a locking error. I can't access the stack trace, all I can get at is a log file the application writes too. Here is the section my class wrote. It was right in the middle of indexing and bang lock issue. I don't know if the problem is in my code or something in the existing application. Error Message: ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent) |INFO|INDEXING INFO: Start Indexing new content. |INFO|INDEXING INFO: Index Folder Did Not Exist. Start Creation Of New Index |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING INFO: Beginnging Incremental update comparisions |INFO|INDEXING ERROR: Unable to index new content Lock obtain timed out: Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432 10f7fe8-write.lock |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent) Here is my code. You will recognize it pretty much as the IndexHTML class from the Lucene demo written by Doug Cutting. I have put a ton of comments in a attempt to understand what is going on. Any help would be appreciated. Luke package com.fbhm.bolt.search; /* * Created on Nov 11, 2004 * * This class will create a single index file for the Content * Management System (CMS). It contains logic to ensure * indexing is done intelligently. Based on IndexHTML.java * from the demo folder that ships with Lucene */ import java.io.File; import java.io.IOException; import java.util.Arrays; import java.util.Date; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermEnum; import org.pdfbox.searchengine.lucene.LucenePDFDocument; import org.apache.lucene.demo.HTMLDocument; import com.alaia.common.debug.Trace; import com.alaia.common.util.AppProperties; /** * @author lshannon Description: br * This class is used to index a content folder. It contains logic to * ensure only new or documents that have been modified since the last * search are indexed. br * Based on code writen by Doug Cutting in the IndexHTML class found in * the Lucene demo */ public class Indexer { //true during deletion pass, this is when the index already exists private static boolean deleting = false; //object to read existing indexes private static IndexReader reader; //object to write to the index folder private static IndexWriter writer; //this will be used to write the index file private static TermEnum uidIter; /* * This static method does all the work, the end result is an up-to-date index folder */ public static void Index() { //we will assume to start the index has been created boolean create = true; //set
Re: lucene file locking question
Disabling locking is only recommended for read-only indexes that aren't being modified. I think there is a comment in the code about a good example of this being an index you read off of a CD-ROM. --- John Wang [EMAIL PROTECTED] wrote: Hi folks: My application builds a super-index around the lucene index, e.g. stores some additional information outside of lucene. I am using my own locking outside of the lucene index via FileLock object in the jdk1.4 nio package. My code does the following: FileLock lock=null; try{ lock=myLockFileChannel.lock(); indexing into lucene; indexing additional information; } finally{ try{ commit lucene index by closing the IndexWriter instance. } finally{ if (lock!=null){ lock.release(); } } } Now here is the weird thing, say I terminate the process in the middle of indexing, and run the program again, I would get a Lock obtain time out exception, as long as you delete the stale lock file, the index remains uncorrupted. However, if I turn lucene file lock off since I have a lock outside it anyways, (by doing: static{ System.setProperty(disableLuceneLocks,true); } ) and do the same thing. Instead I get an unrecoverable corrupted index. Does lucene lock really guarentee index integrity under this kind of abuse or am I just getting lucky? If so, can someone shine some light on how? Thanks in advance -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Locking issue
Whoops! Looks like my attachment didn't make it through. I'm re-attaching my simple test app. Thanks. --- Erik Hatcher [EMAIL PROTECTED] wrote: On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED] wrote: Hi, With the information provided, I have no idea what the issue may be. Is there some information that I should post that will help determine why Lucene is giving me this error? You mentioned posting code - though I don't recall getting an attachment. If you could post it as a Bugzilla issue with your code attached, it would be preserved outside of our mailboxes. If the code is self-contained enough for me to try it, I will at some point in the near future. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Locking issue
Yes, I tried that too and it worked. The issue is that our Operations folks plan to install this on a pretty busy box and I was hoping that Lucene wouldn't cause issues if it only had a small slice of the CPU. Guess I'll tell them to buy a bigger box! Unless you have any other ideas. I'm running some tests with a larger timeout to see if that helps. --- Erik Hatcher [EMAIL PROTECTED] wrote: I just added a Thread.sleep(1000) in the writer thread and it has run for quite some time, and is still running as I send this. Erik On Nov 10, 2004, at 8:02 PM, [EMAIL PROTECTED] wrote: I added it to Bugzilla like you suggested: http://issues.apache.org/bugzilla/show_bug.cgi?id=32171 Let me know if you see any way to get around this issue. --- Lucene Users List [EMAIL PROTECTED] wrote: Whoops! Looks like my attachment didn't make it through. I'm re-attaching my simple test app. Thanks. --- Erik Hatcher [EMAIL PROTECTED] wrote: On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED] wrote: Hi, With the information provided, I have no idea what the issue may be. Is there some information that I should post that will help determine why Lucene is giving me this error? You mentioned posting code - though I don't recall getting an attachment. If you could post it as a Bugzilla issue with your code attached, it would be preserved outside of our mailboxes. If the code is self-contained enough for me to try it, I will at some point in the near future. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search scalability
Does it take 800MB of RAM to load that index into a RAMDirectory? Or are only some of the files loaded into RAM? --- Otis Gospodnetic [EMAIL PROTECTED] wrote: Hello, 100 parallel searches going against a single index on a single disk means a lot of disk seeks all happening at once. One simple way of working around this is to load your FSDirectory into RAMDirectory. This should be faster (could you report your observations/comparisons?). You can also try using ramfs if you are using Linux. Otis --- Ravi [EMAIL PROTECTED] wrote: We have one large index for a document repository of 800,000 documents. The size of the index is 800MB. When we do searches against the index, it takes 300-500ms for a single search. We wanted to test the scalability and tried 100 parallel searches against the index with the same query and the average response time was 13 seconds. We used a simple IndexSearcher. Same searcher object was shared by all the searches. I'm sure people have success in configuring lucene for better scalability. Can somebody share their approach? Thanks Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene1.4.1 + OutOf Memory
There is a memory leak in the sorting code of Lucene 1.4.1. 1.4.2 has the fix! --- Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies.. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field Only for 2000 hits = Exception [ Too many Files Open ] IInd type : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /ParallelSearcher + Search on Content Field Only for 2 hits = Exception [ OutOf Memeory ] System Config [same for both type] Amd Processor [High End Single] RAM 1GB O/s Linux ( jantoo type ) Appserver Tomcat 5.05 Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ] Index contains 15 Fields Search Done only on 1 field Retrieve 11 corrosponding fields 3 Fields are for debug details Switched from Ist type to IInd Type Can some body suggest me Why is this Happening Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]