Re: Lucene : avoiding locking

2004-11-11 Thread yahootintin-lucene
I'm working on a similar project...
Make sure that only one call to the index method is occuring at
a time.  Synchronizing that method should do it.

--- Luke Shannon [EMAIL PROTECTED] wrote:

 Hi All;
 
 I have hit a snag in my Lucene integration and don't know what
 to do.
 
  My company has a content management product. Each time
 someone changes the
  directory structure or a file with in it that portion of the
 site needs to
  be re-indexed so the changes are reflected in future searches
 (indexing
 must
  happen during run time).
 
  I have written a Indexer class with a static Index() method.
 The idea is
 too
  call the method every time something changes and the index
 needs to be
  re-examined. I am hoping the logic put in by Doug Cutting
 surrounding the
  UID will make indexing efficient enough to be called so
 frequently.
 
  This class works great when I tested it on my own little site
 (I have about
  2000 file). But when I drop the functionality into the QA
 environment I get
  a locking error.
 
  I can't access the stack trace, all I can get at is a log
 file the
  application writes too. Here is the section my class wrote.
 It was right in
  the middle of indexing and bang lock issue.
 
  I don't know if the problem is in my code or something in the
 existing
  application.
 
  Error Message:
  ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
  |INFO|INDEXING INFO: Start Indexing new content.
  |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
 Creation Of New
 Index
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING ERROR: Unable to index new content Lock obtain
 timed out:
 

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
  10f7fe8-write.lock
 
 |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
 
  Here is my code. You will recognize it pretty much as the
 IndexHTML class
  from the Lucene demo written by Doug Cutting. I have put a
 ton of comments
  in a attempt to understand what is going on.
 
  Any help would be appreciated.
 
  Luke
 
  package com.fbhm.bolt.search;
 
  /*
   * Created on Nov 11, 2004
   *
   * This class will create a single index file for the Content
   * Management System (CMS). It contains logic to ensure
   * indexing is done intelligently. Based on IndexHTML.java
   * from the demo folder that ships with Lucene
   */
 
  import java.io.File;
  import java.io.IOException;
  import java.util.Arrays;
  import java.util.Date;
 
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.index.IndexWriter;
  import org.apache.lucene.index.Term;
  import org.apache.lucene.index.TermEnum;
  import org.pdfbox.searchengine.lucene.LucenePDFDocument;
  import org.apache.lucene.demo.HTMLDocument;
 
  import com.alaia.common.debug.Trace;
  import com.alaia.common.util.AppProperties;
 
  /**
   * @author lshannon Description: br
   *   This class is used to index a content folder. It
 contains logic to
   *   ensure only new or documents that have been modified
 since the last
   *   search are indexed. br
   *   Based on code writen by Doug Cutting in the IndexHTML
 class found in
   *   the Lucene demo
   */
  public class Indexer {
   //true during deletion pass, this is when the index already
 exists
   private static boolean deleting = false;
 
   //object to read existing indexes
   private static IndexReader reader;
 
   //object to write to the index folder
   private static IndexWriter writer;
 
   //this will be used to write the index file
   private static TermEnum uidIter;
 
   /*
* This static method does all the work, the end result is
 an up-to-date
  index folder
   */
   public static void Index() {
//we will assume to start the index has been created
boolean create = true;
//set 

Re: lucene file locking question

2004-11-11 Thread yahootintin-lucene
Disabling locking is only recommended for read-only indexes that
aren't being modified.  I think there is a comment in the code
about a good example of this being an index you read off of a
CD-ROM.

--- John Wang [EMAIL PROTECTED] wrote:

 Hi folks:
 
   My application builds a super-index around the lucene
 index,
 e.g. stores some additional information outside of lucene.
 
I am using my own locking outside of the lucene index
 via
 FileLock object in the jdk1.4 nio package.
 
My code does the following:
 
 FileLock lock=null;
 try{
 lock=myLockFileChannel.lock();
 
 indexing into lucene;
 
 indexing additional information;
 
 }
 
 finally{
   try{
   commit lucene index by closing the IndexWriter
 instance.
   }
   finally{
 if (lock!=null){
lock.release();
 }
   }
 }
 
 
 Now here is the weird thing, say I terminate the process in
 the middle
 of indexing, and run the program again, I would get a Lock
 obtain
 time out exception, as long as you delete the stale lock
 file, the
 index remains uncorrupted.
 
 However, if I turn lucene file lock off since I have a lock
 outside it anyways, 
 (by doing: 
 static{
 System.setProperty(disableLuceneLocks,true);
   }
 )
 
 and do the same thing. Instead I get an unrecoverable
 corrupted index.
 
 Does lucene lock really guarentee index integrity under this
 kind of
 abuse or am I just getting lucky?
 If so, can someone shine some light on how?
 
 Thanks in advance
 
 -John
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Locking issue

2004-11-10 Thread yahootintin-lucene
Whoops!  Looks like my attachment didn't make it through.  I'm
re-attaching my simple test app.

Thanks.

--- Erik Hatcher [EMAIL PROTECTED] wrote:

 On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED]
 wrote:
  Hi,
 
  With the information provided, I have no
  idea what the issue
  may be.
 
  Is there some information that I should post that will help
 determine
  why Lucene is giving me this error?
 
 You mentioned posting code - though I don't recall getting an 
 attachment.  If you could post it as a Bugzilla issue with
 your code 
 attached, it would be preserved outside of our mailboxes.  If
 the code 
 is self-contained enough for me to try it, I will at some
 point in the 
 near future.
 
   Erik
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Locking issue

2004-11-10 Thread yahootintin-lucene
Yes, I tried that too and it worked.  The issue is that our
Operations folks plan to install this on a pretty busy box and I
was hoping that Lucene wouldn't cause issues if it only had a
small slice of the CPU.

Guess I'll tell them to buy a bigger box!  Unless you have any
other ideas.  I'm running some tests with a larger timeout to
see if that helps.

--- Erik Hatcher [EMAIL PROTECTED] wrote:

 I just added a Thread.sleep(1000) in the writer thread and it
 has run 
 for quite some time, and is still running as I send this.
 
   Erik
 
 On Nov 10, 2004, at 8:02 PM, [EMAIL PROTECTED]
 wrote:
 
  I added it to Bugzilla like you suggested:
  http://issues.apache.org/bugzilla/show_bug.cgi?id=32171
 
 
  Let me know if you see any way to get around this issue.
 
  --- Lucene
  Users List [EMAIL PROTECTED] wrote:
  Whoops!  Looks like my
  attachment didn't make it through.  I'm
  re-attaching my simple test app.
 
 
  Thanks.
 
  --- Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
  On Nov 10, 2004, at 5:48 PM,
 [EMAIL PROTECTED]
 
  wrote:
  Hi,
 
  With the information provided, I have no
 
  idea what the issue
  may be.
 
  Is there some information
  that I should post that will help
  determine
  why Lucene is giving
  me this error?
 
  You mentioned posting code - though I don't recall
  getting an
  attachment.  If you could post it as a Bugzilla issue with
 
  your code
  attached, it would be preserved outside of our mailboxes.
   If
  the code
  is self-contained enough for me to try it, I will
  at some
  point in the
  near future.
 
Erik
 
 
 
 
 

-
 
  To unsubscribe, e-mail:
  [EMAIL PROTECTED]
 
  For additional commands, e-mail:
  [EMAIL PROTECTED]
 
 
 
 
 
 

-
 
  To unsubscribe, e-mail:
 [EMAIL PROTECTED]
  For
  additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 

-
  To unsubscribe, e-mail:
 [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search scalability

2004-11-10 Thread yahootintin-lucene
Does it take 800MB of RAM to load that index into a
RAMDirectory?  Or are only some of the files loaded into RAM?

--- Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Hello,
 
 100 parallel searches going against a single index on a single
 disk
 means a lot of disk seeks all happening at once.  One simple
 way of
 working around this is to load your FSDirectory into
 RAMDirectory. 
 This should be faster (could you report your
 observations/comparisons?).  You can also try using ramfs if
 you are
 using Linux.
 
 Otis
 
 --- Ravi [EMAIL PROTECTED] wrote:
 
   We have one large index for a document repository of
 800,000
  documents.
  The size of the index is 800MB. When we do searches against
 the
  index,
  it takes 300-500ms for a single search. We wanted to test
 the
  scalability and tried 100 parallel searches against the
 index with
  the
  same query and the average response time was 13 seconds. We
 used a
  simple IndexSearcher. Same searcher object was shared by all
 the
  searches. I'm sure people have success in configuring lucene
 for
  better
  scalability. Can somebody share their approach?
  
  Thanks 
  Ravi. 
  
 

-
  To unsubscribe, e-mail:
 [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
  
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene1.4.1 + OutOf Memory

2004-11-09 Thread yahootintin-lucene
There is a memory leak in the sorting code of Lucene 1.4.1. 
1.4.2 has the fix!

--- Karthik N S [EMAIL PROTECTED] wrote:

 
 Hi
 Guys
 
 Apologies..
 
 
 
 History
 
 Ist type :  4  subindexes   +  MultiSearcher  + Search on
 Content Field
 Only  for 2000 hits
 
   
=
 Exception  [ Too many Files Open ]
 
 
 
 
 
 IInd type :  40 Mergerd Indexes [1000 subindexes each]   + 
 MultiSearcher
 /ParallelSearcher +  Search on Content Field Only for 2
 hits
 
   
=
 Exception  [ OutOf Memeory  ]
 
 
 
 System Config  [same for both type]
 
 Amd Processor [High End Single]
 RAM  1GB
 O/s Linux  ( jantoo type )
 Appserver Tomcat 5.05
 Jdk [ IBM  Blackdown-1.4.1-01  ( == Jdk1.4.1) ]
 
 Index contains 15 Fields
 Search
 Done only on 1 field
 Retrieve 11 corrosponding fields
 3 Fields  are for debug details
 
 
 Switched from Ist type to IInd Type
 
 Can some body suggest me Why is this Happening
 
 Thx in advance
 
 
 
 
   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]
 
 
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]