date:20080708

Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Paul Taylor

Hi, I have been using a RAMDirectory for indexing without any problem, 
but I then moved to a file based directory to reduce memory usage. this 
has been working fine on Windows and OSX and my version of linux 
(redhat) but is failing on a version of linux (archlinux) with 'Too many 
files opened' , but they are only indexing 32 documents , I can index 
thousands without a problem. It mentions this error in the Lucene FAQ 
but I am not dealing directly with the filesystem myself, this is my 
code for creating an index is it okay or is there some kind of close 
that I am missing


thanks for any help Paul

public synchronized void reindex()
   {
   MainWindow.logger.info("Reindex start:" + new Date());
   TableModel tableModel = table.getModel();
   try
   {
   //Recreate the RAMDirectory uses too much memory
   //directory = new RAMDirectory();
   directory = 
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + 
TAG_BROWSER_INDEX);

   IndexWriter writer = new IndexWriter(directory, analyzer, true);

   //Iterate through all rows
   for (int row = 0; row < tableModel.getRowCount(); row++)
   {
   //for each row make a new document
   Document document = createDocument(row);
   writer.addDocument(document);

   }
   writer.optimize();
   writer.close();
   }
   catch (Exception e)
   {
   throw new RuntimeException("Problem indexing Data:" + 
e.getMessage());

   }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

'deletable' indexing files are not deleted on RHEL5

2008-07-08 Thread Zhou Lin Dai


Hi

I'm using Lucene on a RHEL5 box. The indexing folder is growing extremely
large, more than 20 GB, with a lot 'deletable' indexing files. It runs out
of the disk. I have to clear the entire folder and start indexing from
blank. The code ran fine before I moved it onto RHEL5. Does that matter?
Can anyone give some suggestions on how to solve this issue?

Thanks in advance.

Best Regards,

Frank Dai (Dai Zhoulin 戴周林)
Lotus Connections - Dogear Development, WPLC
China Development Lab, IBM Shanghai
TEL:(86-21)60928189
Internet ID: [EMAIL PROTECTED]
Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech Park,
201203, Shanghai, China
My Blog: http://www.daizhoulin.com/wordpress

Re: 'deletable' indexing files are not deleted on RHEL5

2008-07-08 Thread Michael McCandless



What do you mean by "deletable" indexing files?

Moving to RHEL5 should have no effect (vs other platforms) on how much  
disk space is used.


However, Lucene's disk usage can be surprising.  While merging  
segments it will temporarily require free space equal to the size of  
the resulting merged segment.  For a large merge this can be a sizable  
percentage of your total index size.


Mike

Zhou Lin Dai wrote:



Hi

I'm using Lucene on a RHEL5 box. The indexing folder is growing  
extremely
large, more than 20 GB, with a lot 'deletable' indexing files. It  
runs out

of the disk. I have to clear the entire folder and start indexing from
blank. The code ran fine before I moved it onto RHEL5. Does that  
matter?

Can anyone give some suggestions on how to solve this issue?

Thanks in advance.

Best Regards,

Frank Dai (Dai Zhoulin 戴周林)
Lotus Connections - Dogear Development, WPLC
China Development Lab, IBM Shanghai
TEL:(86-21)60928189
Internet ID: [EMAIL PROTECTED]
Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech  
Park,

201203, Shanghai, China
My Blog: http://www.daizhoulin.com/wordpress



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Michael McCandless



Technically you should call directory.close() as well, but missing  
that will not lead to too many open files.


How often is that RuntimeException being thrown?  EG if a single  
document is frequently hitting an exception during analysis, your code  
doesn't close the IndexWriter in that situation.  It's better to use a  
try/finally and close the IndexWriter in the finally clause, to cover  
that case.


Are you sure nothing else is using up file descriptors?  EG the  
createDocument call does not open any files?


Mike

Paul Taylor wrote:

Hi, I have been using a RAMDirectory for indexing without any  
problem, but I then moved to a file based directory to reduce memory  
usage. this has been working fine on Windows and OSX and my version  
of linux (redhat) but is failing on a version of linux (archlinux)  
with 'Too many files opened' , but they are only indexing 32  
documents , I can index thousands without a problem. It mentions  
this error in the Lucene FAQ but I am not dealing directly with the  
filesystem myself, this is my code for creating an index is it okay  
or is there some kind of close that I am missing


thanks for any help Paul

public synchronized void reindex()
  {
  MainWindow.logger.info("Reindex start:" + new Date());
  TableModel tableModel = table.getModel();
  try
  {
  //Recreate the RAMDirectory uses too much memory
  //directory = new RAMDirectory();
  directory =  
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" +  
TAG_BROWSER_INDEX);
  IndexWriter writer = new IndexWriter(directory, analyzer,  
true);


  //Iterate through all rows
  for (int row = 0; row < tableModel.getRowCount(); row++)
  {
  //for each row make a new document
  Document document = createDocument(row);
  writer.addDocument(document);

  }
  writer.optimize();
  writer.close();
  }
  catch (Exception e)
  {
  throw new RuntimeException("Problem indexing Data:" +  
e.getMessage());

  }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Paul Taylor


Michael McCandless wrote:


Technically you should call directory.close() as well, but missing 
that will not lead to too many open files.


How often is that RuntimeException being thrown?  EG if a single 
document is frequently hitting an exception during analysis, your code 
doesn't close the IndexWriter in that situation.  It's better to use a 
try/finally and close the IndexWriter in the finally clause, to cover 
that case.


Are you sure nothing else is using up file descriptors?  EG the 
createDocument call does not open any files?


Mike

The runtimeException is occurring all the time, Im waiting for some more 
information from the user. Since the post I've since added 
directory.close() too, I thought this would cause a problem when I call 
IndexSearcher with it as a parameter but it seems to still work - the 
documentation is not very clear on this point. I see your poibnt about 
the try/finally I'll make that chnage.


There are many other parts of the code that use filedescriptors, but the 
problem has never occurred before moving to a FSDirectory


thanks paul

heres an example of my search code, is this ok ?

public boolean recNoColumnMatchesSearch(Integer columnId, Integer recNo, 
String search)
{  
   try

   {
   IndexSearcher is = new IndexSearcher(directory);

   //Build a query based on the fields, searchString and 
standard analyzer
   QueryParser parser = new 
QueryParser(String.valueOf(columnId) + INDEXED, analyzer);

   Query query = parser.parse(search);
   MainWindow.logger.finer("Parsed Search Query Is" + 
query.toString() + "of type:" + query.getClass());


   //Create a filter,to restrict search to one row
   Filter filter = new QueryFilter(new TermQuery(new 
Term(ROW_NUMBER, String.valueOf(recNo;


   //run the search
   Hits hits = is.search(query, filter);
   Iterator i = hits.iterator();
   if (i.hasNext())
   {
   return true;
   }
   }
   catch (ParseException pe)
   {
   //Problem with syntax rather than throwing exception and 
causing everything to stop we just

   //log and return false
   MainWindow.logger.warning("Search Query invalid:" + 
pe.getMessage());

   return false;
   }
   catch (IOException e)
   {
   MainWindow.logger.warning("DataIndexer.Unable to do perform 
reno match search:" + search + ":" + e);

   }
   return false;


Paul Taylor wrote:

Hi, I have been using a RAMDirectory for indexing without any 
problem, but I then moved to a file based directory to reduce memory 
usage. this has been working fine on Windows and OSX and my version 
of linux (redhat) but is failing on a version of linux (archlinux) 
with 'Too many files opened' , but they are only indexing 32 
documents , I can index thousands without a problem. It mentions this 
error in the Lucene FAQ but I am not dealing directly with the 
filesystem myself, this is my code for creating an index is it okay 
or is there some kind of close that I am missing


thanks for any help Paul

public synchronized void reindex()
  {
  MainWindow.logger.info("Reindex start:" + new Date());
  TableModel tableModel = table.getModel();
  try
  {
  //Recreate the RAMDirectory uses too much memory
  //directory = new RAMDirectory();
  directory = 
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + 
TAG_BROWSER_INDEX);
  IndexWriter writer = new IndexWriter(directory, analyzer, 
true);


  //Iterate through all rows
  for (int row = 0; row < tableModel.getRowCount(); row++)
  {
  //for each row make a new document
  Document document = createDocument(row);
  writer.addDocument(document);

  }
  writer.optimize();
  writer.close();
  }
  catch (Exception e)
  {
  throw new RuntimeException("Problem indexing Data:" + 
e.getMessage());

  }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Michael McCandless



Hmmm, you should not close the directory if you are then going to use  
it to instantiate a searcher.


Your code below never closes the searcher?  I think that is most  
likely the source of your file descriptor leaks.


Mike

Paul Taylor wrote:


Michael McCandless wrote:


Technically you should call directory.close() as well, but missing  
that will not lead to too many open files.


How often is that RuntimeException being thrown?  EG if a single  
document is frequently hitting an exception during analysis, your  
code doesn't close the IndexWriter in that situation.  It's better  
to use a try/finally and close the IndexWriter in the finally  
clause, to cover that case.


Are you sure nothing else is using up file descriptors?  EG the  
createDocument call does not open any files?


Mike

The runtimeException is occurring all the time, Im waiting for some  
more information from the user. Since the post I've since added  
directory.close() too, I thought this would cause a problem when I  
call IndexSearcher with it as a parameter but it seems to still work  
- the documentation is not very clear on this point. I see your  
poibnt about the try/finally I'll make that chnage.


There are many other parts of the code that use filedescriptors, but  
the problem has never occurred before moving to a FSDirectory


thanks paul

heres an example of my search code, is this ok ?

public boolean recNoColumnMatchesSearch(Integer columnId, Integer  
recNo, String search)

{ try
  {
  IndexSearcher is = new IndexSearcher(directory);

  //Build a query based on the fields, searchString and  
standard analyzer
  QueryParser parser = new  
QueryParser(String.valueOf(columnId) + INDEXED, analyzer);

  Query query = parser.parse(search);
  MainWindow.logger.finer("Parsed Search Query Is" +  
query.toString() + "of type:" + query.getClass());


  //Create a filter,to restrict search to one row
  Filter filter = new QueryFilter(new TermQuery(new  
Term(ROW_NUMBER, String.valueOf(recNo;


  //run the search
  Hits hits = is.search(query, filter);
  Iterator i = hits.iterator();
  if (i.hasNext())
  {
  return true;
  }
  }
  catch (ParseException pe)
  {
  //Problem with syntax rather than throwing exception and  
causing everything to stop we just

  //log and return false
  MainWindow.logger.warning("Search Query invalid:" +  
pe.getMessage());

  return false;
  }
  catch (IOException e)
  {
  MainWindow.logger.warning("DataIndexer.Unable to do  
perform reno match search:" + search + ":" + e);

  }
  return false;


Paul Taylor wrote:

Hi, I have been using a RAMDirectory for indexing without any  
problem, but I then moved to a file based directory to reduce  
memory usage. this has been working fine on Windows and OSX and my  
version of linux (redhat) but is failing on a version of linux  
(archlinux) with 'Too many files opened' , but they are only  
indexing 32 documents , I can index thousands without a problem.  
It mentions this error in the Lucene FAQ but I am not dealing  
directly with the filesystem myself, this is my code for creating  
an index is it okay or is there some kind of close that I am missing


thanks for any help Paul

public synchronized void reindex()
 {
 MainWindow.logger.info("Reindex start:" + new Date());
 TableModel tableModel = table.getModel();
 try
 {
 //Recreate the RAMDirectory uses too much memory
 //directory = new RAMDirectory();
 directory =  
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/"  
+ TAG_BROWSER_INDEX);
 IndexWriter writer = new IndexWriter(directory, analyzer,  
true);


 //Iterate through all rows
 for (int row = 0; row < tableModel.getRowCount(); row++)
 {
 //for each row make a new document
 Document document = createDocument(row);
 writer.addDocument(document);

 }
 writer.optimize();
 writer.close();
 }
 catch (Exception e)
 {
 throw new RuntimeException("Problem indexing Data:" +  
e.getMessage());

 }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Michael McCandless



Also, if possible, you should share the IndexSearcher across multiple  
searches (ie, don't open/close a new one per search).  Opening an  
IndexSearcher can be a resource intensive operation, so you'll see  
better throughput if you share.  (Though in your particular situation  
it may not matter).


Mike

Paul Taylor wrote:


Michael McCandless wrote:


Technically you should call directory.close() as well, but missing  
that will not lead to too many open files.


How often is that RuntimeException being thrown?  EG if a single  
document is frequently hitting an exception during analysis, your  
code doesn't close the IndexWriter in that situation.  It's better  
to use a try/finally and close the IndexWriter in the finally  
clause, to cover that case.


Are you sure nothing else is using up file descriptors?  EG the  
createDocument call does not open any files?


Mike

The runtimeException is occurring all the time, Im waiting for some  
more information from the user. Since the post I've since added  
directory.close() too, I thought this would cause a problem when I  
call IndexSearcher with it as a parameter but it seems to still work  
- the documentation is not very clear on this point. I see your  
poibnt about the try/finally I'll make that chnage.


There are many other parts of the code that use filedescriptors, but  
the problem has never occurred before moving to a FSDirectory


thanks paul

heres an example of my search code, is this ok ?

public boolean recNoColumnMatchesSearch(Integer columnId, Integer  
recNo, String search)

{ try
  {
  IndexSearcher is = new IndexSearcher(directory);

  //Build a query based on the fields, searchString and  
standard analyzer
  QueryParser parser = new  
QueryParser(String.valueOf(columnId) + INDEXED, analyzer);

  Query query = parser.parse(search);
  MainWindow.logger.finer("Parsed Search Query Is" +  
query.toString() + "of type:" + query.getClass());


  //Create a filter,to restrict search to one row
  Filter filter = new QueryFilter(new TermQuery(new  
Term(ROW_NUMBER, String.valueOf(recNo;


  //run the search
  Hits hits = is.search(query, filter);
  Iterator i = hits.iterator();
  if (i.hasNext())
  {
  return true;
  }
  }
  catch (ParseException pe)
  {
  //Problem with syntax rather than throwing exception and  
causing everything to stop we just

  //log and return false
  MainWindow.logger.warning("Search Query invalid:" +  
pe.getMessage());

  return false;
  }
  catch (IOException e)
  {
  MainWindow.logger.warning("DataIndexer.Unable to do  
perform reno match search:" + search + ":" + e);

  }
  return false;


Paul Taylor wrote:

Hi, I have been using a RAMDirectory for indexing without any  
problem, but I then moved to a file based directory to reduce  
memory usage. this has been working fine on Windows and OSX and my  
version of linux (redhat) but is failing on a version of linux  
(archlinux) with 'Too many files opened' , but they are only  
indexing 32 documents , I can index thousands without a problem.  
It mentions this error in the Lucene FAQ but I am not dealing  
directly with the filesystem myself, this is my code for creating  
an index is it okay or is there some kind of close that I am missing


thanks for any help Paul

public synchronized void reindex()
 {
 MainWindow.logger.info("Reindex start:" + new Date());
 TableModel tableModel = table.getModel();
 try
 {
 //Recreate the RAMDirectory uses too much memory
 //directory = new RAMDirectory();
 directory =  
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/"  
+ TAG_BROWSER_INDEX);
 IndexWriter writer = new IndexWriter(directory, analyzer,  
true);


 //Iterate through all rows
 for (int row = 0; row < tableModel.getRowCount(); row++)
 {
 //for each row make a new document
 Document document = createDocument(row);
 writer.addDocument(document);

 }
 writer.optimize();
 writer.close();
 }
 catch (Exception e)
 {
 throw new RuntimeException("Problem indexing Data:" +  
e.getMessage());

 }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTEC

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Paul Taylor


Michael McCandless wrote:


Hmmm, you should not close the directory if you are then going to use 
it to instantiate a searcher.

how come it works ?


Your code below never closes the searcher?  I think that is most 
likely the source of your file descriptor leaks.

Ok fixed

paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 'deletable' indexing files are not deleted on RHEL5

2008-07-08 Thread Erick Erickson

Assuming your indexing completes, after the whole thing is
done and the process terminates, what is the size of
your index?

Is it possible that your old box had lots more disk space
and you just never noticed the (perhaps temporary) disk
space usage?

Best
Erick

2008/7/8 Zhou Lin Dai <[EMAIL PROTECTED]>:

>
> Hi
>
> I'm using Lucene on a RHEL5 box. The indexing folder is growing extremely
> large, more than 20 GB, with a lot 'deletable' indexing files. It runs out
> of the disk. I have to clear the entire folder and start indexing from
> blank. The code ran fine before I moved it onto RHEL5. Does that matter?
> Can anyone give some suggestions on how to solve this issue?
>
> Thanks in advance.
>
> Best Regards,
>
> Frank Dai (Dai Zhoulin 戴周林)
> Lotus Connections - Dogear Development, WPLC
> China Development Lab, IBM Shanghai
> TEL:(86-21)60928189
> Internet ID: [EMAIL PROTECTED]
> Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech Park,
> 201203, Shanghai, China
> My Blog: http://www.daizhoulin.com/wordpress

Re: How to make documents clustering and topic classification with lucene

2008-07-08 Thread Glen Newton

Use Carrot2:
 http://project.carrot2.org/

For Lucene + Carrot2:
 http://project.carrot2.org/faq.html#lucene-integration

-glen

2008/7/7 Ariel <[EMAIL PROTECTED]>:
> Hi everybody:
> Do you have Idea how to make how to make documents clustering and topic
> classification using lucene ??? Is there anyway to do this.
> Please I need help.
> Thanks everybody.
> Ariel
>



-- 

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Readers synchronization

2008-07-08 Thread Eric Diaz

According to SVN history on the next version this will be available:

LUCENE-1044: IndexWriter with autoCommit=true now commits (such
that a reader can see the changes) far less often than it used to.
Previously, every flush was also a commit.  You can always force a
commit by calling IndexWriter.commit().  Furthermore, in 3.0,
autoCommit will be hardwired to false (IndexWriter constructors
that take an autoCommit argument have been deprecated) (Mike
McCandless)

Does this mean that I won't need to reopen all the readers in order to see the 
index changes?

Thanks


  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Michael McCandless



It works because Lucene doesn't currently check for it, and, because  
closing an FSDirectory does not actually make it unusable.  In fact it  
also doesn't catch a double-close call.


But it may cause subtle problems, because FSDirectory has this  
invariant: only a single instance of FSDirectory exists per canonical  
directory in the filesystem.  This allows code to synchronized on that  
instance and sure no other code in the same JVM is also working in  
that canonical directory.


When you close an FSDirectory but keep using it you can get yourself  
to a point where this invariant is broken.  That said, besides  
IndexModifier (which is now deprecated), I can't find anything that  
would actually break when this invariant is broken.


Still I think we should put protection in to catch double-closing and  
prevent using a closed directory.  I'll open an issue.


Mike

Paul Taylor wrote:


Michael McCandless wrote:


Hmmm, you should not close the directory if you are then going to  
use it to instantiate a searcher.

how come it works ?


Your code below never closes the searcher?  I think that is most  
likely the source of your file descriptor leaks.

Ok fixed

paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Readers synchronization

2008-07-08 Thread Michael McCandless



No, that's not changed.  You must still reopen an IndexReader to see  
changes to the index.  An IndexReader always searches a point-in-time  
snapshot of the index.


LUCENE-1044 does mean that you should call IndexWriter.commit() (or,  
close the writer) to ensure all changes you've made become visible to  
the reader.


Mike

Eric Diaz wrote:


According to SVN history on the next version this will be available:

LUCENE-1044: IndexWriter with autoCommit=true now commits (such
   that a reader can see the changes) far less often than it used to.
   Previously, every flush was also a commit.  You can always force a
   commit by calling IndexWriter.commit().  Furthermore, in 3.0,
   autoCommit will be hardwired to false (IndexWriter constructors
   that take an autoCommit argument have been deprecated) (Mike
   McCandless)

Does this mean that I won't need to reopen all the readers in order  
to see the index changes?


Thanks




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

2008-07-08 Thread Michael McCandless



OK I opened:

https://issues.apache.org/jira/browse/LUCENE-1331

Mike

Paul Taylor wrote:


Michael McCandless wrote:


Hmmm, you should not close the directory if you are then going to  
use it to instantiate a searcher.

how come it works ?


Your code below never closes the searcher?  I think that is most  
likely the source of your file descriptor leaks.

Ok fixed

paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Readers synchronization

2008-07-08 Thread Eric Diaz

Besides the warm up that the faq section suggests (used on solr), is there 
another technique or solution to have an IndexReader/Search with an updated 
view of an index under a concurrent scenario (web app)?

Thanks

--- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote:

> From: Michael McCandless <[EMAIL PROTECTED]>
> Subject: Re: Readers synchronization
> To: java-user@lucene.apache.org
> Date: Tuesday, July 8, 2008, 11:12 AM
> No, that's not changed.  You must still reopen an
> IndexReader to see  
> changes to the index.  An IndexReader always searches a
> point-in-time  
> snapshot of the index.
> 
> LUCENE-1044 does mean that you should call
> IndexWriter.commit() (or,  
> close the writer) to ensure all changes you've made
> become visible to  
> the reader.
> 
> Mike
> 
> Eric Diaz wrote:
> 
> > According to SVN history on the next version this will
> be available:
> >
> > LUCENE-1044: IndexWriter with autoCommit=true now
> commits (such
> >that a reader can see the changes) far less often
> than it used to.
> >Previously, every flush was also a commit.  You can
> always force a
> >commit by calling IndexWriter.commit(). 
> Furthermore, in 3.0,
> >autoCommit will be hardwired to false (IndexWriter
> constructors
> >that take an autoCommit argument have been
> deprecated) (Mike
> >McCandless)
> >
> > Does this mean that I won't need to reopen all the
> readers in order  
> > to see the index changes?
> >
> > Thanks
> >
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> 
> 
> -
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]


  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Readers synchronization

2008-07-08 Thread Michael McCandless



No other techniques that I know of...

But there is ongoing discussions/work towards making reopening a  
reader much less costly.  EG repopulating the field cache after reopen  
is a costly operation now, but this issue:


https://issues.apache.org/jira/browse/LUCENE-1231

would make that cost be proportional to the number & size of the  
changed segments since you last reopened.


There has also been discussions on creating an IndexReader  
implementation that can directly search the RAM buffer in IndexWriter,  
which should give very fast turnaround in searching just-indexed  
documents, but that is quite a ways off...


Mike

Eric Diaz wrote:

Besides the warm up that the faq section suggests (used on solr), is  
there another technique or solution to have an IndexReader/Search  
with an updated view of an index under a concurrent scenario (web  
app)?


Thanks

--- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]>  
wrote:



From: Michael McCandless <[EMAIL PROTECTED]>
Subject: Re: Readers synchronization
To: java-user@lucene.apache.org
Date: Tuesday, July 8, 2008, 11:12 AM
No, that's not changed.  You must still reopen an
IndexReader to see
changes to the index.  An IndexReader always searches a
point-in-time
snapshot of the index.

LUCENE-1044 does mean that you should call
IndexWriter.commit() (or,
close the writer) to ensure all changes you've made
become visible to
the reader.

Mike

Eric Diaz wrote:


According to SVN history on the next version this will

be available:


LUCENE-1044: IndexWriter with autoCommit=true now

commits (such

  that a reader can see the changes) far less often

than it used to.

  Previously, every flush was also a commit.  You can

always force a

  commit by calling IndexWriter.commit().

Furthermore, in 3.0,

  autoCommit will be hardwired to false (IndexWriter

constructors

  that take an autoCommit argument have been

deprecated) (Mike

  McCandless)

Does this mean that I won't need to reopen all the

readers in order

to see the index changes?

Thanks






-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]





-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Readers synchronization

2008-07-08 Thread Eric Diaz

Is there any plan to change this behavior? meaning that by default a reader 
will see the current index?

Thanks in advance

--- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote:

> From: Michael McCandless <[EMAIL PROTECTED]>
> Subject: Re: Readers synchronization
> To: java-user@lucene.apache.org, [EMAIL PROTECTED]
> Date: Tuesday, July 8, 2008, 11:58 AM
> No other techniques that I know of...
> 
> But there is ongoing discussions/work towards making
> reopening a  
> reader much less costly.  EG repopulating the field cache
> after reopen  
> is a costly operation now, but this issue:
> 
>  https://issues.apache.org/jira/browse/LUCENE-1231
> 
> would make that cost be proportional to the number &
> size of the  
> changed segments since you last reopened.
> 
> There has also been discussions on creating an IndexReader 
> 
> implementation that can directly search the RAM buffer in
> IndexWriter,  
> which should give very fast turnaround in searching
> just-indexed  
> documents, but that is quite a ways off...
> 
> Mike
> 
> Eric Diaz wrote:
> 
> > Besides the warm up that the faq section suggests
> (used on solr), is  
> > there another technique or solution to have an
> IndexReader/Search  
> > with an updated view of an index under a concurrent
> scenario (web  
> > app)?
> >
> > Thanks
> >
> > --- On Tue, 7/8/08, Michael McCandless
> <[EMAIL PROTECTED]>  
> > wrote:
> >
> >> From: Michael McCandless
> <[EMAIL PROTECTED]>
> >> Subject: Re: Readers synchronization
> >> To: java-user@lucene.apache.org
> >> Date: Tuesday, July 8, 2008, 11:12 AM
> >> No, that's not changed.  You must still reopen
> an
> >> IndexReader to see
> >> changes to the index.  An IndexReader always
> searches a
> >> point-in-time
> >> snapshot of the index.
> >>
> >> LUCENE-1044 does mean that you should call
> >> IndexWriter.commit() (or,
> >> close the writer) to ensure all changes you've
> made
> >> become visible to
> >> the reader.
> >>
> >> Mike
> >>
> >> Eric Diaz wrote:
> >>
> >>> According to SVN history on the next version
> this will
> >> be available:
> >>>
> >>> LUCENE-1044: IndexWriter with autoCommit=true
> now
> >> commits (such
> >>>   that a reader can see the changes) far less
> often
> >> than it used to.
> >>>   Previously, every flush was also a commit. 
> You can
> >> always force a
> >>>   commit by calling IndexWriter.commit().
> >> Furthermore, in 3.0,
> >>>   autoCommit will be hardwired to false
> (IndexWriter
> >> constructors
> >>>   that take an autoCommit argument have been
> >> deprecated) (Mike
> >>>   McCandless)
> >>>
> >>> Does this mean that I won't need to reopen
> all the
> >> readers in order
> >>> to see the index changes?
> >>>
> >>> Thanks
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> -
> >>> To unsubscribe, e-mail:
> >> [EMAIL PROTECTED]
> >>> For additional commands, e-mail:
> >> [EMAIL PROTECTED]
> >>>
> >>
> >>
> >>
> -
> >> To unsubscribe, e-mail:
> >> [EMAIL PROTECTED]
> >> For additional commands, e-mail:
> >> [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> 
> 
> -
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]


  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Readers synchronization

2008-07-08 Thread Michael McCandless



Not that I know of.

Mike

Eric Diaz wrote:

Is there any plan to change this behavior? meaning that by default a  
reader will see the current index?


Thanks in advance

--- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]>  
wrote:



From: Michael McCandless <[EMAIL PROTECTED]>
Subject: Re: Readers synchronization
To: java-user@lucene.apache.org, [EMAIL PROTECTED]
Date: Tuesday, July 8, 2008, 11:58 AM
No other techniques that I know of...

But there is ongoing discussions/work towards making
reopening a
reader much less costly.  EG repopulating the field cache
after reopen
is a costly operation now, but this issue:

https://issues.apache.org/jira/browse/LUCENE-1231

would make that cost be proportional to the number &
size of the
changed segments since you last reopened.

There has also been discussions on creating an IndexReader

implementation that can directly search the RAM buffer in
IndexWriter,
which should give very fast turnaround in searching
just-indexed
documents, but that is quite a ways off...

Mike

Eric Diaz wrote:


Besides the warm up that the faq section suggests

(used on solr), is

there another technique or solution to have an

IndexReader/Search

with an updated view of an index under a concurrent

scenario (web

app)?

Thanks

--- On Tue, 7/8/08, Michael McCandless

<[EMAIL PROTECTED]>

wrote:


From: Michael McCandless

<[EMAIL PROTECTED]>

Subject: Re: Readers synchronization
To: java-user@lucene.apache.org
Date: Tuesday, July 8, 2008, 11:12 AM
No, that's not changed.  You must still reopen

an

IndexReader to see
changes to the index.  An IndexReader always

searches a

point-in-time
snapshot of the index.

LUCENE-1044 does mean that you should call
IndexWriter.commit() (or,
close the writer) to ensure all changes you've

made

become visible to
the reader.

Mike

Eric Diaz wrote:


According to SVN history on the next version

this will

be available:


LUCENE-1044: IndexWriter with autoCommit=true

now

commits (such

 that a reader can see the changes) far less

often

than it used to.

 Previously, every flush was also a commit.

You can

always force a

 commit by calling IndexWriter.commit().

Furthermore, in 3.0,

 autoCommit will be hardwired to false

(IndexWriter

constructors

 that take an autoCommit argument have been

deprecated) (Mike

 McCandless)

Does this mean that I won't need to reopen

all the

readers in order

to see the index changes?

Thanks








-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]







-

To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]







-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]





-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

How to handle frequent updates.

2008-07-08 Thread miztaken


Hi there,
I know lucene is for indexing and not for frequent updates and delete.
But i have been using lucene to store my matrix as a document.
Since with my algorithm the value of matrix can change so i am updating the
value.
But for this i have to close and reopen indexReader and in additional to
that the reader is not able to read the documents hold in the RAM directory
or buffer in indexWriter... i.e. documents that are hold in memory by
indexwriter due to other parameters set for indexwriter so eventually i have
to optimize or flush the writer and reopen the reader to get accurate
results.

Is there some work around for this type of job?
Can any one suggest me any other open source API ?

Thank You
miztaken
-- 
View this message in context: 
http://www.nabble.com/How-to-handle-frequent-updates.-tp18347238p18347238.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

boolean query or

2008-07-08 Thread Cam Bazz

Hello,

Is it possible to make a boolean query where a word is equal to fieldA or
fieldB?

in other words, I like to search a word in two fields, if word passes in
fieldA or fieldB, then it is a hit.

Best,
-C.B.

Move from RAMDirectory to FSDirectory causing problem sometimes

'deletable' indexing files are not deleted on RHEL5

Re: 'deletable' indexing files are not deleted on RHEL5

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: 'deletable' indexing files are not deleted on RHEL5

Re: How to make documents clustering and topic classification with lucene

Readers synchronization

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Readers synchronization

Re: Move from RAMDirectory to FSDirectory causing problem sometimes

Re: Readers synchronization

Re: Readers synchronization

Re: Readers synchronization

Re: Readers synchronization

How to handle frequent updates.

boolean query or

20 matches

Site Navigation

Mail list logo

Footer information