date:20030403

Re: Problem while indexing

2003-04-03 Thread Terry Steichen

Amit,

I don't exactly know what your problem is, but I'm using a configuration not
too different from yours with no problems - so at least you know it's
possible.

I have an index of about 125MB which I use on various machines, including an
old Windows98/SE 400MHz notebook.  I used the default MergeFactor (10, I
think) and do a daily merge (the daily addition represents about 200
documents added to a total of over 58,000).  Each document (XML format) has
about 15 fields of various types.  I'm using release 1.3 dev 1.

At one point I too had a problem of too many open files - turned out that I
wasn't closing the IndexReader.  Fixed that, and the number of open files
usually stays below 500 (without Lucene, there are typically about 300-400
open files just for the system).

HTH,

Terry



- Original Message -
From: Amit Kapur [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 12:13 AM
Subject: Problem while indexing



 hi all

 I m facing problems like mentioned below while indexing, If anyone has any
 help to offer i would to obliged
  couldn't rename segments.new to segments 
  F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
 files)

 I am trying to index documents using Lucene generating about 30 MB of
index
 (Optimized) which can be raised to about 100 MB or More ( but that would
be
 on a high end server machine).

 Description of Current Case:
 #---Each Document has four fields (One Text field, and 3 other Keyword
 Fields).
 #---The analyzer is based on a StopFilter and a PorterStemFilter.
 #---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
 #---mergeFactor is set to 25, and I am optimizing the index after adding
 about 20 Documents.
 #---Using Lucene Release 1.2

 Problem Faced
 After adding about 4000 Documents generating an index of 30 MB, I
initially
 got an error saying,  couldn't rename segments.new to segments 
 after which the IndexReader or the IndexWriter to the current index
couldnot
 be opened.

 Then I changed a couple of settings,
 #---mergeFactor=20 and Optimize was called after ever 10 documents.
 #---Using Lucene Release 1.3

 Problem Faced
 After adding about 1500 Documents generating an index of 10 MB, I
initially
 got an error saying,  F:\Program Files\OmniDocs
Server\ftstest\_3cf.fnm
 (Too many open files) after which the IndexWriter to the current index
 couldnot be opened.

 Now my requirement needs to have a much much larger index (practically)
and
 I am actually at the point where these errors are coming unpredictably.

 Please if anyone could guide me on this ASAP.
 Thanx in advance

 Regards
 Amit

 PS: I have already read articles in the mail archieve
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg02815.html.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Indexing Growth

2003-04-03 Thread Rob Outar

Would there be any abnormal effects if after adding a document, you called
optimize?  I am still seeing a large growth from setting a field.  When I
set a field I:

1.  Get the document
2.  Remove the field.
3.  Write the document to index
4.  Get the document again.
5.  Add the new field object.
6.  Write the document to index.
7.  Call optimize.

From writing out my steps it looks like I should write a set method instead
of treating set as removeField() and addField(), I thought combining these
two would equal set which it does, but it seems horribly inefficient.  But
in any case would the above cause in the index to grow from say 10.5 megs to
31 megs?

Is there any efficient way to implement a set, for example if there was a
field value pair of book/hamlet, but now we wanted to set book = none?
Please keep in mind there could be multiple field names with book.  So it is
not simply a matter of removing the field book and then readding it.

Anyhow let me know your thoughts.

Thanks,

Rob


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 02, 2003 11:35 AM
To: Lucene Users List
Subject: RE: Indexing Growth


Funny how this is the outcome of 90% of the problems people have with
software - their own mistakes :)

Regarding reindexing - no need for any explicit calls.  When you add a
document to the index it is indexed right away.  You will have to
detect index change (methods for that are there) and re-open the
IndexSearcher in order to see newly added/indexed documents.

Otis


--- Rob Outar [EMAIL PROTECTED] wrote:
 I found the freakin problem, I am going to kill my co-worker when he
 gets
 in.  He was removing a field and adding the same field back for each
 document in the index in a piece of code I did not notice until
 now  He is so dead.  I commented out that piece of
 code,
 queried to my hearts content and the index has not changed.  Heck the
 tool
 is like super fast now.

 One last concern is about the re-indexing thing, when does that
 occur?
 optimize()?  I am curious what method would cause a reindex.

 I want to thank all of you for your help, it was truly appreciated!

 Thanks,

 Rob



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



__
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem while indexing

2003-04-03 Thread Amit Kapur

Thanx Terry

Well today again I have made a few changes in the archietecture of my
component where am using Lucene, and changed the way I am using the
IndexReader and as u said made sure that all readers are closed, mergefactor
is back to default (10).

The test run is on and its working pretty well for now, have managed to have
about 1000 documents, index of about 10MB, n counting :) .. hope this time
things are better, well thanx for your word, it made me feel that this has
been rightly done before and can be done even now. I appreciate the way you
replied.

Thanx
would get back to you later ...
Cheers!!
Amit


- Original Message -
From: Terry Steichen
To: Lucene Users Group
Sent: Thursday, April 03, 2003 7:38 PM
Subject: Re: Problem while indexing


Amit,

I don't exactly know what your problem is, but I'm using a configuration not
too different from yours with no problems - so at least you know it's
possible.

I have an index of about 125MB which I use on various machines, including an
old Windows98/SE 400MHz notebook.  I used the default MergeFactor (10, I
think) and do a daily merge (the daily addition represents about 200
documents added to a total of over 58,000).  Each document (XML format) has
about 15 fields of various types.  I'm using release 1.3 dev 1.

At one point I too had a problem of too many open files - turned out that I
wasn't closing the IndexReader.  Fixed that, and the number of open files
usually stays below 500 (without Lucene, there are typically about 300-400
open files just for the system).

HTH,

Terry



- Original Message -
From: Amit Kapur [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 12:13 AM
Subject: Problem while indexing



 hi all

 I m facing problems like mentioned below while indexing, If anyone has any
 help to offer i would to obliged
  couldn't rename segments.new to segments 
  F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
 files)

 I am trying to index documents using Lucene generating about 30 MB of
index
 (Optimized) which can be raised to about 100 MB or More ( but that would
be
 on a high end server machine).

 Description of Current Case:
 #---Each Document has four fields (One Text field, and 3 other Keyword
 Fields).
 #---The analyzer is based on a StopFilter and a PorterStemFilter.
 #---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
 #---mergeFactor is set to 25, and I am optimizing the index after adding
 about 20 Documents.
 #---Using Lucene Release 1.2

 Problem Faced
 After adding about 4000 Documents generating an index of 30 MB, I
initially
 got an error saying,  couldn't rename segments.new to segments 
 after which the IndexReader or the IndexWriter to the current index
couldnot
 be opened.

 Then I changed a couple of settings,
 #---mergeFactor=20 and Optimize was called after ever 10 documents.
 #---Using Lucene Release 1.3

 Problem Faced
 After adding about 1500 Documents generating an index of 10 MB, I
initially
 got an error saying,  F:\Program Files\OmniDocs
Server\ftstest\_3cf.fnm
 (Too many open files) after which the IndexWriter to the current index
 couldnot be opened.

 Now my requirement needs to have a much much larger index (practically)
and
 I am actually at the point where these errors are coming unpredictably.

 Please if anyone could guide me on this ASAP.
 Thanx in advance

 Regards
 Amit

 PS: I have already read articles in the mail archieve
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg02815.html.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Indexing Growth

2003-04-03 Thread Rob Outar

I took out the optimize() after the write and the index is growing but at
like a 1kb rate, but now there are tons of 1kb files.  I assume at this
optimize would fix this?  What is a good rule of thumb for calling
optimize()?  Will Lucene ever invoke an optimize() on it's own?

Thanks,

Rob Outar
OneSAF AI -- SAIC
Software\Data Engineer
321-235-7660
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 10:53 AM
To: Lucene Users List
Subject: RE: Indexing Growth


Would there be any abnormal effects if after adding a document, you called
optimize?  I am still seeing a large growth from setting a field.  When I
set a field I:

1.  Get the document
2.  Remove the field.
3.  Write the document to index
4.  Get the document again.
5.  Add the new field object.
6.  Write the document to index.
7.  Call optimize.

From writing out my steps it looks like I should write a set method instead
of treating set as removeField() and addField(), I thought combining these
two would equal set which it does, but it seems horribly inefficient.  But
in any case would the above cause in the index to grow from say 10.5 megs to
31 megs?

Is there any efficient way to implement a set, for example if there was a
field value pair of book/hamlet, but now we wanted to set book = none?
Please keep in mind there could be multiple field names with book.  So it is
not simply a matter of removing the field book and then readding it.

Anyhow let me know your thoughts.

Thanks,

Rob


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 02, 2003 11:35 AM
To: Lucene Users List
Subject: RE: Indexing Growth


Funny how this is the outcome of 90% of the problems people have with
software - their own mistakes :)

Regarding reindexing - no need for any explicit calls.  When you add a
document to the index it is indexed right away.  You will have to
detect index change (methods for that are there) and re-open the
IndexSearcher in order to see newly added/indexed documents.

Otis


--- Rob Outar [EMAIL PROTECTED] wrote:
 I found the freakin problem, I am going to kill my co-worker when he
 gets
 in.  He was removing a field and adding the same field back for each
 document in the index in a piece of code I did not notice until
 now  He is so dead.  I commented out that piece of
 code,
 queried to my hearts content and the index has not changed.  Heck the
 tool
 is like super fast now.

 One last concern is about the re-indexing thing, when does that
 occur?
 optimize()?  I am curious what method would cause a reindex.

 I want to thank all of you for your help, it was truly appreciated!

 Thanks,

 Rob



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



__
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about increment update

2003-04-03 Thread kerr

Thank you Otis,
Yes, reader should be closed. But it isn't the reason of this Exception.
the errors happen before deleting file.
   Kerr.
close()
Closes files associated with this index. Also saves any new deletions to disk. No 
other methods should be called after this has been called.

- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 12:14 PM
Subject: Re: about increment update


 Maybe this is missing?
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
 
 Otis
 
 --- kerr [EMAIL PROTECTED] wrote:
  Hello everyone,
  Here I try to increment update index file and follow the idea to
  delete modified file first and re-add it. Here is the source.
  But when I execute it, the index directory create a file(write.lock)
  when execute the line
  reader.delete(i);, 
  and caught a class java.io.IOException   with message: Index locked
  for write.
  After that, when I execute the line
  IndexWriter writer = new IndexWriter(index, new
  StandardAnalyzer(), false);
  caught a class java.io.IOException   with message: Index locked for
  write
  if I delete the file(write.lock), the error will re-happen.
  anyone can help and thanks.
 Kerr.
  
  
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.index.IndexWriter;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.Field;
  import org.apache.lucene.store.Directory;
  import org.apache.lucene.store.FSDirectory;
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.index.Term;
  
  import java.io.File;
  import java.util.Date;
  
  
  public class UpdateIndexFiles {
public static void main(String[] args) {
  try {
Date start = new Date();
  
Directory directory = FSDirectory.getDirectory(index, false);
IndexReader reader = IndexReader.open(directory);
System.out.println(reader.isLocked(directory));
//reader.unlock(directory);
IndexWriter writer = new IndexWriter(index, new
  StandardAnalyzer(), false);
  
String base = ;
if (args.length == 0){
  base = D:\\Tomcat\\webapps\\ROOT\\test;
} else {
  base = args[0];
}
removeModifiedFiles(reader);
updateIndexDocs(reader, writer, new File(base));
  
writer.optimize();
writer.close();
  
Date end = new Date();
  
System.out.print(end.getTime() - start.getTime());
System.out.println( total milliseconds);
  
  } catch (Exception e) {
System.out.println( caught a  + e.getClass() +
 \n with message:  + e.getMessage());
e.printStackTrace();
  }
}
  
public static void removeModifiedFiles(IndexReader reader) throws
  Exception {
  Document adoc;
  String path;
  File aFile;
  for (int i=0; ireader.numDocs(); i++){
adoc = reader.document(i);
path = adoc.get(path);
aFile = new File(path);
if (reader.lastModified(path)  aFile.lastModified()){
  System.out.println(reader.isLocked(path));
  reader.delete(i);
}
  }
}
  
public static void updateIndexDocs(IndexReader reader, IndexWriter
  writer, File file)
 throws Exception {
  
  if (file.isDirectory()) {
String[] files = file.list();
for (int i = 0; i  files.length; i++)
updateIndexDocs(reader, writer, new File(file, files[i]));
  } else {
if (!reader.indexExists(file)){
  System.out.println(adding  + file);
  writer.addDocument(FileDocument.Document(file));
} else {}
  }
}
  }
 
 
 __
 Do you Yahoo!?
 Yahoo! Tax Center - File online, calculators, forms, and more
 http://tax.yahoo.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

Re: JSP files

2003-04-03 Thread John Bresnik


Additionally - you can use a crawler to crawl your site, then index the
resulting files. Lucene comes with a crawler called LARM but the current
make file doesnt build it properly. I ended using a different crawler called
Spinx :

http://www-2.cs.cmu.edu/~rcm/websphinx/





 Pinky,

 You don't want to index the jsp directly, as you would
 be missing the content inserted by the server when the
 pages are accessed. Typically indexing dynamic pages
 is problematic since the content will change
 freqently... That being said, the java.io library
 provides classes for retrieving the content of a URL
 as an input stream. You can write a class to traverse
 your site downloading the URLS and indexing them. It
 will be slower of course than reading HTML from disk
 files.

 -Tom

 --- Pinky Iyer [EMAIL PROTECTED] wrote:
 
   Hi all!
Is there any seperate parser for jsp files. Any
  other option other than modifying indexHTML.java
  class is appreciated. I already tried modifying this
  class, html parsing is fine, but jsp parsing yields
  all the jsp tags as well in the summary...
  Thanks!
  Pinky
 
 
 
  -
  Do you Yahoo!?
  Yahoo! Tax Center - File online, calculators, forms,
  and more


 __
 Do you Yahoo!?
 Yahoo! Tax Center - File online, calculators, forms, and more
 http://tax.yahoo.com

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Querying Question

2003-04-03 Thread Rob Outar

Hi all,

I am a little fuzzy on complex querying using AND, OR, etc..  For example:

I have the following name/value pairs

file 1 = name = checkpoint value = filename_1
file 2 = name = checkpoint value = filename_2
file 3 = name = checkpoint value = filename_3
file 4 = name = checkpoint value = filename_4

I ran the following Query:

name:\checkpoint\ AND  value:\filenane_1\

Instead of getting back file 1, I got back all four files?

Then after trying different things I did:

+(name:\checkpoint\) AND  +(value:\filenane_1\)

it then returned file 1.

Our project queries solely on name value pairs and we need the ability to
query using AND, OR, NOTS, etc..  What the correct syntax for such queries?

The code I use is :
 QueryParser p = new QueryParser(,
 new RepositoryIndexAnalyzer());
 this.query = p.parse(query.toLowerCase());
 Hits hits = this.searcher.search(this.query);

Thanks as always,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Querying Question

2003-04-03 Thread Eric Isakson

This query.toLowerCase() lowercased your query to become:

name:\checkpoint\ and  value:\filenane_1\

The keyword AND must be uppercase when the query parser gets a hold of it.

If your RepositoryIndexAnalyzer lowercases its tokens you don't need to do 
query.toLowerCase(). If it doesn't lowercase its tokens, you may want to modify it so 
that it does.

Eric

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2003 5:11 PM
To: Lucene Users List
Subject: Querying Question
Importance: High


Hi all,

I am a little fuzzy on complex querying using AND, OR, etc..  For example:

I have the following name/value pairs

file 1 = name = checkpoint value = filename_1
file 2 = name = checkpoint value = filename_2
file 3 = name = checkpoint value = filename_3
file 4 = name = checkpoint value = filename_4

I ran the following Query:

name:\checkpoint\ AND  value:\filenane_1\

Instead of getting back file 1, I got back all four files?

Then after trying different things I did:

+(name:\checkpoint\) AND  +(value:\filenane_1\)

it then returned file 1.

Our project queries solely on name value pairs and we need the ability to query using 
AND, OR, NOTS, etc..  What the correct syntax for such queries?

The code I use is :
 QueryParser p = new QueryParser(,
 new RepositoryIndexAnalyzer());
 this.query = p.parse(query.toLowerCase());
 Hits hits = this.searcher.search(this.query);

Thanks as always,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Querying Question

2003-04-03 Thread Rob Outar

RepositoryIndexAnalyzer :

 /**
 * Creates a TokenStream which tokenizes all the text in the provided
Reader.
 * Default implementation forwards to tokenStream(Reader) for
compatibility
 * with older version. Override to allow Analyzer to choose strategy
based
 * on document and/or field.
 * @param field is the name of the field
 * @param reader is the data
 * @return a token stream
 * @build 10
 */
public TokenStream tokenStream(String field, final Reader reader) {

// do not tokenize any field
TokenStream t = new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
return true;
}
};

//case insensitive search
t = new LowerCaseFilter(t);
return t;

}

but earlier when I did a query case became an issue I am not sure why as the
analyzer should have lowercased the token but it did not.

Thanks,

Rob

-Original Message-
From: Eric Isakson [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 5:23 PM
To: Lucene Users List
Subject: RE: Querying Question


This query.toLowerCase() lowercased your query to become:

name:\checkpoint\ and  value:\filenane_1\

The keyword AND must be uppercase when the query parser gets a hold of it.

If your RepositoryIndexAnalyzer lowercases its tokens you don't need to do
query.toLowerCase(). If it doesn't lowercase its tokens, you may want to
modify it so that it does.

Eric

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 5:11 PM
To: Lucene Users List
Subject: Querying Question
Importance: High


Hi all,

I am a little fuzzy on complex querying using AND, OR, etc..  For example:

I have the following name/value pairs

file 1 = name = checkpoint value = filename_1
file 2 = name = checkpoint value = filename_2
file 3 = name = checkpoint value = filename_3
file 4 = name = checkpoint value = filename_4

I ran the following Query:

name:\checkpoint\ AND  value:\filenane_1\

Instead of getting back file 1, I got back all four files?

Then after trying different things I did:

+(name:\checkpoint\) AND  +(value:\filenane_1\)

it then returned file 1.

Our project queries solely on name value pairs and we need the ability to
query using AND, OR, NOTS, etc..  What the correct syntax for such queries?

The code I use is :
 QueryParser p = new QueryParser(,
 new RepositoryIndexAnalyzer());
 this.query = p.parse(query.toLowerCase());
 Hits hits = this.searcher.search(this.query);

Thanks as always,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Querying Question

2003-04-03 Thread Aviran Mordo

You should not tokenize the file name instead you should use

 doc.add(new Field(name, value,
true, true, true));
Or 
Doc.add(Field.keyword(name,value));

Aviran

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2003 5:27 PM
To: Lucene Users List
Subject: RE: Querying Question


Use the following type of Field:

   doc.add(new Field(name, value,
true, true, true));
   

Thanks,
 
Rob 


-Original Message-
From: Aviran Mordo [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 5:19 PM
To: 'Lucene Users List'
Subject: RE: Querying Question


Did you index the value field as a keyword?

Aviran

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2003 5:11 PM
To: Lucene Users List
Subject: Querying Question
Importance: High


Hi all,

I am a little fuzzy on complex querying using AND, OR, etc.. For
example:

I have the following name/value pairs

file 1 = name = checkpoint value = filename_1
file 2 = name = checkpoint value = filename_2
file 3 = name = checkpoint value = filename_3
file 4 = name = checkpoint value = filename_4

I ran the following Query:

name:\checkpoint\ AND  value:\filenane_1\

Instead of getting back file 1, I got back all four files?

Then after trying different things I did:

+(name:\checkpoint\) AND  +(value:\filenane_1\)

it then returned file 1.

Our project queries solely on name value pairs and we need the ability
to query using AND, OR, NOTS, etc..  What the correct syntax for such
queries?

The code I use is :
 QueryParser p = new QueryParser(,
 new RepositoryIndexAnalyzer());
 this.query = p.parse(query.toLowerCase());
 Hits hits = this.searcher.search(this.query);

Thanks as always,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about increment update

2003-04-03 Thread Ian Lea

Try this:

1.  Open reader.
2.  removeModifiedFiles(reader)
3.  reader.close()
4.  Open writer.
5.  updateIndexDocs()
6.  writer.close();

i.e. don't have both reader and writer open at the same time.

btw I suspect you might be removing index entries for files that
have been modified, but adding all files. Another index keeps
growing problem!  Could be wrong.


--
Ian.

 [EMAIL PROTECTED] (kerr) wrote 

 Thank you Otis,
 Yes, reader should be closed. But it isn't the reason of this Exception.
 the errors happen before deleting file.
Kerr.
 close()
 Closes files associated with this index. Also saves any new deletions to disk. No 
 other methods should be called after this has been called.
 
 - Original Message - 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, April 03, 2003 12:14 PM
 Subject: Re: about increment update
 
 
  Maybe this is missing?
  http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
  
  Otis
  
  --- kerr [EMAIL PROTECTED] wrote:
   Hello everyone,
   Here I try to increment update index file and follow the idea to
   delete modified file first and re-add it. Here is the source.
   But when I execute it, the index directory create a file(write.lock)
   when execute the line
   reader.delete(i);, 
   and caught a class java.io.IOException   with message: Index locked
   for write.
   After that, when I execute the line
   IndexWriter writer = new IndexWriter(index, new
   StandardAnalyzer(), false);
   caught a class java.io.IOException   with message: Index locked for
   write
   if I delete the file(write.lock), the error will re-happen.
   anyone can help and thanks.
  Kerr.
   
   
   import org.apache.lucene.analysis.standard.StandardAnalyzer;
   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.document.Document;
   import org.apache.lucene.document.Field;
   import org.apache.lucene.store.Directory;
   import org.apache.lucene.store.FSDirectory;
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.index.Term;
   
   import java.io.File;
   import java.util.Date;
   
   
   public class UpdateIndexFiles {
 public static void main(String[] args) {
   try {
 Date start = new Date();
   
 Directory directory = FSDirectory.getDirectory(index, false);
 IndexReader reader = IndexReader.open(directory);
 System.out.println(reader.isLocked(directory));
 //reader.unlock(directory);
 IndexWriter writer = new IndexWriter(index, new
   StandardAnalyzer(), false);
   
 String base = ;
 if (args.length == 0){
   base = D:\\Tomcat\\webapps\\ROOT\\test;
 } else {
   base = args[0];
 }
 removeModifiedFiles(reader);
 updateIndexDocs(reader, writer, new File(base));
   
 writer.optimize();
 writer.close();
   
 Date end = new Date();
   
 System.out.print(end.getTime() - start.getTime());
 System.out.println( total milliseconds);
   
   } catch (Exception e) {
 System.out.println( caught a  + e.getClass() +
  \n with message:  + e.getMessage());
 e.printStackTrace();
   }
 }
   
 public static void removeModifiedFiles(IndexReader reader) throws
   Exception {
   Document adoc;
   String path;
   File aFile;
   for (int i=0; ireader.numDocs(); i++){
 adoc = reader.document(i);
 path = adoc.get(path);
 aFile = new File(path);
 if (reader.lastModified(path)  aFile.lastModified()){
   System.out.println(reader.isLocked(path));
   reader.delete(i);
 }
   }
 }
   
 public static void updateIndexDocs(IndexReader reader, IndexWriter
   writer, File file)
  throws Exception {
   
   if (file.isDirectory()) {
 String[] files = file.list();
 for (int i = 0; i  files.length; i++)
 updateIndexDocs(reader, writer, new File(file, files[i]));
   } else {
 if (!reader.indexExists(file)){
   System.out.println(adding  + file);
   writer.addDocument(FileDocument.Document(file));
 } else {}
   }
 }
   }

--
Searchable personal storage and archiving from http://www.digimem.net/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem while indexing

RE: Indexing Growth

Re: Problem while indexing

RE: Indexing Growth

Re: about increment update

Re: JSP files

Querying Question

RE: Querying Question

RE: Querying Question

RE: Querying Question

Re: about increment update

11 matches

Site Navigation

Mail list logo

Footer information