Re: problems with search on Russian content
Hi i took a look at Andrey Grishin russian character problem and found something strange happening while we tried to debug it. It seems that he has avoided the usual querying with different encoding than indexed problem as he can dump out correctly encoded russian at all points in his application. Is the strings for terms treated differently than the text stored in text fields? The reason i ask is that his russian words are correct in the stored text fields, but shows up faulty in a terms() dump. If he had a character encoding problem in his application the fields should show up faulty as well i think. Even stranger is that i use Lucene 1.2 successfully for utf-8, iso-8859-1, iso-8859-5 and iso-8859-7. Why is this problem showing in russian(Cp1251) and not the other encodings? Strangeness number two is the theory that if the russian word ,!,_,U was skewed to say 0d66539qw upon indexing, and the problem was just a consistent encoding problem, wouldn't a query with ,!,_,U be skewed to 0d66539qw and be found anyway? mvh karl )*ie Begin forwarded message: From: Andrey Grishin [EMAIL PROTECTED] Date: Thu Nov 21, 2002 15:13:33 Europe/Oslo To: Karl Oie [EMAIL PROTECTED] Subject: Re: How to include strange characters?? yes, you are right - there are no russian words in returned terms :((( I've just executed the following -- IndexReader r = IndexReader.open(C:\\j\\jakarta-tomcat-4.1.12\\index\\ukrenergo); TermEnum e = r.terms(); while (e.next()) { Term term = (Term) e.term(); System.out.println(term : + term.text()); } -- and got no russian words in result there are some strange terms returned instead of russian: term : 0d4xvp70w term : 0d66539qw term : 0d67les2o term : 0d6eqgic0 etc. So, I think we got a problem. THis is great :)), thank you... but how to fix it? - Original Message - From: Karl ?e [EMAIL PROTECTED] To: Andrey Grishin [EMAIL PROTECTED] Sent: Thursday, November 21, 2002 3:56 PM Subject: Re: How to include strange characters?? another thing to check is weither the IndexReader.terms() actually contains your term. mvh karl oie On Thursday, Nov 21, 2002, at 14:31 Europe/Oslo, Andrey Grishin wrote: Karl, I have the same problem with lucene search within russian content. I tried all your advises, but lucene still can't find anything : I indexed the content using Cp1251 charset text = new String(text.getBytes(Cp1251)); doc.add(Field.Text(CONTENT_FIELD,text)); and I am searching using the same charset String txt = ,!,_,U; txt = new String(txt.getBytes(Cp1251)); PrefixQuery query = new PrefixQuery(new Term(PortalHTMLDocument.CONTENT_FIELD, txt)); hits = searcher.search(query); and lucene can't find nothing. Also I checked for the DecodeInterceptor in my server.xml - there isn't any I tried UTF-8/16 - and got the same result. if I list all index's content via iterating IndexReader- I can see that my russian content is stored in index... Can you please help me? Do you have any more ideas about what else can be done here to fix this problem? I will appreciate any help. Thanks, Andrey. P.S. I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: problems with search on Russian content
Sorry, my bad! Didn't read this informative post :-) mvh karl øie On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote: Look at CHANGES.txt document in CVS - there is some new stuff in org.apache.lucene.analysis.ru package that you will want to use. Get the Lucene from the nightly build... Otis --- Andrey Grishin [EMAIL PROTECTED] wrote: Hi All, I have a problems with searching on Russian content using lucene 1.2 I indexed the content using Cp1251 charset text = new String(text.getBytes(Cp1251)); doc.add(Field.Text(CONTENT_FIELD,text)); and I am searching using the same charset String txt = ·Œƒ; txt = new String(txt.getBytes(Cp1251)); PrefixQuery query = new PrefixQuery(new Term(PortalHTMLDocument.CONTENT_FIELD, txt)); hits = searcher.search(query); or Analyzer analyzer = new StandardAnalyzer(); String txt = ·Œƒ“≈ ; txt = new String(txt.getBytes(Cp1251)); Query query = QueryParser.parse(txt, PortalHTMLDocument.CONTENT_FIELD, analyzer); hits = searcher.search(query); and lucene can't find nothing. Also I checked for the DecodeInterceptor in my server.xml - there isn't any I tried UTF-8/16 - and got the same result. Also, if I list all index's content via iterating IndexReader - I can see that my russian content is stored in index... Can you please help me? Do you have any more ideas about what else can be done here to fix this problem? I will appreciate any help. Thanks, Andrey. P.S. I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS __ Do you Yahoo!? Yahoo! Mail Plus ñ Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
PDF parser
Whats the best parser available to extarct text from PDF documents. Expecting a reply ASAP Thanks in advance Thomas Chacko
AW: PDF parser
There are different Parsers available - every Parser has other advantages and disadvantages. I use a combination of the PDFBox http://www.pdfbox.org/ and Etymon PJ http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them parse PDF in a format of their own an provide interfaces to get the PDF Documents contents. Other developers on this list prefer JPedal http://www.jpedal.org/ which parses PDF into XML an provide a XML Tree with the PDF Documents contentsest, but the Documentation isn´t very detailed. Micha -Ursprüngliche Nachricht- Von: Thomas Chacko [mailto:[EMAIL PROTECTED]] Gesendet: Freitag, 22. November 2002 15:26 An: Lucene Users List Betreff: PDF parser Whats the best parser available to extarct text from PDF documents. Expecting a reply ASAP Thanks in advance Thomas Chacko -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
How does delete work?
Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- Brain: Pinky, are you pondering what Im pondering? Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were they thinking? -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Updating documents
I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Updating documents
There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Returns the time the index in this directory was last modified. static long lastModified(File directory) Returns the time the index in the named directory was last modified. static long lastModified(String directory) Returns the time the index in the named directory was last modified. Do I need to create a new instance of IndexSearcher each time I search? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:20 PM To: Lucene Users List Subject: Re: Updating documents Don't you have to make use of lastModified method (I think in IndexSearcher), to 'reload' your instance of IndexSearcher? I'm pulling this from some old, not very fresh memory Otis --- Rob Outar [EMAIL PROTECTED] wrote: I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Updating documents
Btw. I have posted the code for this before, so you can find it in the list archives. Otis --- Scott Ganyo [EMAIL PROTECTED] wrote: Not each time you search, but if you've modified the index since you opened the searcher, you need to create a new searcher to get the changes. Scott Rob Outar wrote: There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Returns the time the index in this directory was last modified. static long lastModified(File directory) Returns the time the index in the named directory was last modified. static long lastModified(String directory) Returns the time the index in the named directory was last modified. Do I need to create a new instance of IndexSearcher each time I search? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:20 PM To: Lucene Users List Subject: Re: Updating documents Don't you have to make use of lastModified method (I think in IndexSearcher), to 'reload' your instance of IndexSearcher? I'm pulling this from some old, not very fresh memory Otis --- Rob Outar wrote: I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: -- Brain: Pinky, are you pondering what Im pondering? Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were they thinking? -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Updating documents
A deletion is only visible in other IndexReader instances created after the IndexReader where you made the deletion is closed. So if you're searching using a different IndexReader, you need to re-open it after the deleting IndexReader is closed. The lastModified method helps you to figure out when this is required. The standard idiom is to cache the lastModified date returned when a reader is opened, then check it against the current value before each search. When it is different, re-open. Note: If you have many searching threads, it is most efficient for them to share an IndexReader. But if one thread closes the reader while others are still searching it, then those searches may crash. So, when re-opening the index, don't immediately close the old one. Rather just let the garbage collector close its open files. The only problem with this approach is that, if your index changes more frequently than the garbage collector collects old indexes then you can run out of file handles. Hmm. It would probably make things simpler if an IndexReader cached its lastModifiedDate when it was opened, so that applications don't have to do this themseleves to find out whether an IndexReader is out-of-date... Doug Rob Outar wrote: There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Returns the time the index in this directory was last modified. static long lastModified(File directory) Returns the time the index in the named directory was last modified. static long lastModified(String directory) Returns the time the index in the named directory was last modified. Do I need to create a new instance of IndexSearcher each time I search? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:20 PM To: Lucene Users List Subject: Re: Updating documents Don't you have to make use of lastModified method (I think in IndexSearcher), to 'reload' your instance of IndexSearcher? I'm pulling this from some old, not very fresh memory Otis --- Rob Outar [EMAIL PROTECTED] wrote: I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
large index - slow optimize()
Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every optimize() call. I suspect this is normal and not a bug, but is there no way around that? Do you know which part is the part that takes longer and longer as the index grows? Thanks, Otis __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a single segment with documents 0-100. Is this right? Thanks, Otis --- Doug Cutting [EMAIL PROTECTED] wrote: Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: large index - slow optimize()
Note - this is not a fact, this is what I think I know about how it works. My working assumption has been its just a matter of disk speed, since during optimize, the entire index is copied into new files, and then at the end, the old one is removed. So the more GB you have to copy, the longer it takes. This is also the reason that you need double the size of your index available on the drive in order to perform an optimize, correct? Or does this only apply when you are merging indexes? Dan -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:52 PM To: [EMAIL PROTECTED] Subject: large index - slow optimize() Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every optimize() call. I suspect this is normal and not a bug, but is there no way around that? Do you know which part is the part that takes longer and longer as the index grows? Thanks, Otis __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: has this exception been seen before
I am getting this problem as well, but have not been able to pinpoint the cause. A tip for those who are doing a complete re-index. You can save alot of time by creating a new index and then merging the old files into the new index. One disadvantage here is that you may have to re-point your app to the new index. I find that the bug prevents the old index from being deleted on Win2K. _ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Readability score?
Hello, This is slightly off topic but... Does anyone have a handy library to compute readability score? Something like Flesch Reading Ease score Co: http://thibs.menloschool.org/~djwong/docs/wordReadabilityformulas.html Would you like to share?-) Thanks. R. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
No, in my example optimize() was never called. The merge rule operates recursively. So, after 99 documents had been added the segment stack contained nine indexes with ten documents and nine with one document. When the hundredth document was added, the nine one document segments were popped of the stack and merged into a single segment that was pushed onto the stack. So then the top of the stack had ten segments each containing ten documents, i.e., mergeFactor segments of the same size, and these ten segments were then merged into a single segment of 100 documents. So adding the 100th document triggered two merges. (One error in my previous example: the 100 document segment actually contains documents 0-99, not 0-100.) A corollary of this is, when mergeFactor is 10 and no deletions have been made, the segments correspond to the digits in the decimal representation of the number of documents in the index. So, an 85 document index has eight segments with 10 documents and five segments with one documents. (This is somewhat of a simplification, as Lucene automatically merges single document segments before ever writing them to disk as an optimization.) It is most beneficial to call IndexWriter.optimize() only when you know you won't be adding documents to an index for a while, but will be searching it a lot. Calling optimize() periodically while indexing mostly just slows things down. Doug Otis Gospodnetic wrote: I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a single segment with documents 0-100. Is this right? Thanks, Otis --- Doug Cutting [EMAIL PROTECTED] wrote: Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Question on having IndexReader and IndexWriter simultaneously
Hi, According to my experimentation, I am unable to create an IndexWriter while any IndexReader/Searcher is open on the same index. Since I have all search threads share one IndexReader, each time I need to create an IndexWriter I have to wait until all searches are done so that I can close the IndexReader. Only then I am able to create an IndexWriter. Does this concurrency problem really exist? Because one problem I have now is starvation of modification threads. Thanks. -- Herman
Date Range - I've searched FAQs and mail list archive..... no help..... Really
Part of my problem seems to be that the Range Query Object isn't acting as it should as per the FAQ and other mail list entries. I'm using Lucene 1.2 I have a field in my index called DATE. I'd like to do a date range search on it. I am using Strings in the format of MMdd. I have the following dates in my Index: 20021105 20021126 20021113 20021115 20021103 20021125 When I use the follwing code to search, I get an exception: *NOTE: I'm using the MultiFieldQueryParser becuase In some cases I check other field, I've simplified this one to demonstrate (and run my tests isolated from other factors) IndexSearcher search = new IndexSearcher(myindex); SimpleAnalyzer analyzer = new SimpleAnalyzer(); String[] fields = new String[1]; fields[0] = DATE String buff = ( DATE:[20021101 - 20021131] ); Query query = MultiFieldQueryParser.parse(buff, fields, analyzer); searcher.search(query); I get the following error: java.lang.IllegalArgumentException: At least one term must be non-null if buff = ( DATE:20021101 - 20021131 ); as well as if buff = ( DATE:(20021101 - 20021131 )); I simply get no results. I have added the date to the document by both Field.Text(DATE, dateStr); and Field.Keyword(DATE, dateStr); I have also tried to build the queries up creating Objects. One of the things I notice is that if I use the RangeQuery Object there are no spaces on either side of the -. The documents which I created have the following Fields: TITLE, DESCRIPTION and DATE. If I search on TITLE or DESCRIPTION or a combination of both I get results just fine. Am I doing something stupid, or is this a bug? Seems to based on what I read that the example above where String buff = ( DATE:[20021101 - 20021131] ); is correct and should work. I published the complete source in an earlier posting called Problem with Range. It also contains a stack trace of the error. Thanks in advance, Michael
Re: Question on having IndexReader and IndexWriter simultaneously
Sounds like problem outside Lucene. Can you create a self-contained class that demonstrates the problem? If you cannot it probably is not a problem. Otis --- Herman Chen [EMAIL PROTECTED] wrote: Hi, According to my experimentation, I am unable to create an IndexWriter while any IndexReader/Searcher is open on the same index. Since I have all search threads share one IndexReader, each time I need to create an IndexWriter I have to wait until all searches are done so that I can close the IndexReader. Only then I am able to create an IndexWriter. Does this concurrency problem really exist? Because one problem I have now is starvation of modification threads. Thanks. -- Herman __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]