RE: PDFBox Issue
What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have gotten this error before using PDFBox? I've gotten the latest version of PDFBox and it is giving me the following error: java.lang.VerifyError: (class: org/apache/log4j/LogManager, method: clinit signature: ()V) Incompatible argument to function at org.apache.log4j.Logger.getLogger(Logger.java:94) at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57) at org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum ent.java:197) at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu ment.java:118) at Index.indexFile(Index.java:287) at Index.indexDirectory(Index.java:265) at Index.update(Index.java:63) at Lucene.main(Lucene.java:26) Exception in thread main I am using all the jar files that came with PDFBox. Anyone run into this problem. I am using the following line of code: Document doc = LucenePDFDocument.getDocument(f); Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: http AND halt
What Analyzer is being used? If it is removing stop words, what is the stop word list? Erik On Aug 17, 2004, at 1:56 AM, Leos Literak wrote: One user reported, that if he searches http AND halt, the search fails. This can be found in logs: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.Vector.elementAt(Vector.java:434) at org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java: 181) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108) Leos - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PDFBox Issue
Wow, this is an old message. I managed to get my code to work by using the previous version of PDFBox. I had used the version of log4j that had come with PDFBox. Someone had mentioned recompiling log4j, but I couldn't get the project to import the source into Eclipse, so I gave up. But things work great with the version of PDFBox that I compiled with so I am fine with that. As for the version of log4j, I could not tell you, as I said above it came with PDFBox, so I'm guessing that it had probably not been tested with the version of log4j it was being distributed with. Paul Smith wrote: What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have gotten this error before using PDFBox? I've gotten the latest version of PDFBox and it is giving me the following error: java.lang.VerifyError: (class: org/apache/log4j/LogManager, method: clinit signature: ()V) Incompatible argument to function at org.apache.log4j.Logger.getLogger(Logger.java:94) at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57) at org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum ent.java:197) at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu ment.java:118) at Index.indexFile(Index.java:287) at Index.indexDirectory(Index.java:265) at Index.update(Index.java:63) at Lucene.main(Lucene.java:26) Exception in thread "main" I am using all the jar files that came with PDFBox. Anyone run into this problem. I am using the following line of code: Document doc = LucenePDFDocument.getDocument(f); Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AnalyZer HELP Please
Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PDFBox Issue
PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar file), I believe that 1.2.8 is the latest. I will make sure that the next version of PDFBox includes the latest log4j version, which I assume is what everybody would like to use. But, by looking at the below error message it appears that you might have an older log4j in your classpath Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8 Ben On Tue, 17 Aug 2004, Don Vaillancourt wrote: Wow, this is an old message. I managed to get my code to work by using the previous version of PDFBox. I had used the version of log4j that had come with PDFBox. Someone had mentioned recompiling log4j, but I couldn't get the project to import the source into Eclipse, so I gave up. But things work great with the version of PDFBox that I compiled with so I am fine with that. As for the version of log4j, I could not tell you, as I said above it came with PDFBox, so I'm guessing that it had probably not been tested with the version of log4j it was being distributed with. Paul Smith wrote: What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have gotten this error before using PDFBox? I've gotten the latest version of PDFBox and it is giving me the following error: java.lang.VerifyError: (class: org/apache/log4j/LogManager, method: clinit signature: ()V) Incompatible argument to function at org.apache.log4j.Logger.getLogger(Logger.java:94) at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57) at org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum ent.java:197) at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu ment.java:118) at Index.indexFile(Index.java:287) at Index.indexDirectory(Index.java:265) at Index.update(Index.java:63) at Lucene.main(Lucene.java:26) Exception in thread main I am using all the jar files that came with PDFBox. Anyone run into this problem. I am using the following line of code: Document doc = LucenePDFDocument.getDocument(f); Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Don Vaillancourt Director of Software Development * *WEB IMPACT INC.* phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] web: http://www.web-impact.com / This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. / - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PDFBox Issue
Anything is possible. In a couple of weeks I may be upgrading my code to use Lucene 1.4 and I will make an attempt to use the latest version of PDFBox. You may be right about log4j being somewhere else in the classpath, but being a jar for Jakarta, I couldn't think of any apps on my desktop that might use it. I'm doing a search now and ColdFusionMX is the only app I can think of, but I'm pretty sure it didn't come with log4j.jar. Well I'll have to experiment a little. Thanks Ben Litchfield wrote: PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar file), I believe that 1.2.8 is the latest. I will make sure that the next version of PDFBox includes the latest log4j version, which I assume is what everybody would like to use. But, by looking at the below error message it appears that you might have an older log4j in your classpath Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8 Ben On Tue, 17 Aug 2004, Don Vaillancourt wrote: Wow, this is an old message. I managed to get my code to work by using the previous version of PDFBox. I had used the version of log4j that had come with PDFBox. Someone had mentioned recompiling log4j, but I couldn't get the project to import the source into Eclipse, so I gave up. But things work great with the version of PDFBox that I compiled with so I am fine with that. As for the version of log4j, I could not tell you, as I said above it came with PDFBox, so I'm guessing that it had probably not been tested with the version of log4j it was being distributed with. Paul Smith wrote: What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have gotten this error before using PDFBox? I've gotten the latest version of PDFBox and it is giving me the following error: java.lang.VerifyError: (class: org/apache/log4j/LogManager, method: clinit signature: ()V) Incompatible argument to function at org.apache.log4j.Logger.getLogger(Logger.java:94) at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57) at org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum ent.java:197) at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu ment.java:118) at Index.indexFile(Index.java:287) at Index.indexDirectory(Index.java:265) at Index.update(Index.java:63) at Lucene.main(Lucene.java:26) Exception in thread "main" I am using all the jar files that came with PDFBox. Anyone run into this problem. I am using the following line of code: Document doc = LucenePDFDocument.getDocument(f); Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Don Vaillancourt Director of Software Development * *WEB IMPACT INC.* phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] web: http://www.web-impact.com / This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. / - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately
RE: AnalyZer HELP Please
Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
Karthik, What you would want to do with the split tokens ( New and Year ) is then create a PhraseQuery containing a Term object for each token. This should do what you want. As Erik said, QueryParser would have done this internally, only if you actually sent in the quotes...not just New Year, but \New Year\. Patrick On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
Further on this, Karthik, is that you need to really understand what you indexed. For example... take a document that has New Year in it, and follow it through your indexing process. See what your analyzer at indexing time actually indexed. And if new year are side-by-side tokens emitted from that process, then querying for New Year through QueryParser should find a match. You can easily put together a 10-line JUnit test case using RAMDirectory and your favorite Analyzer to see how this works. I highly recommend you do this in order to isolate the situation even further. Erik On Aug 17, 2004, at 9:25 AM, Patrick Burleson wrote: Karthik, What you would want to do with the split tokens ( New and Year ) is then create a PhraseQuery containing a Term object for each token. This should do what you want. As Erik said, QueryParser would have done this internally, only if you actually sent in the quotes...not just New Year, but \New Year\. Patrick On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
On Aug 17, 2004, at 9:23 AM, Karthik N S wrote: So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here No... this is what this analyzer does. QueryParser does the same thing. The difference it that QueryParser knows it was wrapped in quotes, so it takes each of those terms [New] and [Year] and makes a zero-slop PhraseQuery out of them. Have another look at this stuff: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: AnalyZer HELP Please
Hi Patrick I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Thx Karthik -Original Message- From: Patrick Burleson [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:55 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please Karthik, What you would want to do with the split tokens ( New and Year ) is then create a PhraseQuery containing a Term object for each token. This should do what you want. As Erik said, QueryParser would have done this internally, only if you actually sent in the quotes...not just New Year, but \New Year\. Patrick On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Whether this is a bug or not is really for your users to determine :) But it is absolutely the expected behavior. QueryParser analyzes the expression too. Even if you somehow changed QueryParser, if you never indexed the word new then you certainly cannot expect to search on it and find it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help? -George --- [EMAIL PROTECTED] wrote: I fixed my own problem, but hope this might help someone else in the future: I went into my segments file (with a hex editor), deleted the record for _cu0v and changed the length 0x20 to be 0x1f, and it seems I have most of my index back! Maybe a developer could elaborate on this? ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
I think attachments are filtered. This is what I see when I open in the hex editor. : 00 04 e0 af 00 00 00 02 05 5f 36 75 6e 67 00 04 ..à¯._6ung.. :0010 1e fb 05 5f 36 75 6e 69 00 00 00 01 00 00 00 00 .û._6uni :0020 00 00 c1 b4 ..Á´ -George --- Honey George [EMAIL PROTECTED] wrote: Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help? -George --- [EMAIL PROTECTED] wrote: I fixed my own problem, but hope this might help someone else in the future: I went into my segments file (with a hex editor), deleted the record for _cu0v and changed the length 0x20 to be 0x1f, and it seems I have most of my index back! Maybe a developer could elaborate on this? ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
http://www.ultraedit.com/ is the best! However, I cannot imagine how another hexeditor wouldnt work. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:35 AM To: Lucene Users List Subject: RE: Restoring a corrupt index Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help? -George --- [EMAIL PROTECTED] wrote: I fixed my own problem, but hope this might help someone else in the future: I went into my segments file (with a hex editor), deleted the record for _cu0v and changed the length 0x20 to be 0x1f, and it seems I have most of my index back! Maybe a developer could elaborate on this? ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
Change 02 to be 01 and delete the bytes that represent the one record that is bad. It was easier to see what a record was in my file because I had about 30 _files. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:39 AM To: Lucene Users List Subject: RE: Restoring a corrupt index I think attachments are filtered. This is what I see when I open in the hex editor. : 00 04 e0 af 00 00 00 02 05 5f 36 75 6e 67 00 04 ..à¯._6ung.. :0010 1e fb 05 5f 36 75 6e 69 00 00 00 01 00 00 00 00 .û._6uni :0020 00 00 c1 b4 ..Á´ -George --- Honey George [EMAIL PROTECTED] wrote: Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help? -George --- [EMAIL PROTECTED] wrote: I fixed my own problem, but hope this might help someone else in the future: I went into my segments file (with a hex editor), deleted the record for _cu0v and changed the length 0x20 to be 0x1f, and it seems I have most of my index back! Maybe a developer could elaborate on this? ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: AnalyZer HELP Please
Hi Guys Apologies.. Correct me If I am wrong... During Indexing process, if the Analyzer has a word 'new' in the array ' STOPWORD' this word is prevented from indexing or Stopped from indexing. Then during the process of Search would not return me a hit on the word New Year , since the word 'new' is in Array STOPWORD ... [ Even if the Word is surrounded by \] With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 7:35 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Whether this is a bug or not is really for your users to determine :) But it is absolutely the expected behavior. QueryParser analyzes the expression too. Even if you somehow changed QueryParser, if you never indexed the word new then you certainly cannot expect to search on it and find it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[OT] Re: Restoring a corrupt index
Hmm, while I agree that UltraEdit is the best on Windows, since they were using KHexEdit, I doubt it's an option for them on Linux (although I do know it runs fine under Wine). Patrick On Tue, 17 Aug 2004 10:39:27 -0400, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: http://www.ultraedit.com/ is the best! However, I cannot imagine how another hexeditor wouldnt work. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:35 AM To: Lucene Users List Subject: RE: Restoring a corrupt index Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help? -George --- [EMAIL PROTECTED] wrote: I fixed my own problem, but hope this might help someone else in the future: I went into my segments file (with a hex editor), deleted the record for _cu0v and changed the length 0x20 to be 0x1f, and it seems I have most of my index back! Maybe a developer could elaborate on this? ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AnalyZer HELP Please
I believe that is correct. So, the word new is never being indexed since it is a stop word. Patrick On Tue, 17 Aug 2004 20:26:19 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies.. Correct me If I am wrong... During Indexing process, if the Analyzer has a word 'new' in the array ' STOPWORD' this word is prevented from indexing or Stopped from indexing. Then during the process of Search would not return me a hit on the word New Year , since the word 'new' is in Array STOPWORD ... [ Even if the Word is surrounded by \] With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 7:35 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Whether this is a bug or not is really for your users to determine :) But it is absolutely the expected behavior. QueryParser analyzes the expression too. Even if you somehow changed QueryParser, if you never indexed the word new then you certainly cannot expect to search on it and find it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Swapping Indexes?
Forward back to list. -- Forwarded message -- From: Patrick Burleson [EMAIL PROTECTED] Date: Tue, 17 Aug 2004 11:30:19 -0400 Subject: Re: Swapping Indexes? To: Stephane James Vaucher [EMAIL PROTECTED] Stephane, Thank you for the ideas. I'm going about implenting idea 1 (I like the idea of leaving the temp index around for recovery), but I have a question reguarding your original index. Do you just copy over the temp index and don't worry abou cleaning up the old index directory? Right now I have my code deleting the files in the main index directory after telling the search controller to switch to the temp index. But by doing that, I need to manage existing searches and not break them while they are running. I also still run into the open files problem on Windows when trying to delete a file one of the searchers has open before it's closed. Thoughts? Patrick On Mon, 16 Aug 2004 18:22:20 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: I've tried two options that seem to work: 1) Have a singleton that is responsible that will control your searchers. This controller can temporarilly redirect your searchers to c:/temp/myindex, allowing you to copy you index to c:/myindex. After that process completes, your controller can tell your searchers to use c:/myindex, allowing you to then erase your temp index. If you index nightly, you can always *not* erase your tmp dir, your index process will do this automatically if you create your IndexWriter with the overwrite option. This way, you can have a backup index if there is a system failure at some point (like when you copy/move directories). 2) Use an incremental index. Regularly, I scan my files, see if there are modification/additions and update my master index. Removing from the master index, adding to a temp dir, then merging. I haven't seen any weirdness on windows with this process. HTH, sv On Mon, 16 Aug 2004, Patrick Burleson wrote: I've read in the docs about updating an index and its suggestion reguarding swapping out indexes with a directory rename. Here's my question, how to do this when searches are running live? Say I have a directory that holds the current valid index: C:\myindex and when I'm running my nightly process to generate the index, it gets temporarily indexed to: C:\temp\myindex How can I very quickly replace C:\myindex with C:\temp\myindex? I can't simply do a rename since C:\myindex will likely have open files. (Gotta love windows) And I can't delete all files in myindex, again because of the open files issue. Any ideas? Thanks, Patrick - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
javadoc api
Hello Lucene developers A litle issue about a Field documentation. In Field class on getBoost() method it says: Returns the boost factor for hits on any field of this document. I think that this comment are copied from Document class and forgot change it. Bye Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Swapping Indexes?
On Tue, 17 Aug 2004, Patrick Burleson wrote: Forward back to list. -- Forwarded message -- From: Patrick Burleson [EMAIL PROTECTED] Date: Tue, 17 Aug 2004 11:30:19 -0400 Subject: Re: Swapping Indexes? To: Stephane James Vaucher [EMAIL PROTECTED] Stephane, Thank you for the ideas. I'm going about implenting idea 1 (I like the idea of leaving the temp index around for recovery), but I have a question reguarding your original index. Do you just copy over the temp index and don't worry abou cleaning up the old index directory? Actually, I use a IndexWriter in overwrite mode on the master dir and merge the temp dir. This cleans up the old master. Right now I have my code deleting the files in the main index directory after telling the search controller to switch to the temp index. But by doing that, I need to manage existing searches and not break them while they are running. I also still run into the open files problem on Windows when trying to delete a file one of the searchers has open before it's closed. I used to way some time (~1 minute) for all searches on the old master to finish after redirecting to the temp dir, then I would switch to the new master. Thoughts? If you apply a lease-like contract with your searchers where they borrow a reference to a searcher and then hand it back to the manager, you can probably trace your open files. HTH, sv Patrick On Mon, 16 Aug 2004 18:22:20 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: I've tried two options that seem to work: 1) Have a singleton that is responsible that will control your searchers. This controller can temporarilly redirect your searchers to c:/temp/myindex, allowing you to copy you index to c:/myindex. After that process completes, your controller can tell your searchers to use c:/myindex, allowing you to then erase your temp index. If you index nightly, you can always *not* erase your tmp dir, your index process will do this automatically if you create your IndexWriter with the overwrite option. This way, you can have a backup index if there is a system failure at some point (like when you copy/move directories). 2) Use an incremental index. Regularly, I scan my files, see if there are modification/additions and update my master index. Removing from the master index, adding to a temp dir, then merging. I haven't seen any weirdness on windows with this process. HTH, sv On Mon, 16 Aug 2004, Patrick Burleson wrote: I've read in the docs about updating an index and its suggestion reguarding swapping out indexes with a directory rename. Here's my question, how to do this when searches are running live? Say I have a directory that holds the current valid index: C:\myindex and when I'm running my nightly process to generate the index, it gets temporarily indexed to: C:\temp\myindex How can I very quickly replace C:\myindex with C:\temp\myindex? I can't simply do a rename since C:\myindex will likely have open files. (Gotta love windows) And I can't delete all files in myindex, again because of the open files issue. Any ideas? Thanks, Patrick - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Swapping Indexes?
On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher Actually, I use a IndexWriter in overwrite mode on the master dir and merge the temp dir. This cleans up the old master. I'm a bit of a Lucene newbie here, and I am trying to understand what you mean by merge the temp dir? Do you copy your exiting Index to the temp location, then use the overwrite feature of IndexWriter to re-create the master, then what do you merge? Shouldn't the master index now have everything? I used to way some time (~1 minute) for all searches on the old master to finish after redirecting to the temp dir, then I would switch to the new master. I'm going to make this a setting, so that test won't have to wait a whole minute. But I think this is the cleanest solution without having to implement some sort of leaseing solution. Our searches should be fast and 1 minute is a long time. They should all be done by then. Thanks again, Patrick - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Swapping Indexes?
On Tue, 17 Aug 2004, Patrick Burleson wrote: On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher Actually, I use a IndexWriter in overwrite mode on the master dir and merge the temp dir. This cleans up the old master. I'm a bit of a Lucene newbie here, and I am trying to understand what you mean by merge the temp dir? IndexWriter.addIndexes() Do you copy your exiting Index to the temp location, then use the overwrite feature of IndexWriter to re-create the master, then what do you merge? Shouldn't the master index now have everything? What I mean is the following: 1) create tmp dir 2) redirect searchers to tmp dir 3) wait for everyone to use tmp dir (or other mecanism) 4) open indexwriter on master dir erasing it 5) merge tmp directory, using addIndexes() method 6) redirect searchers to new master dir I used to way some time (~1 minute) for all searches on the old master to finish after redirecting to the temp dir, then I would switch to the new master. I'm going to make this a setting, so that test won't have to wait a whole minute. But I think this is the cleanest solution without having to implement some sort of leaseing solution. Our searches should be fast and 1 minute is a long time. They should all be done by then. I used to reindex all my docs at 5h00AM, I probably could have waited 10 minutes since I didn't have users, it's all about requirements ;) Thanks again, Patrick sv - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: javadoc api
Thanks Ernesto, I fixed it. Otis --- Ernesto De Santis [EMAIL PROTECTED] wrote: Hello Lucene developers A litle issue about a Field documentation. In Field class on getBoost() method it says: Returns the boost factor for hits on any field of this document. I think that this comment are copied from Document class and forgot change it. Bye Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: PDFBox Issue
I actually thought it might have been trying to use the log4j 1.3 'alpha' build (there is no 'alpha' build yet, but notionally the latest HEAD isn't too far from it). There has been a subtle change to log4j in recent months that could have a similar impact. Cheers, Paul Smith -Original Message- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:48 PM To: Lucene Users List Subject: Re: PDFBox Issue PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar file), I believe that 1.2.8 is the latest. I will make sure that the next version of PDFBox includes the latest log4j version, which I assume is what everybody would like to use. But, by looking at the below error message it appears that you might have an older log4j in your classpath Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8 Ben On Tue, 17 Aug 2004, Don Vaillancourt wrote: Wow, this is an old message. I managed to get my code to work by using the previous version of PDFBox. I had used the version of log4j that had come with PDFBox. Someone had mentioned recompiling log4j, but I couldn't get the project to import the source into Eclipse, so I gave up. But things work great with the version of PDFBox that I compiled with so I am fine with that. As for the version of log4j, I could not tell you, as I said above it came with PDFBox, so I'm guessing that it had probably not been tested with the version of log4j it was being distributed with. Paul Smith wrote: What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have gotten this error before using PDFBox? I've gotten the latest version of PDFBox and it is giving me the following error: java.lang.VerifyError: (class: org/apache/log4j/LogManager, method: clinit signature: ()V) Incompatible argument to function at org.apache.log4j.Logger.getLogger(Logger.java:94) at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57) at org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDoc um ent.java:197) at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDo cu ment.java:118) at Index.indexFile(Index.java:287) at Index.indexDirectory(Index.java:265) at Index.update(Index.java:63) at Lucene.main(Lucene.java:26) Exception in thread main I am using all the jar files that came with PDFBox. Anyone run into this problem. I am using the following line of code: Document doc = LucenePDFDocument.getDocument(f); Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Don Vaillancourt Director of Software Development * *WEB IMPACT INC.* phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] web: http://www.web-impact.com / This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. / - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
OutOfMemoryError
Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryError
Sorry. I should make it more clear in my last email. I have implemented an EJB Session Bean executing the Lucene search. At the beginning, the session been is working fine. It returns the correct search results to me. As more and more search requests being processed, the server ends up having the OutOfMemoryError. If I restart the server, every thing works fine again. Terence Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError
On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } You close is here again, not fsDir. Also, it's a good idea to never ignore exceptions, you should at least print them out, even if it's just a close() that fails. Regards Daniel -- http://www.danielnaber.de
RE: Re: OutOfMemoryError
Thanks for pointing this out. Even I fixed the code to close the fsDir and also add the ex.printStackTrace(System.out), I am still hitting the OutOfMemeoryError. Terence On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } You close is here again, not fsDir. Also, it's a good idea to never ignore exceptions, you should at least print them out, even if it's just a close() that fails. Regards Daniel -- http://www.danielnaber.de -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]