RE: PDFBox Issue

2004-08-17 Thread Paul Smith
What version of the log4j jar are you using? -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 8:06 AM To: Lucene Users List Subject: PDFBox Issue Hi all, I know that this is a Lucene list but wanted to know if any of you have

Re: http AND halt

2004-08-17 Thread Erik Hatcher
What Analyzer is being used? If it is removing stop words, what is the stop word list? Erik On Aug 17, 2004, at 1:56 AM, Leos Literak wrote: One user reported, that if he searches http AND halt, the search fails. This can be found in logs: java.lang.ArrayIndexOutOfBoundsException: -1

Re: PDFBox Issue

2004-08-17 Thread Don Vaillancourt
Wow, this is an old message. I managed to get my code to work by using the previous version of PDFBox. I had used the version of log4j that had come with PDFBox. Someone had mentioned recompiling log4j, but I couldn't get the project to import the source into Eclipse, so I gave up. But

AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year

Re: PDFBox Issue

2004-08-17 Thread Ben Litchfield
PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar file), I believe that 1.2.8 is the latest. I will make sure that the next version of PDFBox includes the latest log4j version, which I assume is what everybody would like to use. But, by looking at the below error message it

Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query

Re: PDFBox Issue

2004-08-17 Thread Don Vaillancourt
Anything is possible. In a couple of weeks I may be upgrading my code to use Lucene 1.4 and I will make an attempt to use the latest version of PDFBox. You may be right about log4j being somewhere else in the classpath, but being a jar for Jakarta, I couldn't think of any apps on my desktop

RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the

Re: AnalyZer HELP Please

2004-08-17 Thread Patrick Burleson
Karthik, What you would want to do with the split tokens ( New and Year ) is then create a PhraseQuery containing a Term object for each token. This should do what you want. As Erik said, QueryParser would have done this internally, only if you actually sent in the quotes...not just New Year, but

Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
Further on this, Karthik, is that you need to really understand what you indexed. For example... take a document that has New Year in it, and follow it through your indexing process. See what your analyzer at indexing time actually indexed. And if new year are side-by-side tokens emitted

Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
On Aug 17, 2004, at 9:23 AM, Karthik N S wrote: So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here No... this is what this analyzer does. QueryParser does the same thing. The difference

RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi Patrick I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] That's when I checked up with Analyzer's to verify, If u look

Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I

RE: Restoring a corrupt index

2004-08-17 Thread Honey George
Wallen, Which hex editor have you used. I am also facing a similar problem. I tried to use KHexEdit and it doesn't seem to help. I am attaching with this email my segments file. I think only the segment with name _ung is a valid one, I wanted to delete the remaining..but couldn't. Can you help?

RE: Restoring a corrupt index

2004-08-17 Thread Honey George
I think attachments are filtered. This is what I see when I open in the hex editor. : 00 04 e0 af 00 00 00 02 05 5f 36 75 6e 67 00 04 ..à¯._6ung.. :0010 1e fb 05 5f 36 75 6e 69 00 00 00 01 00 00 00 00 .û._6uni :0020 00 00 c1 b4 ..Á´

RE: Restoring a corrupt index

2004-08-17 Thread wallen
http://www.ultraedit.com/ is the best! However, I cannot imagine how another hexeditor wouldnt work. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:35 AM To: Lucene Users List Subject: RE: Restoring a corrupt index Wallen, Which hex

RE: Restoring a corrupt index

2004-08-17 Thread wallen
Change 02 to be 01 and delete the bytes that represent the one record that is bad. It was easier to see what a record was in my file because I had about 30 _files. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 10:39 AM To: Lucene Users

RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi Guys Apologies.. Correct me If I am wrong... During Indexing process, if the Analyzer has a word 'new' in the array ' STOPWORD' this word is prevented from indexing or Stopped from indexing. Then during the process of Search would not return me a hit on the word New

[OT] Re: Restoring a corrupt index

2004-08-17 Thread Patrick Burleson
Hmm, while I agree that UltraEdit is the best on Windows, since they were using KHexEdit, I doubt it's an option for them on Linux (although I do know it runs fine under Wine). Patrick On Tue, 17 Aug 2004 10:39:27 -0400, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: http://www.ultraedit.com/ is

Re: AnalyZer HELP Please

2004-08-17 Thread Patrick Burleson
I believe that is correct. So, the word new is never being indexed since it is a stop word. Patrick On Tue, 17 Aug 2004 20:26:19 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies.. Correct me If I am wrong... During Indexing process, if the Analyzer has a word

Re: Swapping Indexes?

2004-08-17 Thread Patrick Burleson
Forward back to list. -- Forwarded message -- From: Patrick Burleson [EMAIL PROTECTED] Date: Tue, 17 Aug 2004 11:30:19 -0400 Subject: Re: Swapping Indexes? To: Stephane James Vaucher [EMAIL PROTECTED] Stephane, Thank you for the ideas. I'm going about implenting idea 1 (I like

javadoc api

2004-08-17 Thread Ernesto De Santis
Hello Lucene developers A litle issue about a Field documentation. In Field class on getBoost() method it says: Returns the boost factor for hits on any field of this document. I think that this comment are copied from Document class and forgot change it. Bye Ernesto. --- Outgoing mail is

Re: Swapping Indexes?

2004-08-17 Thread Stephane James Vaucher
On Tue, 17 Aug 2004, Patrick Burleson wrote: Forward back to list. -- Forwarded message -- From: Patrick Burleson [EMAIL PROTECTED] Date: Tue, 17 Aug 2004 11:30:19 -0400 Subject: Re: Swapping Indexes? To: Stephane James Vaucher [EMAIL PROTECTED] Stephane, Thank you

Re: Swapping Indexes?

2004-08-17 Thread Patrick Burleson
On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher Actually, I use a IndexWriter in overwrite mode on the master dir and merge the temp dir. This cleans up the old master. I'm a bit of a Lucene newbie here, and I am trying to understand what you mean by merge the temp dir? Do

Re: Swapping Indexes?

2004-08-17 Thread Stephane James Vaucher
On Tue, 17 Aug 2004, Patrick Burleson wrote: On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher Actually, I use a IndexWriter in overwrite mode on the master dir and merge the temp dir. This cleans up the old master. I'm a bit of a Lucene newbie here, and I am trying

Re: javadoc api

2004-08-17 Thread Otis Gospodnetic
Thanks Ernesto, I fixed it. Otis --- Ernesto De Santis [EMAIL PROTECTED] wrote: Hello Lucene developers A litle issue about a Field documentation. In Field class on getBoost() method it says: Returns the boost factor for hits on any field of this document. I think that this comment

RE: PDFBox Issue

2004-08-17 Thread Paul Smith
I actually thought it might have been trying to use the log4j 1.3 'alpha' build (there is no 'alpha' build yet, but notionally the latest HEAD isn't too far from it). There has been a subtle change to log4j in recent months that could have a similar impact. Cheers, Paul Smith -Original

OutOfMemoryError

2004-08-17 Thread Terence Lai
Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) {

RE: OutOfMemoryError

2004-08-17 Thread Terence Lai
Sorry. I should make it more clear in my last email. I have implemented an EJB Session Bean executing the Lucene search. At the beginning, the session been is working fine. It returns the correct search results to me. As more and more search requests being processed, the server ends up having

Re: OutOfMemoryError

2004-08-17 Thread Daniel Naber
On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } You close is here again, not fsDir. Also, it's a good idea to never ignore exceptions, you should at least print them

RE: Re: OutOfMemoryError

2004-08-17 Thread Terence Lai
Thanks for pointing this out. Even I fixed the code to close the fsDir and also add the ex.printStackTrace(System.out), I am still hitting the OutOfMemeoryError. Terence On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close();