RE: PDFBox Issue

2004-08-17 Thread Paul Smith
What version of the log4j jar are you using? 

 -Original Message-
 From: Don Vaillancourt [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 29, 2004 8:06 AM
 To: Lucene Users List
 Subject: PDFBox Issue
 
 Hi all,
 
 I know that this is a Lucene list but wanted to know if any of you have
 gotten this error before using PDFBox?
 
 I've gotten the latest version of PDFBox and it is giving me the following
 error:
 
 java.lang.VerifyError: (class: org/apache/log4j/LogManager, method:
 clinit signature: ()V) Incompatible argument to function
 at org.apache.log4j.Logger.getLogger(Logger.java:94)
 at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57)
 at
 org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum
 ent.java:197)
 at
 org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu
 ment.java:118)
 at Index.indexFile(Index.java:287)
 at Index.indexDirectory(Index.java:265)
 at Index.update(Index.java:63)
 at Lucene.main(Lucene.java:26)
 Exception in thread main
 
 I am using all the jar files that came with PDFBox.
 
 Anyone run into this problem.  I am using the following line of code:
 
 Document doc = LucenePDFDocument.getDocument(f);
 
 Thanks
 
 
 Don Vaillancourt
 Director of Software Development
 
 WEB IMPACT INC.
 416-815-2000 ext. 245
 email: [EMAIL PROTECTED]
 web: http://www.web-impact.com
 
 
 
 
 This email message is intended only for the addressee(s)
 and contains information that may be confidential and/or
 copyright.  If you are not the intended recipient please
 notify the sender by reply email and immediately delete
 this email. Use, disclosure or reproduction of this email
 by anyone other than the intended recipient(s) is strictly
 prohibited. No representation is made that this email or
 any attachments are free of viruses. Virus scanning is
 recommended and is the responsibility of the recipient.
 
 
 
 
 
 
 
 
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: http AND halt

2004-08-17 Thread Erik Hatcher
What Analyzer is being used?  If it is removing stop words, what is the  
stop word list?

Erik
On Aug 17, 2004, at 1:56 AM, Leos Literak wrote:
One user reported, that if he searches http AND halt,
the search fails. This can be found in logs:
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.Vector.elementAt(Vector.java:434)
at  
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java: 
181)
at  
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493)
at  
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108)

Leos
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: PDFBox Issue

2004-08-17 Thread Don Vaillancourt




Wow, this is an old message.

I managed to get my code to work by using the previous version of
PDFBox. I had used the version of log4j that had come with PDFBox.

Someone had mentioned recompiling log4j, but I couldn't get the project
to import the source into Eclipse, so I gave up. But things work great
with the version of PDFBox that I compiled with so I am fine with that.

As for the version of log4j, I could not tell you, as I said above it
came with PDFBox, so I'm guessing that it had probably not been tested
with the version of log4j it was being distributed with.

Paul Smith wrote:

  What version of the log4j jar are you using? 

  
  
-Original Message-
From: Don Vaillancourt [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 29, 2004 8:06 AM
To: Lucene Users List
Subject: PDFBox Issue

Hi all,

I know that this is a Lucene list but wanted to know if any of you have
gotten this error before using PDFBox?

I've gotten the latest version of PDFBox and it is giving me the following
error:

java.lang.VerifyError: (class: org/apache/log4j/LogManager, method:
clinit signature: ()V) Incompatible argument to function
at org.apache.log4j.Logger.getLogger(Logger.java:94)
at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57)
at
org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum
ent.java:197)
at
org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu
ment.java:118)
at Index.indexFile(Index.java:287)
at Index.indexDirectory(Index.java:265)
at Index.update(Index.java:63)
at Lucene.main(Lucene.java:26)
Exception in thread "main"

I am using all the jar files that came with PDFBox.

Anyone run into this problem.  I am using the following line of code:

Document doc = LucenePDFDocument.getDocument(f);

Thanks


Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.











  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  



-- 

Don Vaillancourt
Director of Software Development


WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

AnalyZer HELP Please

2004-08-17 Thread Karthik N S

Hey Guys.

Apologies..


Some small Help needed

When I Run the Analyzer's for the word  New Year (with Quotes) on
Lucene1-4 final.jar on win 2k O/s
Why is the SimpleAnalyzer splitting it into 2 words ??? 

or 


am i missing something in here..



Analzying  New  Year 
org.apache.lucene.analysis.WhitespaceAnalyzer:

[] [New] [+] [Year] [] 

org.apache.lucene.analysis.SimpleAnalyzer:

[new] [year] 

org.apache.lucene.analysis.StopAnalyzer:

[new] [year] 

org.apache.lucene.analysis.standard.StandardAnalyzer:

[new] [year] 

com.controlnet.indexing.analyzers.GrammerAnalyzer:

[year] 





  WITH WARM REGARDS 
  HAVE A NICE DAY 
  [ N.S.KARTHIK] 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: PDFBox Issue

2004-08-17 Thread Ben Litchfield

PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar
file), I believe that 1.2.8 is the latest.  I will make sure that the next
version of PDFBox includes the latest log4j version, which I assume is
what everybody would like to use.

But, by looking at the below error message it appears that you might have
an older log4j in your classpath

Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8


Ben


On Tue, 17 Aug 2004, Don Vaillancourt wrote:

 Wow, this is an old message.

 I managed to get my code to work by using the previous version of
 PDFBox.  I had used the version of log4j that had come with PDFBox.

 Someone had mentioned recompiling log4j, but I couldn't get the project
 to import the source into Eclipse, so I gave up.  But things work great
 with the version of PDFBox that I compiled with so I am fine with that.

 As for the version of log4j, I could not tell you, as I said above it
 came with PDFBox, so I'm guessing that it had probably not been tested
 with the version of log4j it was being distributed with.

 Paul Smith wrote:

 What version of the log4j jar are you using?
 
 
 
 -Original Message-
 From: Don Vaillancourt [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 29, 2004 8:06 AM
 To: Lucene Users List
 Subject: PDFBox Issue
 
 Hi all,
 
 I know that this is a Lucene list but wanted to know if any of you have
 gotten this error before using PDFBox?
 
 I've gotten the latest version of PDFBox and it is giving me the following
 error:
 
 java.lang.VerifyError: (class: org/apache/log4j/LogManager, method:
 clinit signature: ()V) Incompatible argument to function
 at org.apache.log4j.Logger.getLogger(Logger.java:94)
 at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57)
 at
 org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum
 ent.java:197)
 at
 org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu
 ment.java:118)
 at Index.indexFile(Index.java:287)
 at Index.indexDirectory(Index.java:265)
 at Index.update(Index.java:63)
 at Lucene.main(Lucene.java:26)
 Exception in thread main
 
 I am using all the jar files that came with PDFBox.
 
 Anyone run into this problem.  I am using the following line of code:
 
 Document doc = LucenePDFDocument.getDocument(f);
 
 Thanks
 
 
 Don Vaillancourt
 Director of Software Development
 
 WEB IMPACT INC.
 416-815-2000 ext. 245
 email: [EMAIL PROTECTED]
 web: http://www.web-impact.com
 
 
 
 
 This email message is intended only for the addressee(s)
 and contains information that may be confidential and/or
 copyright.  If you are not the intended recipient please
 notify the sender by reply email and immediately delete
 this email. Use, disclosure or reproduction of this email
 by anyone other than the intended recipient(s) is strictly
 prohibited. No representation is made that this email or
 any attachments are free of viruses. Virus scanning is
 recommended and is the responsibility of the recipient.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


 --
 *Don Vaillancourt
 Director of Software Development
 *
 *WEB IMPACT INC.*
 phone: 416-815-2000 ext. 245
 fax: 416-815-2001
 email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 web: http://www.web-impact.com



 / This email message is intended only for the addressee(s)
 and contains information that may be confidential and/or
 copyright. If you are not the intended recipient please
 notify the sender by reply email and immediately delete
 this email. Use, disclosure or reproduction of this email
 by anyone other than the intended recipient(s) is strictly
 prohibited. No representation is made that this email or
 any attachments are free of viruses. Virus scanning is
 recommended and is the responsibility of the recipient.
 /


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
This is what analyzers do.  I don't know of any analyzer that deals 
with quotes in the way you're requesting, by keeping the contents 
together as a complete token.  You'll have to write your own variant 
that does this.

QueryParser, however, uses quotes to denote a phrase query, and will 
query for the words together.  Perhaps this is sufficient for your 
needs?

Erik
On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:
Hey Guys.
Apologies..
Some small Help needed
When I Run the Analyzer's for the word  New Year (with Quotes) on
Lucene1-4 final.jar on win 2k O/s
Why is the SimpleAnalyzer splitting it into 2 words ???
or
am i missing something in here..

Analzying  New  Year 
org.apache.lucene.analysis.WhitespaceAnalyzer:
[] [New] [+] [Year] []
org.apache.lucene.analysis.SimpleAnalyzer:
[new] [year]
org.apache.lucene.analysis.StopAnalyzer:
[new] [year]
org.apache.lucene.analysis.standard.StandardAnalyzer:
[new] [year]
com.controlnet.indexing.analyzers.GrammerAnalyzer:
[year]


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: PDFBox Issue

2004-08-17 Thread Don Vaillancourt




Anything is possible. 

In a couple of weeks I may be upgrading my code to use Lucene 1.4 and I
will make an attempt to use the latest version of PDFBox.
You may be right about log4j being somewhere else in the classpath, but
being a jar for Jakarta, I couldn't think of any apps on my desktop
that might use it.

I'm doing a search now and ColdFusionMX is the only app I can think of,
but I'm pretty sure it didn't come with log4j.jar. Well I'll have to
experiment a little.

Thanks

Ben Litchfield wrote:

  PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar
file), I believe that 1.2.8 is the latest.  I will make sure that the next
version of PDFBox includes the latest log4j version, which I assume is
what everybody would like to use.

But, by looking at the below error message it appears that you might have
an older log4j in your classpath

Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8


Ben


On Tue, 17 Aug 2004, Don Vaillancourt wrote:

  
  
Wow, this is an old message.

I managed to get my code to work by using the previous version of
PDFBox.  I had used the version of log4j that had come with PDFBox.

Someone had mentioned recompiling log4j, but I couldn't get the project
to import the source into Eclipse, so I gave up.  But things work great
with the version of PDFBox that I compiled with so I am fine with that.

As for the version of log4j, I could not tell you, as I said above it
came with PDFBox, so I'm guessing that it had probably not been tested
with the version of log4j it was being distributed with.

Paul Smith wrote:



  What version of the log4j jar are you using?



  
  
-Original Message-
From: Don Vaillancourt [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 29, 2004 8:06 AM
To: Lucene Users List
Subject: PDFBox Issue

Hi all,

I know that this is a Lucene list but wanted to know if any of you have
gotten this error before using PDFBox?

I've gotten the latest version of PDFBox and it is giving me the following
error:

java.lang.VerifyError: (class: org/apache/log4j/LogManager, method:
clinit signature: ()V) Incompatible argument to function
at org.apache.log4j.Logger.getLogger(Logger.java:94)
at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57)
at
org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDocum
ent.java:197)
at
org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocu
ment.java:118)
at Index.indexFile(Index.java:287)
at Index.indexDirectory(Index.java:265)
at Index.update(Index.java:63)
at Lucene.main(Lucene.java:26)
Exception in thread "main"

I am using all the jar files that came with PDFBox.

Anyone run into this problem.  I am using the following line of code:

Document doc = LucenePDFDocument.getDocument(f);

Thanks


Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.













  
  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  


--
*Don Vaillancourt
Director of Software Development
*
*WEB IMPACT INC.*
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
web: http://www.web-impact.com



/ This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.
/


  
  
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  



-- 

Don Vaillancourt
Director of Software Development


WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately 

RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi

Erik

  Apologies...

  What I ment to Say was,  a word such as New Year  (Quotes means  \ )
  on  QueryParser.parse(word, contents, analyzer) should return me hits
for the full word,
  but it did not.

 So when I  did a quick run on Analyzer process and
 found that it was splitting the Word

  New Year  =  [New]  [Year]


 Am I doing some thing wrong in here




Thx in advance.
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 6:18 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


This is what analyzers do.  I don't know of any analyzer that deals
with quotes in the way you're requesting, by keeping the contents
together as a complete token.  You'll have to write your own variant
that does this.

QueryParser, however, uses quotes to denote a phrase query, and will
query for the words together.  Perhaps this is sufficient for your
needs?

Erik

On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:


 Hey Guys.

 Apologies..


 Some small Help needed

 When I Run the Analyzer's for the word  New Year (with Quotes) on
 Lucene1-4 final.jar on win 2k O/s
 Why is the SimpleAnalyzer splitting it into 2 words ???

 or


 am i missing something in here..



 Analzying  New  Year 
 org.apache.lucene.analysis.WhitespaceAnalyzer:

 [] [New] [+] [Year] []

 org.apache.lucene.analysis.SimpleAnalyzer:

 [new] [year]

 org.apache.lucene.analysis.StopAnalyzer:

 [new] [year]

 org.apache.lucene.analysis.standard.StandardAnalyzer:

 [new] [year]

 com.controlnet.indexing.analyzers.GrammerAnalyzer:

 [year]





   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AnalyZer HELP Please

2004-08-17 Thread Patrick Burleson
Karthik,

What you would want to do with the split tokens ( New and Year )
is then create a PhraseQuery containing a Term object for each token.
This should do what you want. As Erik said, QueryParser would have
done this internally, only if you actually sent in the quotes...not
just New Year, but \New Year\.

Patrick

On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
 Hi
 
 Erik
 
   Apologies...
 
   What I ment to Say was,  a word such as New Year  (Quotes means  \ )
   on  QueryParser.parse(word, contents, analyzer) should return me hits
 for the full word,
   but it did not.
 
  So when I  did a quick run on Analyzer process and
  found that it was splitting the Word
 
   New Year  =  [New]  [Year]
 
  Am I doing some thing wrong in here
 
 Thx in advance.
 Karthik
 
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 6:18 PM
 To: Lucene Users List
 Subject: Re: AnalyZer HELP Please
 
 This is what analyzers do.  I don't know of any analyzer that deals
 with quotes in the way you're requesting, by keeping the contents
 together as a complete token.  You'll have to write your own variant
 that does this.
 
 QueryParser, however, uses quotes to denote a phrase query, and will
 query for the words together.  Perhaps this is sufficient for your
 needs?
 
 Erik
 
 On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:
 
 
  Hey Guys.
 
  Apologies..
 
 
  Some small Help needed
 
  When I Run the Analyzer's for the word  New Year (with Quotes) on
  Lucene1-4 final.jar on win 2k O/s
  Why is the SimpleAnalyzer splitting it into 2 words ???
 
  or
 
 
  am i missing something in here..
 
 
 
  Analzying  New  Year 
  org.apache.lucene.analysis.WhitespaceAnalyzer:
 
  [] [New] [+] [Year] []
 
  org.apache.lucene.analysis.SimpleAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.StopAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.standard.StandardAnalyzer:
 
  [new] [year]
 
  com.controlnet.indexing.analyzers.GrammerAnalyzer:
 
  [year]
 
 
 
 
 
WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
Further on this, Karthik, is that you need to really understand what 
you indexed.  For example... take a document that has New Year in it, 
and follow it through your indexing process.  See what your analyzer at 
indexing time actually indexed.  And if new year are side-by-side 
tokens emitted from that process, then querying for New Year through 
QueryParser should find a match.

You can easily put together a 10-line JUnit test case using 
RAMDirectory and your favorite Analyzer to see how this works.  I 
highly recommend you do this in order to isolate the situation even 
further.

Erik
On Aug 17, 2004, at 9:25 AM, Patrick Burleson wrote:
Karthik,
What you would want to do with the split tokens ( New and Year )
is then create a PhraseQuery containing a Term object for each token.
This should do what you want. As Erik said, QueryParser would have
done this internally, only if you actually sent in the quotes...not
just New Year, but \New Year\.
Patrick
On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
Hi
Erik
  Apologies...
  What I ment to Say was,  a word such as New Year  (Quotes means  
\ )
  on  QueryParser.parse(word, contents, analyzer) should return me 
hits
for the full word,
  but it did not.

 So when I  did a quick run on Analyzer process and
 found that it was splitting the Word
  New Year  =  [New]  [Year]
 Am I doing some thing wrong in here
Thx in advance.
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 6:18 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please
This is what analyzers do.  I don't know of any analyzer that deals
with quotes in the way you're requesting, by keeping the contents
together as a complete token.  You'll have to write your own variant
that does this.
QueryParser, however, uses quotes to denote a phrase query, and will
query for the words together.  Perhaps this is sufficient for your
needs?
Erik
On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:
Hey Guys.
Apologies..
Some small Help needed
When I Run the Analyzer's for the word  New Year (with Quotes) on
Lucene1-4 final.jar on win 2k O/s
Why is the SimpleAnalyzer splitting it into 2 words ???
or
am i missing something in here..

Analzying  New  Year 
org.apache.lucene.analysis.WhitespaceAnalyzer:
[] [New] [+] [Year] []
org.apache.lucene.analysis.SimpleAnalyzer:
[new] [year]
org.apache.lucene.analysis.StopAnalyzer:
[new] [year]
org.apache.lucene.analysis.standard.StandardAnalyzer:
[new] [year]
com.controlnet.indexing.analyzers.GrammerAnalyzer:
[year]


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
On Aug 17, 2004, at 9:23 AM, Karthik N S wrote:
 So when I  did a quick run on Analyzer process and
 found that it was splitting the Word
  New Year  =  [New]  [Year]
 Am I doing some thing wrong in here
No... this is what this analyzer does.  QueryParser does the same 
thing.  The difference it that QueryParser knows it was wrapped in 
quotes, so it takes  each of those terms [New] and [Year] and makes a 
zero-slop PhraseQuery out of them.

Have another look at this stuff:
http://wiki.apache.org/jakarta-lucene/AnalysisParalysis
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi
Patrick

I did as Erik  replied in his mail ,
and  searched for the complete word   \New Year\  ,
but the QueryParser Still returns me hit for Year  Only.

[ The Analyzer I use has 555 English Stop words  with  new present in it ]

That's when I checked up with Analyzer's to verify,
If u look at the list  Analyzer's  o/p
GrammerAnalyzer is the one that has 555 English STOPWORDS.

Do u think this is the bug in my Code.

Thx
Karthik



-Original Message-
From: Patrick Burleson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 6:55 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


Karthik,

What you would want to do with the split tokens ( New and Year )
is then create a PhraseQuery containing a Term object for each token.
This should do what you want. As Erik said, QueryParser would have
done this internally, only if you actually sent in the quotes...not
just New Year, but \New Year\.

Patrick

On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
 Hi

 Erik

   Apologies...

   What I ment to Say was,  a word such as New Year  (Quotes means
 \ )
   on  QueryParser.parse(word, contents, analyzer) should return me hits
 for the full word,
   but it did not.

  So when I  did a quick run on Analyzer process and
  found that it was splitting the Word

   New Year  =  [New]  [Year]

  Am I doing some thing wrong in here

 Thx in advance.
 Karthik



 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 6:18 PM
 To: Lucene Users List
 Subject: Re: AnalyZer HELP Please

 This is what analyzers do.  I don't know of any analyzer that deals
 with quotes in the way you're requesting, by keeping the contents
 together as a complete token.  You'll have to write your own variant
 that does this.

 QueryParser, however, uses quotes to denote a phrase query, and will
 query for the words together.  Perhaps this is sufficient for your
 needs?

 Erik

 On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:

 
  Hey Guys.
 
  Apologies..
 
 
  Some small Help needed
 
  When I Run the Analyzer's for the word  New Year (with Quotes) on
  Lucene1-4 final.jar on win 2k O/s
  Why is the SimpleAnalyzer splitting it into 2 words ???
 
  or
 
 
  am i missing something in here..
 
 
 
  Analzying  New  Year 
  org.apache.lucene.analysis.WhitespaceAnalyzer:
 
  [] [New] [+] [Year] []
 
  org.apache.lucene.analysis.SimpleAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.StopAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.standard.StandardAnalyzer:
 
  [new] [year]
 
  com.controlnet.indexing.analyzers.GrammerAnalyzer:
 
  [year]
 
 
 
 
 
WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AnalyZer HELP Please

2004-08-17 Thread Erik Hatcher
On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
I did as Erik  replied in his mail ,
and  searched for the complete word   \New Year\  ,
but the QueryParser Still returns me hit for Year  Only.
[ The Analyzer I use has 555 English Stop words  with  new present 
in it ]
No wonder!
That's when I checked up with Analyzer's to verify,
If u look at the list  Analyzer's  o/p
GrammerAnalyzer is the one that has 555 English STOPWORDS.
Do u think this is the bug in my Code.
Whether this is a bug or not is really for your users to determine :) 
 But it is absolutely the expected behavior.  QueryParser analyzes the 
expression too.  Even if you somehow changed QueryParser, if you never 
indexed the word new then you certainly cannot expect to search on it 
and find it.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Restoring a corrupt index

2004-08-17 Thread Honey George
Wallen,
Which hex editor have you used. I am also facing a
similar problem. I tried to use KHexEdit and it
doesn't seem to help. I am attaching with this email
my segments file. I think only the segment with name
_ung is a valid one, I wanted to delete the
remaining..but couldn't. Can you help?

-George



 --- [EMAIL PROTECTED] wrote: 
 I fixed my own problem, but hope this might help
 someone else in the future:
 
 I went into my segments file (with a hex editor),
 deleted the record for
 _cu0v and changed the length 0x20 to be 0x1f, and it
 seems I have most of my
 index back!
 
 Maybe a developer could elaborate on this?
 





___ALL-NEW Yahoo! Messenger - 
all new features - even more fun!  http://uk.messenger.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Restoring a corrupt index

2004-08-17 Thread Honey George
I think attachments are filtered. This is what I see
when I open in the hex editor.

: 00 04 e0 af 00 00 00 02 05 5f 36 75 6e 67 00
04 ..à¯._6ung..
:0010 1e fb 05 5f 36 75 6e 69 00 00 00 01 00 00 00
00 .û._6uni
:0020 00 00 c1 b4 
   ..Á´

-George





 --- Honey George [EMAIL PROTECTED] wrote: 
 Wallen,
 Which hex editor have you used. I am also facing a
 similar problem. I tried to use KHexEdit and it
 doesn't seem to help. I am attaching with this email
 my segments file. I think only the segment with name
 _ung is a valid one, I wanted to delete the
 remaining..but couldn't. Can you help?
 
 -George
 
 
 
  --- [EMAIL PROTECTED] wrote: 
  I fixed my own problem, but hope this might help
  someone else in the future:
  
  I went into my segments file (with a hex editor),
  deleted the record for
  _cu0v and changed the length 0x20 to be 0x1f, and
 it
  seems I have most of my
  index back!
  
  Maybe a developer could elaborate on this?
  
 
 
   
   
   

___ALL-NEW
 Yahoo! Messenger - all new features - even more fun!
  http://uk.messenger.yahoo.com
 
-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED] 





___ALL-NEW Yahoo! Messenger - 
all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Restoring a corrupt index

2004-08-17 Thread wallen
http://www.ultraedit.com/ is the best!

However, I cannot imagine how another hexeditor wouldnt work.

-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 10:35 AM
To: Lucene Users List
Subject: RE: Restoring a corrupt index


Wallen,
Which hex editor have you used. I am also facing a
similar problem. I tried to use KHexEdit and it
doesn't seem to help. I am attaching with this email
my segments file. I think only the segment with name
_ung is a valid one, I wanted to delete the
remaining..but couldn't. Can you help?

-George



 --- [EMAIL PROTECTED] wrote: 
 I fixed my own problem, but hope this might help
 someone else in the future:
 
 I went into my segments file (with a hex editor),
 deleted the record for
 _cu0v and changed the length 0x20 to be 0x1f, and it
 seems I have most of my
 index back!
 
 Maybe a developer could elaborate on this?
 





___ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Restoring a corrupt index

2004-08-17 Thread wallen
Change 02 to be 01 and delete the bytes that represent the one record that
is bad.  It was easier to see what a record was in my file because I had
about 30 _files.

-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 10:39 AM
To: Lucene Users List
Subject: RE: Restoring a corrupt index


I think attachments are filtered. This is what I see
when I open in the hex editor.

: 00 04 e0 af 00 00 00 02 05 5f 36 75 6e 67 00
04 ..à¯._6ung..
:0010 1e fb 05 5f 36 75 6e 69 00 00 00 01 00 00 00
00 .û._6uni
:0020 00 00 c1 b4 
   ..Á´

-George





 --- Honey George [EMAIL PROTECTED] wrote: 
 Wallen,
 Which hex editor have you used. I am also facing a
 similar problem. I tried to use KHexEdit and it
 doesn't seem to help. I am attaching with this email
 my segments file. I think only the segment with name
 _ung is a valid one, I wanted to delete the
 remaining..but couldn't. Can you help?
 
 -George
 
 
 
  --- [EMAIL PROTECTED] wrote: 
  I fixed my own problem, but hope this might help
  someone else in the future:
  
  I went into my segments file (with a hex editor),
  deleted the record for
  _cu0v and changed the length 0x20 to be 0x1f, and
 it
  seems I have most of my
  index back!
  
  Maybe a developer could elaborate on this?
  
 
 
   
   
   

___ALL-NEW
 Yahoo! Messenger - all new features - even more fun!
  http://uk.messenger.yahoo.com
 
-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED] 





___ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi Guys

Apologies..

   Correct me If I am wrong...

   During Indexing process, if the Analyzer  has a word   'new' in the array
' STOPWORD'  this  word is  prevented from indexing or
  Stopped from indexing.

  Then  during the process of Search  would  not return me a hit on the word
New Year  ,
  since the  word 'new'  is  in Array STOPWORD ...
  [ Even if the Word is surrounded by \]



With regards
Karthik



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 7:35 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
 I did as Erik  replied in his mail ,
 and  searched for the complete word   \New Year\  ,
 but the QueryParser Still returns me hit for Year  Only.

 [ The Analyzer I use has 555 English Stop words  with  new present
 in it ]

No wonder!

 That's when I checked up with Analyzer's to verify,
 If u look at the list  Analyzer's  o/p
 GrammerAnalyzer is the one that has 555 English STOPWORDS.

 Do u think this is the bug in my Code.

Whether this is a bug or not is really for your users to determine :)
  But it is absolutely the expected behavior.  QueryParser analyzes the
expression too.  Even if you somehow changed QueryParser, if you never
indexed the word new then you certainly cannot expect to search on it
and find it.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[OT] Re: Restoring a corrupt index

2004-08-17 Thread Patrick Burleson
Hmm, while I agree that UltraEdit is the best on Windows, since they
were using KHexEdit, I doubt it's an option for them on Linux
(although I do know it runs fine under Wine).

Patrick

On Tue, 17 Aug 2004 10:39:27 -0400, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 http://www.ultraedit.com/ is the best!
 
 However, I cannot imagine how another hexeditor wouldnt work.
 
 
 
 -Original Message-
 From: Honey George [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 10:35 AM
 To: Lucene Users List
 Subject: RE: Restoring a corrupt index
 
 Wallen,
 Which hex editor have you used. I am also facing a
 similar problem. I tried to use KHexEdit and it
 doesn't seem to help. I am attaching with this email
 my segments file. I think only the segment with name
 _ung is a valid one, I wanted to delete the
 remaining..but couldn't. Can you help?
 
 -George
 
  --- [EMAIL PROTECTED] wrote:
  I fixed my own problem, but hope this might help
  someone else in the future:
 
  I went into my segments file (with a hex editor),
  deleted the record for
  _cu0v and changed the length 0x20 to be 0x1f, and it
  seems I have most of my
  index back!
 
  Maybe a developer could elaborate on this?
 
 
 ___ALL-NEW Yahoo!
 Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AnalyZer HELP Please

2004-08-17 Thread Patrick Burleson
I believe that is correct. So, the word new is never being indexed
since it is a stop word.

Patrick

On Tue, 17 Aug 2004 20:26:19 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
 Hi Guys
 
 Apologies..
 
Correct me If I am wrong...
 
During Indexing process, if the Analyzer  has a word   'new' in the array
 ' STOPWORD'  this  word is  prevented from indexing or
   Stopped from indexing.
 
   Then  during the process of Search  would  not return me a hit on the word
 New Year  ,
   since the  word 'new'  is  in Array STOPWORD ...
   [ Even if the Word is surrounded by \]
 
 With regards
 Karthik
 
 
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 7:35 PM
 To: Lucene Users List
 Subject: Re: AnalyZer HELP Please
 
 On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
  I did as Erik  replied in his mail ,
  and  searched for the complete word   \New Year\  ,
  but the QueryParser Still returns me hit for Year  Only.
 
  [ The Analyzer I use has 555 English Stop words  with  new present
  in it ]
 
 No wonder!
 
  That's when I checked up with Analyzer's to verify,
  If u look at the list  Analyzer's  o/p
  GrammerAnalyzer is the one that has 555 English STOPWORDS.
 
  Do u think this is the bug in my Code.
 
 Whether this is a bug or not is really for your users to determine :)
   But it is absolutely the expected behavior.  QueryParser analyzes the
 expression too.  Even if you somehow changed QueryParser, if you never
 indexed the word new then you certainly cannot expect to search on it
 and find it.
 
 Erik
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Swapping Indexes?

2004-08-17 Thread Patrick Burleson
Forward back to list.


-- Forwarded message --
From: Patrick Burleson [EMAIL PROTECTED]
Date: Tue, 17 Aug 2004 11:30:19 -0400
Subject: Re: Swapping Indexes?
To: Stephane James Vaucher [EMAIL PROTECTED]

Stephane,

Thank you for the ideas. I'm going about implenting idea 1 (I like the
idea of leaving the temp index around for recovery), but I have a
question reguarding your original index. Do you just copy over the
temp index and don't worry abou cleaning up the old index directory?

Right now I have my code deleting the files in the main index
directory after telling the search controller to switch to the temp
index. But by doing that, I need to manage existing searches and not
break them while they are running. I also still run into the open
files problem on Windows when trying to delete a file one of the
searchers has open before it's closed.

Thoughts?

Patrick




On Mon, 16 Aug 2004 18:22:20 -0400 (EDT), Stephane James Vaucher
[EMAIL PROTECTED] wrote:
 I've tried two options that seem to work:

 1) Have a singleton that is responsible that will control your searchers.
 This controller can temporarilly redirect your searchers to
 c:/temp/myindex, allowing you to copy you index to c:/myindex. After that
 process completes, your controller can tell your searchers to use
 c:/myindex, allowing you to then erase your temp index.

 If you index nightly, you can always *not* erase your tmp dir, your index
 process will do this automatically if you create your IndexWriter with
 the overwrite option. This way, you can have a backup index if there is
 a system failure at some point (like when you copy/move directories).

 2) Use an incremental index. Regularly, I scan my files, see if there are
 modification/additions and update my master index. Removing from the
 master index, adding to a temp dir, then merging. I haven't seen any
 weirdness on windows with this process.

 HTH,
 sv



 On Mon, 16 Aug 2004, Patrick Burleson wrote:

  I've read in the docs about updating an index and its suggestion
  reguarding swapping out indexes with a directory rename.
 
  Here's my question, how to do this when searches are running live?
 
  Say I have a directory that holds the current valid index:
 
  C:\myindex
 
  and when I'm running my nightly process to generate the index, it gets
  temporarily indexed to:
 
  C:\temp\myindex
 
  How can I very quickly replace C:\myindex with C:\temp\myindex?
 
  I can't simply do a rename since C:\myindex will likely have open
  files. (Gotta love windows)
 
  And I can't delete all files in myindex, again because of the open files issue.
 
  Any ideas?
 
  Thanks,
  Patrick
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



javadoc api

2004-08-17 Thread Ernesto De Santis
Hello Lucene developers

A litle issue about a Field documentation.

In Field class on getBoost() method it says:

Returns the boost factor for hits on any field of this document.

I think that this comment are copied from Document class and forgot change
it.

Bye
Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Swapping Indexes?

2004-08-17 Thread Stephane James Vaucher
On Tue, 17 Aug 2004, Patrick Burleson wrote:

 Forward back to list.
 
 
 -- Forwarded message --
 From: Patrick Burleson [EMAIL PROTECTED]
 Date: Tue, 17 Aug 2004 11:30:19 -0400
 Subject: Re: Swapping Indexes?
 To: Stephane James Vaucher [EMAIL PROTECTED]
 
 Stephane,
 
 Thank you for the ideas. I'm going about implenting idea 1 (I like the
 idea of leaving the temp index around for recovery), but I have a
 question reguarding your original index. Do you just copy over the
 temp index and don't worry abou cleaning up the old index directory?

Actually, I use a IndexWriter in overwrite mode on the master dir and 
merge the temp dir. This cleans up the old master.
 
 Right now I have my code deleting the files in the main index
 directory after telling the search controller to switch to the temp
 index. But by doing that, I need to manage existing searches and not
 break them while they are running. I also still run into the open
 files problem on Windows when trying to delete a file one of the
 searchers has open before it's closed.

I used to way some time (~1 minute) for all searches on the old master to 
finish after redirecting to the temp dir, then I would switch to the new 
master. 

 Thoughts?

If you apply a lease-like contract with your searchers where they 
borrow a reference to a searcher and then hand it back to the manager, 
you can probably trace your open files.

HTH,
sv
 
 Patrick
 
 
 
 
 On Mon, 16 Aug 2004 18:22:20 -0400 (EDT), Stephane James Vaucher
 [EMAIL PROTECTED] wrote:
  I've tried two options that seem to work:
 
  1) Have a singleton that is responsible that will control your searchers.
  This controller can temporarilly redirect your searchers to
  c:/temp/myindex, allowing you to copy you index to c:/myindex. After that
  process completes, your controller can tell your searchers to use
  c:/myindex, allowing you to then erase your temp index.
 
  If you index nightly, you can always *not* erase your tmp dir, your index
  process will do this automatically if you create your IndexWriter with
  the overwrite option. This way, you can have a backup index if there is
  a system failure at some point (like when you copy/move directories).
 
  2) Use an incremental index. Regularly, I scan my files, see if there are
  modification/additions and update my master index. Removing from the
  master index, adding to a temp dir, then merging. I haven't seen any
  weirdness on windows with this process.
 
  HTH,
  sv
 
 
 
  On Mon, 16 Aug 2004, Patrick Burleson wrote:
 
   I've read in the docs about updating an index and its suggestion
   reguarding swapping out indexes with a directory rename.
  
   Here's my question, how to do this when searches are running live?
  
   Say I have a directory that holds the current valid index:
  
   C:\myindex
  
   and when I'm running my nightly process to generate the index, it gets
   temporarily indexed to:
  
   C:\temp\myindex
  
   How can I very quickly replace C:\myindex with C:\temp\myindex?
  
   I can't simply do a rename since C:\myindex will likely have open
   files. (Gotta love windows)
  
   And I can't delete all files in myindex, again because of the open files issue.
  
   Any ideas?
  
   Thanks,
   Patrick
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Swapping Indexes?

2004-08-17 Thread Patrick Burleson
On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher 
 
 Actually, I use a IndexWriter in overwrite mode on the master dir and
 merge the temp dir. This cleans up the old master.
 

I'm a bit of a Lucene newbie here, and I am trying to understand what
you mean by merge the temp dir? Do you copy your exiting Index to
the temp location, then use the overwrite feature of IndexWriter to
re-create the master, then what do you merge? Shouldn't the master
index now have everything?

 
 I used to way some time (~1 minute) for all searches on the old master to
 finish after redirecting to the temp dir, then I would switch to the new
 master.
 

I'm going to make this a setting, so that test won't have to wait a
whole minute. But I think this is the cleanest solution without having
to implement some sort of leaseing solution. Our searches should be
fast and 1 minute is a long time. They should all be done by then.

Thanks again,
Patrick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Swapping Indexes?

2004-08-17 Thread Stephane James Vaucher
On Tue, 17 Aug 2004, Patrick Burleson wrote:

 On Tue, 17 Aug 2004 13:17:10 -0400 (EDT), Stephane James Vaucher 
  
  Actually, I use a IndexWriter in overwrite mode on the master dir and
  merge the temp dir. This cleans up the old master.
  
 
 I'm a bit of a Lucene newbie here, and I am trying to understand what
 you mean by merge the temp dir? 

IndexWriter.addIndexes()

 Do you copy your exiting Index to
 the temp location, then use the overwrite feature of IndexWriter to
 re-create the master, then what do you merge? Shouldn't the master
 index now have everything?

What I mean is the following:

1) create tmp dir
2) redirect searchers to tmp dir
3) wait for everyone to use tmp dir (or other mecanism)
4) open indexwriter on master dir erasing it
5) merge tmp directory, using addIndexes() method
6) redirect searchers to new master dir
 
  
  I used to way some time (~1 minute) for all searches on the old master to
  finish after redirecting to the temp dir, then I would switch to the new
  master.
  
 
 I'm going to make this a setting, so that test won't have to wait a
 whole minute. But I think this is the cleanest solution without having
 to implement some sort of leaseing solution. Our searches should be
 fast and 1 minute is a long time. They should all be done by then.
 
I used to reindex all my docs at 5h00AM, I probably could have waited 10 
minutes since I didn't have users, it's all about requirements ;)

 Thanks again,
 Patrick
 

sv


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: javadoc api

2004-08-17 Thread Otis Gospodnetic
Thanks Ernesto, I fixed it.

Otis

--- Ernesto De Santis [EMAIL PROTECTED] wrote:

 Hello Lucene developers
 
 A litle issue about a Field documentation.
 
 In Field class on getBoost() method it says:
 
 Returns the boost factor for hits on any field of this document.
 
 I think that this comment are copied from Document class and forgot
 change
 it.
 
 Bye
 Ernesto.
 
 
 ---
 Outgoing mail is certified Virus Free.
 Checked by AVG anti-virus system (http://www.grisoft.com).
 Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004

 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: PDFBox Issue

2004-08-17 Thread Paul Smith
I actually thought it might have been trying to use the log4j 1.3 'alpha'
build (there is no 'alpha' build yet, but notionally the latest HEAD isn't
too far from it).  There has been a subtle change to log4j in recent months
that could have a similar impact.
Cheers,

Paul Smith
 -Original Message-
 From: Ben Litchfield [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 10:48 PM
 To: Lucene Users List
 Subject: Re: PDFBox Issue
 
 
 PDFBox comes with log4j version 1.2.5(according to MANIFEST.MF in jar
 file), I believe that 1.2.8 is the latest.  I will make sure that the next
 version of PDFBox includes the latest log4j version, which I assume is
 what everybody would like to use.
 
 But, by looking at the below error message it appears that you might have
 an older log4j in your classpath
 
 Logger.getLogger( Class ) is available in 1.2.5 and 1.2.8
 
 
 Ben
 
 
 On Tue, 17 Aug 2004, Don Vaillancourt wrote:
 
  Wow, this is an old message.
 
  I managed to get my code to work by using the previous version of
  PDFBox.  I had used the version of log4j that had come with PDFBox.
 
  Someone had mentioned recompiling log4j, but I couldn't get the project
  to import the source into Eclipse, so I gave up.  But things work great
  with the version of PDFBox that I compiled with so I am fine with that.
 
  As for the version of log4j, I could not tell you, as I said above it
  came with PDFBox, so I'm guessing that it had probably not been tested
  with the version of log4j it was being distributed with.
 
  Paul Smith wrote:
 
  What version of the log4j jar are you using?
  
  
  
  -Original Message-
  From: Don Vaillancourt [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, June 29, 2004 8:06 AM
  To: Lucene Users List
  Subject: PDFBox Issue
  
  Hi all,
  
  I know that this is a Lucene list but wanted to know if any of you
 have
  gotten this error before using PDFBox?
  
  I've gotten the latest version of PDFBox and it is giving me the
 following
  error:
  
  java.lang.VerifyError: (class: org/apache/log4j/LogManager, method:
  clinit signature: ()V) Incompatible argument to function
  at org.apache.log4j.Logger.getLogger(Logger.java:94)
  at org.pdfbox.pdfparser.PDFParser.clinit(PDFParser.java:57)
  at
 
 org.pdfbox.searchengine.lucene.LucenePDFDocument.addContent(LucenePDFDoc
 um
  ent.java:197)
  at
 
 org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDo
 cu
  ment.java:118)
  at Index.indexFile(Index.java:287)
  at Index.indexDirectory(Index.java:265)
  at Index.update(Index.java:63)
  at Lucene.main(Lucene.java:26)
  Exception in thread main
  
  I am using all the jar files that came with PDFBox.
  
  Anyone run into this problem.  I am using the following line of code:
  
  Document doc = LucenePDFDocument.getDocument(f);
  
  Thanks
  
  
  Don Vaillancourt
  Director of Software Development
  
  WEB IMPACT INC.
  416-815-2000 ext. 245
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com
  
  
  
  
  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright.  If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
 
 
  --
  *Don Vaillancourt
  Director of Software Development
  *
  *WEB IMPACT INC.*
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
  web: http://www.web-impact.com
 
 
 
  / This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.
  /
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



OutOfMemoryError

2004-08-17 Thread Terence Lai
Hi All,

I am getting a OutOfMemoryError when I deploy my EJB application. To debug the 
problem, I wrote the following test program:

public static void main(String[] args) {
try {
Query query = getQuery();

for (int i=0; i1000; i++) {
search(query);

if ( i%50 == 0 ) {
System.out.println(Sleep...);
Thread.currentThread().sleep(5000);
System.out.println(Wake up!);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}

private static void search(Query query) throws IOException {
FSDirectory fsDir = null;
IndexSearcher is = null;
Hits hits = null;

try {
fsDir = FSDirectory.getDirectory(C:\\index, false);
is = new IndexSearcher(fsDir);
SortField sortField = new SortField(profile_modify_date,
SortField.STRING, true);

hits = is.search(query, new Sort(sortField));
} finally {
if (is != null) {
try {
is.close();
} catch (Exception ex) {
}
}

if (fsDir != null) {
try {
is.close();
} catch (Exception ex) {
}
}
}

}

In the test program, I wrote a loop to keep calling the search method. Everytime it 
enters the search method, I would instantiate the IndexSearcher. Before I exit the 
method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 
seconds in every 50 searches. Hopefully, this will give some time for the java to do 
the Garbage Collection. Unfortunately, when I observe the memory usage of my process, 
it keeps increasing until I got the java.lang.OutOfMemoryError.

Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the 
search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I 
don't have this problem, and the memory usage keeps at a very static level.

Does anyone experience a similar problem? Did I do something wrong in the test 
program. I throught by closing the IndexSearcher and the FSDirectory, the memory will 
be able to release during the Garbage Collection.

Thanks,
Terence




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: OutOfMemoryError

2004-08-17 Thread Terence Lai
Sorry. I should make it more clear in my last email. I have implemented an EJB Session 
Bean executing the Lucene search. At the beginning, the session been is working fine. 
It returns the correct search results to me. As more and more search requests being 
processed, the server ends up having the OutOfMemoryError. If I restart the server, 
every thing works fine again.

Terence

 Hi All,
 
 I am getting a OutOfMemoryError when I deploy my EJB application. To debug the 
 problem, 
 I wrote the following test program:
 
 public static void main(String[] args) {
 try {
 Query query = getQuery();
 
 for (int i=0; i1000; i++) {
 search(query);
 
 if ( i%50 == 0 ) {
 System.out.println(Sleep...);
 Thread.currentThread().sleep(5000);
 System.out.println(Wake up!);
 }
 }
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
 
 private static void search(Query query) throws IOException {
 FSDirectory fsDir = null;
 IndexSearcher is = null;
 Hits hits = null;
 
 try {
 fsDir = FSDirectory.getDirectory(C:\\index, false);
 is = new IndexSearcher(fsDir);
 SortField sortField = new SortField(profile_modify_date,
 SortField.STRING, true);
 
 hits = is.search(query, new Sort(sortField));
 } finally {
 if (is != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 
 if (fsDir != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 }
 
 }
 
 In the test program, I wrote a loop to keep calling the search method. Everytime 
 it enters the search method, I would instantiate the IndexSearcher. Before I exit 
 the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep 
 for 5 seconds in every 50 searches. Hopefully, this will give some time for the 
 java to do the Garbage Collection. Unfortunately, when I observe the memory usage 
 of my process, it keeps increasing until I got the java.lang.OutOfMemoryError.
 
 Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the 
 search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), 
 I don't have this problem, and the memory usage keeps at a very static level.
 
 Does anyone experience a similar problem? Did I do something wrong in the test 
 program. 
 I throught by closing the IndexSearcher and the FSDirectory, the memory will be 
 able to release during the Garbage Collection.
 
 Thanks,
 Terence
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError

2004-08-17 Thread Daniel Naber
On Wednesday 18 August 2004 00:30, Terence Lai wrote:

   if (fsDir != null) {
 try {
   is.close();
 } catch (Exception ex) {
 }
   }

You close is here again, not fsDir. Also, it's a good idea to never ignore 
exceptions, you should at least print them out, even if it's just a 
close() that fails.

Regards
 Daniel

-- 
http://www.danielnaber.de


RE: Re: OutOfMemoryError

2004-08-17 Thread Terence Lai
Thanks for pointing this out. Even I fixed the code to close the fsDir and also add 
the ex.printStackTrace(System.out), I am still hitting the OutOfMemeoryError.

Terence

 On Wednesday 18 August 2004 00:30, Terence Lai wrote:
if (fsDir != null) { try {
is.close(); } catch (Exception ex) 
 { }   }
 You close is here again, not fsDir. Also, it's a good idea to never ignore 
 exceptions, 
 you should at least print them out, even if it's just a close() that fails.
 Regards Daniel
 -- http://www.danielnaber.de
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]