date:20040820

What would be the best way?  Use Lucene outside of EJB.  It's quite 
silly to make such a decision purely due to a policy decision when 
the technicalities of it show that it is an unwise decision.

You're going to navigate Hits through a session bean?  And as you said, 
the EJB spec says not to use file I/O from EJB's.  That is a good 
recommendation if you are distributing your system across servers and 
replication is occurring - if another call to a session bean occurs and 
ends up on a different server, then the file handle is lost.

I violate the spec in my JavaDevWithAnt project and have one mode where 
I have a stateless session bean returning search results: 
http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do 
not recommend it.  It works when you are in a single-server 
environment.

In summary - EJB and Lucene are not a good mix - don't force it just to 
be buzzword compliant.

Erik
On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
hi all
   purely due to a policy decision, we would like to host our lucene 
search
application , in a j2ee container, preferable by means of a ejb.
Since access to java.io is restricted by the ejb specification, what 
would
be the best way to create desgin the application ?
  i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and 
not a
session bean
  does any one have pointers or samples that can be looked at



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene and ejb applications

2004-08-20 Thread Rupinder Singh Mazara

hi erik

 thanks for the warning and the code.
 Let me re-phrase the question,

 i have a index generated by lucene, i need to have the search capabilty
 to have a high availabilty. What solutions would be the most optimal

 Currentlly i have two senarions in mind
  a) setup a RMI based app. that on start-up initializes a IndexSearcher
object
 and waits for invocation of a method like Vector executeQuery(Query )

  b) create a web based app(jsp/servlet or struts)  that initialises the
IndexSearcher object, and stores in the servletContext on intialization, and
all request invoke the Hits search(Query q)

  with senario a)  i can have more control over updates, insert, and deletes
  where as with  senario b) has higher availabilty

 I want to create and store the IndexSearcher object, during initailization
to save on
 mutlitple open and reads. once updates are ready signal can be sent to
block further searches while the updates are integrated into the existing
index.



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 20 August 2004 11:13
To: Lucene Users List
Subject: Re: lucene and ejb applications


What would be the best way?  Use Lucene outside of EJB.  It's quite
silly to make such a decision purely due to a policy decision when
the technicalities of it show that it is an unwise decision.

You're going to navigate Hits through a session bean?  And as you said,
the EJB spec says not to use file I/O from EJB's.  That is a good
recommendation if you are distributing your system across servers and
replication is occurring - if another call to a session bean occurs and
ends up on a different server, then the file handle is lost.

I violate the spec in my JavaDevWithAnt project and have one mode where
I have a stateless session bean returning search results:
http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do
not recommend it.  It works when you are in a single-server
environment.

In summary - EJB and Lucene are not a good mix - don't force it just to
be buzzword compliant.

   Erik


On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:

 hi all

purely due to a policy decision, we would like to host our lucene
 search
 application , in a j2ee container, preferable by means of a ejb.
 Since access to java.io is restricted by the ejb specification, what
 would
 be the best way to create desgin the application ?
   i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and
 not a
 session bean
   does any one have pointers or samples that can be looked at






 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

pdf search

Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Fw: pdf search

How can I search through PDF?
- Original Message - 
From: Santosh 
To: Lucene Users List 
Sent: Friday, August 20, 2004 5:59 PM
Subject: pdf search

Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 

regards
Santosh kumar
SoftPro Systems
Hyderabad

The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: Fw: pdf search

2004-08-20 Thread Ben Litchfield



In order to search through a PDF document the text must be extracted from
the PDF document.  There are several libraries to do that, including
http://www.pdfbox.org   After you have the text from the PDF document you
just add it to the lucene index like any other text document.  You should
go through the intro tutorial to understand how to index/search text using
lucene.

Ben



On Fri, 20 Aug 2004, Santosh wrote:

 How can I search through PDF?
 - Original Message -
 From: Santosh
 To: Lucene Users List
 Sent: Friday, August 20, 2004 5:59 PM
 Subject: pdf search


 Hi,

 I am new bee to lucene.

 I have downloaded zip file. now how can i give my own list words to lucene?
 In the demo i saw that lucene is automatically creating index if we run the java 
 program.but I want to give my own search words, how is it possible?


 regards
 Santosh kumar
 SoftPro Systems
 Hyderabad


 The harder you train in peace, the lesser you bleed in war

 ---SOFTPRO DISCLAIMER--

 Information contained in this E-MAIL and any attachments are
 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
 and 'confidential'.

 If you are not an intended or authorised recipient of this E-MAIL or
 have received it in error, You are notified that any use, copying or
 dissemination  of the information contained in this E-MAIL in any
 manner whatsoever is strictly prohibited. Please delete it immediately
 and notify the sender by E-MAIL.

 In such a case reading, reproducing, printing or further dissemination
 of this E-MAIL is strictly prohibited and may be unlawful.

 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
 hereto is free from computer viruses or other defects.

 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
 those of the author and are not necessarily those of SOFTPRO SYSTEMS.
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: pdf search

2004-08-20 Thread David Townsend

Hi Santosh,

Lucene doesn't search pdfs per se.  To make anything searchable you have to first 
extract the content and then put it in lucene in a form it understands (i.e document 
objects).  So in order to search your pdfs you first need to extract the info from the 
PDFs using something like PDFBox.  So your battleplan should be forget lucene for a 
while, get the raw data out of all the items you want to search. Then look at the 
lucene articles about creating simple searchable indices.

DT

If we didn't train to fight, who'd fight the wars? :)

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 20 August 2004 13:30
To: Lucene Users List
Subject: Fw: pdf search


How can I search through PDF?
- Original Message - 
From: Santosh 
To: Lucene Users List 
Sent: Friday, August 20, 2004 5:59 PM
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects. 



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

pdfboxhelp

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: pdf search

2004-08-20 Thread Karthik N S

hi

What is that u intend to Search and What is this own 'search words'

 First Explain properly  u'r requirement to the form to get intented
results.



with regards
Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Friday, August 20, 2004 5:59 PM
To: Lucene Users List
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the
java program.but I want to give my own search words, how is it possible?


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene and ejb applications

2004-08-20 Thread Otis Gospodnetic

Option b) sounds simpler and sufficient to me.  I don't see why you
would need to involve RMI for something as simple as this.  I use
something similar to your b) option for some indices behind
http://www.simpy.com/ .  I don't store IndexSearcher in the servlet
context, though - I just have some logic like this:


/**
 * Returns an instance of [EMAIL PROTECTED] IndexDescriptor} for the given
 * codeindexID/code, which must represent an absolute file
 * path to the index directory.
 * p/
 * This method caches [EMAIL PROTECTED] IndexDescriptor}s in a LRU Map and
 * first tries to retrieve them from there.
 * p/
 * If the specified index has been changed since the the last time
 * it was used, its [EMAIL PROTECTED] Searcher} is reloaded.
 *
 * @param indexID the full path to the index directory
 * @return an instance of [EMAIL PROTECTED] IndexDescriptor}
 * @throws SearcherException if the given index cannot be accessed
 */
IndexDescriptor getUserSearcherIndexDescriptor(String indexID)
throws SearcherException
{
File indexDir = validateIndex(indexID);
IndexDescriptor indexDescriptor =
getIndexDescriptorFromCache(indexDir);

try
{
// if this is a known index
if (indexDescriptor != null)
{
// if the index has changed since this Searcher was
created, make a new Searcher
long currentVersion =
IndexReader.getCurrentVersion(indexDir);
if (currentVersion  indexDescriptor.lastKnownVersion)
{
indexDescriptor.lastKnownVersion = currentVersion;
indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
}
}
// if this is a new index
else
{
indexDescriptor = new IndexDescriptor();
indexDescriptor.indexDir = indexDir;
indexDescriptor.lastKnownVersion =
IndexReader.getCurrentVersion(indexDir);
indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
}
return cacheIndexDescriptor(indexDescriptor);
}
catch (IOException e)
{
throw new SearcherException(Cannot open index:  +
indexDir, e);
}
}

IndexDescriptor is a simple struct-like class.


Otis


--- Rupinder Singh Mazara [EMAIL PROTECTED] wrote:

 hi erik
 
  thanks for the warning and the code.
  Let me re-phrase the question,
 
  i have a index generated by lucene, i need to have the search
 capabilty
  to have a high availabilty. What solutions would be the most optimal
 
  Currentlly i have two senarions in mind
   a) setup a RMI based app. that on start-up initializes a
 IndexSearcher
 object
  and waits for invocation of a method like Vector
 executeQuery(Query )
 
   b) create a web based app(jsp/servlet or struts)  that initialises
 the
 IndexSearcher object, and stores in the servletContext on
 intialization, and
 all request invoke the Hits search(Query q)
 
   with senario a)  i can have more control over updates, insert, and
 deletes
   where as with  senario b) has higher availabilty
 
  I want to create and store the IndexSearcher object, during
 initailization
 to save on
  mutlitple open and reads. once updates are ready signal can be sent
 to
 block further searches while the updates are integrated into the
 existing
 index.
 
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: 20 August 2004 11:13
 To: Lucene Users List
 Subject: Re: lucene and ejb applications
 
 
 What would be the best way?  Use Lucene outside of EJB.  It's quite
 silly to make such a decision purely due to a policy decision when
 the technicalities of it show that it is an unwise decision.
 
 You're going to navigate Hits through a session bean?  And as you
 said,
 the EJB spec says not to use file I/O from EJB's.  That is a good
 recommendation if you are distributing your system across servers
 and
 replication is occurring - if another call to a session bean occurs
 and
 ends up on a different server, then the file handle is lost.
 
 I violate the spec in my JavaDevWithAnt project and have one mode
 where
 I have a stateless session bean returning search results:
 http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely
 do
 not recommend it.  It works when you are in a single-server
 environment.
 
 In summary - EJB and Lucene are not a good mix - don't force it just
 to
 be buzzword compliant.
 
  Erik
 
 
 On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
 
  hi all
 
 purely due to a policy decision, we would like to host our
 lucene
  search
  application , in a j2ee container, preferable by means of a ejb.
  Since access to java.io is restricted by the ejb specification,
 what
  would
  be the best way to create desgin the application ?
i have taken a look at [EMAIL

Re: Debian build problem with 1.4.1

2004-08-20 Thread Otis Gospodnetic

Hello Jeff,

I don't have Debian to try this out, and this is going to be a stupid
question and suggestion, but where/how is the CLASSPATH set?  Are any
of those commands actually using Lucene's build.xml?

I'm asking, because it looks like your compiler is not finding Reader
and IOException classes, both of which are in java.io.* package, which
I see imported in StandardTokenizer.java as 'import java.io.*;'.

Otis

--- Jeff Breidenbach [EMAIL PROTECTED] wrote:

 
 Hi all,
 
 I am the Debian package maintainer for Lucene, and I'm having build
 problems with 1.4.1. We are very close to a major Debian release
 (code
 named 'sarge'), and the window for changes is very small. Can someone
 please help me in the next day or two, otherwise Debian stable will
 ship
 Lucene 1.4-final for the next couple of years. It looks to me like
 the
 problem is in javacc generated code, and it's not obvious to me what
 to do.
 
 For debian sarge or sid users out there who want to reproduce the
 build problem, download the lucene 1.4.1 source tarball, then:
 
   apt-get install devscripts
   apt-get source liblucene-java
   cd lucene-1.4
   uupdate -v 1.4.1 ../lucene-1.4.1-src.tar.gz 
   cd ../lucene-1.4.1
   debuild -us -uc
 
 Cheers,
 Jeff
 
 =
 
 
 compile-core:
 [mkdir] Created dir: /tmp/lucene/lucene-1.4.1/build/classes/java
 [javac] Compiling 160 source files to
 /tmp/lucene/lucene-1.4.1/build/classes/java
 [javac]

/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15:
 cannot resolve symbol
 [javac] symbol  : class Reader
 [javac] location: class
 org.apache.lucene.analysis.standard.StandardTokenizer
 [javac]   public StandardTokenizer(Reader reader) {
 [javac]^
 [javac]

/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:24:
 cannot resolve symbol
 [javac] symbol  : class IOException
 [javac] location: class
 org.apache.lucene.analysis.standard.StandardTokenizer
 [javac]   final public org.apache.lucene.analysis.Token next()
 throws ParseException, IOException {
 [javac]  
 ^
 [javac]

/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15:
 recursive constructor invocation
 [javac]   public StandardTokenizer(Reader reader) {
 [javac]  ^
 [javac] 3 errors
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdf search

hi karthik,

I have a website with some items, each  contain html and pdf documents , I
have to store keywords against each item, whenever a user enters any search
word if it matches with any one of  the existing keyword list then it should
show the link to particular Item.


- Original Message -
From: Karthik N S [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, August 20, 2004 6:56 PM
Subject: RE: pdf search


 hi

 What is that u intend to Search and What is this own 'search words'

  First Explain properly  u'r requirement to the form to get intented
 results.



 with regards
 Karthik

 -Original Message-
 From: Santosh [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 20, 2004 5:59 PM
 To: Lucene Users List
 Subject: pdf search


 Hi,

 I am new bee to lucene.

 I have downloaded zip file. now how can i give my own list words to
lucene?
 In the demo i saw that lucene is automatically creating index if we run
the
 java program.but I want to give my own search words, how is it possible?


 regards
 Santosh kumar
 SoftPro Systems
 Hyderabad


 The harder you train in peace, the lesser you bleed in war

 ---SOFTPRO DISCLAIMER--



 Information contained in this E-MAIL and any attachments are

 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

 and 'confidential'.



 If you are not an intended or authorised recipient of this E-MAIL or

 have received it in error, You are notified that any use, copying or

 dissemination  of the information contained in this E-MAIL in any

 manner whatsoever is strictly prohibited. Please delete it immediately

 and notify the sender by E-MAIL.



 In such a case reading, reproducing, printing or further dissemination

 of this E-MAIL is strictly prohibited and may be unlawful.



 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

 hereto is free from computer viruses or other defects.



 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

 those of the author and are not necessarily those of SOFTPRO SYSTEMS.

 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdfboxhelp

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: Lucene Search Applet

2004-08-20 Thread Simon mcIlwaine

Im a new Lucene User and I'm not too familiar with Applets either but I've
been doing a bit of testing on java applet security and if im correct in
saying that applets can read anything below there codebase then my problem
is not a security restriction one. The error is reading
java.lang.NoClassDefFoundError and the classpath is set as I have it working
in a Swing App. Does someone actually have Lucene working in an Applet? Can
it be done?? Please help.

Thanks

Simon

- Original Message - 

From: Terry Steichen [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, August 18, 2004 4:17 PM
Subject: Re: Lucene Search Applet


I suspect it has to do with the security restrictions of the applet, 'cause
it doesn't appear to be finding your Lucene jar file.  Also, regarding the
lock files, I believe you can disable the locking stuff just for purposes
like yours (read-only index).

Regards,

Terry
  - Original Message - 
  From: Simon mcIlwaine
  To: Lucene Users List
  Sent: Wednesday, August 18, 2004 11:03 AM
  Subject: Lucene Search Applet


  Im developing a Lucene CD-ROM based search which will search html pages on
CD-ROM, using an applet as the UI. I know that theres a problem with lock
files and also security restrictions on applets so I am using the
RAMDirectory. I have it working in a Swing application however when I put it
into an applet its giving me problems. It compiles but when I go to run the
applet I get the error below. Can anyone help? Thanks in advance.
  Simon

  Error:

  Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory

  At: Java.lang.Class.getDeclaredConstructors0(Native Method)

  At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610)

  At: Java.lang.Class.getConstructor0(Class.java:1922)

  At: Java.lang.Class.newInstance0(Class.java:278)

  At: Java.lang.Class.newInstance(Class.java:261)

  At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617)

  At: sun.applet.AppletPanel.runloader(AppletPanel.java:546)

  At: sun.applet.AppletPanel.run(AppletPanel.java:298)

  At: java.lang.Thread.run(Thread.java:534)

  Code:

  import org.apache.lucene.search.IndexSearcher;

  import org.apache.lucene.search.Query;

  import org.apache.lucene.search.TermQuery;

  import org.apache.lucene.store.RAMDirectory;

  import org.apache.lucene.store.Directory;

  import org.apache.lucene.index.Term;

  import org.apache.lucene.search.Hits;

  import java.awt.*;

  import java.awt.event.*;

  import javax.swing.*;

  import java.io.*;

  public class MemorialApp2 extends JApplet implements ActionListener{

  JLabel prompt;

  JTextField input;

  JButton search;

  JPanel panel;

  String indexDir = C:/Java/lucene/index-list;

  private static RAMDirectory idx;

  public void init(){

  Container cp = getContentPane();

  panel = new JPanel();

  panel.setLayout(new FlowLayout(FlowLayout.CENTER, 4, 4));

  prompt = new JLabel(Keyword search:);

  input = new JTextField(,20);

  search = new JButton(Search);

  search.addActionListener(this);

  panel.add(prompt);

  panel.add(input);

  panel.add(search);

  cp.add(panel);

  }

  public void actionPerformed(ActionEvent e){

  if (e.getSource() == search){

  String surname = (input.getText());

  try {

  findSurname(indexDir, surname);

  } catch(Exception ex) {

  System.err.println(ex);

  }

  }

  }

  public static void findSurname(String indexDir, String surname) throws
Exception{

  idx = new RAMDirectory(indexDir);

  IndexSearcher searcher = new IndexSearcher(idx);

  Query query = new TermQuery(new Term(surname, surname));

  Hits hits = searcher.search(query);

  for (int i = 0; i  hits.length(); i++) {

  //Document doc = hits.doc(i);

  System.out.println(Surname:  + hits.doc(i).get(surname));

  }

  }

  }



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdfboxhelp

Here is the super simple code required.

import org.pdfbox.searchengine.lucene.*;

File pdfFile = new File("/path/to/the/file.pdf");

// Below returns a parse PDF file in a Lucene Document object.
Document doc = LucenePDFDocument.getDocument(pdfFile);

Santosh wrote:

exactly, the same is required to me
- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 6:39 PM
Subject: Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--
Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web:

Re: pdfboxhelp

  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 7:37 PM
  Subject: Re: pdfboxhelp

  Here is the super simple code required.

  import org.pdfbox.searchengine.lucene.*;

  File pdfFile = new File(/path/to/the/file.pdf); 

  // Below returns a parse PDF file in a Lucene Document object.
  Document doc = LucenePDFDocument.getDocument(pdfFile);

  Santosh wrote:

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

continuous index updates

2004-08-20 Thread Crump, Michael

Hello,

 

I am currently working on a server app that will require the ability to
make index additions/deletions at any time.  I want to cache/reuse index
searchers and readers.  I know that once an index has changed only newly
opened readers will see the changes.  Creating a new reader to see the
changes and caching it will be no problem.  My  problem is that since
this is a multithreaded app other threads may be using the old readers
making it difficult to know when to close them.  I assume that a reader
must be closed to free the associated resources.  I was thinking about
using some kind of reference counted reader that would keep track of its
references and only truly close when there were no references.

 

Am I making this too difficult?

 

Is there a better way?

 

I assume others have had to do this using Lucene, do you have any
recommendations?

 

Regards,

 

Michael

Re: pdfboxhelp

Did I leave you speechless!? :-)

Santosh wrote:

- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 7:37 PM
Subject: Re: pdfboxhelp

Here is the super simple code required.

import org.pdfbox.searchengine.lucene.*;

File pdfFile = new File("/path/to/the/file.pdf");

// Below returns a parse PDF file in a Lucene Document object.
Document doc = LucenePDFDocument.getDocument(pdfFile);

Santosh wrote:

exactly, the same is required to me
- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 6:39 PM
Subject: Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--
Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: pdfboxhelp

Iam sorry, mail has been sent accidentally
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 8:02 PM
  Subject: Re: pdfboxhelp

  Did I leave you speechless!?  :-)

  Santosh wrote:

  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 7:37 PM
  Subject: Re: pdfboxhelp

  Here is the super simple code required.

  import org.pdfbox.searchengine.lucene.*;

  File pdfFile = new File(/path/to/the/file.pdf); 

  // Below returns a parse PDF file in a Lucene Document object.
  Document doc = LucenePDFDocument.getDocument(pdfFile);

  Santosh wrote:

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

Indexing and Searching Database Values in Lucene Search Engine

2004-08-20 Thread sivalingam T

  
How to index and search database values using Lucene Search Engine?

By

T.Sivalingam.

Sivalingam T

Indexing and Searching Database in Lucene

2004-08-20 Thread sivalingam T

  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92

RE: Indexing and Searching Database in Lucene

2004-08-20 Thread Aviran

You need to create a lucene index from the database.
Just  index the columns and the records from the database.
It will be useful to have also a field in lucene that contains the
database's primary key, so you can retrieve the actual record from the
database

Aviran

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 20, 2004 10:55 AM
To: [EMAIL PROTECTED]
Subject: Indexing and Searching Database in Lucene


  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1



Hi Otis,

I'm asking, because it looks like your compiler is not finding Reader
and IOException classes, both of which are in java.io.* package, which
I see imported in StandardTokenizer.java as 'import java.io.*;'.


In my copy of StandardTokenizer.java, there is no 'import java.io.*;'
(and in fact this is a change from lucene-1.4-final). Since this file
is apparently generated from JavaCC, I'm not sure what to do.  I'm
happy to supply a login to a Debian computer if someone is interested
in helping debug.

Are any of those commands actually using Lucene's build.xml?

Yes, they are just a wrapper around calling ant. The build.xml 
file has very minimal debian specific modifications.

Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing and Searching Database in Lucene





Funy thing is I was thinking of doing something like this just today.
This is especially good when you perform a lot of queries using the
LIKE statement. Lucene would increase search performance a great deal.

Aviran wrote:

  You need to create a lucene index from the database.
Just  index the columns and the records from the database.
It will be useful to have also a field in lucene that contains the
database's primary key, so you can retrieve the actual record from the
database

Aviran

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED]] 
Sent: Friday, August 20, 2004 10:55 AM
To: [EMAIL PROTECTED]
Subject: Indexing and Searching Database in Lucene


  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  



-- 

Don Vaillancourt
Director of Software Development


WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene and ejb applications

On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
hi erik
 thanks for the warning and the code.
 Let me re-phrase the question,
 i have a index generated by lucene, i need to have the search 
capabilty
 to have a high availabilty. What solutions would be the most optimal
I'm guessing from your descriptions that you want a search server that 
multiple applications can access.  Correct?  Is that what you mean by 
high availability?

Take a look at Nutch for examples of doing this kind of thing.  And 
also...

 Currentlly i have two senarions in mind
  a) setup a RMI based app. that on start-up initializes a 
IndexSearcher
object
 and waits for invocation of a method like Vector 
executeQuery(Query )
Lucene has built-in RMI capability, so you don't need to recreate this 
yourself.  Look at RemoteSearchable (and the test cases that use it).

  b) create a web based app(jsp/servlet or struts)  that initialises 
the
IndexSearcher object, and stores in the servletContext on 
intialization, and
all request invoke the Hits search(Query q)
This is ok, but you have the same issues with servlet context 
(application scope or even session scope) with distributed 
applications.  IndexSearcher, at the very least, should be transient 
and lazy initialized, perhaps nested under a controller object of your 
making.

  with senario a)  i can have more control over updates, insert, and 
deletes
  where as with  senario b) has higher availabilty
I disagree with your analysis of those scenarios.  Neither has more or 
less control or availability than the other.

 I want to create and store the IndexSearcher object, during 
initailization
to save on
 mutlitple open and reads. once updates are ready signal can be sent to
block further searches while the updates are integrated into the 
existing
index.
It is a good thing to keep an IndexSearcher instance around for big 
indexes to save on that I/O, I completely agree.  A simple 
IndexSearcher-encapsulating Java object which lazy initializes and 
keeps IndexSearcher as a transient would be quite sufficient, I think.  
Store that object wherever you like - application scope seems to be 
appropriate for your web application scenario.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

On Aug 20, 2004, at 11:12 AM, Jeff Breidenbach wrote:
Hi Otis,
I'm asking, because it looks like your compiler is not finding Reader
and IOException classes, both of which are in java.io.* package, which
I see imported in StandardTokenizer.java as 'import java.io.*;'.

In my copy of StandardTokenizer.java, there is no 'import java.io.*;'
(and in fact this is a change from lucene-1.4-final).
I don't understand this.  StandardTokenizer.java hasn't changed since  
last year.

% cvs log StandardTokenizer.java
  ...

revision 1.3
date: 2003/12/22 22:12:24;  author: cutting;  state: Exp;  lines: +6 -6
Fix StandardTokenizer's handling of CJK characters.

revision 1.2
date: 2003/10/01 16:39:26;  author: ehatcher;  state: Exp;  lines: +7 -4
oops, forgot to check in JavaCC generated files

revision 1.1
date: 2003/09/11 01:51:33;  author: ehatcher;  state: Exp;
PR 19468, but not exactly as it was done in the provided patches.   
JavaCC is no longer required to build Lucene, but can be run optionally
 
=

And I have import java.io.* at the top.
 Since this file
is apparently generated from JavaCC, I'm not sure what to do.
You can regenerate StandardTokenizer by running:
ant javacc
(you'll need JavaCC installed, of course, and this is the reason we  
check in the generated files in order to save the hassle for others)

It seems something is fishy with the copy of the code you have.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene and ejb applications

2004-08-20 Thread Praveen Peddi

Infact we do the same exact thing. Session bean method called search()
delegates to a POJO SearchService. We lazy load the IndexSearch cache it in
memory and invalidate that object when someone else modifies the index. This
trick works wonderfually for us. The search has become faster after caching
the searcher.

Praveen
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, August 20, 2004 12:02 PM
Subject: Re: lucene and ejb applications

 On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
  hi erik

   thanks for the warning and the code.
   Let me re-phrase the question,

   i have a index generated by lucene, i need to have the search
  capabilty
   to have a high availabilty. What solutions would be the most optimal

 I'm guessing from your descriptions that you want a search server that
 multiple applications can access.  Correct?  Is that what you mean by
 high availability?

 Take a look at Nutch for examples of doing this kind of thing.  And
 also...

   Currentlly i have two senarions in mind
a) setup a RMI based app. that on start-up initializes a
  IndexSearcher
  object
   and waits for invocation of a method like Vector
  executeQuery(Query )

 Lucene has built-in RMI capability, so you don't need to recreate this
 yourself.  Look at RemoteSearchable (and the test cases that use it).

b) create a web based app(jsp/servlet or struts)  that initialises
  the
  IndexSearcher object, and stores in the servletContext on
  intialization, and
  all request invoke the Hits search(Query q)

 This is ok, but you have the same issues with servlet context
 (application scope or even session scope) with distributed
 applications.  IndexSearcher, at the very least, should be transient
 and lazy initialized, perhaps nested under a controller object of your
 making.

with senario a)  i can have more control over updates, insert, and
  deletes
where as with  senario b) has higher availabilty

 I disagree with your analysis of those scenarios.  Neither has more or
 less control or availability than the other.

   I want to create and store the IndexSearcher object, during
  initailization
  to save on
   mutlitple open and reads. once updates are ready signal can be sent to
  block further searches while the updates are integrated into the
  existing
  index.

 It is a good thing to keep an IndexSearcher instance around for big
 indexes to save on that I/O, I completely agree.  A simple
 IndexSearcher-encapsulating Java object which lazy initializes and
 keeps IndexSearcher as a transient would be quite sufficient, I think.
 Store that object wherever you like - application scope seems to be
 appropriate for your web application scenario.

 Erik

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


I don't understand this.  StandardTokenizer.java hasn't changed since  
last year.

I have packaged Lucene such that 'ant javacc' is called at package
build time. I now see the problem - 'import java.io.*;' has been
removed from StandardTokenizer.jj in Lucene 1.4.1.  When I put that
line back in, things build fine.

Now that I know what the problem is, I'll go ahead and patch the Debian 
package. Please make sure the Lucene codebase gets fixed as well.

Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

memory leek in lucene?

2004-08-20 Thread iouli . golovatyi


Doing query against lucene  I run into memomry problem, i.e. it's look like
it's not giving memory back after the
query have been  executed.

I use ParallelMultiSearcher ant call close method after results are
displayed.

hits=null; // Hits class
if (ms!=null) ms.close(); //ParallelMultiSearcher

Doesn't help. The memory getting not free. On queries like No* I get
incremental memory consume of c. 20-70mb. per query.
Imagine what happens with my web server...

I tried also from command line and got the similar result.

Am I doing wrong or miss something?

Please help, I use 1.4.1 on linux box.
Joel





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have
enough time to percolate before the sarge release.

Now that that is taken care of, I'm curious about the status of gcj
compilation. Packaging Lucene as a native library might be useful for
projects such as PyLucene, and it is also advantageous for license
reasons i.e. avoiding the non-free JVM dependency. What's the current
gcj compilation recipe? The best I could find on Google (below) seems
a little bit stale.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html

Cheers,
Jeff



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

On Aug 20, 2004, at 12:36 PM, Jeff Breidenbach wrote:

I don't understand this.  StandardTokenizer.java hasn't changed since
last year.
I have packaged Lucene such that 'ant javacc' is called at package
build time. I now see the problem - 'import java.io.*;' has been
removed from StandardTokenizer.jj in Lucene 1.4.1.  When I put that
line back in, things build fine.
Now that I know what the problem is, I'll go ahead and patch the Debian
package. Please make sure the Lucene codebase gets fixed as well.
The codebase has been fixed, as of a couple of weeks ago :)
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: continuous index updates

2004-08-20 Thread Crump, Michael

So the finalizer on the underlying reader closes file handles?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 20, 2004 2:41 PM
To: Lucene Users List
Subject: Re: continuous index updates

I just create a new IndexSearcher, leave the old IndexSearcher alone,
and JVM's garbage collection cleans it up.

Otis

--- Crump, Michael [EMAIL PROTECTED] wrote:

 Hello,

 I am currently working on a server app that will require the ability
 to
 make index additions/deletions at any time.  I want to cache/reuse
 index
 searchers and readers.  I know that once an index has changed only
 newly
 opened readers will see the changes.  Creating a new reader to see
 the
 changes and caching it will be no problem.  My  problem is that since
 this is a multithreaded app other threads may be using the old
 readers
 making it difficult to know when to close them.  I assume that a
 reader
 must be closed to free the associated resources.  I was thinking
 about
 using some kind of reference counted reader that would keep track of
 its
 references and only truly close when there were no references.

 Am I making this too difficult?

 Is there a better way?

 I assume others have had to do this using Lucene, do you have any
 recommendations?

 Regards,

 Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: NegativeArraySizeException when creating a new IndexSearcher

2004-08-20 Thread Doug Cutting

Looks to me like you're using an older version of Lucene on your Linux 
box.  The code is back-compatible, it will read old indexes, but Lucene 
1.3 cannot read indexes created by Lucene 1.4, and will fail in the way 
you describe.

Doug
Sven wrote:
Hi!
I have a problem to port a Lucene based knowledgebase from Windows to Linux.
On Windows it works fine whereas I get a NegativeArraySizeException on Linux
when I try to initialise a new IndexSearcher to search the index. Deleting
and rebuilding the index didn't help. I checked permissions, file path and
lock_dir but as far as I can say they seem to be all right. As I couldn't
find another one with the same problem I guess I've overlooked sth, but I've
run out of ideas. I use lucene-1.4-rc2 and tomcat 5.0.18. Can someone help
me please with this or has an idea?
Kind regards,
Sven
java.lang.NegativeArraySizeException
 at
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106)
 at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:82)
 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:141)
 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:120)
 at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
 at org.apache.lucene.store.Lock$With.run(Lock.java:148)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:99)
 at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:75)
 at
com.sykon.knowledgebase.action.ListQueryResultAction.act(ListQueryResultActi
on.java:134)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp
eNode.java:159)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActionSetNode.call(Action
SetNode.java:121)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActSetNode.invoke(ActSetN
ode.java:98)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:165)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:162)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:136)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:371)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:312)
 at
org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNod
e.java:133)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:165)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:162)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:136)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:371)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:312)
 at org.apache.cocoon.Cocoon.process(Cocoon.java:656)
 at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1112)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:284)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:204)
 at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.
java:742)
 at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis
patcher.java:506)
 at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch
er.java:443)
 at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher
.java:359)
 at
org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.java:712
)
 at
org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.java:682)
 at
org.apache.jsp.knowlegebase.controller_jsp._jspService(controller_jsp.java:8
44)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:133)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
 at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
11)
 at

Lucene with English and Spanish Best Practice?

2004-08-20 Thread Chad Small

Hello,

I'm interested in any feedback from anyone who has worked through implementing 
Internationalization (I18N) search with Lucene or has ideas for this requirement.  
Currently, we're using Lucene with straight English and are looking to add Spanish to 
the mix (with maybe more languages to follow).  

This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper:

   PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new 
StandardAnalyzer());
   analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
   analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
   IndexWriter writer = new IndexWriter(indexDir, analyzer, create);

Would people suggest we switch this over to Snowball so there are English and Spanish 
Analyzers and IndexWriters?  Something like this:

PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer(English));
analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create);

PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer(Spanish));
analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create);


Are multiple indexes or mirrors of each index then usually created for every language? 
 We currently have 4 indexes that are all English.  Would we then create 4 more that 
are Spanish?  Then at search time we would determine the language and which set of 
indexes to search against, English or Spanish.

Or another approach could be to add a Spanish field to the existing 4 indexes since 
most of the indexes have only one field that will be translated from English to 
Spanish.


thanks a bunch,
chad.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

2004-08-20 Thread Doug Cutting

I can successfully use gcc 3.4.0 with Lucene as follows:
ant jar jar-demo
gcj -O3 build/lucene-1.5-rc1-dev.jar build/lucene-demos-1.5-rc1-dev.jar 
-o indexer --main=org.apache.lucene.demo.IndexHTML

./indexer -create docs
It runs pretty snappy too!  However I don't know if there's much milage 
in packaging Lucene as a native library.  It's easy enough for folks to 
compile Lucene this way, and applications built this way are pretty 
small.  The big thing to install is libgcj.

Doug
Jeff Breidenbach wrote:
Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have
enough time to percolate before the sarge release.
Now that that is taken care of, I'm curious about the status of gcj
compilation. Packaging Lucene as a native library might be useful for
projects such as PyLucene, and it is also advantageous for license
reasons i.e. avoiding the non-free JVM dependency. What's the current
gcj compilation recipe? The best I could find on Google (below) seems
a little bit stale.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html
Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley

Hi,

I'm trying to figure out how to speed up queries to a
large index.
I'm currently getting 133 req/sec, which isn't bad,
but isn't too close
to MySQL, which is getting 500 req/sec on the same
hardware with the
same set of documents.

Setup info  Stats:
- 4.3M documents, 12 keyword fields per document, 11
unindexed fields per document.
- lucene index size on disk=1.3G
- Hardware: dual opteron w/ 16GB memory, running 64
bit JVM (Sun 1.5 beta)
- Lucene version 1.4.1
- Hitting multithreaded server w/ 10 clients at once
- This is a read-only index... no updating is done
- Single IndexSearcher that is reused for all requests
 

Q1)  while hitting it with multiple queries at once,
lucene is pegged at 50% CPU usage (meaning it is
only using 1 out of 2 CPUs on average).  I took a
thread dump
and all of the lucene threads except one are blocked
on
reading a file (see trace below).  I could create two
index
readers, but that seems like it might be a waste, and
fixing
a symptom instead of the root problem.  Would multiple
IndexSearchers or IndexReaders share internal caches?
Is there a way to cache more info at a higher level
such that
it would get rid of this bottleneck?  The JVM isn't
taking up
much space (125M or so), and I have 16GB to work with!
The OS (linux) is obviously caching the index file,
but
that doesn't get rid of the synchronization issues,
and the
overhead of re-reading.
How is caching in lucene configured?
Does it internally use FieldCache, or do I have to use
that
somehow myself?
 
tcpConnection-8080-72 daemon prio=1
tid=0x002b24412490 nid=0x34a4 waiting for monitor
entry 

[0x45aba000..0x45abb2d0]
at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:215)
- waiting to lock 0x002ae153fa00 (a
org.apache.lucene.store.FSInputStream)
at
org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at
org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at
org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:176)
at
org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:88)
at
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:53)
at
org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:48)
at
org.apache.lucene.search.Scorer.score(Scorer.java:37)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
at
org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at
org.apache.lucene.search.Hits.init(Hits.java:43)
at
org.apache.lucene.search.Searcher.search(Searcher.java:33)
at
org.apache.lucene.search.Searcher.search(Searcher.java:27)


Even using only 1 cpu though, MySQL is faster. Here is
what
the queries look like:

field1:4 AND field2:188453 AND field3:1

field1:4  done alone selects around 4.2M records
field2:188453 done alone selects around 1.6M records
field3:1  done alone selects around 1K records
The whole query normally selects less than 50 records
Only the first 10 are returned (or whatever range
the client selects).

The fields are all keywords checked for exact matches
(no
fulltext search is done).  Is there anything I can do
to
speed these queries up, or is the structure just more
suited
to MySQL (and not an inverted index)?

How is a query like this carried out?

Any help would be greatly appreciated.  There's not a
lot of info
on searching (much more on updating). I'm looking
forward
to Lucene in Action!  too bad it's not out till
October.

-Yonik



___
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Custom filter

2004-08-20 Thread roy-lucene-user

Hi guys!

I was hoping someone here could help me out with a custom filter.

We have an index of emails and do some searches on the text of an email message and 
also searches based on the email addresses in a To, From or CC.

Since we also do searches on a bunch of emails, we created a custom filter for 
searches on an array of fields for an array of values.  [code included below]

The problem we're having is that creating a query string like so:
Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) OR CC:(email1 OR 
email2))
would return results, but our filter combined with a query string of Message:viagra 
sometimes wouldn't.

One thing I noticed is that when the results do return with the filter, the email has 
the format of [EMAIL PROTECTED], but the one that doesn't has something like [EMAIL 
PROTECTED]

Also it might have something to do with the storage of the From or To or CC.  We don't 
parse out the email addresses before storing them.  So sometimes the value of a 
From/To/CC field might be [EMAIL PROTECTED] or local [EMAIL PROTECTED] or even 
[EMAIL PROTECTED].  Could the carrots be throwing off my filter?

I also wouldn't mind any suggestions to doing this filter better.

Here is the bits method from our custom filter:
-
final public BitSet bits( IndexReader reader ) throws IOException {
BitSet bits = new BitSet( reader.maxDoc() );

for ( int x = 0; x  fields.length; x++ ) {
for ( int y = 0; y  values.length; y++ ) {
TermDocs termDocs = reader.termDocs( new Term( fields[x], values[y] ) 
);
try {
while ( termDocs.next() ) {
bits.set( termDocs.doc() );
}
}
finally {
termDocs.close();
}
}
}
return bits;
}
-

Thanks in advance,

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Custom filter

Have you considered using the built-in QueryFilter for this?   Why 
isn't it sufficient for your needs?

Erik
On Aug 20, 2004, at 6:32 PM, [EMAIL PROTECTED] wrote:
Hi guys!
I was hoping someone here could help me out with a custom filter.
We have an index of emails and do some searches on the text of an 
email message and also searches based on the email addresses in a To, 
From or CC.

Since we also do searches on a bunch of emails, we created a custom 
filter for searches on an array of fields for an array of values.  
[code included below]

The problem we're having is that creating a query string like so:
Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) 
OR CC:(email1 OR email2))
would return results, but our filter combined with a query string of 
Message:viagra sometimes wouldn't.

One thing I noticed is that when the results do return with the 
filter, the email has the format of [EMAIL PROTECTED], but the 
one that doesn't has something like [EMAIL PROTECTED]

Also it might have something to do with the storage of the From or To 
or CC.  We don't parse out the email addresses before storing them.  
So sometimes the value of a From/To/CC field might be 
[EMAIL PROTECTED] or local [EMAIL PROTECTED] or even 
[EMAIL PROTECTED].  Could the carrots be throwing off my filter?

I also wouldn't mind any suggestions to doing this filter better.
Here is the bits method from our custom filter:
-
final public BitSet bits( IndexReader reader ) throws IOException {
BitSet bits = new BitSet( reader.maxDoc() );
for ( int x = 0; x  fields.length; x++ ) {
for ( int y = 0; y  values.length; y++ ) {
TermDocs termDocs = reader.termDocs( new Term( 
fields[x], values[y] ) );
try {
while ( termDocs.next() ) {
bits.set( termDocs.doc() );
}
}
finally {
termDocs.close();
}
}
}
return bits;
}
-

Thanks in advance,
Roy.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


It's easy enough for folks to  compile Lucene this way

I'm having trouble, warnings and error messages appended. This is for
Lucene 1.4.1. One of the few Debian specific changes was to call the
jarball 1.4 instead of the default 1.5-rc1-dev designation in
build.xml.

rode:~ gcj --version
gcj (GCC) 3.3.4 (Debian 1:3.3.4-9)

rode:~ gcj build/lucene-1.4.jar build/lucene-demos-1.4.jar -o indexer \
   --main=org.apache.lucene.demo.IndexHTML  /tmp/log.txt

 and applications built this way are pretty small.  The big thing to
 install is libgcj.

I'm potentially interested in C applications calling a Lucene gcj
compiled native library. But that would be in the distant future if at
all. Right now just compiling a working Lucene app with gcj would be
pretty cool.

Cheers,
Jeff



=


org/apache/lucene/analysis/de/WordlistLoader.java: In class 
`org.apache.lucene.analysis.de.WordlistLoader':
org/apache/lucene/analysis/de/WordlistLoader.java: In method 
`org.apache.lucene.analysis.de.WordlistLoader.getWordSet(java.io.File)':
org/apache/lucene/analysis/de/WordlistLoader.java:47: warning: exception handler 
inside code that is being protected
CompoundFileReader.java: In class 
`org.apache.lucene.index.CompoundFileReader$CSInputStream':
CompoundFileReader.java: In method 
`org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(byte[],int,int)':
CompoundFileReader.java:215: warning: exception handler inside code that is being 
protected
org/apache/lucene/index/CompoundFileReader.java: In class 
`org.apache.lucene.index.CompoundFileReader':
org/apache/lucene/index/CompoundFileReader.java: In constructor 
`(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/CompoundFileReader.java:51: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/CompoundFileWriter.java: In class 
`org.apache.lucene.index.CompoundFileWriter':
org/apache/lucene/index/CompoundFileWriter.java: In method 
`org.apache.lucene.index.CompoundFileWriter.close()':
org/apache/lucene/index/CompoundFileWriter.java:127: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/CompoundFileWriter.java: In method 
`org.apache.lucene.index.CompoundFileWriter.copyFile(org.apache.lucene.index.CompoundFileWriter$FileEntry,org.apache.lucene.store.OutputStream,byte[])':
org/apache/lucene/index/CompoundFileWriter.java:194: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In class 
`org.apache.lucene.index.DocumentWriter':
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.addDocument(java.lang.String,org.apache.lucene.document.Document)':
org/apache/lucene/index/DocumentWriter.java:60: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.invertDocument(org.apache.lucene.document.Document)':
org/apache/lucene/index/DocumentWriter.java:117: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.writePostings(org.apache.lucene.index.Posting[],java.lang.String)':
org/apache/lucene/index/DocumentWriter.java:250: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.writeNorms(org.apache.lucene.document.Document,java.lang.String)':
org/apache/lucene/index/DocumentWriter.java:320: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/FieldInfos.java: In class `org.apache.lucene.index.FieldInfos':
org/apache/lucene/index/FieldInfos.java: In constructor 
`(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/FieldInfos.java:36: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/FieldInfos.java: In method 
`org.apache.lucene.index.FieldInfos.write(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/FieldInfos.java:172: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In class 
`org.apache.lucene.index.IndexReader':
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.open(org.apache.lucene.store.Directory,boolean)':
org/apache/lucene/index/IndexReader.java:110: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.delete(org.apache.lucene.index.Term)':
org/apache/lucene/index/IndexReader.java:449: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.commit()':
org/apache/lucene/index/IndexReader.java:480: warning: exception handler

Re: Custom filter