Query Analyzer

2005-02-07 Thread Ravi
How do I set the analyzer when I build the query in my code instead of
using a query parser ?

Thanks in advance
Ravi. 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query Analyzer

2005-02-07 Thread Ravi
That worked. Thanks a lot. 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 07, 2005 11:39 AM
To: Lucene Users List
Subject: Re: Query Analyzer


On Feb 7, 2005, at 11:29 AM, Ravi wrote:

 How do I set the analyzer when I build the query in my code instead of
 using a query parser ?

You don't.  All terms you use for any Query subclasses you instantiate 
must match exactly the terms in the index.  If you need an analyzer to 
do this then you're responsible for doing it yourself, just as 
QueryParser does underneath.  I do this myself in my current 
application like this:

 private Query createPhraseQuery(String fieldName, String string, 
boolean lowercase) {
 RossettiAnalyzer analyzer = new RossettiAnalyzer(lowercase);
 TokenStream stream = analyzer.tokenStream(fieldName, new 
StringReader(string));

 PhraseQuery pq = new PhraseQuery();
 Token token;
 try {
   while ((token = stream.next()) != null) {
   pq.add(new Term(fieldName, token.termText()));
   }
 } catch (IOException ignored) {
   // ignore - shouldn't get an IOException on a StringReader
 }

 if (pq.getTerms().length == 1) {
 // optimize single term phrase to TermQuery
 return new TermQuery(pq.getTerms()[0]);
 }

 return pq;
 }

Hope that helps.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexSearcher close

2005-02-01 Thread Ravi
Is there a way to check if an IndexSearcher is closed? 


Thanks in advance,
Ravi.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: How to get document count?

2005-02-01 Thread Ravi
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexW
riter.html#docCount()

You can try this.

-Original Message-
From: Luke Shannon [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 01, 2005 11:33 AM
To: Lucene Users List
Subject: Re: How to get document count?

Not sure if the API provides a method for this, but you could use Luke:

http://www.getopt.org/luke/

It gives you a count and lets you step through each Doc looking at their
fields.

- Original Message - 
From: Jim Lynch [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Tuesday, February 01, 2005 11:28 AM
Subject: How to get document count?


 I've indexed a large set of documents and think that something may
have
 gone wrong somewhere in the middle.  Is there a way I can display the
 count of documents in the index?

 Thanks,
 Jim.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Stopwords in phrases

2004-12-21 Thread Ravi
 I want to be able to use stopwords in exact phrase searches. I have
looked at Nutch and used the same approach (replace common words with
n-grams. Look at net.nutch.analysis.CommonGrams). 
  So if to,be,or and not are stop words, for the string to be
or not to be, the analyzer produces the following tokens

[to-be, to-be-or, to-be-or-not, to-be-or-not-to, to-be-or-not-to-be,
be-or, be-or-not, be-or-not-to, be-or-not-to-be, or-not, or-not-to,
or-not-to-be, not-to, not-to-be, to-be]

  This is exactly what I wanted from the analyzer during indexing.
  But I'm having a problem with the search. 
 when I do a search on not to be the analyzer is converting my search
into 
  content:not-to not-to-be to-be because the analyzer produces the
tokens not-to,not-to-be,to-be

  I'm getting 0 results on this as there is no token not-to not-to-be
to-be in the index. 

  I want just not-to-be from the analyzer during the search so when I
search on not to be I will get the document which has not-to-be as a
token. 

   How can I use the same analyzer to get different results in indexing
and searching? 

Thanks in advance,
Ravi. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Stopwords in phrases

2004-12-21 Thread Ravi

Are you also using the position increment of 0 for the gram tokens
like Nutch does?
Yes. 

I don't think considering only gram tokens will work for me because
Nutch uses only bi-grams. It can only have one gram per token. In my
case I have more than one and even if I get only the grams, I still will
have the same problem. 

Ravi.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



No of docs using IndexSearcher

2004-12-10 Thread Ravi
 How do I get the number of docs in an index If I just have access to a
searcher on that index?

Thanks in advance
Ravi.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: No of docs using IndexSearcher

2004-12-10 Thread Ravi
I'm fairly new to lucene.  The main reason why I did n't use the
IndexReader constructor for the searcher is we organize the indexes as
different partitions depending on document's date and during searching I
instantiate a MultiSearcher object on these different partitions
depending on from-date and to-date from the search. I was getting a
runtime exception during search, If the index does not have any
documents. That's why I was looking for some method on the searcher
object that gives me the number of documents. 

Thanks
Ravi


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 10, 2004 3:25 PM
To: Lucene Users List
Subject: Re: No of docs using IndexSearcher

If your index is open shouldnt there be an instance of IndexReader
already there?


Ravi said the following on 12/10/2004 3:13 PM:

I already have a field with a constant value in my index. How about
using IndexSearcher.docFreq(new Term(field,value))? Then I don't have
to
instantiate IndexReader. 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 10, 2004 2:59 PM
To: Lucene Users List
Subject: Re: No of docs using IndexSearcher

numDocs()

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/Index
R
eader.html#numDocs()



Ravi said the following on 12/10/2004 2:42 PM:

  

How do I get the number of docs in an index If I just have access to a



  

searcher on that index?

Thanks in advance
Ravi.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MultiSearcher close

2004-12-10 Thread Ravi
 If I close a MultiSearcher, does it close all the associated searchers
too? I was getting a bad file descriptor error, if I close the
MultiSearcher object and open it again for another search without
reinstantiating the underlying searchers. 

Thanks in advance,
Ravi

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Retrieving all docs in the index

2004-12-09 Thread Ravi
I'm sorry I don't think I articulated my question well. We use a date
filter to sort the search results. This works fine when te user provides
some search criteria. But if he gives an empty search criteria, we need
to return all the documents in the index in the given date range sorted
by date. So I was looking for a query that returns me all documents in
the index and then I want to apply the date filter on it.  


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 1:55 PM
To: Lucene Users List
Subject: Re: Retrieving all docs in the index

On Dec 9, 2004, at 1:35 PM, Ravi wrote:
  Is there any other way to extract all documents from an index apart 
 from adding an additional field with the same value to all documents 
 and then doing a term query on that field with the common value?

Of course.  Have a look at the IndexReader API.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Retrieving all docs in the index

2004-12-09 Thread Ravi
Thanks Paul. I think I'll go with the first approach (adding a new
field).  

-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 3:49 PM
To: [EMAIL PROTECTED]
Subject: Re: Retrieving all docs in the index

On Thursday 09 December 2004 21:18, Ravi wrote:
 That was exactly my original question. I was wondering if there are 
 alternatives to this approach.

In case you need only a few of the top ranking documents, and the
documents are to be sorted by date anyway, you might consider to search
each of the dates in sorted order separately until you have enough
results.

In that way there is no need to use a field with some constant value.
Nonetheless, I can recommend to have a special field containing all the
field names for a document. As all docs normally contain a primary key,
the name of the primary key field can serve as the constant value.

Regards,
Paul Elschot

 
 -Original Message-
 From: Aviran [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 2:08 PM
 To: 'Lucene Users List'
 Subject: RE: Retrieving all docs in the index
 
 In this case you'll have to add another field with a fixed value to 
 all the documents and query on that field
 
 
 Aviran
 http://www.aviransplace.com
 
 -Original Message-
 From: Ravi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 14:04 PM
 To: Lucene Users List
 Subject: RE: Retrieving all docs in the index
 
 
 I'm sorry I don't think I articulated my question well. We use a date 
 filter to sort the search results. This works fine when te user 
 provides some search criteria. But if he gives an empty search 
 criteria, we need to return all the documents in the index in the 
 given date range sorted by date.
 So I
 was looking for a query that returns me all documents in the index and

 then I want to apply the date filter on it.
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 1:55 PM
 To: Lucene Users List
 Subject: Re: Retrieving all docs in the index
 
 On Dec 9, 2004, at 1:35 PM, Ravi wrote:
   Is there any other way to extract all documents from an index apart

  from adding an additional field with the same value to all documents

  and then doing a term query on that field with the common value?
 
 Of course.  Have a look at the IndexReader API.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index delete failing

2004-12-08 Thread Ravi
I got this working. I had to close all index searchers and writer on the
index, set them to null and call System.gc() before the delete process.
I think windows still thinks writer and searchers are pointing to the
index directory even if you close them. 

Ravi

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 06, 2004 4:48 PM
To: Lucene Users List
Subject: Re: Index delete failing

This smells like a Windows issue.  It is possible that something in your
JVM is still holding onto the index directory (for example,
FSDirectory), and Winblows is not letting you remove the directory.  I
bet this will work if you exit the JVM and run java.io.file.delete()
without calling Lucene.  Sorry, my Windows + Lucene experience is
limited.

Otis

--- Ravi [EMAIL PROTECTED] wrote:

  Hi
  We need to delete a lucene index from our application using 
 java.io.file.delete(). We are closing the indexWriter and even all the

 index searchers on that folder. But a call to delete returns false.
 There is no lock on the index directory. Interesting thing is that the

 deletable and segments files are getting removed. But the rest of the 
 .cfs are not. Has somebody had similar problem?
 
 Thanks in advance,
 Ravi. 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index delete failing

2004-12-06 Thread Ravi
 Hi 
 We need to delete a lucene index from our application using
java.io.file.delete(). We are closing the indexWriter and even all the
index searchers on that folder. But a call to delete returns false.
There is no lock on the index directory. Interesting thing is that the
deletable and segments files are getting removed. But the rest of the
.cfs are not. Has somebody had similar problem? 

Thanks in advance,
Ravi. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index delete failing

2004-12-06 Thread Ravi
Yep, It works if I exit the JVM and run file.delete() from a different
class without using Lucene.  

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 06, 2004 4:48 PM
To: Lucene Users List
Subject: Re: Index delete failing

This smells like a Windows issue.  It is possible that something in your
JVM is still holding onto the index directory (for example,
FSDirectory), and Winblows is not letting you remove the directory.  I
bet this will work if you exit the JVM and run java.io.file.delete()
without calling Lucene.  Sorry, my Windows + Lucene experience is
limited.

Otis

--- Ravi [EMAIL PROTECTED] wrote:

  Hi
  We need to delete a lucene index from our application using 
 java.io.file.delete(). We are closing the indexWriter and even all the

 index searchers on that folder. But a call to delete returns false.
 There is no lock on the index directory. Interesting thing is that the

 deletable and segments files are getting removed. But the rest of the 
 .cfs are not. Has somebody had similar problem?
 
 Thanks in advance,
 Ravi. 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



mergeFactor maxMergeDocs

2004-11-18 Thread Ravi
How do you set mergeFactor on an IndexWriter object. I tried the way it
was mentioned in this
article(http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html)
writer.mergeFactor = 1000; 
This did not work for me. I tried setting the
org.apache.lucene.mergeFactor property. That worked for me. Lucene
created a new segment for every 1000 documents. Then I wanted to test
maxMergeDocs using setProperty. I set its value to 5000. With the
default maxMergeDocs when I have 1 documents, Lucene merged the 10
segments into one large segment. I want to set the value to maxMergeDocs
to 5000 so I don't want lucene to do a merge when it has 10,000
documents. But that did not work. Can somebody explain how to set these
properties? 

Thanks in advance,
Ravi.   

-Original Message-
From: Luke Shannon [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 18, 2004 2:38 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: version documents

Thank you for the suggestion.

I ended up biting the bullet and re-working my indexing logic. Luckily
the system itself knows what the current version of a document is
(otherwise it won't know which one to display to the user) for any given
folder.

I was able to get a static method I could call passing in a folder name.
The method returns the file name of the current version for that folder.

Each time I am doing an incremental update if I find that a document
from a folder hasn't changed I make sure it is the current version
before moving on. If it isn't I remove it from the index.

Than when I am creating a new index or adding files to an existing, for
each file, I have to check the file I am adding to ensure it is the
current version for the folder before adding it.

As you can imagine this slows down indexing (creating a new or updating
an
existing) but it ensures content from an old version will never show up
in a query.

Luke

- Original Message -
From: Yonik Seeley [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]; Justin
Swanhart
[EMAIL PROTECTED]
Sent: Thursday, November 18, 2004 1:32 PM
Subject: Re: version documents


 This won't fully work.  You still need to delete the
 original out of the lucene index to avoid it showing
 up in searches.

 Example:
 myfile v1:  I want a cat
 myfile v2:  I want a dog

 If you change cat to dog in myfile, and then do a
 search for cat, you will *only* get v1 and hence the
 sort on version doesn't help.

 -Yonik


 --- Justin Swanhart [EMAIL PROTECTED] wrote:
  Split the filename into basefilename and version
  and make each a keyword.
 
  Sort your query by version descending, and only use
  the first
  basefile you encounter.


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index copy

2004-11-17 Thread Ravi
Whats the bestway to copy an index from one directory to another? I
tried opening an IndexWriter at the new location and used addIndexes to
read from the old index. But that was very slow. 

Thanks in advance,
Ravi.  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



mergeFactor

2004-11-17 Thread Ravi
Can somebody explain the difference between the parameters minMergeDocs
and mergeFactor in IndexWriter. When I read the documentation, it looks
like both of them represent number of documents to be in buffer before
they are merged into a new segment. 

Thanks in advance,
Ravi.  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index copy

2004-11-17 Thread Ravi
 
Thanks. I was looking for an o/s independent way of copying. Probably I
can use BufferedInputStream and BufferedOutputStream classes to copy the
index to a different location.
 
-Original Message-
From: Justin Swanhart [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 17, 2004 2:35 PM
To: Lucene Users List
Subject: Re: Index copy

You could lock your index for writes, then copy the file using operating
system copy commands.

Another way would be to lock your index, make a filesystem snapshot,
then unlock your index.  You can then safely copy the snapshot without
interupting further index operations.

On Wed, 17 Nov 2004 11:25:48 -0500, Ravi [EMAIL PROTECTED] wrote:
 Whats the bestway to copy an index from one directory to another? I 
 tried opening an IndexWriter at the new location and used addIndexes 
 to read from the old index. But that was very slow.
 
 Thanks in advance,
 Ravi.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RAM, FS Directory: Problem during merge

2004-11-16 Thread Ravi Rao
All,

Lucene 1.4 final.

I have an index that has to be updated frequently.  A search may
happen at any time.  I implemented this by indexing into a
RAMDirectory and then merging with an FSDirecotory at regular
intervals (or sometimes when a search is requested).  This seems to
work quite well.

On Linux, I have started seeing the following exception thrown.

java.io.IOException: read past EOF
  at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:220)
  at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
  at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:357)
  at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:324)
  at
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:422)
  at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:94)
  at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487)
  at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
  at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:381)

From reading the sources, the only way I can see this happening is if
the RAMDirectory is corrupted in some way.  Has anyone seen this
before?  I don't yet have access to the full logs so I don't have much
more information.

Many thanks,
-- 
Ravi/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Number of documents to be optimized

2004-11-12 Thread Ravi
How do I know the number of documents to be optimized (If I have one
large index, number of documents that are in other segments) at any
time?

Thanks in advance,
Ravi. 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search scalability

2004-11-11 Thread Ravi
Thanks a lot. I'll use RAMDirectory and post my results.  

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 11, 2004 9:09 AM
To: Lucene Users List
Subject: Re: Search scalability

If you load it explicitly, then all 800 MB will make it into RAM.
It's easy to try, the API for this is super simple.

Otis

--- [EMAIL PROTECTED] wrote:

 Does it take 800MB of RAM to load that index into a RAMDirectory?  Or 
 are only some of the files loaded into RAM?
 
 --- Otis Gospodnetic [EMAIL PROTECTED] wrote:
 
  Hello,
  
  100 parallel searches going against a single index on a single disk 
  means a lot of disk seeks all happening at once.  One simple way of 
  working around this is to load your FSDirectory into RAMDirectory.
  This should be faster (could you report your 
  observations/comparisons?).  You can also try using ramfs if you are

  using Linux.
  
  Otis
  
  --- Ravi [EMAIL PROTECTED] wrote:
  
We have one large index for a document repository of
  800,000
   documents.
   The size of the index is 800MB. When we do searches against
  the
   index,
   it takes 300-500ms for a single search. We wanted to test
  the
   scalability and tried 100 parallel searches against the
  index with
   the
   same query and the average response time was 13 seconds. We
  used a
   simple IndexSearcher. Same searcher object was shared by all
  the
   searches. I'm sure people have success in configuring lucene
  for
   better
   scalability. Can somebody share their approach?
   
   Thanks
   Ravi. 
   
  
 
 -
   To unsubscribe, e-mail:
  [EMAIL PROTECTED]
   For additional commands, e-mail:
  [EMAIL PROTECTED]
   
   
  
  
 
 -
  To unsubscribe, e-mail:
  [EMAIL PROTECTED]
  For additional commands, e-mail:
  [EMAIL PROTECTED]
  
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Merging multiple indexes

2004-11-10 Thread Ravi
Whats's the simplest way to merge 2 or more indexes into one large
index. 

Thanks in advance,
Ravi.  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search scalability

2004-11-10 Thread Ravi
 We have one large index for a document repository of 800,000 documents.
The size of the index is 800MB. When we do searches against the index,
it takes 300-500ms for a single search. We wanted to test the
scalability and tried 100 parallel searches against the index with the
same query and the average response time was 13 seconds. We used a
simple IndexSearcher. Same searcher object was shared by all the
searches. I'm sure people have success in configuring lucene for better
scalability. Can somebody share their approach?

Thanks 
Ravi. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching against index in memory

2004-10-28 Thread Ravi
If I have a document set of 10,000 docs and my merge factor is 1000, for
every 1000 documents, Lucene creates a new segment. By the time Lucene
indexes 4500 documents, index will have 4000 documents on the disk and
index for 500 documents is stored in memory. How can I search against
this index at the same time from a different JVM? I can access the 4000
docs on the disk. But what about those in the memory on the indexing
box? Is there a way to do this? 

Thanks
Ravi. 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Stopwords in Exact phrase

2004-10-27 Thread Ravi
 Is there way to include stopwords in an exact phrase search? For
example, when I search on Melbourne IT, Lucene only searches for
Melbourne ignoring IT. 

Thanks,
Ravi. 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



weights on multi index searches

2004-10-27 Thread Ravi
Can I give weights on different indexes when I search against multiple
indexes. The final score of a document should be a linear combination of
the weights on each index and the individual score for that index. Is
this possible in Lucene?
 
 
Thanks
Ravi. 


Changes to QueryParser.jj: Status?

2004-03-24 Thread Ravi Rao
Dear All,

Some time ago there was a discussion on modifying the definitions of
tokens in QueryParser so that the character '-' (dash), and others,
will be treated as part of a word.

Can someone please tell me the status of that discussion.  Will these
changes actually be reflected in the code...soon?

Thanks,
-- 
Ravi/

PS: The title of the thread in the previous discussion was
'Problem with search results'

Ravi(ndra) Rao
AlterPoint Inc.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Can field names contain the space character?

2004-01-26 Thread Ravi Rao
Dear lucene-user,

Can field names contain the space character?  In other words can I
index documents which include a field name containing the space
character?

Here is a program that creates an index created by adding one
document.  This document has a (text)field named 'software version'.
The program successfully searches the index for a document containing
this field.  However, I'm not very happy with the solution because it
will be cumbersome to set up general queries this way.  Is there
a better way?

Many thanks,
-- 
Ravi/

import org.apache.lucene.document.Field;
import org.apache.lucene.document.Document;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;

/**
 * Can a field name contain the space character ie ' '.
 *
 * The output of this program is
 * PRE
 * Adding document to index
 * Documents in index = 1
 * Query = '+software version:8.6 +(contents:brown contents:fox)'
 * Number of hits = 1
 * Value of 'software version' = 8.6
 * /PRE
 *
 * To compile:
 * javac -classpath $LUCENE_HOME/lucene-1.2.jar SpaceInField.java
 *
 * and to run:
 * java -classpath $LUCENE_HOME/lucene-1.2.jar:. SpaceInField
 */
public class SpaceInField
{
  public static void main(String[] args) throws Exception
  {
IndexWriter writer = new IndexWriter(lucene-index, new StandardAnalyzer(), true);
Document doc = new Document();

System.out.println(Adding document to index);
doc.add(Field.Text(category, central texas));
doc.add(Field.Text(contents, The quick brown fox jumps over the lazy dog.));
// Field contains a space character ie ' '.
doc.add(Field.Text(software version, 8.6));

writer.addDocument(doc);
writer.optimize();
writer.close();

IndexReader reader = IndexReader.open(lucene-index);
System.out.println(Documents in index =  + reader.numDocs());
reader.close();

// This works
// Query q = new QueryParser(software version, new 
StandardAnalyzer()).parse(8.6);

// and this.
// Query q = new TermQuery(new Term(software version, 8.6));

// This does not,
// Query q = new QueryParser(contents, new StandardAnalyzer()).parse(SoftWare 
Version:8.6);

// neither does this
// Query q = new QueryParser(contents, new StandardAnalyzer()).parse(software\\ 
version:8.6);

// This block seems to be a general solution if we want to build
// arbitrary queries.  Is there a better way?
BooleanQuery bq = new BooleanQuery();
{
  QueryParser qp = new QueryParser(contents, new StandardAnalyzer());
  Query q;
  q = new TermQuery(new Term(software version, 8.6));
  bq.add(q, true, false);
  q = qp.parse(brown fox);
  bq.add(q, true, false);
  System.out.println(Query = ' + bq.toString() + ');
}

IndexSearcher searcher = new IndexSearcher(lucene-index);
Hits h = searcher.search(bq);
System.out.println(Number of hits =  + h.length());
if (h.length()  0)
  {
Document d = h.doc(0);
System.out.println(Value of 'software version' =  + d.get(software 
version));
  }
searcher.close();
  }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how to search .pdf, .doc, .jsp files using LUCENE ?

2003-02-20 Thread Naraharasetty Ravi Kumar
Hi All

I have used the lucene's demo web application. I am developing a website for which I 
use jsp,servlets,java files. I
need to implement a search engine in my site and for that I am using
LUCENE. I implemented the search using LUCENE and so could search
.html,.txt files but how to search .pdf, .doc, .jsp etc. and use LUCENE in
this context ?

Basically, how I implemented the search on .html, .txt files is, I created the index 
document using below
command prompt instructions:

C:\DarrenWebsite\Lucenejava
org.apache.lucene.demo.IndexHTML -create -index Index
..\html
adding ../html/AboutUs/aboutus.htm
adding ../html/AboutUs/milestones.htm
adding ../html/AboutUs/ourmethodology.htm
adding ../html/AboutUs/ourmission.htm
adding ../html/AboutUs/ourpeople.htm
adding ../html/AboutUs/ourwork.htm
adding ../html/Careers/careerpath.htm
adding ../html/Careers/careers.htm
adding ../html/Careers/opportunities.htm
adding ../html/Clients/clients.htm
adding ../html/ContactUs/contactus.htm
adding ../html/Home/legaldisclaim.htm
adding ../html/Home/sitemap.htm
adding ../html/Images/Menu/menu.htm
adding ../html/Partners/partners.htm
adding ../html/Products/edynamo.htm
adding ../html/Products/packaged_edapters.htm
adding ../html/Products/products.htm
adding ../html/Services/bc.htm
adding ../html/Services/crm.htm
adding ../html/Services/eai.htm
adding ../html/Services/erp.htm
adding ../html/Services/scm.htm
adding ../html/Services/services.htm
adding ../html/a.txt
adding ../html/b.txt
Optimizing index...
2625 total milliseconds




And in my configuration.jsp I have below entry:
String indexLocation = /DarrenWebsite/Lucene/Index;



That is all I did to implement search on .html, .txt
file and now how do I implement search on .pdf, .doc, .jsp
etc. ??




Thanks  Regards,
Ravi.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Is Like Search Possible ?

2002-10-06 Thread Ravi Kothiyal

Dear Suneetha,
As far as I know the lucene is not case sensitive.

Are you storing the Domain also as a field in the index . If yes than you can refine 
your query as toAddress:abc* AND Domain:xyz.com

Otherwise you can refine it using fuzzy search.
toAddress:abc* AND toAddress:@xyz.com~

Hope this will help You

Regards

Ravi


- Original Message -
From: Suneetha Rao [EMAIL PROTECTED]
Date: Sat, 05 Oct 2002 13:01:48 +0530
To: Lucene Users List [EMAIL PROTECTED]
Subject: Re: Is Like Search Possible ?


 Dear Ravi,
 Thanks for ur help but my problem is not solved  yet. I have indexed 
the field ToAddress.
 I'm able to get results if I search for  (toAddress:abc*)  it gives me all mailids 
starting with abc
 but I want it to search for in the domain how do I do it ??
 Also I've found it does not return any results when I query for 
(toAddress:Abc).
 If Lucene is not case sensitive  why doesen't  it give me results.
 
 Regards,
 Suneetha
 
 Ravi Kothiyal wrote:
 
  Dear Suneetha,
  visit http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
  for the syntax about query. But this is for basic html search. But I think if you 
want to search through email's ToAddress field, You have a create an index which 
stores the toAddress than only you can retereve the search for toAddress.
 
  Hope this will help you
 
  Regards
  Ravi
 
  - Original Message -
  From: Suneetha Rao [EMAIL PROTECTED]
  Date: Sat, 05 Oct 2002 10:16:02 +0530
  To: Lucene Users List [EMAIL PROTECTED]
  Subject: Is Like Search Possible ?
 
   Hi,
   I've used lucene and indexed the whole database where I savd the
   mail headers
   and some files where I saved the mail contents.I would like to to a
   search on email
   ids.I'm using a Boolean Query to retirive results and is using the
   StandardAnalyzer.
   How do I translate the SQL Statement
   SELECT * FROM tableName where TOADDRESS LIKE '%infy%' ;
   I  tried   the query  +(toAddress:infy*) but it does  does not retrieve
   any results.
   I basically want to retrieve all records that have the toAddress
   like [EMAIL PROTECTED] there something wrong with the way Iquery?
   How should I get to desired results.
  Thanks in Advance
  
   Regards,
   Suneetha
  
  
   --
   To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
   For additional commands, e-mail: mailto:[EMAIL PROTECTED]
  
  
 
  --
  __
  Sign-up for your own FREE Personalized E-mail at Mail.com
  http://www.mail.com/?sr=signup
 
  Free price comparison tool gives you the best prices and cash back!
  http://www.bestbuyfinder.com/download.htm
 
  --
  To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
  For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 

-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Problem With org.apache.lucene.demo.IndexHTML class on Sun Solaris

2002-10-05 Thread Ravi Kothiyal

Dear Friends ,

I am using lucene-1.2 . When I am trying to create a html index from java 
org.apache.lucene.demo.IndexHTML -create -index /opt/index /webdev

It start creating index but after some time it gives Memory outof range exception and 
quit the command . Can you please help me in this matter.

Best Regards

Ravi 
-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Is Like Search Possible ?

2002-10-05 Thread Ravi Kothiyal


Dear Suneetha,
visit http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
for the syntax about query. But this is for basic html search. But I think if you want 
to search through email's ToAddress field, You have a create an index which stores the 
toAddress than only you can retereve the search for toAddress.

Hope this will help you

Regards
Ravi


- Original Message -
From: Suneetha Rao [EMAIL PROTECTED]
Date: Sat, 05 Oct 2002 10:16:02 +0530
To: Lucene Users List [EMAIL PROTECTED]
Subject: Is Like Search Possible ? 


 Hi,
 I've used lucene and indexed the whole database where I savd the
 mail headers
 and some files where I saved the mail contents.I would like to to a
 search on email
 ids.I'm using a Boolean Query to retirive results and is using the
 StandardAnalyzer.
 How do I translate the SQL Statement
 SELECT * FROM tableName where TOADDRESS LIKE '%infy%' ;
 I  tried   the query  +(toAddress:infy*) but it does  does not retrieve
 any results.
 I basically want to retrieve all records that have the toAddress
 like [EMAIL PROTECTED] there something wrong with the way Iquery?
 How should I get to desired results.
Thanks in Advance
 
 Regards,
 Suneetha
 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 

-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Xtreeme Newbie, trying to get demos to work

2002-10-05 Thread Ravi Kothiyal

hi,
Yes You need Java Server/JSP engine to run webapplication demo .
If are using any jsp/servlet engine . Configure the luceneweb.war application with the 
demo in the jsp engine.

Then create the index file for serach.
For that you have to add lucene-2.1.jar and lucene-demos-2.1.jar in your classpath.

the Index can be created by
java org.apache.lucene.demo.IndexHTML -create -index pathtostoreindex 
pathofyourdocumentstobeindexed 

modify the path of index in the configuration.jsp in the luceneweb.jar 

This will make your search engine work

For more help visit
http://jakarta.apache.org/lucene/docs/gettingstarted.html

Hope this will help you
Reagrds
Ravi 



- Original Message -
From: ARJANG ASSADI [EMAIL PROTECTED]
Date: Tue, 24 Sep 2002 06:46:15 +
To: [EMAIL PROTECTED]
Subject: Xtreeme Newbie, trying to get demos to work


 I am trying to run the Demo.
 I couldnt find any docs on running the demo, does Lucene require Tomcat 
 and/or other Java WebServer technology?
 
 I am new to java but not programming or computers, just dont know what the 
 firsts steps are to getting Lucene Demo to work, any hints are greately 
 apreciated.
 
 Also if you could provide the link to further info so I can look it up 
 myself instead of clugging this newsgroup with stupid questions.
 Thank you
 
 On WindowsXP, I have lucene-1.2.jar and lucene-demos-1.2.jar sitting in the 
 same directory.
 I have tried : java lucene-demos-1.2.jar with no luck.
 
 I have searched the user mailing list archive with out any luck either.
 I have searched http://jakarta.apache.org/lucene without any luck either.
 
 _
 Chat with friends online, try MSN Messenger: http://messenger.msn.com
 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 

-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: New User , How to have the search functioning

2002-09-18 Thread Ravi Kothiyal

Hi friens,
I had now created index with the org.apache.lucene.demo.IndexHTML .
the syntax i had given is java org.apache.lucene.demo.IndexHTML -create -index 
d:\index %catalina_home%/webapps/textdocs

It had created the index . I had used this index in configuration.jsp. and the Search 
is producing result . But the url returned by the search in for local document in the 
form of d:\tomcat\tomcat-jakarta\webapps\textdocs\xyz.htm

But i want url of my documents in the form of http://mywebsite:8080/textdocs/xyz.htm

How can i achieve this . Please help me in this matter.

Regards
Ravi



- Original Message -
From: Ravi Kothiyal [EMAIL PROTECTED]
Date: Wed, 18 Sep 2002 00:18:41 -0500
To: [EMAIL PROTECTED]
Subject: RE: New User , How to have the search functioning


 Hi Tim,
 
 Thanks for your help. I had created the index but that was empty. Later on I 
populated it with IndexFiles . I added a directory to the index. the location of 
directory is d:/xyz . When i searched the files using luceneweb it shows result with 
null as hyperlink. It return url and summary as null .  Later I found that it stores 
the document path also and i can get that document path .
 
 Can you please tell me how to store the document summary and url along with the 
document in the index . It will be nice of you.
 
 Regards
 Ravi 
 - Original Message -
 From: Stone, Timothy [EMAIL PROTECTED]
 Date: Tue, 17 Sep 2002 08:43:37 -0400
 To: Lucene Users List [EMAIL PROTECTED]
 Subject: RE: New User , How to have the search functioning
 
 
  Ravi,
  
  If I follow your steps correctly, you dropped the demo in your webapps
  directory, made some config changes and performed a search.
  
  What you don't mention is creating the index, although you elude to possibly
  doing so since you actually pointed to an index in the configuration.jsp.
  
  If you can elaborate (include logs if necessary), I'll be glad to help. The
  demo has a couple of tripping points, but nothing that can't be hurdled
  quickly. Reply directly if desired.
  
  Hoping to help,
  Tim
  
  
   -Original Message-
   From: Ravi Kothiyal [mailto:[EMAIL PROTECTED]]
   Sent: Tuesday, September 17, 2002 01:13
   To: [EMAIL PROTECTED]
   Subject: New User , How to have the search functioning
   Importance: High
   
   
   Hi friens,
   I m new to Lucene . I had installed lucene1.2 on winnt 4 and 
   using it with  jakarta-tomcat-4.1.10. I had created an index 
   and  had configured lucene demo application luceneweb on 
   jakarta . I had also modified configuration.jsp to add the 
   location of index.
   
   When using lucene web i am searching the application . I am 
   getting no result . What could be the reason of this. Is 
   there any need to add entry in index. if yes how . Pls help 
   me out . If any one has any saple application It would be 
   very nice to help me with the code.
   
   Regards
   
   Ravi 
   -- 
   __
   Sign-up for your own FREE Personalized E-mail at Mail.com
   http://www.mail.com/?sr=signup
   
   
   --
   To unsubscribe, e-mail:   
   mailto:[EMAIL PROTECTED]
   For additional commands, e-mail: 
  mailto:[EMAIL PROTECTED]
  
 
 -- 
 __
 Sign-up for your own FREE Personalized E-mail at Mail.com
 http://www.mail.com/?sr=signup
 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 

-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: New User , How to have the search functioning

2002-09-17 Thread Ravi Kothiyal

Hi Tim,

Thanks for your help. I had created the index but that was empty. Later on I populated 
it with IndexFiles . I added a directory to the index. the location of directory is 
d:/xyz . When i searched the files using luceneweb it shows result with null as 
hyperlink. It return url and summary as null .  Later I found that it stores the 
document path also and i can get that document path .

Can you please tell me how to store the document summary and url along with the 
document in the index . It will be nice of you.

Regards
Ravi 
- Original Message -
From: Stone, Timothy [EMAIL PROTECTED]
Date: Tue, 17 Sep 2002 08:43:37 -0400
To: Lucene Users List [EMAIL PROTECTED]
Subject: RE: New User , How to have the search functioning


 Ravi,
 
 If I follow your steps correctly, you dropped the demo in your webapps
 directory, made some config changes and performed a search.
 
 What you don't mention is creating the index, although you elude to possibly
 doing so since you actually pointed to an index in the configuration.jsp.
 
 If you can elaborate (include logs if necessary), I'll be glad to help. The
 demo has a couple of tripping points, but nothing that can't be hurdled
 quickly. Reply directly if desired.
 
 Hoping to help,
 Tim
 
 
  -Original Message-
  From: Ravi Kothiyal [mailto:[EMAIL PROTECTED]]
  Sent: Tuesday, September 17, 2002 01:13
  To: [EMAIL PROTECTED]
  Subject: New User , How to have the search functioning
  Importance: High
  
  
  Hi friens,
  I m new to Lucene . I had installed lucene1.2 on winnt 4 and 
  using it with  jakarta-tomcat-4.1.10. I had created an index 
  and  had configured lucene demo application luceneweb on 
  jakarta . I had also modified configuration.jsp to add the 
  location of index.
  
  When using lucene web i am searching the application . I am 
  getting no result . What could be the reason of this. Is 
  there any need to add entry in index. if yes how . Pls help 
  me out . If any one has any saple application It would be 
  very nice to help me with the code.
  
  Regards
  
  Ravi 
  -- 
  __
  Sign-up for your own FREE Personalized E-mail at Mail.com
  http://www.mail.com/?sr=signup
  
  
  --
  To unsubscribe, e-mail:   
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail: 
 mailto:[EMAIL PROTECTED]
 

-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




New User , How to have the search functioning

2002-09-16 Thread Ravi Kothiyal

Hi friens,
I m new to Lucene . I had installed lucene1.2 on winnt 4 and using it with  
jakarta-tomcat-4.1.10. I had created an index and  had configured lucene demo 
application luceneweb on jakarta . I had also modified configuration.jsp to add the 
location of index.

When using lucene web i am searching the application . I am getting no result . What 
could be the reason of this. Is there any need to add entry in index. if yes how . Pls 
help me out . If any one has any saple application It would be very nice to help me 
with the code.

Regards

Ravi 
-- 
__
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]