RE: sorting tokenized field

2004-12-10 Thread Aviran
I have suggested a solution for this problem (
http://issues.apache.org/bugzilla/show_bug.cgi?id=30382 ) you can use the
patch suggested there and recompile lucene.


Aviran
http://www.aviransplace.com

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 10, 2004 13:53 PM
To: Lucene Users List
Subject: Re: sorting tokenized field



On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote:
 I read that the tokenised fields cannot be sorted. In order to sort
 tokenized field, either the application has to duplicate field with 
 diff name and not tokenize it or come up with something else. But 
 shouldn't the search engine takecare of this? Are there any plans of 
 putting this functionality built into lucene?

It would be wasteful for Lucene to assume any field you add should be 
available for sorting.

Adding one more line to your indexing code to accommodate your sorting 
needs seems a pretty small price to pay.  Do you have suggestions to 
improve how this works?   Or how it is documented?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: finalize delete without optimize

2004-12-09 Thread Aviran
Lucene standard API does not support this kind of operation.

Aviran
http://www.aviransplace.com


-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 08, 2004 17:32 PM
To: [EMAIL PROTECTED]
Subject: Re: finalize delete without optimize


Hi folks:

I sent this out a few days ago without a response. 

Please help.

Thanks in advance

-John


On Mon, 6 Dec 2004 21:15:00 -0800, John Wang [EMAIL PROTECTED] wrote:
 Hi:
 
   Is there a way to finalize delete, e.g. actually remove them from 
 the segments and make sure the docIDs are contiguous again.
 
   The only explicit way to do this is by calling 
 IndexWriter.optmize(). But this call does a lot more (also merges all 
 the segments), hence is very expensive. Is there a way to simply just 
 finalize the deletes without having to merge all the segments?
 
If not, I'd be glad to submit an implementation of this feature if 
 the Lucene devs agree this is useful.
 
 Thanks
 
 -John


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: InderWriter.optimize()

2004-12-09 Thread Aviran
Beside merging the segments, optimize also physically deletes all the
deleted documents from the index (When you call delete, lucene only marks
the documents as deleted, they physically deleted when you call optimize).

Aviran
http://www.aviransplace.com

-Original Message-
From: Yura Smolsky [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 5:55 AM
To: [EMAIL PROTECTED]
Subject: InderWriter.optimize()


Hello, lucene-user.

I used FSDirectory as storage for index. And I have used optimize() method
of IndexWriter to optimize index for faster access.

Now I use DbDirectory (Berkley DB) as storage. Does it make sense to use
optimize method on index stored in this storage?..

What does optimize do actually?

Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Retrieving all docs in the index

2004-12-09 Thread Aviran
In this case you'll have to add another field with a fixed value to all the
documents and query on that field


Aviran
http://www.aviransplace.com

-Original Message-
From: Ravi [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 14:04 PM
To: Lucene Users List
Subject: RE: Retrieving all docs in the index


I'm sorry I don't think I articulated my question well. We use a date filter
to sort the search results. This works fine when te user provides some
search criteria. But if he gives an empty search criteria, we need to return
all the documents in the index in the given date range sorted by date. So I
was looking for a query that returns me all documents in the index and then
I want to apply the date filter on it.  


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 1:55 PM
To: Lucene Users List
Subject: Re: Retrieving all docs in the index

On Dec 9, 2004, at 1:35 PM, Ravi wrote:
  Is there any other way to extract all documents from an index apart
 from adding an additional field with the same value to all documents 
 and then doing a term query on that field with the common value?

Of course.  Have a look at the IndexReader API.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene transaction and roll back implementation

2004-11-18 Thread Aviran
AFIK there is no transaction not rollback support in lucene

Aviran
http://www.aviransplace.com

-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 17, 2004 20:25 PM
To: [EMAIL PROTECTED]
Subject: lucene transaction and roll back implementation


Hi folks:

How does lucene implement transaction and roll back. E.g. if the machine
crashes (from power outage etc.) in the middle of a write, e.g.
indexWriter.close()? From examining the code, seems that there is a
possibility such crash can cause a corrupted index.

(in segmentInfos, new data is written to a temp file and then swaps back to
the actual file by doing Util.renameFile, but Util.renameFile is not atomic
if we are doing a byte copy)

Is there a automatic recovery mechansim or a roll back?

What is a general advise for how to handle these situations?

Thanks

-John

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: best ways of using IndexSearcher

2004-11-17 Thread Aviran
Yes, IndexSearcher is thread safe.

Aviran
http://www.aviransplace.com

-Original Message-
From: Abhay Saswade [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 16, 2004 15:16 PM
To: Lucene Users List
Subject: Re: best ways of using IndexSearcher


Hello,
Can I use single instance of IndexSearcher in multiple threads with sorting?
Thanks, Abhay

- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, June 28, 2004 8:51 PM
Subject: Re: best ways of using IndexSearcher


 Anson,

 Use a single instance of IndexSearcher and, if you want to always 
 'see' even the latest index changes (deletes and adds since you opened 
 the
 IndexSearcher) make sure to re-create the IndexSearcher when you detect
 that the index version has changed (see

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade
r.html#getCurrentVersion(org.apache.lucene.store.Directory))

 When you get the new IndexSearcher, leave the old instance alone - let 
 the GC take care of it, and don't call close() on it, in case 
 something in your application is still using that instance.

 This stuff is not really CPU intensive.  Disk I/O tends to be the 
 bottleneck.  If you are working with multiple indices, spread them 
 over multiple disks (not just partitions, real disks), if you can.

 Otis


 --- Anson Lau [EMAIL PROTECTED] wrote:
  Hi Guys,
 
  What's the recommended way of using IndexSearcher? Should 
  IndexSearcher be a singleton or pooled?  Would pooling provide a 
  more scalable solution by
  allowing you to decide how many IndexSearcher to use based on say how
  many
  CPU u have on ur server?
 
  Thanks,
 
  Anson


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Steamming

2004-11-15 Thread Aviran
I don't understand what kind of examples you need. All there is to it is
just use a different analyzer.
Take a look at Snowball analyzer in lucene's sand box.

Aviran
http://www.aviransplace.com

-Original Message-
From: Miguel Angel [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 15, 2004 13:28 PM
To: [EMAIL PROTECTED]
Subject: Steamming


Hi, i use demo for lucene, about steamming ?? anybody can examples used
steamming for lucene??? examples please
-- 
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Faster highlighting with TermPositionVectors

2004-11-03 Thread Aviran
Did anyone tried this class ?

I tried this class but I can't make it to work I indexed a field as new
Field(description, description,true,true,true,true); but when I call 
TokenSources.getTokenStream(_indexReader,i,description); I get
ClassCastException

In this class the line TermPositionVector tpv=(TermPositionVector)
reader.getTermFreqVector(docId,field); is trying to cast SegmentTermVector
to TermPositionVector. 
Is there anything I'm doing wrong. Should I have indexed the field some
other way to store TermPositionVector ?

BTW: I'm using the latest lucene source from CVS.

Thanks,
Aviran

-Original Message-
From: Bruce Ritchie [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 29, 2004 1:15 AM
To: Lucene Users List
Subject: RE: Faster highlighting with TermPositionVectors


Mark,

 Thanks to the recent changes (see CVS) in TermFreqVector
 support we can now make use of term offset information held 
 in the Lucene index rather than incurring the cost of 
 re-analyzing text to highlight it.
 
 I have created a  class ( see
 http://www.inperspective.com/lucene/TokenSources.java ) which 
 handles creating a TokenStream from the TermPositionVector 
 stored in the database which can then be passed to the highlighter.
 This approach is significantly faster than re-parsing the 
 original text.
 If people are happy with this class I'll add it to the 
 Highlighter sandbox but it may sit better elsewhere in the 
 Lucene code base as a more general purpose utility.
 
 BTW as part of putting this together I found that the
 TermFreq code throws a null pointer when indexing fields that 
 produce no tokens (ie empty or all stopwords). Otherwise 
 things work very well.

This is great news! While I won't have the time to test this until probably
mid November I do look forward to the speed improvements as the current
highlighting mechanisms (reparsing the text) was just not performant enough
under heavy loads.


Regards,

Bruce Ritchie
http://www.jivesoftware.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: index files version and lucene 1.4

2004-10-21 Thread Aviran
Lucene 1.4 changed the file format for indexes. You can access a old index
using lucene 1.4 but you can't access index which was created using lucene
1.4 with older versions.
I suggest you rebuild your index using lucene 1.4

Aviran
http://aviran.mordos.com

-Original Message-
From: arnaud gaudinat [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 21, 2004 12:10 PM
To: Lucene Users List
Subject: index files version and lucene 1.4


Hi,
Certainly  a stupid question!
I have just upgraded to 1.4, I have succeeded to access my 1.3 index files
but not my new 1.4 index files. In fact I have no error, but no hits for 1.4
index files. More, I don't know if it's normal but now I have just 3 files
for my index (.cfs, deletable and segments). However if I use Luke with the
1.4 index files, It works perfectly.

An idea?

Regards,

Arno.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Null or no analyzer

2004-10-20 Thread Aviran
AFIK if the term Election 2004 will be between quotation marks this should
work fine.

Aviran
http://aviran.mordos.com

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 20, 2004 2:25 AM
To: Lucene Users List
Subject: RE: Null or no analyzer


Aviran writes:
 You can use WhiteSpaceAnalyzer
 
Can he? If Elections 2004 is one token in the subject field (keyword), 
this will fail, since WhiteSpeceAnalyzer will tokenize that to `Elections' 
and `2004'.
So I guess he has to write an identity analyzer himself unless there is one
provided (which doesn't seem to be the case). The only alternatives are not
using query parser or extending query parser for a key word syntax, as far
as I can see.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Spell checker

2004-10-20 Thread Aviran
Here http://issues.apache.org/bugzilla/showattachment.cgi?attach_id=13009

Aviran
http://aviran.mordos.com

-Original Message-
From: Lynn Li [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 20, 2004 10:52 AM
To: 'Lucene Users List'
Subject: RE: Spell checker 


Where can I download it? 

Thanks,
Lynn

-Original Message-
From: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED]
Sent: Monday, October 11, 2004 1:26 PM
To: Lucene Users List
Subject: Spell checker 


hy lucene users
i developed a Spell checker for lucene inspired by the David Spencer code

see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker

Nicolas Maisonneuve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Null or no analyzer

2004-10-19 Thread Aviran
You can use WhiteSpaceAnalyzer

Aviran
http://aviran.mordos.com

-Original Message-
From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 11:23 AM
To: Lucene Users List
Subject: Null or no analyzer


Hi All

  I have a question regarding selection of Analyzer's during query parsing


  i have three field in my index db_id, full_text, subject
  all three are indexed, however while indexing I specified to lucene to
index db_id and subject but not tokenize them

  I want to give a single search box in my application to enable searching
for documents
  some query can look lile  motor cross rally this will get fed to
QueryParser to do the relevent parsing

  however if the user enters  Jhon Kerry  subject:Elections 2004 I want to
make sure that No analyzer is used fro the subject field ? how can that be
done.

  this is because I expect the users to know the subject from a List of
controlled vocabularies and also I am searching for  documents that have the
exact subject I tried using the PerFieldAnalyzerWrapper, but how do I get
hold a Analyzer that  does nothing but pass the text trough to the Searcher
?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: how to find field that has any value

2004-10-11 Thread Aviran
You can try to use a range query something like test:[null TO
]
Please note that you might get TooManyBooleanClause Exception, if you have
too many of them.

The other thing you can use is with the operator NOT. For all the Empty
fields you can fill them with a string lest say empty and then query for
-test:empty

Aviran

-Original Message-
From: MATL (Mats Lindberg) [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 06, 2004 16:27 PM
To: Lucene Users List
Subject: how to find field that has any value


Hello
 
i have a probably simple question for some of you.
 
Since lucene does not allow a query to start with a wild card (* or ?) how
would i find all documents in lets say field test that has something in that
field, or is not empty.
 
my first thought would be to do something like this.
 
test:(cause the value __ isn't very likely
to be present)
 
 
test:*(would be the correct way, i guess,
but lucene doesn't allow that)
 
 
does anyone have a greater idea.
 
Best regards,
Mats Lindberg



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: A simple newbee question . How do i exclude a field ?

2004-10-11 Thread Aviran
For the records that don't contain a field you can put a bogus value such as
empty and then you can query on -UD:empty

Aviran
http://aviran.mordos.com


-Original Message-
From: Robinson Raju [mailto:[EMAIL PROTECTED] 
Sent: Saturday, October 09, 2004 10:25 AM
To: Lucene Users List
Subject: A simple newbee question . How do i exclude a field ?


Hi ,
i use lucene to search against a flatted DB table.
I have  a table which contains the following data . 
there are 3 records which contain the code RN , 
27 which contain UD 
and 3266 which contain BLANK.

codeNumber of records
----
3269
RN  3
UD  27

if my searchString is RN , i get 3 
if my searchString is UD , i get 27 
if my searchString is  , i get 3296 (in this case i bypass queryfilter)

Now , I need to get number of records which do not contain UD . (similar to
a DB query of NOT IN or !=). if the string is -UD , it doesnt work. 
Could you tell me how to construct a string for this ?

Regards
Robin

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: multiple threads

2004-10-04 Thread Aviran
You should not have more then one IndexWriter. (You can have multiple
IndexReaders, but only one IndexWriter).

Aviran


-Original Message-
From: Justin Swanhart [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 01, 2004 19:14 PM
To: [EMAIL PROTECTED]
Subject: multiple threads


As I understand it, if two writers try to acess the same index for writing,
then one of the writers should block waiting for a lock until the lock
timeout period expires, and then they will return a Lock wait timeout
exception.

I have a multithreaded indexing applications that writes into one of
multiple indexes depending on a hash value, and I intend to merge all the
hashes when the indexing finishes.  Locking usually works but sometimes it
doesn't and I get IO exceptions such as the following..

java.io.IOException: Cannot delete _19.fnm
at
org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:198)
at
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:
157)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:389)
at org.en.global.indexer.IndexGroup.run(IndexGroup.java:387)


Any idea on why this could be happening?  I am using NFS currently, but the
problem appears on the local filesystem as well.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sorting on a long string

2004-09-29 Thread Aviran
Currently Lucene can only sort on a Keyword field properly.
I guess your field is tokenized, which in this case the sort does not work
properly.

A patch has been suggested to fix this problem ( but has not been applied
yet )

http://issues.apache.org/bugzilla/show_bug.cgi?id=30382

Aviran

-Original Message-
From: Daly, Pete [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 28, 2004 15:46 PM
To: Lucene Users List
Subject: Sorting on a long string


I am new to lucene, and trying to perform a sorted query on a list of
people's names.  Lucene seem unable to properly sort on the name field of my
indexed documents.  If I sort by the other (shorter) fields, it seems to
work fine.  The name sort seems to be close, almost like the last few
iterations through the sort loop are not being done.  The records are
obviously not in the normally random order, but not fully sorted either.  I
have tried different ways of sorting, including a SortField array/object
with the field cast as a string.

The index I am sorting has about 1.2 million documents.

Are their known limitations in the sorting functionality that I am running
into?  I can provide more details if needed.

Thanks for any help,

-Pete



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Hebrew support

2004-09-28 Thread Aviran
As far as I know there is no Analyzer for Hebrew.


Aviran

-Original Message-
From: Alex Kiselevski [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 28, 2004 3:12 AM
To: [EMAIL PROTECTED]
Subject: Hebrew support



Hello,
Do you know something about hebrew support in Lucene
Thanks in advance

Alex Kiselevsky
 Speech Technology  Tel:972-9-776-43-46
RD, Amdocs - IsraelMobile: 972-53-63 50 38
mailto:[EMAIL PROTECTED]




The information contained in this message is proprietary of Amdocs,
protected from disclosure, and may be privileged. The information is
intended to be conveyed only to the designated recipient(s) of the message.
If the reader of this message is not the intended recipient, you are hereby
notified that any dissemination, use, distribution or copying of

this communication is strictly prohibited and may be unlawful.

If you have received this communication in error, please notify us
immediately by replying to the message and deleting it from your computer.
Thank you.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Keyword query confusion

2004-09-24 Thread Aviran
The StandardAnalyzer removes the 1 as it is a stop word.
There are two ways you can work around this problem.
1 as you mentioned is to create a Query object programmatically.
2 You can use WhiteSpace Analyzer instead of StandardAnalyzer.

Aviran

-Original Message-
From: Fred Toth [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 24, 2004 12:27 PM
To: [EMAIL PROTECTED]
Subject: Keyword query confusion


Hi all,

I'm trying to understand what's going on with the query parser and keyword
fields.

I've got a large subset of my documents which are publications. So as to
be able to query these, I've got this in the indexer:

doc.add(Field.Keyword(is_pub, 1));

However, if I run a query:

is_pub:1

I get no hits. If I find a document by other means and dump the fields, the
is_pub keyword is there, with value of 1.

Now, I've learned that if I change the field to contain the value true
instead of the string 1, this query:

is_pub:true

works just fine.

So, I'm pretty sure I'm running afoul of the analyzer, right? The doc says
specifically that I should add keyword query clauses programmatically, and
I'm guessing that's what's wrong.

But can someone explain this? It sure is useful to be able to test this sort
of thing with the query parser. What is going on with the standard analyzer
that makes true work and 1 not work?

Is there a way around this other than by writing code to create the query?
This also applies to other types of query, like pub_date:2004.

Hoping for enlightenment...

Thanks,

Fred


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Questions related to closing the searcher

2004-09-23 Thread Aviran
The best way is to use IndexReader's getCurrentVersion() method to check
whether the index has changed. If it has, just get a new Searcher
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade
r.html#getCurrentVersion(java.lang.String)

Aviran

-Original Message-
From: Edwin Tang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 22, 2004 11:38 AM
To: [EMAIL PROTECTED]
Subject: Fwd: Questions related to closing the searcher


Hello,

In my testing, it seems like if the searcher (in my
case ParallelMultiSearcher) is not closed, the
searcher will not pick up any new data that has been
added to the index since it was opened. I'm wondering
if this is a correct statement.

Assuming the above is true, I went about closing the
searcher with searcher.close(), then setting both the
searcher and QueryParser to null, then did a
System.gc(). The application will sleep for a set
period of time, then resumes to process another batch
of queries against the index. When the application
resumes, the following method is ran:

/**
 * Creates a [EMAIL PROTECTED] ParallelMultiSearcher} and
[EMAIL PROTECTED] QueryParser} if they
 * do not already exist.
 *
 * @return  0 if successful or the objects already
exist; -1 if failed.
 */
private int getSearcher() {
Analyzer analyzer;
IndexSearcher[] searchers;
int iReturn;
Vector vector;
if (logger.isDebugEnabled())
logger.debug(Entering getSearcher());
if (searcher == null || parser == null) {
analyzer = new
CIAnalyzer(utility.sStopWordsFile);
try {
vector = new Vector();
if (utility.bSearchAMX)
vector.add(new IndexSearcher(utility.amxIndexDir));
if (utility.bSearchCOMTEX)
vector.add(new IndexSearcher(utility.comtexIndexDir));
if (utility.bSearchDJNW)
vector.add(new IndexSearcher(utility.djnwIndexDir));
if (utility.bSearchMoreover)
vector.add(new IndexSearcher(utility.moreoverIndexDir));
searchers = (IndexSearcher[]) vector.toArray(new
IndexSearcher[vector.size()]);
searcher = new ParallelMultiSearcher(searchers);
parser = new QueryParser(body,
analyzer);
iReturn = 0;
} catch (IOException ioe) {
logger.error(Error creating
searcher, ioe);
iReturn = -1;
} catch (Exception e) {
logger.error(Unexpected error while
creating searcher, e);
iReturn = -1;
}
} else
iReturn = 0;
if (logger.isDebugEnabled())
logger.debug(Exitng getSearcher() with 
+ iReturn);
return iReturn;
} // End method getSearcher()

This seems to get me around the problem where the
searcher was not picking up new data from the index.
However, I would run out of memory after 8 iterations
of the application processing a batch query, sleeping,
process another batch query, sleep, etc.

I'm probably missing something completely obvious, but
I'm just not seeing it. Can someone please tell me
what I'm doing wrong?

Thanks,
Ed



__
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: problem with SortField[] in search method (newbie)

2004-09-15 Thread Aviran
You can only sort on indexed field. (even more than that, it'll work
properly only on Untokenized fields, ie keyword).

Aviran

-Original Message-
From: Wermus Fernando [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 15, 2004 13:13 PM
To: [EMAIL PROTECTED]
Subject: problem with SortField[] in search method (newbie)


Luceners,
My search looks up the whole entities. My entities are accounts, contacts,
tasks, etc. My searching looks up a group of entity's fields. This works
fine despite, I don't have indexed any entity in a document. But If I sort
by some fields from different entities, I get the following error.
 
field shortName does not appear to be indexed
 
The account's field I have indexed are
 
shortName,number,location,fax,phone,symbol
 
and I order by
 
shortName

without  any order
 
shortName,number,location,fax,phone,symbol
 
it works fine.

I don't understand the behavior because If I don't order the searching and I
don't have any document indexed, It works fine, but If I add an order I get
a runtimeException and I can't catch the exception  to solve the problem.
The only solution it's to index the whole fields' entitities once in a
document, but for me it's a patch.
 
Any idea,  it could help me out.
 
Thanks in advance.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort Search Result

2004-08-24 Thread Aviran
Look at SortField
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/SortField
.html

-Original Message-
From: Natarajan.T [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 24, 2004 11:35 AM
To: 'Lucene Users List'
Subject: Sort Search Result


FYI,
 
How can I get the search results in Ascending order... (Sort API)
 
Thanks,
Natarajan.
 
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching MySql index using lucene

2004-08-24 Thread Aviran
Just read your data from the database and create a Lucene Index for the
columns you want to search

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 24, 2004 9:52 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Searching MySql index using lucene


 
Hi,
  
1. MySql defaultly creates an index. if i want to search this index using
lucene how i can search. 

2. How to create index on databases using lucene.

   Give me suggestions if any body know.

   Thanks.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Indexing and Searching Database in Lucene

2004-08-20 Thread Aviran
You need to create a lucene index from the database.
Just  index the columns and the records from the database.
It will be useful to have also a field in lucene that contains the
database's primary key, so you can retrieve the actual record from the
database

Aviran

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 20, 2004 10:55 AM
To: [EMAIL PROTECTED]
Subject: Indexing and Searching Database in Lucene


  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: index and search question

2004-08-09 Thread Aviran
yes

-Original Message-
From: Dmitrii PapaGeorgio [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 16, 2004 9:23 AM
To: [EMAIL PROTECTED]
Subject: index and search question


Ok so when I index a file such as below

Document doc = new Document();
doc.Add(Field.Text(contents, new StreamReader(dataDir)));
doc.Add(Field.Keyword(filename, dataDir));

I can do a search as this
+contents:SomeWord  +filename:SomePath

Correct?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Question on number of fields in a document.

2004-08-04 Thread Aviran
You should be fine, no problem with the number of fields

-Original Message-
From: John Z [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 04, 2004 12:23 PM
To: [EMAIL PROTECTED]
Subject: Question on number of fields in a document.


Hi
 
I had a question related to number of fields in a document. Is there any
limit to the number of fields you can have in an index.
 
We have around 25-30 fields per document at present, about 6 are keywords,
Around 6 stored, but not indexed and rest of them are text, which is
analyzed and indexed fields. We are planning on adding around 24 more fields
, mostly keywords.
 
Does anyone see any issues with this? Impact to search or index ?
 
Thanks
ZJ




-
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: When does IndexReader pick up changes?

2004-07-29 Thread Aviran
IndexReader will pick the changes as it is opened. 
If new documents are added to the index you need to open a new IndexReader
in order for it to pick up the changes

Aviran

-Original Message-
From: Stephane James Vaucher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 29, 2004 0:00 AM
To: Lucene Users List
Subject: Re: When does IndexReader pick up changes?


IIRC, if you use a searcher, changes are picked up right away. With a
reader, I would expect it should react the same way.

disclaimerI'm not a lucene guru, I might be wrong/disclaimer Where I'm
less sure is with a FSDirectory, as it uses an internal RAMDirectory. If two
separate processes (within the same classloader, FS with same paths are
reused) use different FSDirectories, you might notice a flushing behaviour.

sv

On 28 Jul 2004 [EMAIL PROTECTED] wrote:

 Hi,

 Does anyone know if the IndexWriter has to be closed for an 
 IndexReader to pick up the changes?

 Thanks.

 --- Lucene Users List [EMAIL PROTECTED]
 wrote:
 Hi,
 
  If I do this:
 
- open index writer
- add document

- open reader
- search with reader
- close reader
- close
 writer
 
  Will the reader pick up the document that
  was added to the
 index since it was opened
  after the document was added?  Or will it
 
 only pick up changes that occur after
  the index writer is closed?
 
  Thanks for the help!


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: When does IndexReader pick up changes?

2004-07-29 Thread Aviran
AFAIK you don't have to close the writer

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 29, 2004 11:17 AM
To: [EMAIL PROTECTED]
Subject: RE: When does IndexReader pick up changes?


Yes, I understand that the IndexReader only picks up changes once it is
opened.  I'm just trying to determine whether the IndexWriter first needs to
be closed or if that is not necessary.

--- Lucene Users List [EMAIL PROTECTED]
wrote:
IndexReader will pick the changes as it is opened. 
 If new documents
are added to the index you need to open a new IndexReader
 in order for
it to pick up the changes
 
 Aviran
 
 -Original Message-

 From: Stephane James Vaucher [mailto:[EMAIL PROTECTED]
 Sent: Thursday,
July 29, 2004 0:00 AM
 To: Lucene Users List
 Subject: Re: When does IndexReader
pick up changes?
 
 
 IIRC, if you use a searcher, changes are picked
up right away. With a
 reader, I would expect it should react the same way.

 
 disclaimerI'm not a lucene guru, I might be wrong/disclaimer Where
I'm
 less sure is with a FSDirectory, as it uses an internal RAMDirectory.
If two
 separate processes (within the same classloader, FS with same paths
are
 reused) use different FSDirectories, you might notice a flushing 
 behaviour.

 
 sv
 
 On 28 Jul 2004 [EMAIL PROTECTED] wrote:
 
  Hi,
 
  Does anyone know if the IndexWriter has to be closed
for an 
  IndexReader to pick up the changes?
 
  Thanks.
 

 --- Lucene Users List [EMAIL PROTECTED]
  wrote:
 
Hi,
  
   If I do this:
  
 - open index writer
  
  - add document
 
 - open reader
 - search with reader

 - close reader
 - close
  writer
  
   Will the
reader pick up the document that
   was added to the
  index since
it was opened
   after the document was added?  Or will it
  
 
only pick up changes that occur after
   the index writer is closed?
  
   Thanks for the help!
 
 
  
  -

  To unsubscribe, e-mail: [EMAIL PROTECTED]

 For additional commands, e-mail: [EMAIL PROTECTED]


 
 
 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For
additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For
additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: rebuild index

2004-07-22 Thread Aviran
Why don't you just build a new index in a different location and at the end
add the missing documents from the old index to the new one, and then delete
the old index.

Aviran

-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 22, 2004 10:49 AM
To: Lucene Users List
Subject: rebuild index



 Hi all,

 I have a question related to reindexing of documents with lucene. We want
to implement the functinality of rebuilding lucene index. That means I want
to delete all documents in the index and to add newer 
versions.
All information I need to reindex is kept in the database so that I have 
a Term ID, which is unique.

My problem is that I don't have a deleteall() method in IndexReader, and 
I don't have undelete(int) and undelete(Term)
methods. I have only delete(Term) and  undeleteAll() methods that can be 
used for this action.

I would like to delete all documents (just mark as deleted). Add the new 
documents o the index and create a list of documents that were not 
succesfully indexed,
(from different reasons, that may depend on lucene or on our code). At 
the end I would like to restore (mark as undeleted) the documents in the 
list and to optimize the
index, so that the changes to be permanetly commited in the index.

 Is this possible witout hacking lucene code? Any Ideas?

 Thanks in advance,

 Sergiu
 

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sorting on tokenized fields

2004-07-21 Thread Aviran
You can create a new field which contains the full untokened string and use
it as a sort field.


-Original Message-
From: Florian Sauvin [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 20, 2004 20:13 PM
To: Lucene Users List
Subject: Sorting on tokenized fields


I see in the Javadoc that it is only possible to sort on fields that 
are not tokenized, I have two questions about that:

1) What happens if the field is tokenized, is sorting done anyway, 
using the first term only?

2) Is there a way to do some sorting anyway, by concatenating all the 
tokens into one string?

--

Florian


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
Since I had to implement sorting in lucene 1.2 I had to write my own sorting
using something similar to a lucene's contribution called SortField. 
Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I
realized that my old implementation works 40% faster then Lucene's
implementation. My guess is that you are right and there is a problem with
the cache although I couldn't find what that is yet.

Aviran

-Original Message-
From: Greg Gershman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 9:22 AM
To: [EMAIL PROTECTED]
Subject: Sort: 1.4-rc3 vs. 1.4-final


When rc3 came out, I modified the classes used for
Sorting to, in addition to Integer, Float and
String-based sort keys, use Long values.  All I did
was add extra statements in 2 classes (SortField and
FieldSortedHitQueue) that made a special case for
longs, and created a LongSortedHitQueue identical to
the IntegerSortedHitQueue, only using longs.  

This worked as expected; Long values converted to
strings and stored in Field.Keyword type fields would
be sorted according to Long order.  The initial query
would take a while, to build the sorted array, but
subsequent queries would take little to no time at
all.

I went back to look at 1.4 final, and noticed the Sort implementation has
changed quite a bit.  I tried the same type of modifications to the existing
source files, but was unable to achieve similiar results. 
Each subsequent query seems to take a significant
amount of time, as if the Sorted array is being
rebuilt each time.  Also, I tried sorting on an
Integer fields and got similar results, which leads me
to believe there might be a caching problem somewhere.

Has anyone else seen this in 1.4-final?  Also, I would
like it if Long sorted fields could become a part of
the API; it makes sorting by date a breeze.

Thanks!

Greg Gershman



__
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I think I found the problem
FieldCacheImpl uses WeakHashMap to store the cached objects, but since there
is no other reference to this cache it is getting released.
Switching to HashMap solves it.
The only problem is that I don't see anywhere where the cached object will
get released if you open a new IndexReader.

Aviran

-Original Message-
From: Greg Gershman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 13:13 PM
To: Lucene Users List
Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


I've done a bit more snooping around; it seems that in
FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored
comparator in the cache always return null.  This occurs even for the
built-in sort types (I tested it on integers and my code for longs).  The
comparators don't even appear to be being stored in the HashMap to begin
with.

Any ideas?

Greg

 

--- Aviran [EMAIL PROTECTED] wrote:
 Since I had to implement sorting in lucene 1.2 I had
 to write my own sorting
 using something similar to a lucene's contribution
 called SortField.
 Yesterday I did some tests, trying to use lucene 1.4
 Sort objects and I
 realized that my old implementation works 40% faster
 then Lucene's
 implementation. My guess is that you are right and
 there is a problem with
 the cache although I couldn't find what that is yet.
 
 Aviran
 
 -Original Message-
 From: Greg Gershman [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 21, 2004 9:22 AM
 To: [EMAIL PROTECTED]
 Subject: Sort: 1.4-rc3 vs. 1.4-final
 
 
 When rc3 came out, I modified the classes used for
 Sorting to, in addition to Integer, Float and
 String-based sort keys, use Long values.  All I did
 was add extra statements in 2 classes (SortField and
 FieldSortedHitQueue) that made a special case for
 longs, and created a LongSortedHitQueue identical to
 the IntegerSortedHitQueue, only using longs.
 
 This worked as expected; Long values converted to
 strings and stored in Field.Keyword type fields
 would
 be sorted according to Long order.  The initial
 query
 would take a while, to build the sorted array, but
 subsequent queries would take little to no time at
 all.
 
 I went back to look at 1.4 final, and noticed the
 Sort implementation has
 changed quite a bit.  I tried the same type of
 modifications to the existing
 source files, but was unable to achieve similiar
 results.
 Each subsequent query seems to take a significant
 amount of time, as if the Sorted array is being
 rebuilt each time.  Also, I tried sorting on an
 Integer fields and got similar results, which leads
 me
 to believe there might be a caching problem
 somewhere.
 
 Has anyone else seen this in 1.4-final?  Also, I
 would
 like it if Long sorted fields could become a part of
 the API; it makes sorting by date a breeze.
 
 Thanks!
 
 Greg Gershman
 
 
   
 __
 Do you Yahoo!?
 New and Improved Yahoo! Mail - Send 10MB messages!
 http://promotions.yahoo.com/new_mail
 

-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 

-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 





__
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I just saw this post, I guess we both came to the same conclusion. 
The only problem is that the cached object never gets released, and a new
one will get created every time you open a new IndexReader

Aviran

-Original Message-
From: Greg Gershman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 13:30 PM
To: Lucene Users List
Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


I switched the Comparators and FieldCache classes to
use java.util.HashMap instead of
java.util.WeakHashMap, and got the performance boost I
was looking for (test index of 100K documents; initial
search took 991 ms, all subsequent searchs took 
90ms.  Before, I was seeing initial query of ~1sec,
subsequent queries between 500 and 700 ms, with
comparator and field lookup table computed each time).

I guess the question is why use a WeakHashMap here as
opposed to a HashMap?

Greg

--- Greg Gershman [EMAIL PROTECTED] wrote:
 I've done a bit more snooping around; it seems that
 in
 FieldSortedHitQueue.getCachedComparator(line 153),
 calls to lookup a stored comparator in the cache
 always return null.  This occurs even for the
 built-in
 sort types (I tested it on integers and my code for
 longs).  The comparators don't even appear to be
 being
 stored in the HashMap to begin with.
 
 Any ideas?
 
 Greg
 
  
 
 --- Aviran [EMAIL PROTECTED] wrote:
  Since I had to implement sorting in lucene 1.2 I
 had
  to write my own sorting
  using something similar to a lucene's contribution
  called SortField.
  Yesterday I did some tests, trying to use lucene
 1.4
  Sort objects and I
  realized that my old implementation works 40%
 faster
  then Lucene's
  implementation. My guess is that you are right and
  there is a problem with
  the cache although I couldn't find what that is
 yet.
  
  Aviran
  
  -Original Message-
  From: Greg Gershman [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 21, 2004 9:22 AM
  To: [EMAIL PROTECTED]
  Subject: Sort: 1.4-rc3 vs. 1.4-final
  
  
  When rc3 came out, I modified the classes used for
  Sorting to, in addition to Integer, Float and
  String-based sort keys, use Long values.  All I
 did
  was add extra statements in 2 classes (SortField
 and
  FieldSortedHitQueue) that made a special case for
  longs, and created a LongSortedHitQueue identical
 to
  the IntegerSortedHitQueue, only using longs.
  
  This worked as expected; Long values converted to
  strings and stored in Field.Keyword type fields
  would
  be sorted according to Long order.  The initial
  query
  would take a while, to build the sorted array, but subsequent 
  queries would take little to no time at all.
  
  I went back to look at 1.4 final, and noticed the
  Sort implementation has
  changed quite a bit.  I tried the same type of modifications to the 
  existing source files, but was unable to achieve similiar
  results. 
  Each subsequent query seems to take a significant
  amount of time, as if the Sorted array is being
  rebuilt each time.  Also, I tried sorting on an
  Integer fields and got similar results, which
 leads
  me
  to believe there might be a caching problem
  somewhere.
  
  Has anyone else seen this in 1.4-final?  Also, I
  would
  like it if Long sorted fields could become a part
 of
  the API; it makes sorting by date a breeze.
  
  Thanks!
  
  Greg Gershman
  
  
  
  __
  Do you Yahoo!?
  New and Improved Yahoo! Mail - Send 10MB messages! 
  http://promotions.yahoo.com/new_mail
  
 

-
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
  [EMAIL PROTECTED]
  
  
  
  
 

-
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
  [EMAIL PROTECTED]
  
  
 
 
 
   
   
 __
 Do you Yahoo!?
 Vote for the stars of Yahoo!'s next ad campaign!

http://advision.webevents.yahoo.com/yahoo/votelifeengine/
 

-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 





__
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort: 1.4-rc3 vs. 1.4-final

2004-07-21 Thread Aviran
I will post a patch soon

Aviran

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 13:56 PM
To: Lucene Users List
Subject: Re: Sort: 1.4-rc3 vs. 1.4-final


The key in the WeakHashMap should be the IndexReader, not the Entry.  I 
think this should become a two-level cache, a WeakHashMap of HashMaps, 
the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry.  I 
think the Entry class can also be changed to not include an IndexReader 
field.  Does this make sense?  Would someone like to construct a patch 
and submit it to the developer list?

Doug

Aviran wrote:
 I think I found the problem
 FieldCacheImpl uses WeakHashMap to store the cached objects, but since 
 there is no other reference to this cache it is getting released. 
 Switching to HashMap solves it. The only problem is that I don't see 
 anywhere where the cached object will get released if you open a new 
 IndexReader.
 
 Aviran
 
 -Original Message-
 From: Greg Gershman [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 21, 2004 13:13 PM
 To: Lucene Users List
 Subject: RE: Sort: 1.4-rc3 vs. 1.4-final
 
 
 I've done a bit more snooping around; it seems that in 
 FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a 
 stored comparator in the cache always return null.  This occurs even 
 for the built-in sort types (I tested it on integers and my code for 
 longs).  The comparators don't even appear to be being stored in the 
 HashMap to begin with.
 
 Any ideas?
 
 Greg
 
  
 
 --- Aviran [EMAIL PROTECTED] wrote:
 
Since I had to implement sorting in lucene 1.2 I had
to write my own sorting
using something similar to a lucene's contribution
called SortField.
Yesterday I did some tests, trying to use lucene 1.4
Sort objects and I
realized that my old implementation works 40% faster
then Lucene's
implementation. My guess is that you are right and
there is a problem with
the cache although I couldn't find what that is yet.

Aviran

-Original Message-
From: Greg Gershman [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 21, 2004 9:22 AM
To: [EMAIL PROTECTED]
Subject: Sort: 1.4-rc3 vs. 1.4-final


When rc3 came out, I modified the classes used for
Sorting to, in addition to Integer, Float and
String-based sort keys, use Long values.  All I did
was add extra statements in 2 classes (SortField and
FieldSortedHitQueue) that made a special case for
longs, and created a LongSortedHitQueue identical to
the IntegerSortedHitQueue, only using longs.

This worked as expected; Long values converted to
strings and stored in Field.Keyword type fields
would
be sorted according to Long order.  The initial
query
would take a while, to build the sorted array, but
subsequent queries would take little to no time at
all.

I went back to look at 1.4 final, and noticed the
Sort implementation has
changed quite a bit.  I tried the same type of
modifications to the existing
source files, but was unable to achieve similiar
results.
Each subsequent query seems to take a significant
amount of time, as if the Sorted array is being
rebuilt each time.  Also, I tried sorting on an
Integer fields and got similar results, which leads
me
to believe there might be a caching problem
somewhere.

Has anyone else seen this in 1.4-final?  Also, I
would
like it if Long sorted fields could become a part of
the API; it makes sorting by date a breeze.

Thanks!

Greg Gershman


  
__
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages! 
http://promotions.yahoo.com/new_mail


 
 -
 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





 
 -
 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 
 
 
 
   
   
 __
 Do you Yahoo!?
 Vote for the stars of Yahoo!'s next ad campaign! 
 http://advision.webevents.yahoo.com/yahoo/votelifeengine/
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Aviran
Hi all,
First let me explain what I found out. I'm running Lucene on a 4 CPU server.
While doing some stress tests I've noticed (by doing full thread dump) that
searching threads are blocked on the method: public FieldInfo fieldInfo(int
fieldNumber) This causes for a significant cpu idle time. 
I noticed that the class org.apache.lucene.index.FieldInfos uses private
class members Vector byNumber and Hashtable byName, both of which are
synchronized objects. By changing the Vector byNumber to ArrayList byNumber
I was able to get 110% improvement in performance (number of searches per
second).
 
My question is: do the fields byNumber and byName have to be synchronized
and what can happen if I'll change them to be ArrayList and HashMap which
are not synchronized ? Can this corrupt the index or the integrity of the
results?

Thanks,
Aviran



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: How would you delete an entry that was indexed like this

2003-12-05 Thread Aviran
This is kind of a problem, in order to delete documents using terms you need
to have a keyword field which contain a unique value, otherwise you might
ending deleting more then you want.

-Original Message-
From: Mike Hogan [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 05, 2003 1:06 PM
To: [EMAIL PROTECTED]
Subject: How would you delete an entry that was indexed like this


Hi,

If I index a document like this:

IndexWriter writer = createWriter();
Document document = new Document(); document.add(Field.Text(ID_FIELD_NAME,
componentId)); document.add(Field.Text(CONTENTS_FIELD_NAME,
componentDescription)); writer.addDocument(document); writer.optimize();
writer.close();

What code must I execute to later delete the document (I tried following the
docs and whats done in the code and test cases.  I saw Terms being used to
ID the document to delete.  But I am not clear what value to put in the
Term, as I do not know how Terms relate to Fields).

Many thanks,
Mike.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query creation

2003-12-04 Thread Aviran
You'll need to apply some kind of filter or add another field to the index
which contains only the first word (Yes you'll need to rebuild the index in
this case)

-Original Message-
From: Armbrust, Daniel C. [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2003 5:49 PM
To: 'Lucene Users List'
Subject: Query creation


Is it possible to create a query that would find a match in a document if
and only if the query (a one word query) matched with the first word in the
field I am searching?

Or do I have to rebuild my indexes, with a field that only contains the
first word?

Thanks, 

Dan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: RC2 requires reindexing?

2003-08-29 Thread Aviran Mordo
You can find RC2 in CVS

-Original Message-
From: Jan Agermose [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 29, 2003 6:32 AM
To: Lucene Users List
Subject: Re: RC2 requires reindexing?


Ok, on the first posting about RC2 i looked for et, but as I did not
find any RC2 I guessed he was mistaken... but now? What RC2 are You
talking about and if its 1.3RC2 where do I find it and why does the
webpage not mention it (or the download area hold it) ?

Jan

- Original Message - 
From: Lukas Zapletal [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, August 29, 2003 12:14 PM
Subject: Re: RC2 requires reindexing?


 it do not need reindexing, it works fine for me

 On Thu, 28 Aug 2003 11:27:34 -0400, Terry Steichen 
 [EMAIL PROTECTED]
 wrote:

  I just switched to RC2 and found that a number of queries now don't
work.
  (When I switch back to RC! they work fine.)  Can't seem to figure 
  out a pattern regarding those that don't work versus those (the vast

  majority) that still work fine.  I looked in the RC2 source and 
  noticed that the dates on IndexWriter and IndexReader and a bunch of

  related modules seem to have been changed.
 
  Is it necessary to reindex (a major task for my stuff) to use RC2?
 
  Regards,
 
  Terry
 



 --
 Lukas Zapletal

 http://www.tanecni-olomouc.cz/lzap   icq: 17569735
 mail: lzap_at_root.czjabber: lzap_at_njs.netlab.cz
 pgp: 715B 5502 4FB3 65E7 266B 927E CE9F 1D04 0EE2 4DB7

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Newbie Questions

2003-08-26 Thread Aviran Mordo
1. You need to use MultiFieldQueryParser
2. I think you should use PorterStemFilter instead of fuzzy query
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Por
terStemFilter.html

-Original Message-
From: Mark Woon [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 26, 2003 12:54 AM
To: [EMAIL PROTECTED]
Subject: Newbie Questions


Hi all...

I've been playing with Lucene for a couple days now and I have a couple 
questions I'm hoping some one can help me with.  I've created a Lucene 
index with data from a database that's in several different fields, and 
I want to set up a web page where users can search the index.  Ideally, 
all searches should be as google-like as possible.  In Lucene terms, I 
guess this means the query should be fuzzy.  For example, if someone 
searches for cancer then I'd like to get back all resuls with any form

of the word cancer in the term (cancerous, breast cancer, etc.).

So far, I seem to be having two problems:

1) How can I search all fields at the same time?  The QueryParser seems 
to only search one specific field.

2) How can I automatically default all searches into fuzzy mode?  I 
don't want my users to have to know that they must add a ~ at the end 
of all their terms.

Thanks,
-Mark




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Bug: TermQuery toString - incorrect

2003-07-30 Thread Aviran Mordo
I have a TermQuery object which contains a term which has space (two
words). But when I do a toString() I get a query that matches an OR
operation.

Example: The Term +Small Business results with a toString method as

+(SocioEconomicInformation:Small Business) 

And the expected result should be

+(SocioEconomicInformation:Small Business)

The problem is when I try to parse it again I get 

+(SocioEconomicInformation:Small Content:Business) 

Because it does not have the double quotes it tokenizes the term Small
Business in to two terms [Small] [Business] instead of one [Small
Business]

I use Lucene 1.3 RC1. 

Aviran



RE: keyword indexing

2003-07-16 Thread Aviran Mordo
If you are searching on keyword you might need to use TermQuery in order
to have an exact match

-Original Message-
From: Jan Agermose [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 16, 2003 1:04 PM
To: [EMAIL PROTECTED]
Subject: keyword indexing


I'm having some problems with chars in keywords that are not a-z0-9
chars...

If I have a keyword like Det Naturvidenskabelige Fakultet or a name
Jan Agermose - well besides the fact I need to lowercase the keywords
as the querystring is lowercased by lucene, I still cannot get any hits
on the keywords. 

Det Naturvidenskabelige Fakultet - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0

I can understand the last one - but shouldn't the first one return hits?
If not, using keywords seems to be limited to keywords composed of
[a-z0-9]+ ??? 

Now I do a string replace on [^a-z0-9]+ /  (removing all the chars)
but this gives the queryparse some problems I would think - unless in my
special case where the user is not really free to compose queries on
there own - therefore I can do the same stringreplace thing on the input
:-D But I would like for the poweruser to input real queries - and this
leaves me with the problem of parsing queries. I need to do
stringreplace only within double quotes... This should be lucenes
problem not mine :-D

Am I missing something ??

Jan Agermose



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Maybe a stupid question?

2003-07-10 Thread Aviran Mordo
You can add as many fields with the same name as your heart desire on
the same document. This will give you multiple values

Aviran

-Original Message-
From: Olivier Cochet [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 10, 2003 10:43 AM
To: Lucene Users List
Subject: Maybe a stupid question?


Hello, i would like to know if it was possible to introduce more than
once value for a field in a document. I think that is must be possible,
but i don't know how to make it.

Thanks for answering and knocking on my head if i had stupid questions
;-)

olivier cochet from Paris


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Results sorted by date instead of score?

2003-07-03 Thread Aviran Mordo
You'll need to sort the results after you collected them. There is a
project called SortedField in lucene's contribution or sandbox (I don't
remember exactly) which will help you sort by any field.

-Original Message-
From: Wilton, Reece [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 02, 2003 6:01 PM
To: [EMAIL PROTECTED]
Subject: Results sorted by date instead of score?


Search hits come back ordered by score.  How do I get my results sorted
by the date of the article?  I have added the article date as a keyword
field.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Using Lucene in an multiple index/large io scenario

2003-06-30 Thread Aviran Mordo
You'll probably need to optimize the index more often. This will reduce
the number of files lucene open. Also if you can merge several fields
into one, it will also reduce the number of files.

Aviran

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 30, 2003 3:54 PM
To: [EMAIL PROTECTED]
Subject: Using Lucene in an multiple index/large io scenario


Hello,
i am ProjectManager from the columba.sourceforge.net java
mailclient-project and we integrated Lucene as the search-backend half a
year ago. It is now working for small scale mailtraffic but with
increasing mailtraffic Lucene throws OutOfMemory and
TooManyFilesOpen-Exceptions. I am now wondering if Lucene is capable of
doing the job for us (like Otis Gospodnetic suggested) and would
appreciate any help and knowledge you can share on this topic.

I think the problem arises from following issues:
- Lucene is designed to create an index once in a while and not to
update an index frequently.
   We need it to add and delete documents very often *and* search the
index eventualy after
   every operation. Has anyone experiences running Lucene in such an
environment or do you
   think it is impossible? 

- Do you have an suggestion on how to use Lucene in such an environment
because it is not 
   very nice code if you have to create a new IndexReader/Writer after
every operation?

- We introduced a RAMIndex that is merged to the FileIndex after N
operations to reduce the
  load and to not merge documents that are removed directly after they
are added (with filters
  on the mailboxes that is happening very often). Any ideas if that was
wise or if there is a
  better solution?

- Does Lucene have problems with many indices in the same virtual
machine? We have an index
   for every mailfolder and get TooManyFilesOpen-Excpetions when having
10 indices open.
   Maybe we should try to have only a single index that holds all
messages? 

If you like to look at sourcecode, how we implememted all this look at
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/columba/columba/src/mail/
core/org/columba/mail/folder/search/LuceneSearchEngine.java?rev=1.7cont
ent-type=text/vnd.viewcvs-markup

Its not nice to just give you the plain code and not the relevant
snippets, but these are more general design issues that i think are
better explained in words than in code.

I would really like to see Lucene integrated in Columba, but i had to
learn that it is no easy task, maybe an impossible one. Based on the
responses i willl decide if we continue to work with Lucene or sadly
have to drop it.

Thanks in advance
Timo Stich
[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: date ranges.....

2003-06-27 Thread Aviran Mordo
Use RangeQuery to search on the date field

-Original Message-
From: host unknown [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 27, 2003 10:39 AM
To: [EMAIL PROTECTED]
Subject: date ranges.


Hi all

Here's my scenario

I'm building a calendaring application and using Lucene (one of many
times 
I've used it on our site) for the indexing/retrieval mechanism.  The 
calendar has events.  An event consists of:  start date, end date, start

time, end time, and descriptive information.  Most begin and end on the
same 
day, however not all of them.

Here's where the problem lies.  Let's say an event runs from 20030625
(june 
25 2003) until 20030701 and I want to search all events (several
thousand) 
and know what's happening today (20030727).  The results I'm looking for
can 
be described with this sql statement:  Select * from events where 
start_date = 20030627 and end_date =20030627.  How do i write this 
'query' with Lucene?

Many thanks,
Dominic

_
Protect your PC - get McAfee.com VirusScan Online  
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: query question in trouble

2003-06-11 Thread Aviran Mordo
In is probably a STOP word in your analyzer

-Original Message-
From: Ryan Clifton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2003 3:13 PM
To: Lucene Users List
Subject: query question in trouble


Hello,

Upon reviewing the results of some queries recently I noticed that the
query: in trouble always searches for trouble.

Is 'in' a keyword that I'm not aware of?  I searched the whole query
syntax page and didn't see it mentioned.  I tried an trouble and the
query worked fine.  The query parser appears to be stripping out 'in',
but not doing anything with it.

Here's my log:

**Query: in trouble
2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble
(Query.toString()) 2003-06-11 12:08:50,569 DEBUG 6582 total matching
documents

**Query: an trouble
2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble
(Query.toString()) 2003-06-11 12:06:12,342 DEBUG 1 total matching
documents

Any ideas?

Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sort results by date alone?

2003-05-29 Thread Aviran Mordo
I think I saw a solution for this in the past. Try to search the mailing
list.
Anyway you can always use the SearchBean which is in lucene sandbox to
sort by any field.

-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of David Weitzman
Sent: Tuesday, May 27, 2003 8:26 PM
To: [EMAIL PROTECTED]
Subject: Sort results by date alone?


I think it's possible, but I'm not sure how Scorers work.  I just want
to place the most recent hits at the front and the oldest ones at the
back (where date is a field in the documents).  Is there a simple way
to do this?

Thanks,

David Weitzman




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Wildcard workaround

2003-05-29 Thread Aviran Mordo
You can also index the file names with a leading character. For instance
index file1.exe will be indexed as _file1.exe and always add the
leading character to the search term.
So if the user input is *.exe your query should be _*.exe and if the
user input fi* you'll change it to _fi*

Aviran

-Original Message-
From: David Warnock [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 28, 2003 10:55 AM
To: Lucene Users List
Subject: Re: Wildcard workaround


Andrei,

 I have a file database indexed by content and also by filename. It 
 would be nice if the user could perform a usual search like *.ext.
  
 Anybody tried a workaround for this issue ? ( this is needed only for 
 the name of the file, for the rest of the terms the rules are fine 
 with me)

If the term begins with * then could you expand it into a set of 36 
terms eg a*.ext b*.ext ... z*.ext 0*.ext

No idea how this would compare to the other alternatives for speed. But 
it would be simple to code and would not increase index size.

Of course if filenames can use unicode character sets then you have a 
problem. At that point you would need to do a check of what all the 
first characters are to know what terms to use (ie only create a tewrm 
for each character that is used as the 1st character of a filename).

HTH

Dave
-- 
David Warnock, Sundayta Ltd. http://www.sundayta.com
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Querying Question

2003-04-03 Thread Aviran Mordo
You should not tokenize the file name instead you should use

 doc.add(new Field(name, value,
true, true, true));
Or 
Doc.add(Field.keyword(name,value));

Aviran

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2003 5:27 PM
To: Lucene Users List
Subject: RE: Querying Question


Use the following type of Field:

   doc.add(new Field(name, value,
true, true, true));
   

Thanks,
 
Rob 


-Original Message-
From: Aviran Mordo [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2003 5:19 PM
To: 'Lucene Users List'
Subject: RE: Querying Question


Did you index the value field as a keyword?

Aviran

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2003 5:11 PM
To: Lucene Users List
Subject: Querying Question
Importance: High


Hi all,

I am a little fuzzy on complex querying using AND, OR, etc.. For
example:

I have the following name/value pairs

file 1 = name = checkpoint value = filename_1
file 2 = name = checkpoint value = filename_2
file 3 = name = checkpoint value = filename_3
file 4 = name = checkpoint value = filename_4

I ran the following Query:

name:\checkpoint\ AND  value:\filenane_1\

Instead of getting back file 1, I got back all four files?

Then after trying different things I did:

+(name:\checkpoint\) AND  +(value:\filenane_1\)

it then returned file 1.

Our project queries solely on name value pairs and we need the ability
to query using AND, OR, NOTS, etc..  What the correct syntax for such
queries?

The code I use is :
 QueryParser p = new QueryParser(,
 new RepositoryIndexAnalyzer());
 this.query = p.parse(query.toLowerCase());
 Hits hits = this.searcher.search(this.query);

Thanks as always,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]