RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Halácsy Péter

Hello,
I think my problem is something similar.

 -Original Message-
 From: Julien Nioche [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 13, 2002 6:09 PM
 To: Lucene Developers List
 Subject: Re : How does Lucene handle phrases containing words 
 that are not indexed?
 

 PhraseQueries are
 used for compound words
 (e.g. personal computer) with a given slop value (say 3), 
 it could be
 great not to match things such as It is not personal. My 
 computer hates
 me... .
 

I'd like to index documents that are described by keywords. One document can have zero 
or more keywords and a keyword can be related to one ore more documents. Assume two 
keywords:
human computer interaction
computer science

If I add these keywords to a documents in a field and one search with query human 
science the document'll be found, won't it? I could use - say - 16 distinct fields for 
the max 16 keywords and translate the query keyword:human science to keyword1:human 
science or keyword2:human science ... keyword16:human science but this solution 
isn't prefered by me.

peter

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java

2002-02-14 Thread Doug Cutting

Thanks for making all these cleanups, Otis!

One comment:

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 13, 2002 5:47 PM
 To: [EMAIL PROTECTED]
 Subject: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store
 FSDirectory.java
 
 [ ... ]
   + * Examples of appropriately formatted queries can be 
 found in the a
   + * 
 href=http://cvs.apache.org/viewcvs/jakarta-lucene/src/test/or
g/apache/lucene/queryParser/TestQueryParser.java?rev=1content-type=text/vnd
.viewcvs- markuptest cases/a.
   + * /p

The source code is available on the Lucene website as:
  http://jakarta.apache.org/lucene/src/
so this reference can instead be
 
http://jakarta.apache.org/lucene/src/test/org/apache/lucene/queryParser/Test
QueryParser.java

This is preferable, since it the update of this source is coordinated with
updates to the documentation.  So, for example, if someone extends query
syntax they might check in test cases to CVS long before a new release is
made containing these and the query documentation is updated.

Doug

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




DO NOT REPLY [Bug 6469] New: - Exception parsing

2002-02-14 Thread bugzilla

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469

Exception parsing 

   Summary: Exception parsing
   Product: Lucene
   Version: CVS Nightly - Specify date in submission
  Platform: Other
OS/Version: Windows NT/2K
Status: NEW
  Severity: Normal
  Priority: Other
 Component: QueryParser
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




DO NOT REPLY [Bug 6469] - Exception parsing ' this AND menu '

2002-02-14 Thread bugzilla

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469

Exception parsing ' this AND menu '

[EMAIL PROTECTED] changed:

   What|Removed |Added

Summary|Exception parsing   |Exception parsing ' this
   ||AND menu '



--- Additional Comments From [EMAIL PROTECTED]  2002-02-14 16:49 ---
Exception is thrown whe QueryParser parses ' this AND menu ' query

QueryParser.parse(\this\ AND \menu\, contents, new StopAnalyzer())
causes
java.lang.ArrayIndexOutOfBoundsException: -1  0
to be thrown.

Top of the stack
java.util.Vector.elementAt(int)
org.apache.lucene.queryParser.QueryParser.addClause(java.util.Vector, int, int, 
org.apache.lucene.search.Query

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




DO NOT REPLY [Bug 6469] - Exception parsing ' this AND menu '

2002-02-14 Thread bugzilla

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469

Exception parsing ' this AND menu '





--- Additional Comments From [EMAIL PROTECTED]  2002-02-14 16:51 ---
Happens in 1.2rc2 and 1.2rc3

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Doug Cutting

 From: Halácsy Péter [mailto:[EMAIL PROTECTED]]
 
 I'd like to index documents that are described by keywords. 
 One document can have zero or more keywords and a keyword can 
 be related to one ore more documents. Assume two keywords:
 human computer interaction
 computer science
 
 If I add these keywords to a documents in a field and one 
 search with query human science the document'll be found, 
 won't it? I could use - say - 16 distinct fields for the max 
 16 keywords and translate the query keyword:human science 
 to keyword1:human science or keyword2:human science ... 
 keyword16:human science but this solution isn't prefered by me.

This sounds like a good case for an untokenized field.

When you index, use something like:

  Document doc = new Document();
  doc.add(Field.keyword(keyword, computer science));
  doc.add(Field.keyword(keyword, human computer interaction));
  ...
  indexReader.add(doc);

Then you can either add query keywords manually:

  BooleanQuery query = (BooleanQuery)queryParser.parse(other terms,
analyzer);
  query.add(new TermQuery(new Term(keyword, computer science)), true,
false);

or you can integrate this with the query parser by making an analyzer that
constructs terms for the field named keyword using exactly the provided
input:

  public class MyAnalyzer extends Analyzer {
private Analyzer standard = new StandardAnalyzer();
public TokenStream tokenStream(String field, final Reader reader) {
  if (keyword.equals(field)) {
return new CharTokenizer(reader) {
  protected boolean isTokenChar(char c) { return true; }
};
  } else {
return standard.tokenStream(field, reader);
  }
}
  }

  Analyzer analyzer = new MyAnalyzer();
  Query query = queryParser.parse(keyword:\computer science\, analyzer);

I haven't tested the above code, but I hope you get the idea.

Doug


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Doug Cutting

 From: Julien Nioche [mailto:[EMAIL PROTECTED]]
 
 By the way, I was wondering if there is any Analyzer that 
 uses the following
 constructor
   public Token(String text, int start, int end, String typ) ?

StandardTokenizer uses Token's type field to communicate with
StandardFilter, which does some post-processing.

 Maybe it could be interesting to build an analyzer that recognizes
 punctuation marks and
 keeps it in the index as Tokens with a given Type (say for example
 punctuation) ?

Unfortunately token type is not stored in the index.  Adding it could have a
big impact on index size and search performance.

 The advantage is that information could be used by a
 SloppyPhraseScorer.phraseFreq() method
 to avoid PhraseQuery containing a punctuation mark. Since 
 PhraseQueries are
 used for compound words
 (e.g. personal computer) with a given slop value (say 3), 
 it could be
 great not to match things such as It is not personal. My 
 computer hates
 me... .

On the other hand, you'd miss things like, He needs a new computer.
Personal computing has advanced since 1970.

Still, constraining matches to be within a sentence can be useful, but
Lucene does not currently support it, and I don't see an easy way to add it.

Doug

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Indexes in WAR files

2002-02-14 Thread Doug Cutting

 From: Les Hughes [mailto:[EMAIL PROTECTED]]
 
 Reading the servlet spec again it says that calls such as
 servletcontext.getRealPath() will *possibly* return null if 
 the content is
 being served from a war as opposed the physical path on disk 
 - I'm informed
 that weblogic actually returns the name of the warfile and 
 not the exploded
 location. But you're right, Tomcat works differently.

What kind of URL does weblogic return for
servletContext.getResource(//index/segments)?
Is it a file: URL?

Keeping the index in files and using FSDirectory will be much more
efficient.  If all the major servlet containers support this it would be a
shame not to take advantage of it.  You might look at the result of
getResource and use an FSDirectory if a file: url is returned, and do
something else when it's not.

 So in order to isolate from different interpretations of the 
 spec, I'm going
 to knock up a WARDirectory that probably will wrap a 
 RAMDirectory (going
 back to the servlet container to getResourceAsStream seems 
 awfully expensive
 to me) as a first go.
 I'll post my efforts in a couple of days.

One technique you might consider is, when the index is not available as a
file, use getResourceAsStream to copy it to a temporary directory in
System.getProperty(java.io.tmpdir), then use FSDirectory.  Storing the
whole index in a RAMDirectory will make searches really fast, but could also
chew up a lot of memory.  If the index isn't that big anyway, maybe this
isn't an issue.

Doug

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Patch for IndexReader

2002-02-14 Thread Britton, Colin

A few days ago I posted a patch to add to IndexReader the ability to
check if an index is locked by passing a string or file object as well
as a directory. I added this so that I could have a cached index reader
that checked if an index was not locked, but modified before reloading
it - part of sharing the index between all users in my webapp. This is a
mod from the jhtml example and works well but to keep it clean I
modified IndexReader.isLocked(name) to take the name of the index not
the directory object of it. Is the patch needed? And if so will the
patch make it into 1.2 to save me patching my local copy of lucene.

Rgds
CB

My cached index code.

IndexReader getReader(String name) throws Exception {
CachedIndex index = // look in cache
 (CachedIndex) indexCache.get(name);

try {
if (index != null
 // check up-to-date
 (index.modified ==
IndexReader.lastModified(name)))
return index.reader; // cache hit
else {
if (IndexReader.isLocked(name))
return index.reader; // cache
hit, modified but locked
else {
index = new CachedIndex(name);
// cache miss , get new
}
}
} catch (Exception e) {
//System.out.println( caught a  + e.getClass()
+ \n with message:  + e.getMessage());
e.printStackTrace();
return null;
}





 -Original Message-
 From: Britton, Colin 
 Sent: Friday, February 08, 2002 1:43 PM
 To: [EMAIL PROTECTED]
 Subject: Patch for IndexReader
 
 
 Here is a patch for IndexReader.isLocked() to support file 
 and string in the same way as IndexReader.indexExists()
 
 It is in the body and as an attachment.
 
 Rgds
 CB
 
 Index: IndexReader.java 
 ===
 RCS file: 
 /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/inde
 x/IndexRea
 der.java,v
 retrieving revision 1.6
 diff -u -r1.6 IndexReader.java
 --- IndexReader.java  21 Jan 2002 17:07:23 -  1.6
 +++ IndexReader.java  8 Feb 2002 18:40:03 -
 @@ -269,7 +269,28 @@
 */
  abstract public void close() throws IOException;
  
 -  /**
 + /**
 +   * Returns codetrue/code iff the index in the named 
 directory is
 +   * currently locked.
 +   * @param String the directory to check for a lock
 +   * @throws IOException if there is a problem with 
 accessing the index
 +   */
 +   public static boolean isLocked(String directory) throws 
 IOException
 {
 +return (new File(directory, write.lock)).exists();
 +  }
 +  
 + /**
 +   * Returns codetrue/code iff the index in the named 
 directory is
 +   * currently locked.
 +   * @param File the directory to check for a lock
 +   * @throws IOException if there is a problem with 
 accessing the index
 +   */
 +  public static boolean isLocked(File directory) throws IOException {
 +return (new File(directory, write.lock)).exists();
 +  }
 +
 +
 + /**
 * Returns codetrue/code iff the index in the named 
 directory is
 * currently locked.
 * @param directory the directory to check for a lock
 
 *CVS exited normally with code 1*
 

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Searching multiple fields in one Index of Documents

2002-02-14 Thread Otis Gospodnetic

Folks,

What to do you think about including this class in
org.apache.lucene.queryParser?

Let me know, and if you approve I can commit it.

Thanks,
Otis


--- Kelvin Tan [EMAIL PROTECTED] wrote:
 Peter,
 
 As advised, re-released under APL. :) There were some changes to
 QueryParser
 constructors in rc3, and these are reflected here as well.
 
 FWIW, I've also attached a javascript lib and accompanying HTML which
 constructs a Lucene multi-field query using a HTML form.
 
 Regards,
 Kelvin
 
 - Original Message -
 From: Peter Carlson [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Wednesday, February 13, 2002 10:56 PM
 Subject: Re: Searching multiple fields in one Index of Documents
 
 
  This is great Kelvin,
  Sorry I didn't see it before.
  I'll add it to the list of contributions.
 
  --Peter
 
  On 2/13/02 12:43 AM, Kelvin Tan [EMAIL PROTECTED] wrote:
 
   Charles,
  
   See

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00176.html
  
   Regards,
   K
  
   - Original Message -
   From: Charles Harvey [EMAIL PROTECTED]
   To: [EMAIL PROTECTED]
   Sent: Tuesday, February 12, 2002 8:39 AM
   Subject: Searching multiple fields in one Index of Documents
  
  
   I have a working installation of Lucene running against indexes
 created
 by
   a database query.
   Each Document in the Index contains fifteen or twenty fields. I
 am
   currently searching only one field (that contains concatenated
 database
   columns) because I cannot figure out how to search multiple
 fields. So:
  
   How can I use Lucene to search more than one field in an Index
 of
   Documents?
  
   eg:
   field CATEGORY is(or contains) 'bar'
   AND
   field BODY contains 'foo'
  
  
  
  
   _
  
   The trouble with the rat-race is that even if you win you're
 still a
   rat.
   --Lily Tomlin
   _
   Charles Harvey
   Developer
   http://www.philly.com
   Wk: 215 789 6057
   Cell: 215 588 0851
  
  
   --
   To unsubscribe, e-mail:
   mailto:[EMAIL PROTECTED]
   For additional commands, e-mail:
   mailto:[EMAIL PROTECTED]
  
  
  
  
   --
   To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
   For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
  
  
 
 
  --
  To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 

 ATTACHMENT part 2 application/octet-stream
name=MultiFieldQueryParser.java


 ATTACHMENT part 3 application/octet-stream
name=luceneQueryConstructor.js
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


__
Do You Yahoo!?
Got something to say? Say it better with Yahoo! Video Mail 
http://mail.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]