RE: Re : How does Lucene handle phrases containing words that are not indexed?
Hello, I think my problem is something similar. -Original Message- From: Julien Nioche [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 13, 2002 6:09 PM To: Lucene Developers List Subject: Re : How does Lucene handle phrases containing words that are not indexed? PhraseQueries are used for compound words (e.g. personal computer) with a given slop value (say 3), it could be great not to match things such as It is not personal. My computer hates me... . I'd like to index documents that are described by keywords. One document can have zero or more keywords and a keyword can be related to one ore more documents. Assume two keywords: human computer interaction computer science If I add these keywords to a documents in a field and one search with query human science the document'll be found, won't it? I could use - say - 16 distinct fields for the max 16 keywords and translate the query keyword:human science to keyword1:human science or keyword2:human science ... keyword16:human science but this solution isn't prefered by me. peter -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java
Thanks for making all these cleanups, Otis! One comment: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 13, 2002 5:47 PM To: [EMAIL PROTECTED] Subject: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java [ ... ] + * Examples of appropriately formatted queries can be found in the a + * href=http://cvs.apache.org/viewcvs/jakarta-lucene/src/test/or g/apache/lucene/queryParser/TestQueryParser.java?rev=1content-type=text/vnd .viewcvs- markuptest cases/a. + * /p The source code is available on the Lucene website as: http://jakarta.apache.org/lucene/src/ so this reference can instead be http://jakarta.apache.org/lucene/src/test/org/apache/lucene/queryParser/Test QueryParser.java This is preferable, since it the update of this source is coordinated with updates to the documentation. So, for example, if someone extends query syntax they might check in test cases to CVS long before a new release is made containing these and the query documentation is updated. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
DO NOT REPLY [Bug 6469] New: - Exception parsing
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469 Exception parsing Summary: Exception parsing Product: Lucene Version: CVS Nightly - Specify date in submission Platform: Other OS/Version: Windows NT/2K Status: NEW Severity: Normal Priority: Other Component: QueryParser AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
DO NOT REPLY [Bug 6469] - Exception parsing ' this AND menu '
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469 Exception parsing ' this AND menu ' [EMAIL PROTECTED] changed: What|Removed |Added Summary|Exception parsing |Exception parsing ' this ||AND menu ' --- Additional Comments From [EMAIL PROTECTED] 2002-02-14 16:49 --- Exception is thrown whe QueryParser parses ' this AND menu ' query QueryParser.parse(\this\ AND \menu\, contents, new StopAnalyzer()) causes java.lang.ArrayIndexOutOfBoundsException: -1 0 to be thrown. Top of the stack java.util.Vector.elementAt(int) org.apache.lucene.queryParser.QueryParser.addClause(java.util.Vector, int, int, org.apache.lucene.search.Query -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
DO NOT REPLY [Bug 6469] - Exception parsing ' this AND menu '
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6469 Exception parsing ' this AND menu ' --- Additional Comments From [EMAIL PROTECTED] 2002-02-14 16:51 --- Happens in 1.2rc2 and 1.2rc3 -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Re : How does Lucene handle phrases containing words that are not indexed?
From: Halácsy Péter [mailto:[EMAIL PROTECTED]] I'd like to index documents that are described by keywords. One document can have zero or more keywords and a keyword can be related to one ore more documents. Assume two keywords: human computer interaction computer science If I add these keywords to a documents in a field and one search with query human science the document'll be found, won't it? I could use - say - 16 distinct fields for the max 16 keywords and translate the query keyword:human science to keyword1:human science or keyword2:human science ... keyword16:human science but this solution isn't prefered by me. This sounds like a good case for an untokenized field. When you index, use something like: Document doc = new Document(); doc.add(Field.keyword(keyword, computer science)); doc.add(Field.keyword(keyword, human computer interaction)); ... indexReader.add(doc); Then you can either add query keywords manually: BooleanQuery query = (BooleanQuery)queryParser.parse(other terms, analyzer); query.add(new TermQuery(new Term(keyword, computer science)), true, false); or you can integrate this with the query parser by making an analyzer that constructs terms for the field named keyword using exactly the provided input: public class MyAnalyzer extends Analyzer { private Analyzer standard = new StandardAnalyzer(); public TokenStream tokenStream(String field, final Reader reader) { if (keyword.equals(field)) { return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } else { return standard.tokenStream(field, reader); } } } Analyzer analyzer = new MyAnalyzer(); Query query = queryParser.parse(keyword:\computer science\, analyzer); I haven't tested the above code, but I hope you get the idea. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Re : How does Lucene handle phrases containing words that are not indexed?
From: Julien Nioche [mailto:[EMAIL PROTECTED]] By the way, I was wondering if there is any Analyzer that uses the following constructor public Token(String text, int start, int end, String typ) ? StandardTokenizer uses Token's type field to communicate with StandardFilter, which does some post-processing. Maybe it could be interesting to build an analyzer that recognizes punctuation marks and keeps it in the index as Tokens with a given Type (say for example punctuation) ? Unfortunately token type is not stored in the index. Adding it could have a big impact on index size and search performance. The advantage is that information could be used by a SloppyPhraseScorer.phraseFreq() method to avoid PhraseQuery containing a punctuation mark. Since PhraseQueries are used for compound words (e.g. personal computer) with a given slop value (say 3), it could be great not to match things such as It is not personal. My computer hates me... . On the other hand, you'd miss things like, He needs a new computer. Personal computing has advanced since 1970. Still, constraining matches to be within a sentence can be useful, but Lucene does not currently support it, and I don't see an easy way to add it. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Indexes in WAR files
From: Les Hughes [mailto:[EMAIL PROTECTED]] Reading the servlet spec again it says that calls such as servletcontext.getRealPath() will *possibly* return null if the content is being served from a war as opposed the physical path on disk - I'm informed that weblogic actually returns the name of the warfile and not the exploded location. But you're right, Tomcat works differently. What kind of URL does weblogic return for servletContext.getResource(//index/segments)? Is it a file: URL? Keeping the index in files and using FSDirectory will be much more efficient. If all the major servlet containers support this it would be a shame not to take advantage of it. You might look at the result of getResource and use an FSDirectory if a file: url is returned, and do something else when it's not. So in order to isolate from different interpretations of the spec, I'm going to knock up a WARDirectory that probably will wrap a RAMDirectory (going back to the servlet container to getResourceAsStream seems awfully expensive to me) as a first go. I'll post my efforts in a couple of days. One technique you might consider is, when the index is not available as a file, use getResourceAsStream to copy it to a temporary directory in System.getProperty(java.io.tmpdir), then use FSDirectory. Storing the whole index in a RAMDirectory will make searches really fast, but could also chew up a lot of memory. If the index isn't that big anyway, maybe this isn't an issue. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Patch for IndexReader
A few days ago I posted a patch to add to IndexReader the ability to check if an index is locked by passing a string or file object as well as a directory. I added this so that I could have a cached index reader that checked if an index was not locked, but modified before reloading it - part of sharing the index between all users in my webapp. This is a mod from the jhtml example and works well but to keep it clean I modified IndexReader.isLocked(name) to take the name of the index not the directory object of it. Is the patch needed? And if so will the patch make it into 1.2 to save me patching my local copy of lucene. Rgds CB My cached index code. IndexReader getReader(String name) throws Exception { CachedIndex index = // look in cache (CachedIndex) indexCache.get(name); try { if (index != null // check up-to-date (index.modified == IndexReader.lastModified(name))) return index.reader; // cache hit else { if (IndexReader.isLocked(name)) return index.reader; // cache hit, modified but locked else { index = new CachedIndex(name); // cache miss , get new } } } catch (Exception e) { //System.out.println( caught a + e.getClass() + \n with message: + e.getMessage()); e.printStackTrace(); return null; } -Original Message- From: Britton, Colin Sent: Friday, February 08, 2002 1:43 PM To: [EMAIL PROTECTED] Subject: Patch for IndexReader Here is a patch for IndexReader.isLocked() to support file and string in the same way as IndexReader.indexExists() It is in the body and as an attachment. Rgds CB Index: IndexReader.java === RCS file: /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/inde x/IndexRea der.java,v retrieving revision 1.6 diff -u -r1.6 IndexReader.java --- IndexReader.java 21 Jan 2002 17:07:23 - 1.6 +++ IndexReader.java 8 Feb 2002 18:40:03 - @@ -269,7 +269,28 @@ */ abstract public void close() throws IOException; - /** + /** + * Returns codetrue/code iff the index in the named directory is + * currently locked. + * @param String the directory to check for a lock + * @throws IOException if there is a problem with accessing the index + */ + public static boolean isLocked(String directory) throws IOException { +return (new File(directory, write.lock)).exists(); + } + + /** + * Returns codetrue/code iff the index in the named directory is + * currently locked. + * @param File the directory to check for a lock + * @throws IOException if there is a problem with accessing the index + */ + public static boolean isLocked(File directory) throws IOException { +return (new File(directory, write.lock)).exists(); + } + + + /** * Returns codetrue/code iff the index in the named directory is * currently locked. * @param directory the directory to check for a lock *CVS exited normally with code 1* -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Searching multiple fields in one Index of Documents
Folks, What to do you think about including this class in org.apache.lucene.queryParser? Let me know, and if you approve I can commit it. Thanks, Otis --- Kelvin Tan [EMAIL PROTECTED] wrote: Peter, As advised, re-released under APL. :) There were some changes to QueryParser constructors in rc3, and these are reflected here as well. FWIW, I've also attached a javascript lib and accompanying HTML which constructs a Lucene multi-field query using a HTML form. Regards, Kelvin - Original Message - From: Peter Carlson [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, February 13, 2002 10:56 PM Subject: Re: Searching multiple fields in one Index of Documents This is great Kelvin, Sorry I didn't see it before. I'll add it to the list of contributions. --Peter On 2/13/02 12:43 AM, Kelvin Tan [EMAIL PROTECTED] wrote: Charles, See http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00176.html Regards, K - Original Message - From: Charles Harvey [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 12, 2002 8:39 AM Subject: Searching multiple fields in one Index of Documents I have a working installation of Lucene running against indexes created by a database query. Each Document in the Index contains fifteen or twenty fields. I am currently searching only one field (that contains concatenated database columns) because I cannot figure out how to search multiple fields. So: How can I use Lucene to search more than one field in an Index of Documents? eg: field CATEGORY is(or contains) 'bar' AND field BODY contains 'foo' _ The trouble with the rat-race is that even if you win you're still a rat. --Lily Tomlin _ Charles Harvey Developer http://www.philly.com Wk: 215 789 6057 Cell: 215 588 0851 -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] ATTACHMENT part 2 application/octet-stream name=MultiFieldQueryParser.java ATTACHMENT part 3 application/octet-stream name=luceneQueryConstructor.js -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]