Re: index and search question
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote: Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text(file1, (Reader) new InputStreamReader(is))); doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? Just as you wrote. If you use the QueryParser, you can search with file1:Word file2:Word2 or e.g. +file1:Word +file2:Word2etc. Or you can build a boolean query programmatically (if I understood your question). incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: phrase query not working in boolean clause
On Wed, Jun 09, 2004 at 01:41:55PM -0400, Erik Hatcher wrote: On Jun 9, 2004, at 12:25 PM, Michael Duval wrote: When doing an exact phrase query on the title the expected results are returned: +(title:Mass Asymmetry) after tokenizing/filtering: +title:mass asymmetri returns 20 Hits example hit: Mass asymmetry, equation of state, and nuclear multifragmentation When attempting to confine the results to a particular journal the query used is: +(journal:L) +(title:Mass Asymmetry) after t/f : +journal:L +title:mass asymmetri returns 315 Hits!! example hit 1: Towards dynamical mass calculations example hit 2: Up down-asymmetric gravitational fields of spinning masses It would seem that the search engine is treating +title:mass asymmetri as +title:mass asymmetri However, this behavior is only apparent on composite queries as shown previously. For a sanity check I built the query using both the standard query parser and the lucene search api (TermQuery, PhraseQuery, BooleanQuery). The results were the same both ways. Is this a well known limitation of the lucene search engine? Is there a different means of obtaining the desired results? Could you work up a JUnit test case example indexing a couple of documents like this into a RAMDirectory and a testXXX method that shows the failure? I cannot really make sense of what you have going on with the textual queries and obviously some stemming going on to. Show us the code. :) Erik This was fixed cca a 2 months ago by Mr. Goller. You have to upgrade, if you can. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: need info for database based Lucene but not flat file
On Tue, Apr 27, 2004 at 09:15:05AM -0700, Doug Cutting wrote: Yukun Song wrote: As known, currently Lucene uses flat file to store information for indexing. Any people has idea or resources for combining database (Like MySQL or PostreSQL) and Lucene instead of current flat index file formats? A few folks have implemented an SQL-based Lucene Directory, but none has yet been contributed to Lucene. Hopefully one will be soon. For some discussion of this, see messages on SQLDirectory in the mail archives: http://nagoya.apache.org/eyebrowse/SearchList?listId=listName=lucene-user%40jakarta.apache.orgsearchText=SQLDirectorydefaultField=subjectSearch=Search Doug Could anybody summarize what would be the technical pros/cons of a DB-based directory over the flat files? (What I see at the moment is that for some - significant? - perfomence penalty you'll get an index available over the network for multiple lucene engines -- if I'm right.) incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: need info for database based Lucene but not flat file
On Tue, Apr 27, 2004 at 02:46:22PM -0700, Doug Cutting wrote: Incze Lajos wrote: Could anybody summarize what would be the technical pros/cons of a DB-based directory over the flat files? (What I see at the moment is that for some - significant? - perfomence penalty you'll get an index available over the network for multiple lucene engines -- if I'm right.) http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1344168 Doug Thanks. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: starts with query functionality
On Fri, Apr 02, 2004 at 10:20:54AM -0600, Chad Small wrote: We have a requirement to return documents with a title field that starts with a certain letter. Is there a way to do something like this? We're using the StandardAnalyzer Example title fields: This is the title of a document. And this is a title of a different document. This query doesn't fulfill the requirement: +(t*) - just want to return the 1st document that starts with This, and not the 2nd article that has this as the 2nd word. Or is it just a matter of creating a field in the index called title_starts_with that would look like this for the example: T A Now, the query +(t) would only get a hit on the 1st document. Or is there a better way? thanks, chad. Basically that's a good solution, but you'd better to make an instance of your titles as an indexed and untokenized field, too, to prevent the tokenizer split your titles into tokens. This way, you have a start as long as you want. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query syntax on Keyword field question
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote: Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a KeywordAnalyzer as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the KeywordAnalyzer even though it might be trivial :) Any hints, code, or messages to look at? Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer, as I have a very specia analyzer (different needs for different field catgories) and this is used in that (the code is far from the phase of any kind of optimization, but you can see the logic): --- package hu.emnl.lucene.analyzer; import java.io.IOException; import java.io.Reader; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; public class NotTokenizingTokenizer extends Tokenizer { public NotTokenizingTokenizer() { super(); } public NotTokenizingTokenizer(Reader input) { super(input); } public Token next() throws IOException { Token t = null; int c = input.read(); if (c = 0) { StringBuffer sb = new StringBuffer(); do { sb.append((char) c); c = input.read(); } while (c = 0); t = new Token(new String(sb), 0, sb.length()); } return t; } } --- incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DocumentWriter, StopFilter should use HashMap... (patch)
This would no longer compile with the change Kevin proposes. To make things back-compatible we must: 1. Keep but deprectate StopFilter(Hashtable) constructor; 2. Keep but deprecate StopFilter.makeStopTable(String[]); 3. Add a new constructor: StopFilter(HashMap); If you'd use StopFilter(Map), then it'd be back compatible to users using HasTable in their constructor. I'm not sure in olde Java versions but 1.4 java Hasstable implements Map. (And OTOH why HashMap and not Map?) 4. Add a new method: StopFilter.makeStopMap(String[]); Does that make sense? Doug incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser and escaped characters
On Tue, Jan 27, 2004 at 01:00:11PM -0800, [EMAIL PROTECTED] wrote: I'm constructing a query using queryparser as follows: Query query = QueryParser.parse(ariadne\-1, default, new StandardAnalyzer()); when I print out query.toString(), i get: default:ariadne 1 I'm not sure why my escape of - is not working? --David Goodstein It is working. Without escaping you would specify an ariadne BUT 1 query. It is the StandardAnalyzer which drops the '-'. Anyway, if you've indexed with the StandardAnalyzer, you'll hardly find dashes in your text. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser and escaped characters
[...] so if i *don't* escape the -, the standard analyzer *doesn't* split at the dash..isn't that opposite the expected behavior? --David Yes, it is. If you study the standard tokenizer grammar, the dash is allowed inthe NUM, ALPHANUM token type and your is an ALPHANUM. So, it's OK. When you escape the dash from the QueryParser, you insert a character which breaks the token for the analyzer. If you want to use this analyzer to have the token types and patterns it provides (and you badly need the dash in alphanums) don't use the query parser, build the queries by the APIs. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene maverick
On Mon, Sep 15, 2003 at 12:03:38AM +1000, Mark Brand wrote: hi just wondering if anyone has used lucene with maverick. i am just about to kick of a project using these two technologies and wanted to get some feedback. thanks mark Yes, I used it in a quick dirty app. As a matter of fact, these are totally orthogonal technologies, you can use lucene, woth struts, webwork, maverick, jsf. I don't really understand what is the point in the question. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Modify the StandardAnalyzer
On Fri, Sep 05, 2003 at 10:23:48PM +, Clas Rydergren wrote: Hi, I have been experimenting with Lucene for a few hours, and now I'm looking for a solution to this: When using the SimpleAnalyzer for indexing text, data like www.hotmail.com seem to be indexed as www, hotmail and com which mean that a search for hotmail will return a record. This is the behavior I am looking for! However, since SimpleAnalyzer do not index numbers by default, I would like to use the StandardAnalyzer. But, Standardanalyzer do not split the input stream at .. Ideally I should propably make my own analyser, but that seems to be a bit complicated to me :(. Which is the simplest possible modification that I need to make to the Lucene source to make the StandardAnalyzer split, for example web-addresses, at . into separately indexed words? Can this be made by modifications to the StandardTokenizer.jj? How? What is the easiest way of getting such modification into the compiled Lucene? Is there a need for recompiling everything? Appreciate all help! regards clas You can stack up the two analyzers, first run the simple then the standard on the poutput. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Compile lucene
On Fri, Jan 10, 2003 at 11:00:21AM -0800, Oshima, Scott wrote: Anyone can send me a link to the lucene mailing list email archives? these emails build up fast and i can't store them locally, but too valuable to delete. thanks. I use http://www.mail-archive.com. Enter lucene into the Find list text entry and you'll get two lucene-dev's and a lucene-user. incze -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Can someone please email me a copy of LARM?
On Fri, Dec 27, 2002 at 01:00:45PM +1100, TJ Tee wrote: My firewall does not allow me to download via CVS program. Can someone please email me the latest LARM package? Thank you. Use cvsgrab from sourceforge.net. Very handy. incze -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: AW: Lucene and XML
On Tue, Nov 05, 2002 at 03:19:29PM +0100, Richly, Gerhard wrote: thanks, i fixed it already. Sorry for this beginner question. The beginner is not an issue, just the topic was not lucene. incze -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org