Re: index and search question

2004-06-20 Thread Incze Lajos
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote:
 Let's say I index documents using this
 
  Document doc = new Document();
  doc.add(Field.Text(file1, (Reader) new InputStreamReader(is)));
  doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2)));
 
 And want to do a search like this
 
 file1:Word file2:Word2
 
 Basically doing a search using mutiple segments, file1 and file2 in the 
 same query, how would this be possible?

Just as you wrote. If you use the QueryParser, you can search with

file1:Word file2:Word2  or e.g.
+file1:Word +file2:Word2etc.

Or you can build a boolean query programmatically (if I understood
your question).

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: phrase query not working in boolean clause

2004-06-09 Thread Incze Lajos
On Wed, Jun 09, 2004 at 01:41:55PM -0400, Erik Hatcher wrote:
 On Jun 9, 2004, at 12:25 PM, Michael Duval wrote:
 When doing an exact phrase query on the title the expected results are 
 returned:
 
+(title:Mass Asymmetry)
   after tokenizing/filtering:  +title:mass asymmetri
returns 20 Hits
example hit: Mass asymmetry, equation of state, and nuclear 
 multifragmentation
 
 When attempting to confine the results to a particular journal the 
 query used is:
+(journal:L) +(title:Mass Asymmetry)
   after t/f :  +journal:L +title:mass asymmetri
 
returns 315 Hits!!
example hit 1:  Towards dynamical mass calculations
example hit 2:  Up down-asymmetric gravitational fields of spinning 
 masses
 
 It would seem that the search engine is treating
+title:mass asymmetri  as +title:mass asymmetri
 
 However, this behavior is only apparent on composite queries as shown 
 previously.
 
 For a sanity check I built the query using both the standard query 
 parser and
 the lucene search api (TermQuery, PhraseQuery, BooleanQuery).   The 
 results
 were the same both ways.
 
 Is this a well known limitation of the lucene search engine?  Is there 
 a different
 means of obtaining the desired results?
 
 Could you work up a JUnit test case example indexing a couple of 
 documents like this into a RAMDirectory and a testXXX method that shows 
 the failure?
 
 I cannot really make sense of what you have going on with the textual 
 queries and obviously some stemming going on to.  Show us the code. :)
 
   Erik

This was fixed cca a 2 months ago by Mr. Goller. You have to upgrade,
if you can.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: need info for database based Lucene but not flat file

2004-04-27 Thread Incze Lajos
On Tue, Apr 27, 2004 at 09:15:05AM -0700, Doug Cutting wrote:
 Yukun Song wrote:
 As known, currently Lucene uses flat file to store information for
 indexing. 
 
 Any people has idea or resources for combining database (Like MySQL or
 PostreSQL) and Lucene instead of current flat index file formats?
 
 A few folks have implemented an SQL-based Lucene Directory, but none has 
 yet been contributed to Lucene.  Hopefully one will be soon.
 
 For some discussion of this, see messages on SQLDirectory in the mail 
 archives:
 
 http://nagoya.apache.org/eyebrowse/SearchList?listId=listName=lucene-user%40jakarta.apache.orgsearchText=SQLDirectorydefaultField=subjectSearch=Search
 
 Doug

Could anybody summarize what would be the technical pros/cons of a DB-based
directory over the flat files? (What I see at the moment is that for some
- significant? - perfomence penalty you'll get an index available over the
network for multiple lucene engines -- if I'm right.)

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: need info for database based Lucene but not flat file

2004-04-27 Thread Incze Lajos
On Tue, Apr 27, 2004 at 02:46:22PM -0700, Doug Cutting wrote:
 Incze Lajos wrote:
 Could anybody summarize what would be the technical pros/cons of a DB-based
 directory over the flat files? (What I see at the moment is that for some
 - significant? - perfomence penalty you'll get an index available over the
 network for multiple lucene engines -- if I'm right.)
 
 http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1344168
 
 Doug

Thanks.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: starts with query functionality

2004-04-02 Thread Incze Lajos
On Fri, Apr 02, 2004 at 10:20:54AM -0600, Chad Small wrote:
 We have a requirement to return documents with a title field that starts with a 
 certain letter.  Is there a way to do something like this?  We're using the 
 StandardAnalyzer
  
 Example title fields:
  
 This is the title of a document.
 And this is a title of a different document.
  
 This query doesn't fulfill the requirement:
 +(t*)  - just want to return the 1st document that starts with This, and not the 
 2nd article that has this as the 2nd word.
  
 Or is it just a matter of creating a field in the index called title_starts_with 
 that would look like this for the example:
  
 T
 A
  
 Now, the query +(t) would only get a hit on the 1st document.
 Or is there a better way?
  
 thanks,
 chad.

Basically that's a good solution, but you'd better to
make an instance of your titles as an indexed and untokenized
field, too,  to prevent the tokenizer split your titles
into tokens. This way, you have a start as long as you want.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query syntax on Keyword field question

2004-03-24 Thread Incze Lajos
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote:
 Thanks-you Erik and Incze.  I now understand the issue
 and I'm trying to create a KeywordAnalyzer as suggested
 from you book excerpt, Erik:
  
 http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727
  
 However, not being all that familiar with the Analyzer framework,
 I'm not sure how to implement the KeywordAnalyzer even though
 it might be trivial :)  Any hints, code, or messages to look at?
  

Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer,
as I have a very specia analyzer (different needs for different
field catgories) and this is used in that (the code is far from the
phase of any kind of optimization, but you can see the logic):

---
package hu.emnl.lucene.analyzer;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.Tokenizer;

public class NotTokenizingTokenizer extends Tokenizer {

public NotTokenizingTokenizer() {
super();
}

public NotTokenizingTokenizer(Reader input) {
super(input);
}

public Token next() throws IOException {
Token t = null;
int c = input.read();
if (c = 0) {
StringBuffer sb = new StringBuffer();  
do {
sb.append((char) c);
c = input.read();
} while (c = 0);
t = new Token(new String(sb), 0, sb.length());
}
return t;
}
}
---

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-09 Thread Incze Lajos

 This would no longer compile with the change Kevin proposes.
 
 To make things back-compatible we must:
 
 1. Keep but deprectate StopFilter(Hashtable) constructor;
 2. Keep but deprecate StopFilter.makeStopTable(String[]);
 3. Add a new constructor: StopFilter(HashMap);

If you'd use StopFilter(Map), then it'd be back compatible
to users using HasTable in their constructor. I'm not sure
in olde Java versions but 1.4 java Hasstable implements
Map. (And OTOH why HashMap and not Map?)

 4. Add a new method: StopFilter.makeStopMap(String[]);
 
 Does that make sense?
 
 Doug


incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: QueryParser and escaped characters

2004-01-27 Thread Incze Lajos
On Tue, Jan 27, 2004 at 01:00:11PM -0800, [EMAIL PROTECTED] wrote:
 I'm constructing a query using queryparser as follows:
 
 Query query = QueryParser.parse(ariadne\-1,
   default, new
 StandardAnalyzer());
 
 
 when I print out query.toString(), i get:
 
 default:ariadne 1
 
 I'm not sure why my escape of - is not working?
 
 --David Goodstein
 
It is working. Without escaping you would specify an ariadne BUT 1
query. It is the StandardAnalyzer which drops the '-'. Anyway,
if you've indexed with the StandardAnalyzer, you'll hardly find dashes
in your text.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: QueryParser and escaped characters

2004-01-27 Thread Incze Lajos
[...]
 so if i *don't* escape the -, the standard
 analyzer *doesn't* split at the dash..isn't that
 opposite the expected behavior?
 
 --David

Yes, it is. If you study the standard tokenizer
grammar, the dash is allowed inthe NUM, ALPHANUM
token type and your is an ALPHANUM. So, it's OK.

When you escape the dash from the QueryParser, you
insert a character which breaks the token for
the analyzer. If you want to use this analyzer
to have the token types and patterns it provides
(and you badly need the dash in alphanums) don't
use the query parser, build the queries by the APIs.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene maverick

2003-09-14 Thread Incze Lajos
On Mon, Sep 15, 2003 at 12:03:38AM +1000, Mark Brand wrote:
 hi
 
 just wondering if anyone has used lucene with maverick. i am just about 
 to kick of a project using these two technologies and wanted to get some 
 feedback.
 
 thanks
 mark
 

Yes, I used it in a quick  dirty app. As a matter of fact, these are
totally orthogonal technologies, you can use lucene, woth struts, webwork,
maverick, jsf. I don't really understand what is the point in the question.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Modify the StandardAnalyzer

2003-09-05 Thread Incze Lajos
On Fri, Sep 05, 2003 at 10:23:48PM +, Clas Rydergren wrote:
 Hi,
 
 I have been experimenting with Lucene for a few hours, and now I'm looking 
 for a solution to this:
 
 When using the SimpleAnalyzer for indexing text, data like www.hotmail.com 
 seem to be indexed as www, hotmail and com which mean that a search for 
 hotmail will return a record. This is the behavior I am looking for! 
 However, since SimpleAnalyzer do not index numbers by default, I would like 
 to use the StandardAnalyzer. But, Standardanalyzer do not split the input 
 stream at ..
 
 Ideally I should propably make my own analyser, but that seems to be a bit 
 complicated to me :(. Which is the simplest possible modification that I 
 need to make to the Lucene source to make the StandardAnalyzer split, for 
 example web-addresses, at . into separately indexed words?
 
 Can this be made by modifications to the StandardTokenizer.jj? How? What is 
 the easiest way of getting such modification into the compiled Lucene? Is 
 there a need for recompiling everything?
 
 Appreciate all help!
 
 regards
 clas

You can stack up the two analyzers, first run the simple then the standard
on the poutput.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Compile lucene

2003-01-10 Thread Incze Lajos
On Fri, Jan 10, 2003 at 11:00:21AM -0800, Oshima, Scott wrote:
 Anyone can send me a link to the lucene mailing list email archives?  these emails 
build up fast and i can't store them locally, but too valuable to delete.  thanks.

I use http://www.mail-archive.com.

Enter lucene into the Find list text entry and you'll get two
lucene-dev's and a lucene-user.


incze

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Can someone please email me a copy of LARM?

2002-12-27 Thread Incze Lajos
On Fri, Dec 27, 2002 at 01:00:45PM +1100, TJ Tee wrote:
 My firewall does not allow me to download via CVS program. Can someone
 please email me the latest LARM package? Thank you.

Use cvsgrab from sourceforge.net. Very handy.

incze

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: AW: Lucene and XML

2002-11-05 Thread Incze Lajos
On Tue, Nov 05, 2002 at 03:19:29PM +0100, Richly, Gerhard wrote:
 thanks, i fixed it already.
 
 Sorry for this beginner question.
 

The beginner is not an issue, just the topic was not lucene.

incze

--
To unsubscribe, e-mail:   mailto:lucene-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org