multiple collections indexing

2003-03-19 Thread Morus Walter
Hi, we are currently evaluating lucene. The data we'd like to index consists of ~ 80 collections of documents (a few hundred up to 20 documents per collection, ~ 1.5 million documents total; medium document size is in the order of 1 kB). Searches must be able on any combination of

Re: multiple collections indexing

2003-03-20 Thread Morus Walter
Hi, thanks for all your answers, I think I collect some of the hints and ideas rather than commenting all of them apart. Doug Cutting writes: Morus Walter wrote: Searches must be able on any combination of collections. A typical search includes ~ 40 collections. Now the question

Re: multiple collections indexing

2003-03-20 Thread Morus Walter
Tatu Saloranta writes: On Wednesday 19 March 2003 01:44, Morus Walter wrote: This might still be a feasible thing to do, except if number of collections changes very frequently (as you need to reindex all docs, not just incremental). Well the number is slowly growing. Another

Re: multiple collections indexing

2003-03-21 Thread Morus Walter
Hi, Are lots of different combinations of collections used frequently? Probably not. If only a handful of different subsets of collections are frequently searched, then QueryFilter could be very useful. I did some test and thought the results might be interesting for others also. I ran

Re: FW: Full French Analyser ?

2003-03-24 Thread Morus Walter
1- Add a stem(String) method and delegate to it the calling of snowball frenchStemmer as : public String stem(String term) { stemmer.setCurrent(term); if (stemmer.stem()) return stemmer.getCurrent(); else return term; } 2- Add to

Re: FW: Full French Analyser ?

2003-03-24 Thread Morus Walter
=?iso-8859-1?Q?Ren=E9_Ferr=E9ro?= writes: --- Morus Walter [EMAIL PROTECTED] a écrit : Hmm. Isn't org.apache.lucene.analysis.snowball.SnowballFilter already what you described? Thanks for your information. Where can I get this source code ? I don't find it in the lucene

Wildcard Queries

2003-03-25 Thread Morus Walter
Hi, is it intentional that '?' matches exactly one character within wildcard terms but one or zero characters at the end of wildcard terms? That is: r?? matches r ra rab ... whereas r?b matches rab rbb ... and not rb The AFAIK common definition of '*' and '?' (e.g. in unix glob pattern) is to

RE: Very large queries?

2003-03-27 Thread Morus Walter
[EMAIL PROTECTED] writes: Viewed as an information retrieval problem (not the best way to view it, but this is just an initial approach), one could then (1) create a taxonomy of drugs and a taxonomy of conditions, and (2) implement a concept-oriented (taxonomy-oriented) search of the corpus

RE: Indexing Growth

2003-04-02 Thread Morus Walter
Rob Outar writes: After running first query to get all attributes from all files in the given directory, there were 17 files, each file has 5 attributes so 85 queries were ran: can you post the java code used for querying? Actually I don't understand, why you have to use one query for each

Lucene Index on NFS Server

2003-07-30 Thread Morus Walter
Hi, I'm currently planing a web application using lucene for search. There will be two web server maschines responable for the application and the searches. Two maschines basically to be failsafe, load is not expected to be a problem initially, though this might change over time. So scaling is a

Re: Lucene Index on NFS Server

2003-07-31 Thread Morus Walter
Hi Jan, thanks for your answer. What part of the webserver are you expecting that will fail? The service or the computer? Why would the computer hosting NFS be less likely to fail than your computer hosting the webserver? The computer. Of course you're right with the nfs server. That's one

Re: Lucene Index on NFS Server

2003-08-01 Thread Morus Walter
Doug Cutting writes: Can I have a lucene index on a NFS filesystem without problems (access is readonly)? So long as all access is read-only, there should not be a problem. Keep in mind however that lock files are known to not work correctly over NFS. Hmm. Sorry, I was a bit unprecise

query parser operator precedence and strange result

2003-08-14 Thread Morus Walter
Hi, im currently trying to understand how the standard query parser handles operator precedence in a query like a OR b AND c OR d This is output by the toString method as a +b +c d so AND seems to have higher precedence than OR Now if I try to check this and look at a OR ( b AND c ) OR d I see a

RE: Exact Match

2003-10-23 Thread Morus Walter
Wilton, Reece writes: If I use an untokenized field, would fox match this as well? I need to support both exact match searches and searches where one word exists in the field. couldn't you use some start/end word (that never occurs in your texts) as anchors? That is index 'XXX brown fox

Re: Rephrase My Question - How To Search Database With More Than One Pair of Property/Value as Parameters Using Lucene?

2003-11-06 Thread Morus Walter
Caroline Jen writes: I have a sample program that takes care of the search based on one single pair of property and value in the database. For example, visitors of the web site can retrieve all articles written by Elizabeth Castro. creator is the property and Elizabeth Castro is the value

Re: Reopen IndexWriter after delete?

2003-11-12 Thread Morus Walter
Otis Gospodnetic writes: No, it is not safe. You should close the IndexWriter, then delete the document and close IndexReader, and then get a new IndexWriter and continue writing. IIRC lucene takes care that you do so. Locking prevents you from having an open IndexWriter and modify the

Re: FSDIrectory.create doesn't tolerate subdirectories

2003-12-08 Thread Morus Walter
Erik Hatcher writes: On Sunday, December 7, 2003, at 08:21 PM, Esmond Pitt wrote: I'm not clear whether this is a 'yes' or a 'no'. I think other committers would need to weigh in on it. I'm fine with making a change to check isDirectory as well and not deleting them since Lucene

Query Parser AND / OR

2003-12-09 Thread Morus Walter
Hi, I'm having problems understanding query parsers handling of AND and OR if there's more than one operator. E.g. a OR b AND c gives the same number of hits as b AND c (only scores are different) and a AND b OR c AND d seems to be equivalent to a AND b AND C AND d which doesn't seem logical

Re: Query Parser AND / OR

2003-12-10 Thread Morus Walter
Hi Dror, thanks for your answer. I'm having problems understanding query parsers handling of AND and OR if there's more than one operator. E.g. a OR b AND c gives the same number of hits as b AND c (only scores are different) This would make sense if all the document that

Re: Lock obtain timed out

2003-12-16 Thread Morus Walter
Hohwiller, Joerg writes: Am I safe disabling the locking??? No. Can anybody tell me where to get documentation about the Locking strategy (I still would like to know why I have that problem) ??? I guess -- but given your input I really have to guess; the source you wanted to attach didn't

Re: best way of reusing IndexSearcher objects

2003-12-19 Thread Morus Walter
Doug Cutting writes: Dror Matalon wrote: There are two issues: 1. Having new searches start using the new index only when it's ready, not in a half baked state, which means that you have to synchronize the switch from the old index to the new one. That's true. If you're doing updates

RE: Query Parser AND / OR

2003-12-28 Thread Morus Walter
Jamie Stallwood wrote: What Morus is saying is right, an expression without parenthesis, when interpreted, assumes terms on either side of an AND clause are compulsory terms, and any terms on either side of an OR clause are optional. However, if you combine AND and OR in an expression, the

RE: Query Parser AND / OR

2003-12-28 Thread Morus Walter
Morus Walter writes: I attached the patch (made against 1.3rc3 but working for 1.3final as well) and a test program. Seems the attachments got stripped... So once again: The patch: ===File lucene/QueryParser.jj.patch=== *** QueryParser.jj.org Mon Dec 22 11:47:30 2003

Re: Query Parser AND / OR

2003-12-30 Thread Morus Walter
Dror Matalon writes: my $.02. Before having patches, I think it's a good idea to agree on what the right solution is. I tried to raise that question in the first place. But there wasn't much responce. So I decided to make a concrete suggestion, how to change things. Most of it is obvious

Re: Query Parser AND / OR

2003-12-30 Thread Morus Walter
Hi Dror, thanks for your answer. I really appreciate your comments. Before having patches, I think it's a good idea to agree on what the right solution is. I tried to raise that question in the first place. But there wasn't much responce. Might be the time of the year when many

Re: Query Parser AND / OR

2003-12-30 Thread Morus Walter
Hi Dror, I was thinking about this issue, and currently I think that the only way to define this type of queries formally, is to give the default operator it's own precedence relativly to the precedence of 'OR' and 'AND'. So there are two possibilities: either the default operator has

Re: Philosophy(??) question

2004-01-13 Thread Morus Walter
Scott Smith writes: I have some documents I'm indexing which have multiple languages in them (i.e., some fields in the document are always English; other fields may be other languages). Now, I understand why a query against a certain field must use the same analyzer as was used when that

Re: Ordening documents

2004-01-17 Thread Morus Walter
Peter Keegan writes: What is the returned order for documents with identical scores? have a look at the source of the lessThan method in org.java.lucene.search.HitQueue: protected final boolean lessThan(Object a, Object b) { ScoreDoc hitA = (ScoreDoc)a; ScoreDoc hitB = (ScoreDoc)b;

RE: Indexing of deep structured XML

2004-01-18 Thread Morus Walter
Goulish, Michael writes: To really preserve the relationships in arbitrarily structured XML, you pretty much need to use a database that directly supports an XML query language like XQuery or XPath. If searching within regions is enough (something e.g. sgrep

QueryParser and stopwords

2004-01-20 Thread Morus Walter
Hi, I'm currently trying to get rid of query parser problems with stopwords (depending on the query, there are ArrayIndexOutOfBoundsExceptions, e.g. for stop AND nonstop where stop is a stopword and nonstop not). While this isn't hard to fix (I'll enter a bug and patch in bugzilla), there's one

Re: Lucene search result no stable

2004-01-21 Thread Morus Walter
Ardor Wei writes: What might be the problem? How to solve it? Any suggestion or idea will be appreciated. The only problem with locking I saw so far is that you have to make sure that the temp dir is the same for all applications. Lucene 1.3 stores it's lock in the directory that is defined

Re: Query Term Questions

2004-01-21 Thread Morus Walter
Erik Hatcher writes: TS==I've not been able to get negative boosting to work at all. Maybe there's a problem with my syntax. If, for example, I do a search with green beret^10, it works just fine. But green beret^-2 gives me a ParseException showing a lexical error. Have you

Re: Query madness with NOTs...

2004-01-23 Thread Morus Walter
Otis Gospodnetic writes: Redirecting to lucene-user --- Jim Hargrave [EMAIL PROTECTED] wrote: Can anyone tell me why these two queries would produce different results: +A -B A -(-B) A and +A are not the same thing when you have multiple terms in a query. Hmm. As far

Re: What is the status of Query Parser AND / OR ?

2004-02-11 Thread Morus Walter
Daniel B. Davis writes: There was a lot of correspondence during December about this. Is there any further resolution? There's a patch and I hope it will find it's way into the lucene sources. see: http://issues.apache.org/bugzilla/show_bug.cgi?id=25820 Seems I missed the mail about Otis

Re: a search like Google

2004-02-12 Thread Morus Walter
Nicolas Maisonneuve writes: hy, i have a index with the fields : title author content i would make the same search type than Google ( a form with a textfiel). When the user search i love lucene (it's not a phrase query but just the text in the textfield ), i would like search in

Re: Date Range support

2004-02-12 Thread Morus Walter
tom wa writes: From: Erik Hatcher On Jan 29, 2004, at 5:08 AM, tom wa wrote: I'm trying to create an index which can also be searched with date ranges. My first attempt using the Lucene date format ran in to trouble after my index grew and I couldn't search over more than a few

Re: Limiting hit count

2004-02-13 Thread Morus Walter
[EMAIL PROTECTED] writes: On Friday 13 February 2004 12:18, Julien Nioche wrote: If you want to limit the set of Documents you're querying, you should consider using Filter objects and send it to the searcher along with your Query. Hm, hard to find information about Filters...I actually

Re: Re:can't delete from an index using IndexReader.delete()

2004-02-20 Thread Morus Walter
Dhruba Borthakur writes: Hi folks, I am using the latest and greatest Lucene jar file and am facing a problem with deleting documents from the index. Browsing the mail archive, I found that the following email (June 2003) listed the exact problem that I am encountering. In short: I am

open files under linux

2004-02-20 Thread Morus Walter
Rasik Pandey writes: As a side note, regarding the Too many open files issue, has anyone noticed that this could be related to the JVM? For instance, I have a coworker who tried to run a number of optimized indexes in a JVM instance and a received the Too many open files error. With the

Re: Best Practices for indexing in Web application

2004-03-02 Thread Morus Walter
Michael Steiger writes: I am using an IndexSearcher for querying the index but for deletions I need to use the IndexReader. I now know that I can have Readers and a Writer open concurrently but IndexReader.delete can only be used if no Writer is open. You should be aware that an

Re: java.io.tmpdir as lock dir .... once again

2004-03-02 Thread Morus Walter
Otis Gospodnetic writes: This looks nice. However, what happens if you have two Java processes that work on the same index, and give it different lock directories? They'll mess up the index. Is that different to having two java processes using different java.io.tempdir? I had that problem

Re: Problem with search results

2004-03-02 Thread Morus Walter
Otis Gospodnetic writes: And if you do not use QueryParser, then things work? If so, then this is likely caused by the fact that your Term contains a 'special' character, '-'. Actually I was going to suggest a fix for '-' within words in the query parser. The was a suggested fix, that

Re: Best Practices for indexing in Web application

2004-03-03 Thread Morus Walter
Michael Steiger writes: Depends on your application, but if you can, it's better to keep IndexSearcher open until the index changes. Otherwise you will have to open all the index files for each search. Good tip. So I have to synchronize (logically) my search routine with any updates

Re: Storing numbers

2004-03-05 Thread Morus Walter
[EMAIL PROTECTED] writes: Hi! I want to store numbers (id) in my index: long id = 1069421083284; doc.add(Field.UnStored(in, String.valueOf(id))); But searching for id:1069421083284 doesn't return any hits. If your field is named 'in' you shouldn't search in 'id'.

Re: Problem with search results

2004-03-06 Thread Morus Walter
Doug Cutting writes: Morus Walter wrote: Now I think this can be fixed in the query parser alone by simply allowing '-' within words. That is change #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR ) to #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) As a result, query

RE: Query syntax on Keyword field question

2004-03-23 Thread Morus Walter
Chad Small writes: Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. With this output: Analzying HW-NCI_TOPICS org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS]

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Chad Small writes: I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR:

Re: How to order search results by Field value?

2004-03-25 Thread Morus Walter
Erik Hatcher writes: Why not do the unique sequential number replacement at index time rather than query time? how would you do that? This requires to know the ids that will be added in future. Let's say you start with strings 'a' and 'b'. Later you add a document with 'aa'. How do you know

Re: query

2004-04-22 Thread Morus Walter
Rosen Marinov writes: Short answer: it depends. Questions for you to answer: What field type and analyzer did you use during indexing? What analyzer used with QueryParser? What does the generated Query.toString return? in both cases SimpleAnalyzer QueryParser.parse(\abc\)

RE: multivalue fields

2004-05-15 Thread Morus Walter
Ryan Sonnek writes: using lucene 1.3-final, it appears to only search the first field with that name. here's the code i'm using to construct the index, and I'm using Luke to check that the index is created correctly. Everything looks fine, but my search returns empty. do i have to use a

RE: multivalue fields

2004-05-17 Thread Morus Walter
Alex McManus writes: Maybe your fields are too long so that only part of it gets indexed (look at IndexWriter.maxFieldLength). This is interesting, I've had a look at the JavaDoc and I think I understand. The maximum field length describes the maximum number of unique terms, not the

Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses

2004-05-19 Thread Morus Walter
Claude Devarenne writes: Hi, I have over 60,000 documents in my index which is slightly over a 1 GB in size. The documents range from the late seventies up to now. I have indexed dates as a keyword field using a string because the dates are in MMDD format. When I do range queries

Re: Internal full content store within Lucene

2004-05-19 Thread Morus Walter
Kevin Burton writes: How much interest is there for this? I have to do this for work and will certainly take the extra effort into making this a standard Lucene feature. Sounds interesting. How would you handle deletions? Morus

Re: Tool for analyzing analyzers

2004-05-28 Thread Morus Walter
Hi Mark, I've had this running OK from the command line and in Eclipse on XP. I suspect it might be because you're running a different OS? The Classfinder tries to split the system property java.class.path on the ; character but I forgot different OSes have different seperators. Let me

Re: ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer

2004-07-15 Thread Morus Walter
Claude Devarenne writes: My question is: should the queryParser catch that there is no term before trying to add a clause when using a StandardAnalyzer? Is this even possible? Should the burden be on the application to either catch the exception or parse the query before handing it

Re: Misbehaving query string

2004-07-20 Thread Morus Walter
Bill Tschumy writes: I would think the following strings passed to the QueryParser should yield the same results: #1: +telescope AND !operate #2: (+telescope) AND (!operate) However the first string seems to give the correct results while the second gives zero hits. Am I

Re: Negative Boost

2004-08-04 Thread Morus Walter
Terry Steichen writes: I can't get negative boosts to work with QueryParser. Is it possible to do so? If you change QueryParser ;-) Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Negative Boost

2004-08-05 Thread Morus Walter
Daniel Naber writes: On Wednesday 04 August 2004 13:19, Terry Steichen wrote: I can't get negative boosts to work with QueryParser. Is it possible to do so? Isn't that the same as using a boost 1, e.g. 0.1? That should be possible. no. a^-1 OR b A boost of -1 means that the score

Re: *term search

2004-09-08 Thread Morus Walter
sergiu gordea writes: Hi all, I want to discuss a little problem, lucene doesn't support *Term like queries. I know that this can bring a lot of results in the memory and therefore it is restricted. That's not the reason for the restriction. That's possible with a* also. The

Re: (n00b) Meaning of Hits.id (int)

2004-09-09 Thread Morus Walter
Peter Pimley writes: My documents are not stored in their original form by lucene, but in a seperate database. My lucene docs do however store the primary key, so that I can fetch the original version from the database to show the user (does that sound sane?) yes. I see that the

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-16 Thread Morus Walter
Hi David, Based on this mail I wrote a ngram speller for Lucene. It runs in 2 phases. First you build a fast lookup index as mentioned above. Then to correct a word you do a query in this index based on the ngrams in the misspelled word. Let's see. [1] Source is attached and I'd

RE: QueryParser.parse() and Lucene1.4.1

2004-09-17 Thread Morus Walter
tokens: http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 (Morus Walter via Otis) This change is unlikely to introduce the behaviour you describe, since it affects '-' within words only, not at start. So there is a change for a-b between 1.3 and 1.4 1.3 gives a -b 1.4 gives a b or one

Re: range and content query

2004-09-20 Thread Morus Walter
Chris Fraschetti writes: can someone assist me in building or deny the possibility of combing a range query and a standard query? say for instance i have two fields i'm searching on... one being the a field with an epoch date associated with the entry, and the content so how can I make

Re: range and content query

2004-09-20 Thread Morus Walter
Chris Fraschetti writes: I've more or less figured out the query string required to get a range of docs.. say date[0 TO 10]assuming my dates are from 1 to 10 (for the sake of this example) ... my query has results that I don't understand. if i do from 0 TO 10, then I only get results

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-20 Thread Morus Walter
David Spencer writes: could you put the current version of your code on that website as a java Weblog entry updated: http://searchmorph.com/weblog/index.php?id=23 thanks Great suggestion and thanks for that idiom - I should know such things by now. To clarify the issue, it's just

Re: Combining Lucene and database functionality

2004-09-22 Thread Morus Walter
Marco Schmidt writes: I'm trying to find out whether Lucene is an option for a project of mine. I have texts which also have a date and a list of numbers associated with each of them. These numbers are ID values which connect the article to certain categories. So a particular article X

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Morus Walter
Ulrich Mayring writes: Hi all, first, here's how to reproduce the problem: Go to http://www.denic.de/en/special/index.jsp and enter obscure service in the search field. You'll get 132 hits. Now enter obscure service* - and you only get 1 hit. The above website is running Lucene

Re: Strange search results with wildcard - Bug?

2004-09-23 Thread Morus Walter
Ulrich Mayring writes: Will do, thank you very much. However, how do I get at the analyzed form of my terms? Instanciate the analyzer, create a token stream feeding your input, loop over the tokens, output the results. Morus

Re: Strange search results with wildcard - Bug?

2004-09-24 Thread Morus Walter
Ulrich Mayring writes: Daniel Naber wrote: AND always refers to the terms on both sides, +/- only refers to the term on the right. So a AND b - +a +b is correct. *slap forehead* - you're right. Wasn't there something about operator precedence way back when ;-) Yes. January. And

Re: How extract a Field.Text(String, String) field to process it with a Stylesheet?

2004-10-15 Thread Morus Walter
Otis Gospodnetic writes: That's likely because you used an Analyzer that stripped the XML (, , etc.) from the original text. If you want to preserve the original text, use an Analyzer that doesn't throw your XML away. You can write your own Analyzer that doesn't discard anything, for

Re: online and offline Directory

2004-09-28 Thread Morus Walter
Ernesto De Santis writes: Hi Aviran Thanks for response. I forgot important information for you understand my issue. My process do some like this: The index have contents from differents sources, identified for a special field 'source'. Then the index have documents with source: S1 or

Re: list of removed stop words

2004-09-29 Thread Morus Walter
Chris Fraschetti writes: Is there a way to via the parser or the query retrieve a list of the stop words removed by the analyzer? or should i just check my query against .STOPWORDS and do it myself? Query parser does not provide that info. Of course you might consider adding this inside query

Re: Seraching in Keyword Field

2004-09-30 Thread Morus Walter
Bernhard Messer wrote Hi, try that query: MyKeywordField:ABC Why should that help? foo:(bla) and foo:bla create the same query: java -classpath lucene-1.4.1/lucene-1.4.1.jar org.apache.lucene.queryParser.QueryParser 'foo:(bla)' foo:bla java -classpath lucene-1.4.1/lucene-1.4.1.jar

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Morus Walter
Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this

Re: different analyzer all produce the same index?

2004-10-04 Thread Morus Walter
sergiu gordea writes: Daan Hoogland wrote: H all, I try to create different indices using different Analyzer-classes. I tried standard, german, russian, and cjk. They all produce exactly the same index file (md5-wise). There are over 280 pages so I expected at least some differences.

Re: WildCardQuery

2004-10-05 Thread Morus Walter
Robinson Raju writes: The way i have done is , if there is a wildcard , Use WildCardQuery , else other. Here searchFields is an array which contains the column names . search string is the value to be searched. if ((searchString.indexOf(IOOSConstants.ASTERISK) -1)

Re: StopWord elimination pls. HELP

2004-10-18 Thread Morus Walter
Miro Max writes: String cont = rs.getString(x); d.add(Field.Text(cont, cont)); writer.addDocument(d); to get results from a database into lucene index. but when i check println(d) i can see the german stopwords too. how can i eliminate this? Stopwords in an analyzer don't make the

RE: QueryParsing

2004-10-19 Thread Morus Walter
Rupinder Singh Mazara writes: hi erik and everyone else ok i will buy the book ;) but this still does not solve the problem of why String x = \jakarta apache\~100; is being transalted as a PhraseQuery FULL_TEXT:jakarta apache~100 is the correct query beining formed ? or is

RE: Null or no analyzer

2004-10-20 Thread Morus Walter
Aviran writes: You can use WhiteSpaceAnalyzer Can he? If Elections 2004 is one token in the subject field (keyword), this will fail, since WhiteSpeceAnalyzer will tokenize that to `Elections' and `2004'. So I guess he has to write an identity analyzer himself unless there is one provided

Re: Null or no analyzer

2004-10-21 Thread Morus Walter
Erik Hatcher writes: however perhaps it should be. Or perhaps there are other options to solve this recurring dilemma folks have with Field.Keyword indexed fields and QueryParser? I think one could introduce a special syntax in query parser for keyword fields. Query parser wouldn't

Re: Locks and Readers and Writers

2004-10-28 Thread Morus Walter
Christoph Kiehl writes: AFAIK you should never open an IndexWriter and an IndexReader at the same time. You should use only one of them at a time but you may open as many IndexSearchers as you like for searching. You cannot open an IndexSearcher without opening an IndexReader (explicitly

Re: Searching for a phrase that contains quote character

2004-10-29 Thread Morus Walter
Daniel Naber writes: On Thursday 28 October 2004 19:03, Justin Swanhart wrote: Have you tried making a term query by hand and testing to see if it works?   Term t = new Term(field, this is a \test\); PhraseQuery pq = new PhraseQuery(t); That's not a proper PharseQuery, it searches

Re: new version of NewMultiFieldQueryParser

2004-10-29 Thread Morus Walter
Bill Janssen writes: Try to see the behavior if you want to have a single term query juat something like: robust .. and print out the query string ... Sure, that works fine. For instance, if you have the three default fields title, authors, and contents, the one-word search robust

Re: Locks and Readers and Writers

2004-11-01 Thread Morus Walter
[EMAIL PROTECTED] writes: Hi Christoph, Thats what I thought. But what I'm seeing is this: - open reader for searching (the reader is opening an index on a remote machine (via UNC) which takes a couple seconds) - meanwhile the other service opens an IndexWriter and adds a document (the

Re: jaspq: dashed numerical values tokenized differently

2004-11-02 Thread Morus Walter
Daniel Taurat writes: Hi, I have just another stupid parser question: There seems to be a special handling of the dash sign - different from Lucene 1.2 at least in Lucene 1.4.RC3 StandardAnalyzer. Examples (1.4RC3): A document containing the string dash-test is matched by the following

Re: A TokenFilter to split words and numbers

2004-11-04 Thread Morus Walter
william.sporrong writes: Does it have something to do with the QueryParser guessing what kind of query it is by examining the string and thus presumes that the first string should not be parsed into a PhraseQuery? QueryParser creates a PhraseQuery for words that are tokenized to more than one

Re: stopword AND validword throws exception

2004-11-10 Thread Morus Walter
Sanyi writes: This query works as expected: validword AND stopword (throws out the stopword part and searches for validword) This query seems to crash: stopword AND validword (java.lang.ArrayIndexOutOfBoundsException: -1) Maybe it can't handle the case if it had to remove the very

Re: stopword AND validword throws exception

2004-11-10 Thread Morus Walter
Sanyi writes: Thanx for your replies guys. Now, I was trying to locate the latest patch for this problem group, and the last thread I've read about this is: http://issues.apache.org/bugzilla/show_bug.cgi?id=25820 It ends with an open question from Morus: If you want me to change the

Re: Phrase search for more than 4 words throws exception in QueryParser

2004-11-11 Thread Morus Walter
Sanyi writes: How to perform phrase searches for more than four words? This works well with 1.4.2: aa bb cc dd I pass the query as a command line parameter on XP: \aa bb cc dd\ QueryParser translates it to: text:aa text:bb text:cc text:dd Runs, searches, finds proper matches. This

Re: Searching and indexing from different processes (applications)

2004-11-16 Thread Morus Walter
K Kim writes: I just started to play around with Lucene. I was wondering if searching and indexing can be done simultaneously from different processes (two different processes.) For example, searching is serviced from a web appliation, while indexing is done periodically from a

Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-18 Thread Morus Walter
Sanyi writes: Enumerating the terms using WildcardTermEnum and an IndexReader seems to be too buggy to use. If there's a bug, it should be tracked down, not worked around... But it looks ok to me: import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.*;

Re: problems search number range

2004-11-18 Thread Morus Walter
[EMAIL PROTECTED] writes: i need to solve this search: number: -10 range: -50 TO 5 i need help.. i dont find anything using google.. If your numbers are in the interval MIN/MAX and MIN0 you can shift that to a positive interval 0 ... (MAX-MIN) by subtracting MIN from each number.

Re: problems search number range

2004-11-18 Thread Morus Walter
[EMAIL PROTECTED] writes: this solution was the first that i tried.. but this does not run correctly.. because: when we try to sort this number in alphanumeric order we obtain that number -0010 is higher than -0001 right. I failed to see that. So you would have to use a complement for

Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-19 Thread Morus Walter
Sanyi writes: If there's a bug, it should be tracked down, not worked around... Sure, but I'm working with 20million records and it takes about 25 hours to re-index, so I'm looking for ways that doesn't require reindexing. why reindex? My code was: WildcardTermEnum wcenum =

Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Kauler, Leto S writes: Would anyone have any suggestions on how this could be done? I was thinking maybe the QueryParser would have to be changed/extended to accept a separator other than colon :, something like = for example to indicate this clause is not to be tokenised. I suggested

Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Erik Hatcher writes: If your query isn't entered by users, you shouldn't use query parser in most cases anyway. I'd go even further and say in all cases. If you use lucene as a search server you have to provide the query somehow. E.g. we have an php application, that sends queries to a

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Morus Walter
Hoss writes: (c) Filtering. Filters in general make a lot of sense to me. They are a way to specify (at query time) that only a certain subset of the index should be considered for results. The Filter class has a very straight forward API that seems very easy to subclass to get the

Re: Help on the Query Parser

2004-11-23 Thread Morus Walter
Terence Lai writes: Look likes that the wildcard query disappeared. In fact, I am expecting text:java* developer to be returned. It seems to me that the QueryParser cannot handle the wildcard within a quoted String. That's not just QueryParser. Lucene itself doesn't handle wildcards

Re: hits.length() changes during delete process.

2004-12-06 Thread Morus Walter
David Townsend writes: So the short question is, should the hits object be changing and what is the best way to delete all the results of a search (it's a range query so I can't use delete(Term term)? The hits object caches only part of the hits (initially the first 100 (?)). This cache

  1   2   >