Range queries

2003-01-21 Thread Tatu Saloranta
My apologies if this is a FAQ (which is possible as I am new to Lucene, however, I tried checking the web page for the answer). I read through the Query syntax web page first, and then checked the matching query classes. It seems like query syntax page is missing some details; the one I was

Re: Range queries

2003-01-22 Thread Tatu Saloranta
On Wednesday 22 January 2003 07:49, Erik Hatcher wrote: Unfortunately I don't believe date field range queries work with QueryParser, or at least not human-readable dates. Is that correct? I think it supports date ranges if they are turned into a numeric format, but no human would type that

Re: Range queries

2003-01-22 Thread Tatu Saloranta
On Wednesday 22 January 2003 08:27, Michael Barry wrote: I utilize the earlier version and queries such as this work fine with QueryParser: field:[ 20030120 - 20030125 ] of course the back-end indexer canonocalizes all date fields to MMDD. The front-end search code is responsible for

Re: Wildchar based search?? |

2003-02-01 Thread Tatu Saloranta
On Saturday 01 February 2003 00:19, Otis Gospodnetic wrote: 1) to what extent are wildcards supported by lucenes? You can use * and ? the way they usually are used. I think there was one exception; first character of a simple term can not be a wildcard? (this from query syntax page). -+ Tatu

Re: '-' character not interpreted correctly in field names

2003-02-03 Thread Tatu Saloranta
On Monday 03 February 2003 07:19, Terry Steichen wrote: I believe that the tokenizer treats a dash as a token separator. Hence, the only way, as I recall, to eliminate this behavior is to modify QueryParser.jj so it doesn't do this. However, doing this can cause some other problems, like

Re: % of Relevance

2003-02-11 Thread Tatu Saloranta
On Tuesday 11 February 2003 07:48, Nellai wrote: Hi! can anyone tell me how to calculate the % of relevance using Lucene. Lucene's hit score is normalized float, ] 0.0, 1.0 ] (since 0.0 ones are never included). From there it's basic arithmetics (perhaps this could be included in FAQ , even

Re: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Tatu Saloranta
On Friday 14 February 2003 07:27, Aaron Galea wrote: I had this problem when using xerces to parse xml documents. The problem I think lies in the Java garbage collector. The way I solved it was to create It's unlikely that GC is the culprit. Current ones are good at purging objects that are

Re: Number range search through Query subclass

2003-02-15 Thread Tatu Saloranta
On Friday 14 February 2003 02:58, Volker Luedeling wrote: Hi, I am writing an application that constructs Lucene searches from XML queries. Each item from the XML is represented by a Query of the corresponding type. I have a problem when I try to search for number ranges, since RangeQuery

Re: IndexWriter addDocument NullPointerException

2003-02-22 Thread Tatu Saloranta
On Friday 21 February 2003 13:22, G√ľnter Kukies wrote: Hello, I don't have any line number. You unfortunately do need to know the line number, if you do get an exception and try to see where it occurs. Another less frequent problem is that you actually get the exception as an object and

Re: AW: How is that possible ?

2003-02-28 Thread Tatu Saloranta
On Friday 28 February 2003 05:15, Alain Lauzon wrote: At 07:16 2003-02-28 +0100, you wrote: May it be, that microsoft is found, because the search is not case sensitive (text) and ct is not found because there the search is case sensitive (Keyword) Did you try +state:CT

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Tatu Saloranta
On Wednesday 05 March 2003 13:35, Leo Galambos wrote: I'm all eyes and I'm a serious grown-up with good manners :) Constructive suggestions for improvement are always welcome. First a disclaimer: I don't mean to sound too negative. I'm genuinely curious about many of the issues you mention.

Re: QueryParser and compound words

2003-03-13 Thread Tatu Saloranta
On Thursday 13 March 2003 00:52, Magnus Johansson wrote: Tatu Saloranta wrote: ... But same happens during indexing; fotbollsmatch should be properly split and stemmed to fotboll and match terms, right? Yes but the word fotbollsmatch was never indexed in this example. Only the word fotboll

Re: multiple collections indexing

2003-03-19 Thread Tatu Saloranta
On Wednesday 19 March 2003 01:44, Morus Walter wrote: ... Searches must be able on any combination of collections. A typical search includes ~ 40 collections. Now the question is, how to implement this in lucene best. Currently I see basically three possibilities: - create a data field

Re: Create my own Analyzer...

2003-03-21 Thread Tatu Saloranta
On Friday 21 March 2003 03:55, Pierre Lacchini wrote: Heya, as u can see, I want to create my own french Analyzer, using the snowball's FrenchStemmer... But i don't really know how to proceed... Does anyone know where I can find a tutorial, or a clear example of How to create an analyzer

Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-25 Thread Tatu Saloranta
On Monday 24 March 2003 18:03, Michael Wechner wrote: John Bresnik wrote: anyone know of a quick and easy way to get this demo [org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a crawler to create a local [static] version of the site [i.e. they are not longer JSP files

Re: Wildcard searching - Case sensitiv?

2003-03-28 Thread Tatu Saloranta
On Friday 28 March 2003 08:37, [EMAIL PROTECTED] wrote: Ok, thanks Otis, you have to write the terms lowercase when you're searching with wildcards. Or use the set method in QueryParser to ask it to automatically lower case those terms. Patch for that was added before 1.3RC1 (check javadocs or

Re: Alternate Boolean Query Parser?

2003-03-28 Thread Tatu Saloranta
On Friday 28 March 2003 15:48, Shah, Vineel wrote: One of my clients is asking for an old-style boolean query search on my keywords fields. A string might look like this: oracle admin* and java and oracle and (8.1.6 or 8.1.7) and (solaris or unix or linux) There would probably be need

Re: Analyzer Incorrect?

2003-04-04 Thread Tatu Saloranta
On Friday 04 April 2003 05:24, Rob Outar wrote: Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started Yup... that's the root of all evil. :-) (I'm in similar situation, going through user

Re: Wildcard workaround

2003-05-30 Thread Tatu Saloranta
On Wednesday 28 May 2003 05:43, David Medinets wrote: - Original Message - From: Andrei Melis [EMAIL PROTECTED] As far as I have understood, lucene does not allow search queries starting with wildcards. I have a file database indexed by content and also by filename. It would be

Re: Lowercasing wildcards - why?

2003-05-31 Thread Tatu Saloranta
On Friday 30 May 2003 09:55, Leo Galambos wrote: Ah, I got it. THX. In the good old days, the wildcards were used as a fix for missing stemming module. I am not sure if you can combine these two opposite approaches successfully. I see the following drawbacks of your solution. Example:

Re: Weighted Search by Field using MultiFieldQueryParser

2003-06-17 Thread Tatu Saloranta
On Tuesday 17 June 2003 05:43, Kevin L. Cobb wrote: I have an index that has three fields in it. When I do a search using MultiFieldQueryParser, the search applies the same importance (weight) to each of the fields. BUT, what if I want to apply a different weight to each field, i.e. I want to

Re: commercial websites powered by Lucene?

2003-06-25 Thread Tatu Saloranta
On Wednesday 25 June 2003 09:47, Ulrich Mayring wrote: John Takacs wrote: I'd love to try Lucene with the above, but the Lucene install fails because of JavaCC issues. Surprised more people haven't encountered this problem, as the install instructions are out of date. Well, what do you

Re: Multiuser environments

2003-07-14 Thread Tatu Saloranta
On Monday 14 July 2003 08:52, Guilherme Barile wrote: Hi I'm writing a web application which will index files using textmining to extract text and lucene to store it. I do have the following implementation questions: 1) Only one user can write to an index at each time. How are you people

Re: interesting phrase query issue

2003-07-17 Thread Tatu Saloranta
On Thursday 17 July 2003 07:20, greg wrote: I have several document sections that are being indexed via the StandardAnalyzer. One of these documents has the line access, the manager. When searching for the phrase access manager, this document is being returned. I understand why (at least i

Re: 2,147,483,647 max documents?

2003-08-11 Thread Tatu Saloranta
On Monday 11 August 2003 01:07, Kevin A. Burton wrote: Why was an int chosen to represent document handles? Is there a reason for this? Why wasn't a long chosen to represent document handles? 64 bits seems like the obvious choice here except for a potentially bloated datastore (32 extra

Re: Keyword search with space and wildcard

2003-08-30 Thread Tatu Saloranta
On Friday 29 August 2003 10:02, Terry Steichen wrote: I agree. One problem, however, that new (and not-so-new) Lucene users face is a learning curve when they want to get past the simplest and most obvious uses of Lucene. For example, I don't think any of the docs mention the fact that you

Re: Lucene demo ideas?

2003-09-17 Thread Tatu Saloranta
On Wednesday 17 September 2003 07:07, Erik Hatcher wrote: On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: I would suggest XML as well. Again, I'd like to hear more about how you'd do this generically. Tell me what the field names and values would correspond to when

Re: HTML Parsing problems...

2003-09-18 Thread Tatu Saloranta
On Thursday 18 September 2003 14:50, Michael Giles wrote: I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but I also know that it is updated from time to time and performs much better than the other ones that I have tested. Frustratingly, the very first page I tried to

Re: Struts logic iterate

2003-10-06 Thread Tatu Saloranta
On Monday 06 October 2003 08:35, Lars Hammer wrote: ... to iterate the Hits. I thought that Hits was an array of pointers to docs, ^^^ Actually, Hits contains a Vector (could be an array as well), but is not a Collection itself

Re: Hierarchical document

2003-10-20 Thread Tatu Saloranta
On Monday 20 October 2003 16:41, Erik Hatcher wrote: One more thought related to this subject - once a nice scheme for representing hierarchies within a Lucene index emerges, having XPath as a query language would rock! Has anyone implemented O/R or XPath-like query expressions on top of

Re: positional token info

2003-10-21 Thread Tatu Saloranta
On Tuesday 21 October 2003 17:31, Otis Gospodnetic wrote: It does seem handy to avoid exact phrase matches on phone boy when a stop word is removed though, so patching StopFilter to put in the missing positions seems reasonable to me currently. Any objections to that? So phone boy

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-17 Thread Tatu Saloranta
On Monday 17 November 2003 07:40, Chong, Herb wrote: i don't know what the Java implementation is like but the C++ one is very fast. ... I personally do not have any experience with the BreakIterator in Java. Has anyone used it in any production environment? I'd be very interested to learn

Re: Contributing to Lucene (was RE: inter-term correlation [was R e: Vector Space Model in Lucene?])

2003-11-17 Thread Tatu Saloranta
On Monday 17 November 2003 08:39, Chong, Herb wrote: the core of the search engine has to have certain capabilities, however, because they are next to impossible to add as a layer on top with any efficiency. detecting sentence boundaries outside the core search engine is really hard to do

Re: Dates and others

2003-12-01 Thread Tatu Saloranta
On Monday 01 December 2003 15:13, Dion Almaer wrote: ... Interesting. I implemented an approach which boosted based on the number of months in the past, and after tweaking the boost amounts, it seems to do the job. I do a fresh reindex every night (since the indexing process takes no time at

Re: SearchBlox J2EE Search Component Version 1.1 released

2003-12-03 Thread Tatu Saloranta
On Tuesday 02 December 2003 09:51, Tun Lin wrote: Anyone knows a search engine that supports xml formats? There's no way to generally support xml formats, as xml is just a meta-language. However, building specific search engines using Lucene core it should be reasonably straight-forward to

Re: Index and Field.Text

2003-12-05 Thread Tatu Saloranta
On Friday 05 December 2003 10:45, Doug Cutting wrote: Tatu Saloranta wrote: Also, shouldn't there be at least 3 methods that take Readers; one for Text-like handling, another for UnStored, and last for UnIndexed. How do you store the contents of a Reader? You'd have to double-buffer

Re: Lock obtain timed out

2003-12-16 Thread Tatu Saloranta
On Tuesday 16 December 2003 03:37, Hohwiller, Joerg wrote: Hi there, I have not yet got any response about my problem. While debugging into the depth of lucene (really hard to read deep insde) I discovered that it is possible to disable the Locks using a System property. ... Am I safe

Re: Performance question

2004-01-08 Thread Tatu Saloranta
On Wednesday 07 January 2004 20:48, Dror Matalon wrote: On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote: ... Thanks for the suggestions. I wonder how much faster I can go if I implement some of those? 25 msecs to insert a document is on the high side, but it depends of course

Re: Vector - LinkedList for performance reasons...

2004-01-21 Thread Tatu Saloranta
On Wednesday 21 January 2004 08:38, Doug Cutting wrote: Francesco Bellomi wrote: I agree that synchronization in Vector is a waste of time if it isn't required, It would be interesting to see if such synchronization actually impairs overall performance significantly. This would be fairly

Re: Caching and paging search results

2004-03-08 Thread Tatu Saloranta
On Monday 08 March 2004 12:34, Erik Hatcher wrote: In the RealWorld... many applications actually just re-run a search and jump to the appropriate page within the hits searching is generally plenty fast enough to alleviate concerns of caching. However, if you need to cache Hits, you need

Re: Performing exact search with Lucene

2004-04-02 Thread Tatu Saloranta
On Friday 02 April 2004 08:12, Phil brunet wrote: Hi all. I'm migrating a part of an application from Oracle intermedia to Lucene (1.3) to perform full text searches. Congratulations! :-) I'd like to know if there is a way to perform exact queries. By exact query, i mean beeing able to

Re: Zero hits for queries ending with a number

2004-04-03 Thread Tatu Saloranta
On Saturday 03 April 2004 08:34, [EMAIL PROTECTED] wrote: On Saturday 03 April 2004 17:11, Erik Hatcher wrote: No objections that error messages and such could be made clearer. Patches welcome! Care to submit better error message handling in this case? Or perhaps allow lower-case to? I

Re: Suggestion for Token.java

2004-04-13 Thread Tatu Saloranta
On Tuesday 13 April 2004 15:31, Holger Klawitter wrote: Hi Erik, What is wrong with simply creating a new token that replaces an incoming one for synonyms? I'm just playing devil's advocate here since you can already get the termText() through the public _method_. Well, you're

Re: Bridge with OpenOffice

2004-04-19 Thread Tatu Saloranta
On Monday 19 April 2004 14:01, Mario Ivankovits wrote: Stephane James Vaucher wrote: Anyone try what Joerg suggested here? http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED] pache.orgmsgNo=6231 Dont know what you would like to do, but if you simply would like to extract text, you could

Re: Field.java - STORED, NOT_STORED, etc...

2004-07-11 Thread Tatu Saloranta
On Sunday 11 July 2004 10:03, Doug Cutting wrote: Doug Cutting wrote: The calls would look like: new Field(name, value, Stored.YES, Indexed.NO, Tokenized.YES); . Actually, while we're at it, Indexed and Tokenized are confounded. A single entry would be better, something like: ... then

Re: pdfbox performance.

2004-07-28 Thread Tatu Saloranta
On Wednesday 28 July 2004 15:44, Paul Smith wrote: The first thing that I would do is wrap the FileInputStream with a BufferedInputStream. You get a significant boost reading in from a buffer, particularly as the size of the file grows. Benchmarking is good; whether there's any