Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Chris Hostetter
: Done. I deprecated DateField and DateFilter, and added the RangeFilter : class contributed by Chris. : : I did a little code cleanup, Chris, renaming some RangeFilter variables : and correcting typos in the Javadocs. Let me know if everything looks : ok. Wow ... that was fast. Things look

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Chris Hostetter
: Note that I said FilteredQuery, not QueryFilter. Doh .. right sorry, I confused myself by thinking you were still refering to your comments 2004-03-29 comparing DateFilter with RangeQuery wrapped in a QueryFilter. : I debate (with myself) on whether add-ons that can be done with other : code

RE: fetching similar wordlist as given word

2004-11-24 Thread Chris Hostetter
:can I get the similar wordlist as output. so that I can show the end :user in the column --- do you mean foam? :How can I get similar word list in the given content? This is a non trivial problem, because the definition of similar is subject to interpretation. I

Re: similarity matrix - more clear

2004-11-30 Thread Chris Hostetter
: A possible solution would be to initialize in turn each document as a : query, do a search using an IndexSearcher and to take from the search : result the similarity between the query (which is in fact a document) : and all the other documents. This is highly redundant, because the : similarity

Re: GETVALUES +SEARCH

2004-12-01 Thread Chris Hostetter
: Having Document implement Map sounds reasonable to me though. Any : reasons not to do this? : : Not really, except perhaps that a Lucene Document could theoretically : have multiple identical keys... not something that anyone would want to Assuming you want all changes to be backwards

IndexWriter.optimize and memory usage

2004-12-02 Thread Chris Hostetter
I've been running into an interesting situation that I wanted to ask about. I've been doing some testing by building up indexes with code that looks like this... IndexWriter writer = null; try { writer = new IndexWriter(index, new StandardAnalyzer(), true);

Re: Date Range Search throws IndexAccessException

2004-12-03 Thread Chris Hostetter
: I'm assuming that this must have something to do with how the date field : enumerates against the matches with 'by the second' granularity - and : thereby exceeding the maximum number of boolean clauses (please correct me : if I am wrong). I'm not so certain .. if you were really exceeding the

RE: Date Range Search throws IndexAccessException

2004-12-03 Thread Chris Hostetter
: The problem with using a Filter is that I want to be able to merely generate : a text query based on the range information instead of having to modify the : core search module which basically receives text queries. If I understand : correctly, the Filter would actually have to be created and

Re: indexReader close method

2004-12-06 Thread Chris Hostetter
: Do you know why I can't close the IndexReader explicitly under some : circumstances and why, when I do manage to close it I can still call : methods on the reader? 1) I tried to create a test case that demonstrated your bug based on the code outline you provided, and i couldn't (see below).

Re: Problem with indexing/merging indices - documents not indexed.

2004-12-06 Thread Chris Hostetter
: I would appreciate any feedback on my code and whether I'm doing : something in a wrong way, because I'm at a total loss right now : as to why documents are not being indexed at all. I didn't try running your code (because i don't have a DB to test it with) but a quick read gives me a good

Re: Is this a bug or a feature with addIndexes?

2004-12-06 Thread Chris Hostetter
: [EMAIL PROTECTED] tmp]# time java MemoryVsDisk 1 1 10 -r : Docs in the RAM index: 1 : Docs in the FS index: 0 : Total time: 142 ms I looked at the code from the article you mentioned and added the print statements i'm guessing you added for ramWriter/fsWriter.docCount() before and after

Re: Filter !!!

2004-12-06 Thread Chris Hostetter
: Hits hits = indexSearcher.search(searchQuery, filter) // here I want : to pass multiple filter... (DateFilter,QueryFilter) You can write a Filter that takes in multiple filters and ANDs them together (or ORs them, it's not clear what you want) Hits h = s.search(q,new

Re: Filter !!!

2004-12-07 Thread Chris Hostetter
: Wait there already is a ChainedFilter in the Lucene Sandbox. Boo-Ya! ... I was really surprised I hadn't seen one yet, but that's what I get for assuming everything in the sandbox would be lised on the Lucene Sandbox page. It looks very cool, everything i ever wanted and then some. (the

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

2004-12-07 Thread Chris Hostetter
: executes the search, i would keep a static reference to SearchIndexer : and then when i want to invalidate the cache, set it to null or create : design of your system. But, yes, you do need to keep a reference to it : for the cache to work properly. If you use a new IndexSearcher : instance

Re: Unexpected TermEnum behavior

2004-12-08 Thread Chris Hostetter
: TermEnum terms = reader.terms(new Term(fieldName, )); : : I noticed that initially TermEnum is positioned at the first term. In other : words, I don't have to call terms.next() before calling terms.term(). This : is different from the behavior of Iterator, Enumeration and ResultSet whose

RE: Sorting based on calculations at search time

2004-12-10 Thread Chris Hostetter
: I believe you are talking about the boost factor for fields or documents : while searching. That does not apply in my case - maybe I am missing a : point here. : The weight field I was talking about is only for the calculation Otis is suggesting that you set the boost of the document to be your

Re: Incremental Search experiment with Lucene, sort of like the new Google Suggestion page

2004-12-11 Thread Chris Hostetter
: I also realized they're prob not doing searches at all - instead they're : going off a DB of query popularity - I wanted to code up something you are correct, hence the reason cnet banana doesn't appear in the list of suggestions even though it has 41K results, but hossman trophy does (with

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Chris Hostetter
: select * from MY_TABLE where MY_NUMERIC_FIELD 80 : : as far as I know you have only the range query so you will have to say : : my_numeric_filed:[80 TO ??] : but this would not work in the a/m example or am I missing something? RangeQuery allows you to an open ended range -- you can tell the

Re: A question about scoring function in Lucene

2004-12-15 Thread Chris Hostetter
is significantly better then the other results b) document #3 and #4 are both equaly relevant to Doug Cutting If I then do a search for Chris Hostetter and get back the following results/scores... 9: 0.9 8: 0.3 7: 0.21 6: 0.21 5: 0.1 ...then I can assume the same

Re: To Sort or not to Sort

2004-12-16 Thread Chris Hostetter
: In my application, users search for messages with Lucene. Typically, : they are more interested in seeing their hits in date-order than in : relevance-order. In reading my ebook copy of Lucene in action (wish : I'd had that a year ago), I find that one of the features added in 1.4 : was the

Re: Customizing termFreq

2004-12-12 Thread Chris Hostetter
: H1:text in H1 font : H2:text in H2 font : content:all the text : : The problem is that query of a type : +(H1:xyz) : is getting scored with the termFreq of xyz in the H1 field whereas I want : it be scored using the termFreq of xyz in the entire document (i.e. : content field) so why

Re: Exception: cannot determine sort type

2004-12-23 Thread Chris Hostetter
: The issue occurs if the first field it accesses parses as a numeric : value and then successive fields are String's. If you are mixing and : I am wondering why this exception might occur when the server/index is : under load. I do realise there are many 'variables in the equation', : so :

RE: analyzer effecting phrases?

2004-12-23 Thread Chris Hostetter
: Therefore I turned back to the standard analyzer and now do some replacing : of the underscores in my ID string to avoid my original problem. This solved maybe i'm missing something, but if you've got a field in your doc that represents an ID, why not create that field as NonTokenized so you

Re: (Offtopic) The unicode name for a character

2004-12-23 Thread Chris Hostetter
: However, I don't think that the names are consistent enough to permit a : generic use of regular expressions. What Daniel is trying to achieve : looks interesting anyway, I'm not sure that that really matters in the long run ... I think the OP was asking if there was a way to get the name in

Re: sorting on a field that can have null values

2004-12-23 Thread Chris Hostetter
: I thought of putting empty strings instead of null values but I think : empty strings are put first in the list while sorting which is the : reverse of what anyone would want. instead of adding a field with a null value, or value of an epty string, why not just leave the field out for

Re: Problems...

2005-01-04 Thread Chris Hostetter
To start with, there has to be more to the search side of things then what you included. this search function is not static, which means it's getting called on an object, which obviously has some internal state (paramOffset, hits, and pathToIndex are a few that jump out at me) what are the

Re: Question about Analyzer and words spelled in different languages

2005-01-06 Thread Chris Hostetter
: Is there any already written analyzer that would take that name : (Schamp;auml;ffer or any other name that has entities) so that : Lucene index could searched (once the field has been indexed) for the real : version of the name, which is : : Schäffer : : and the english spelled version of the

Re: multi-threaded thru-put in lucene

2005-01-06 Thread Chris Hostetter
: This is what we found: : : 1 thread, search takes 20 ms. : : 2 threads, search takes 40 ms. : : 5 threads, search takes 100 ms. how big is your index? What are the term frequencies like in your index? how many differnt queries did you try? what was the structure of your

RE: Lucene Book in UK

2005-01-06 Thread Chris Hostetter
: I ordered my from Amazon a while back and was notified yesterday that it : shipped. Here was my price: really??? .. those bastards. I ordered two copies for my work on December 10th and they still haven't shipped them. : 1Lucene In Action (In Action) $27.17 1 $27.17 Hmm,

RE: Problems...

2005-01-06 Thread Chris Hostetter
: Hoss, could you tell me what to exceptions I'm missing? Thanks! anytime you have a catch block, you should be doing something with that exception. If possible, you can recover from an exception, but no matter what you should log the exception in some way so that you know it happened. Your

Re: Problems...

2005-01-07 Thread Chris Hostetter
: Stored = as-is value stored in the Lucene index : : Tokenized = field is analyzed using the specified Analyzer - the tokens : emitted are indexed : : Indexed = the text (either as-is with keyword fields, or the tokens : from tokenized fields) is made searchable (aka inverted) : : Vectored =

Re: Use a date field for ranking

2005-01-07 Thread Chris Hostetter
: we are currently implementing a search engine for a news site. Our goal : is to have a search result that uses the publish date of the documents : to boost the score of the documents. : have to use something that boosts the scores at _search_ time. 1) There is a way to boost individual Query

Re: Query based stemming

2005-01-07 Thread Chris Hostetter
: Is it possible to enable stem queries on a per-query basis? It doesn't : seem to be possible since the stem tokenizing is done during the : indexing process. Are people basically stuck with having all their : queries stemmed or none at all? : From what I've read, if you want to have a choice,

Re: Use a date field for ranking

2005-01-10 Thread Chris Hostetter
: : have to use something that boosts the scores at _search_ time. : Yes, I know I can boost Query objects, but that is not the same as : boosting the document score by a factor. By boosting query objects I : _add_ values to the score. Let me show you an example: well, sure it is ... you have

Re: How do I unlock?

2005-01-11 Thread Chris Hostetter
: What about a shutdown hook? Interesting idea, at the moment the file is created on disk, the FSDirectory could add a shutdown hook that checked for the existence of the file and if it's still there (implying that the Lock owner failed without releasing the lock) it can forcably remove it. Of

Re: stop words and index size

2005-01-13 Thread Chris Hostetter
: The corpus is the English Wikipedia, and I indexed the title and body of : the articles. I used a list of 525 stop words. : : With stopwords removed the index is 227MB. : With stopwords kept the index is 331MB. That doesn't seem horribly surprising. consider that for every Term in the index,

Re: Best way to find if a document exists, using Reader ...

2005-01-17 Thread Chris Hostetter
: 1) Adding 250K documents took half an hour for lucene. : 2) Deleting and adding same 250K documents took more than 50 minutes. In my : test all 250K objects are new so there is nothing to delete. : : Looks like there is no other way to make it fast. I bet you can find an improvement in the

Re: How to get all field values from a Hits object?

2005-01-17 Thread Chris Hostetter
: is it possible to get all different values for a : Field from a Hits object and how to do this? The ording of your question suggests that the Field you are interested in isn't a field which will have a fairly unique value for every doc (ie: not a title, more likely an author or category

Re: lucene integration with relational database

2005-01-18 Thread Chris Hostetter
: Thanks for your tips. I am trying to get a more thorough understanding : why this would be better. 1) give serious consideration to just putting all of your data in lucene for the purposes of searching. the intial example mentioned employees, and salaries and wanted to search for employees

Re: Why IndexReader.lastModified(index) is depricated?

2005-01-19 Thread Chris Hostetter
: Why IndexReader.lastModified(index) is depricated? Did you read the javadocs? Synchronization of IndexReader and IndexWriter instances is no longer done via time stamps of the segments file since the time resolution depends on the hardware platform. Instead, a version number is

Re: Opening up one large index takes 940M or memory?

2005-01-21 Thread Chris Hostetter
: We have one large index right now... its about 60G ... When I open it : the Java VM used 940M of memory. The VM does nothing else besides open Just out of curiosity, have you tried turning on the verbose gc log, and putting in some thread sleeps after you open the reader, to see if the memory

Re: Reloading an index

2005-01-27 Thread Chris Hostetter
: processes ended. If you're under linux, try running the 'lsof' : command to see if there are any handles to files marked (deleted). : Searcher, the old Searcher is closed and nulled, but I : still see about twice the amount of memory in use well : after the original searcher has been

Re: How do I delete?

2005-02-01 Thread Chris Hostetter
: anywhere. I checked the count coming back from the delete operation and : it is zero. I even tried to delete another unique term with similar : results. First off, are you absolutely certain you are closing the reader? it's not in the code you listed. Second, I'd bet $1 that when your

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Chris Hostetter
Another approach... You can make a Filter that is the inverse of the output from another filter, which means you can make a QueryFilter on the search, then wrap it in your inverse Filter. you can't execute a query on a filter without having a Query object, but you can just apply the Filter

Re: Starts With x and Ends With x Queries

2005-02-04 Thread Chris Hostetter
: Also keep in mind that QueryParser only allows a trailing asterisk, : creating a PrefixQuery. However, if you use a WildcardQuery directly, : you can use an asterisk as the starting character (at the risk of : performance). On the issue of ends with wildcard queries, I wanted to throw out and

Re: Document numbers and ids

2005-02-06 Thread Chris Hostetter
: care about their content. I only want to know a particular numeric : field from : document (id of document's category). : I also need to know how many docs in category were found, so I can't : index : You should explore the use of IndexReader. Index your documents with : category id

Re: Starts With x and Ends With x Queries

2005-02-06 Thread Chris Hostetter
: book Managing Gigabytes, making *string* queries drastically more : efficient for searching (though also impacting index size). Take the : term cat. It would be indexed with all rotated variations with an : end of word marker added: ... : The query for *at* would be preprocessed and

Re: RangeQuery With Date

2005-02-07 Thread Chris Hostetter
: Your dates need to be stored in lexicographical order for the RangeQuery : to work. : : Index them using this date format: MMDD. : : Also, I'm not sure if the QueryParser can handle range queries with only : one end point. You may need to create this query programmatically. and when

Re: Lucene in the Humanities

2005-02-22 Thread Chris Hostetter
: Just curious: it would seem easier to use multiple fields for the : original case and lowercase searching. Is there any particular reason : you analyzed the documents to multiple indexes instead of multiple : fields? : : I considered that approach, however to expose QueryParser I'd have

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-25 Thread Chris Hostetter
: Whats the desired pattern of using of TermInfosWriter.indexInterval ? : : There isn't one. It is not a part of the public API. It is an : unsupported internal feature. : It was never public. It used to be static and final, but is now an : instance variable. : The place to put