RE: sorting tokenized field
I have suggested a solution for this problem ( http://issues.apache.org/bugzilla/show_bug.cgi?id=30382 ) you can use the patch suggested there and recompile lucene. Aviran http://www.aviransplace.com -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, December 10, 2004 13:53 PM To: Lucene Users List Subject: Re: sorting tokenized field On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote: I read that the tokenised fields cannot be sorted. In order to sort tokenized field, either the application has to duplicate field with diff name and not tokenize it or come up with something else. But shouldn't the search engine takecare of this? Are there any plans of putting this functionality built into lucene? It would be wasteful for Lucene to assume any field you add should be available for sorting. Adding one more line to your indexing code to accommodate your sorting needs seems a pretty small price to pay. Do you have suggestions to improve how this works? Or how it is documented? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: finalize delete without optimize
Lucene standard API does not support this kind of operation. Aviran http://www.aviransplace.com -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 08, 2004 17:32 PM To: [EMAIL PROTECTED] Subject: Re: finalize delete without optimize Hi folks: I sent this out a few days ago without a response. Please help. Thanks in advance -John On Mon, 6 Dec 2004 21:15:00 -0800, John Wang [EMAIL PROTECTED] wrote: Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: InderWriter.optimize()
Beside merging the segments, optimize also physically deletes all the deleted documents from the index (When you call delete, lucene only marks the documents as deleted, they physically deleted when you call optimize). Aviran http://www.aviransplace.com -Original Message- From: Yura Smolsky [mailto:[EMAIL PROTECTED] Sent: Thursday, December 09, 2004 5:55 AM To: [EMAIL PROTECTED] Subject: InderWriter.optimize() Hello, lucene-user. I used FSDirectory as storage for index. And I have used optimize() method of IndexWriter to optimize index for faster access. Now I use DbDirectory (Berkley DB) as storage. Does it make sense to use optimize method on index stored in this storage?.. What does optimize do actually? Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Retrieving all docs in the index
In this case you'll have to add another field with a fixed value to all the documents and query on that field Aviran http://www.aviransplace.com -Original Message- From: Ravi [mailto:[EMAIL PROTECTED] Sent: Thursday, December 09, 2004 14:04 PM To: Lucene Users List Subject: RE: Retrieving all docs in the index I'm sorry I don't think I articulated my question well. We use a date filter to sort the search results. This works fine when te user provides some search criteria. But if he gives an empty search criteria, we need to return all the documents in the index in the given date range sorted by date. So I was looking for a query that returns me all documents in the index and then I want to apply the date filter on it. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 09, 2004 1:55 PM To: Lucene Users List Subject: Re: Retrieving all docs in the index On Dec 9, 2004, at 1:35 PM, Ravi wrote: Is there any other way to extract all documents from an index apart from adding an additional field with the same value to all documents and then doing a term query on that field with the common value? Of course. Have a look at the IndexReader API. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: lucene transaction and roll back implementation
AFIK there is no transaction not rollback support in lucene Aviran http://www.aviransplace.com -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 17, 2004 20:25 PM To: [EMAIL PROTECTED] Subject: lucene transaction and roll back implementation Hi folks: How does lucene implement transaction and roll back. E.g. if the machine crashes (from power outage etc.) in the middle of a write, e.g. indexWriter.close()? From examining the code, seems that there is a possibility such crash can cause a corrupted index. (in segmentInfos, new data is written to a temp file and then swaps back to the actual file by doing Util.renameFile, but Util.renameFile is not atomic if we are doing a byte copy) Is there a automatic recovery mechansim or a roll back? What is a general advise for how to handle these situations? Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: best ways of using IndexSearcher
Yes, IndexSearcher is thread safe. Aviran http://www.aviransplace.com -Original Message- From: Abhay Saswade [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 16, 2004 15:16 PM To: Lucene Users List Subject: Re: best ways of using IndexSearcher Hello, Can I use single instance of IndexSearcher in multiple threads with sorting? Thanks, Abhay - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, June 28, 2004 8:51 PM Subject: Re: best ways of using IndexSearcher Anson, Use a single instance of IndexSearcher and, if you want to always 'see' even the latest index changes (deletes and adds since you opened the IndexSearcher) make sure to re-create the IndexSearcher when you detect that the index version has changed (see http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade r.html#getCurrentVersion(org.apache.lucene.store.Directory)) When you get the new IndexSearcher, leave the old instance alone - let the GC take care of it, and don't call close() on it, in case something in your application is still using that instance. This stuff is not really CPU intensive. Disk I/O tends to be the bottleneck. If you are working with multiple indices, spread them over multiple disks (not just partitions, real disks), if you can. Otis --- Anson Lau [EMAIL PROTECTED] wrote: Hi Guys, What's the recommended way of using IndexSearcher? Should IndexSearcher be a singleton or pooled? Would pooling provide a more scalable solution by allowing you to decide how many IndexSearcher to use based on say how many CPU u have on ur server? Thanks, Anson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Steamming
I don't understand what kind of examples you need. All there is to it is just use a different analyzer. Take a look at Snowball analyzer in lucene's sand box. Aviran http://www.aviransplace.com -Original Message- From: Miguel Angel [mailto:[EMAIL PROTECTED] Sent: Monday, November 15, 2004 13:28 PM To: [EMAIL PROTECTED] Subject: Steamming Hi, i use demo for lucene, about steamming ?? anybody can examples used steamming for lucene??? examples please -- Miguel Angel Angeles R. Asesoria en Conectividad y Servidores Telf. 97451277 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Faster highlighting with TermPositionVectors
Did anyone tried this class ? I tried this class but I can't make it to work I indexed a field as new Field(description, description,true,true,true,true); but when I call TokenSources.getTokenStream(_indexReader,i,description); I get ClassCastException In this class the line TermPositionVector tpv=(TermPositionVector) reader.getTermFreqVector(docId,field); is trying to cast SegmentTermVector to TermPositionVector. Is there anything I'm doing wrong. Should I have indexed the field some other way to store TermPositionVector ? BTW: I'm using the latest lucene source from CVS. Thanks, Aviran -Original Message- From: Bruce Ritchie [mailto:[EMAIL PROTECTED] Sent: Friday, October 29, 2004 1:15 AM To: Lucene Users List Subject: RE: Faster highlighting with TermPositionVectors Mark, Thanks to the recent changes (see CVS) in TermFreqVector support we can now make use of term offset information held in the Lucene index rather than incurring the cost of re-analyzing text to highlight it. I have created a class ( see http://www.inperspective.com/lucene/TokenSources.java ) which handles creating a TokenStream from the TermPositionVector stored in the database which can then be passed to the highlighter. This approach is significantly faster than re-parsing the original text. If people are happy with this class I'll add it to the Highlighter sandbox but it may sit better elsewhere in the Lucene code base as a more general purpose utility. BTW as part of putting this together I found that the TermFreq code throws a null pointer when indexing fields that produce no tokens (ie empty or all stopwords). Otherwise things work very well. This is great news! While I won't have the time to test this until probably mid November I do look forward to the speed improvements as the current highlighting mechanisms (reparsing the text) was just not performant enough under heavy loads. Regards, Bruce Ritchie http://www.jivesoftware.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: index files version and lucene 1.4
Lucene 1.4 changed the file format for indexes. You can access a old index using lucene 1.4 but you can't access index which was created using lucene 1.4 with older versions. I suggest you rebuild your index using lucene 1.4 Aviran http://aviran.mordos.com -Original Message- From: arnaud gaudinat [mailto:[EMAIL PROTECTED] Sent: Thursday, October 21, 2004 12:10 PM To: Lucene Users List Subject: index files version and lucene 1.4 Hi, Certainly a stupid question! I have just upgraded to 1.4, I have succeeded to access my 1.3 index files but not my new 1.4 index files. In fact I have no error, but no hits for 1.4 index files. More, I don't know if it's normal but now I have just 3 files for my index (.cfs, deletable and segments). However if I use Luke with the 1.4 index files, It works perfectly. An idea? Regards, Arno. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Null or no analyzer
AFIK if the term Election 2004 will be between quotation marks this should work fine. Aviran http://aviran.mordos.com -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 2:25 AM To: Lucene Users List Subject: RE: Null or no analyzer Aviran writes: You can use WhiteSpaceAnalyzer Can he? If Elections 2004 is one token in the subject field (keyword), this will fail, since WhiteSpeceAnalyzer will tokenize that to `Elections' and `2004'. So I guess he has to write an identity analyzer himself unless there is one provided (which doesn't seem to be the case). The only alternatives are not using query parser or extending query parser for a key word syntax, as far as I can see. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Spell checker
Here http://issues.apache.org/bugzilla/showattachment.cgi?attach_id=13009 Aviran http://aviran.mordos.com -Original Message- From: Lynn Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 10:52 AM To: 'Lucene Users List' Subject: RE: Spell checker Where can I download it? Thanks, Lynn -Original Message- From: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED] Sent: Monday, October 11, 2004 1:26 PM To: Lucene Users List Subject: Spell checker hy lucene users i developed a Spell checker for lucene inspired by the David Spencer code see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker Nicolas Maisonneuve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Null or no analyzer
You can use WhiteSpaceAnalyzer Aviran http://aviran.mordos.com -Original Message- From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 11:23 AM To: Lucene Users List Subject: Null or no analyzer Hi All I have a question regarding selection of Analyzer's during query parsing i have three field in my index db_id, full_text, subject all three are indexed, however while indexing I specified to lucene to index db_id and subject but not tokenize them I want to give a single search box in my application to enable searching for documents some query can look lile motor cross rally this will get fed to QueryParser to do the relevent parsing however if the user enters Jhon Kerry subject:Elections 2004 I want to make sure that No analyzer is used fro the subject field ? how can that be done. this is because I expect the users to know the subject from a List of controlled vocabularies and also I am searching for documents that have the exact subject I tried using the PerFieldAnalyzerWrapper, but how do I get hold a Analyzer that does nothing but pass the text trough to the Searcher ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: how to find field that has any value
You can try to use a range query something like test:[null TO ] Please note that you might get TooManyBooleanClause Exception, if you have too many of them. The other thing you can use is with the operator NOT. For all the Empty fields you can fill them with a string lest say empty and then query for -test:empty Aviran -Original Message- From: MATL (Mats Lindberg) [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 06, 2004 16:27 PM To: Lucene Users List Subject: how to find field that has any value Hello i have a probably simple question for some of you. Since lucene does not allow a query to start with a wild card (* or ?) how would i find all documents in lets say field test that has something in that field, or is not empty. my first thought would be to do something like this. test:(cause the value __ isn't very likely to be present) test:*(would be the correct way, i guess, but lucene doesn't allow that) does anyone have a greater idea. Best regards, Mats Lindberg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: A simple newbee question . How do i exclude a field ?
For the records that don't contain a field you can put a bogus value such as empty and then you can query on -UD:empty Aviran http://aviran.mordos.com -Original Message- From: Robinson Raju [mailto:[EMAIL PROTECTED] Sent: Saturday, October 09, 2004 10:25 AM To: Lucene Users List Subject: A simple newbee question . How do i exclude a field ? Hi , i use lucene to search against a flatted DB table. I have a table which contains the following data . there are 3 records which contain the code RN , 27 which contain UD and 3266 which contain BLANK. codeNumber of records ---- 3269 RN 3 UD 27 if my searchString is RN , i get 3 if my searchString is UD , i get 27 if my searchString is , i get 3296 (in this case i bypass queryfilter) Now , I need to get number of records which do not contain UD . (similar to a DB query of NOT IN or !=). if the string is -UD , it doesnt work. Could you tell me how to construct a string for this ? Regards Robin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: multiple threads
You should not have more then one IndexWriter. (You can have multiple IndexReaders, but only one IndexWriter). Aviran -Original Message- From: Justin Swanhart [mailto:[EMAIL PROTECTED] Sent: Friday, October 01, 2004 19:14 PM To: [EMAIL PROTECTED] Subject: multiple threads As I understand it, if two writers try to acess the same index for writing, then one of the writers should block waiting for a lock until the lock timeout period expires, and then they will return a Lock wait timeout exception. I have a multithreaded indexing applications that writes into one of multiple indexes depending on a hash value, and I intend to merge all the hashes when the indexing finishes. Locking usually works but sometimes it doesn't and I get IO exceptions such as the following.. java.io.IOException: Cannot delete _19.fnm at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:198) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java: 157) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:389) at org.en.global.indexer.IndexGroup.run(IndexGroup.java:387) Any idea on why this could be happening? I am using NFS currently, but the problem appears on the local filesystem as well. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting on a long string
Currently Lucene can only sort on a Keyword field properly. I guess your field is tokenized, which in this case the sort does not work properly. A patch has been suggested to fix this problem ( but has not been applied yet ) http://issues.apache.org/bugzilla/show_bug.cgi?id=30382 Aviran -Original Message- From: Daly, Pete [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 28, 2004 15:46 PM To: Lucene Users List Subject: Sorting on a long string I am new to lucene, and trying to perform a sorted query on a list of people's names. Lucene seem unable to properly sort on the name field of my indexed documents. If I sort by the other (shorter) fields, it seems to work fine. The name sort seems to be close, almost like the last few iterations through the sort loop are not being done. The records are obviously not in the normally random order, but not fully sorted either. I have tried different ways of sorting, including a SortField array/object with the field cast as a string. The index I am sorting has about 1.2 million documents. Are their known limitations in the sorting functionality that I am running into? I can provide more details if needed. Thanks for any help, -Pete - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Hebrew support
As far as I know there is no Analyzer for Hebrew. Aviran -Original Message- From: Alex Kiselevski [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 28, 2004 3:12 AM To: [EMAIL PROTECTED] Subject: Hebrew support Hello, Do you know something about hebrew support in Lucene Thanks in advance Alex Kiselevsky Speech Technology Tel:972-9-776-43-46 RD, Amdocs - IsraelMobile: 972-53-63 50 38 mailto:[EMAIL PROTECTED] The information contained in this message is proprietary of Amdocs, protected from disclosure, and may be privileged. The information is intended to be conveyed only to the designated recipient(s) of the message. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Keyword query confusion
The StandardAnalyzer removes the 1 as it is a stop word. There are two ways you can work around this problem. 1 as you mentioned is to create a Query object programmatically. 2 You can use WhiteSpace Analyzer instead of StandardAnalyzer. Aviran -Original Message- From: Fred Toth [mailto:[EMAIL PROTECTED] Sent: Friday, September 24, 2004 12:27 PM To: [EMAIL PROTECTED] Subject: Keyword query confusion Hi all, I'm trying to understand what's going on with the query parser and keyword fields. I've got a large subset of my documents which are publications. So as to be able to query these, I've got this in the indexer: doc.add(Field.Keyword(is_pub, 1)); However, if I run a query: is_pub:1 I get no hits. If I find a document by other means and dump the fields, the is_pub keyword is there, with value of 1. Now, I've learned that if I change the field to contain the value true instead of the string 1, this query: is_pub:true works just fine. So, I'm pretty sure I'm running afoul of the analyzer, right? The doc says specifically that I should add keyword query clauses programmatically, and I'm guessing that's what's wrong. But can someone explain this? It sure is useful to be able to test this sort of thing with the query parser. What is going on with the standard analyzer that makes true work and 1 not work? Is there a way around this other than by writing code to create the query? This also applies to other types of query, like pub_date:2004. Hoping for enlightenment... Thanks, Fred - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Questions related to closing the searcher
The best way is to use IndexReader's getCurrentVersion() method to check whether the index has changed. If it has, just get a new Searcher http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade r.html#getCurrentVersion(java.lang.String) Aviran -Original Message- From: Edwin Tang [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 11:38 AM To: [EMAIL PROTECTED] Subject: Fwd: Questions related to closing the searcher Hello, In my testing, it seems like if the searcher (in my case ParallelMultiSearcher) is not closed, the searcher will not pick up any new data that has been added to the index since it was opened. I'm wondering if this is a correct statement. Assuming the above is true, I went about closing the searcher with searcher.close(), then setting both the searcher and QueryParser to null, then did a System.gc(). The application will sleep for a set period of time, then resumes to process another batch of queries against the index. When the application resumes, the following method is ran: /** * Creates a [EMAIL PROTECTED] ParallelMultiSearcher} and [EMAIL PROTECTED] QueryParser} if they * do not already exist. * * @return 0 if successful or the objects already exist; -1 if failed. */ private int getSearcher() { Analyzer analyzer; IndexSearcher[] searchers; int iReturn; Vector vector; if (logger.isDebugEnabled()) logger.debug(Entering getSearcher()); if (searcher == null || parser == null) { analyzer = new CIAnalyzer(utility.sStopWordsFile); try { vector = new Vector(); if (utility.bSearchAMX) vector.add(new IndexSearcher(utility.amxIndexDir)); if (utility.bSearchCOMTEX) vector.add(new IndexSearcher(utility.comtexIndexDir)); if (utility.bSearchDJNW) vector.add(new IndexSearcher(utility.djnwIndexDir)); if (utility.bSearchMoreover) vector.add(new IndexSearcher(utility.moreoverIndexDir)); searchers = (IndexSearcher[]) vector.toArray(new IndexSearcher[vector.size()]); searcher = new ParallelMultiSearcher(searchers); parser = new QueryParser(body, analyzer); iReturn = 0; } catch (IOException ioe) { logger.error(Error creating searcher, ioe); iReturn = -1; } catch (Exception e) { logger.error(Unexpected error while creating searcher, e); iReturn = -1; } } else iReturn = 0; if (logger.isDebugEnabled()) logger.debug(Exitng getSearcher() with + iReturn); return iReturn; } // End method getSearcher() This seems to get me around the problem where the searcher was not picking up new data from the index. However, I would run out of memory after 8 iterations of the application processing a batch query, sleeping, process another batch query, sleep, etc. I'm probably missing something completely obvious, but I'm just not seeing it. Can someone please tell me what I'm doing wrong? Thanks, Ed __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: problem with SortField[] in search method (newbie)
You can only sort on indexed field. (even more than that, it'll work properly only on Untokenized fields, ie keyword). Aviran -Original Message- From: Wermus Fernando [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 15, 2004 13:13 PM To: [EMAIL PROTECTED] Subject: problem with SortField[] in search method (newbie) Luceners, My search looks up the whole entities. My entities are accounts, contacts, tasks, etc. My searching looks up a group of entity's fields. This works fine despite, I don't have indexed any entity in a document. But If I sort by some fields from different entities, I get the following error. field shortName does not appear to be indexed The account's field I have indexed are shortName,number,location,fax,phone,symbol and I order by shortName without any order shortName,number,location,fax,phone,symbol it works fine. I don't understand the behavior because If I don't order the searching and I don't have any document indexed, It works fine, but If I add an order I get a runtimeException and I can't catch the exception to solve the problem. The only solution it's to index the whole fields' entitities once in a document, but for me it's a patch. Any idea, it could help me out. Thanks in advance. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort Search Result
Look at SortField http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/SortField .html -Original Message- From: Natarajan.T [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:35 AM To: 'Lucene Users List' Subject: Sort Search Result FYI, How can I get the search results in Ascending order... (Sort API) Thanks, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching MySql index using lucene
Just read your data from the database and create a Lucene Index for the columns you want to search -Original Message- From: sivalingam T [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 9:52 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Searching MySql index using lucene Hi, 1. MySql defaultly creates an index. if i want to search this index using lucene how i can search. 2. How to create index on databases using lucene. Give me suggestions if any body know. Thanks. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing and Searching Database in Lucene
You need to create a lucene index from the database. Just index the columns and the records from the database. It will be useful to have also a field in lucene that contains the database's primary key, so you can retrieve the actual record from the database Aviran -Original Message- From: sivalingam T [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 10:55 AM To: [EMAIL PROTECTED] Subject: Indexing and Searching Database in Lucene Hi Can we index and search database in Lucene Search Engine? if anybody have please send reply. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: index and search question
yes -Original Message- From: Dmitrii PapaGeorgio [mailto:[EMAIL PROTECTED] Sent: Monday, August 16, 2004 9:23 AM To: [EMAIL PROTECTED] Subject: index and search question Ok so when I index a file such as below Document doc = new Document(); doc.Add(Field.Text(contents, new StreamReader(dataDir))); doc.Add(Field.Keyword(filename, dataDir)); I can do a search as this +contents:SomeWord +filename:SomePath Correct? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Question on number of fields in a document.
You should be fine, no problem with the number of fields -Original Message- From: John Z [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 04, 2004 12:23 PM To: [EMAIL PROTECTED] Subject: Question on number of fields in a document. Hi I had a question related to number of fields in a document. Is there any limit to the number of fields you can have in an index. We have around 25-30 fields per document at present, about 6 are keywords, Around 6 stored, but not indexed and rest of them are text, which is analyzed and indexed fields. We are planning on adding around 24 more fields , mostly keywords. Does anyone see any issues with this? Impact to search or index ? Thanks ZJ - Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: When does IndexReader pick up changes?
IndexReader will pick the changes as it is opened. If new documents are added to the index you need to open a new IndexReader in order for it to pick up the changes Aviran -Original Message- From: Stephane James Vaucher [mailto:[EMAIL PROTECTED] Sent: Thursday, July 29, 2004 0:00 AM To: Lucene Users List Subject: Re: When does IndexReader pick up changes? IIRC, if you use a searcher, changes are picked up right away. With a reader, I would expect it should react the same way. disclaimerI'm not a lucene guru, I might be wrong/disclaimer Where I'm less sure is with a FSDirectory, as it uses an internal RAMDirectory. If two separate processes (within the same classloader, FS with same paths are reused) use different FSDirectories, you might notice a flushing behaviour. sv On 28 Jul 2004 [EMAIL PROTECTED] wrote: Hi, Does anyone know if the IndexWriter has to be closed for an IndexReader to pick up the changes? Thanks. --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, If I do this: - open index writer - add document - open reader - search with reader - close reader - close writer Will the reader pick up the document that was added to the index since it was opened after the document was added? Or will it only pick up changes that occur after the index writer is closed? Thanks for the help! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: When does IndexReader pick up changes?
AFAIK you don't have to close the writer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, July 29, 2004 11:17 AM To: [EMAIL PROTECTED] Subject: RE: When does IndexReader pick up changes? Yes, I understand that the IndexReader only picks up changes once it is opened. I'm just trying to determine whether the IndexWriter first needs to be closed or if that is not necessary. --- Lucene Users List [EMAIL PROTECTED] wrote: IndexReader will pick the changes as it is opened. If new documents are added to the index you need to open a new IndexReader in order for it to pick up the changes Aviran -Original Message- From: Stephane James Vaucher [mailto:[EMAIL PROTECTED] Sent: Thursday, July 29, 2004 0:00 AM To: Lucene Users List Subject: Re: When does IndexReader pick up changes? IIRC, if you use a searcher, changes are picked up right away. With a reader, I would expect it should react the same way. disclaimerI'm not a lucene guru, I might be wrong/disclaimer Where I'm less sure is with a FSDirectory, as it uses an internal RAMDirectory. If two separate processes (within the same classloader, FS with same paths are reused) use different FSDirectories, you might notice a flushing behaviour. sv On 28 Jul 2004 [EMAIL PROTECTED] wrote: Hi, Does anyone know if the IndexWriter has to be closed for an IndexReader to pick up the changes? Thanks. --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, If I do this: - open index writer - add document - open reader - search with reader - close reader - close writer Will the reader pick up the document that was added to the index since it was opened after the document was added? Or will it only pick up changes that occur after the index writer is closed? Thanks for the help! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: rebuild index
Why don't you just build a new index in a different location and at the end add the missing documents from the old index to the new one, and then delete the old index. Aviran -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 10:49 AM To: Lucene Users List Subject: rebuild index Hi all, I have a question related to reindexing of documents with lucene. We want to implement the functinality of rebuilding lucene index. That means I want to delete all documents in the index and to add newer versions. All information I need to reindex is kept in the database so that I have a Term ID, which is unique. My problem is that I don't have a deleteall() method in IndexReader, and I don't have undelete(int) and undelete(Term) methods. I have only delete(Term) and undeleteAll() methods that can be used for this action. I would like to delete all documents (just mark as deleted). Add the new documents o the index and create a list of documents that were not succesfully indexed, (from different reasons, that may depend on lucene or on our code). At the end I would like to restore (mark as undeleted) the documents in the list and to optimize the index, so that the changes to be permanetly commited in the index. Is this possible witout hacking lucene code? Any Ideas? Thanks in advance, Sergiu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting on tokenized fields
You can create a new field which contains the full untokened string and use it as a sort field. -Original Message- From: Florian Sauvin [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 20, 2004 20:13 PM To: Lucene Users List Subject: Sorting on tokenized fields I see in the Javadoc that it is only possible to sort on fields that are not tokenized, I have two questions about that: 1) What happens if the field is tokenized, is sorting done anyway, using the first term only? 2) Is there a way to do some sorting anyway, by concatenating all the tokens into one string? -- Florian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I think I found the problem FieldCacheImpl uses WeakHashMap to store the cached objects, but since there is no other reference to this cache it is getting released. Switching to HashMap solves it. The only problem is that I don't see anywhere where the cached object will get released if you open a new IndexReader. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:13 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran [EMAIL PROTECTED] wrote: Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I just saw this post, I guess we both came to the same conclusion. The only problem is that the cached object never gets released, and a new one will get created every time you open a new IndexReader Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:30 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I switched the Comparators and FieldCache classes to use java.util.HashMap instead of java.util.WeakHashMap, and got the performance boost I was looking for (test index of 100K documents; initial search took 991 ms, all subsequent searchs took 90ms. Before, I was seeing initial query of ~1sec, subsequent queries between 500 and 700 ms, with comparator and field lookup table computed each time). I guess the question is why use a WeakHashMap here as opposed to a HashMap? Greg --- Greg Gershman [EMAIL PROTECTED] wrote: I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran [EMAIL PROTECTED] wrote: Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I will post a patch soon Aviran -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:56 PM To: Lucene Users List Subject: Re: Sort: 1.4-rc3 vs. 1.4-final The key in the WeakHashMap should be the IndexReader, not the Entry. I think this should become a two-level cache, a WeakHashMap of HashMaps, the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I think the Entry class can also be changed to not include an IndexReader field. Does this make sense? Would someone like to construct a patch and submit it to the developer list? Doug Aviran wrote: I think I found the problem FieldCacheImpl uses WeakHashMap to store the cached objects, but since there is no other reference to this cache it is getting released. Switching to HashMap solves it. The only problem is that I don't see anywhere where the cached object will get released if you open a new IndexReader. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:13 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran [EMAIL PROTECTED] wrote: Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene Search has poor cpu utilization on a 4-CPU machine
Hi all, First let me explain what I found out. I'm running Lucene on a 4 CPU server. While doing some stress tests I've noticed (by doing full thread dump) that searching threads are blocked on the method: public FieldInfo fieldInfo(int fieldNumber) This causes for a significant cpu idle time. I noticed that the class org.apache.lucene.index.FieldInfos uses private class members Vector byNumber and Hashtable byName, both of which are synchronized objects. By changing the Vector byNumber to ArrayList byNumber I was able to get 110% improvement in performance (number of searches per second). My question is: do the fields byNumber and byName have to be synchronized and what can happen if I'll change them to be ArrayList and HashMap which are not synchronized ? Can this corrupt the index or the integrity of the results? Thanks, Aviran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: How would you delete an entry that was indexed like this
This is kind of a problem, in order to delete documents using terms you need to have a keyword field which contain a unique value, otherwise you might ending deleting more then you want. -Original Message- From: Mike Hogan [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 1:06 PM To: [EMAIL PROTECTED] Subject: How would you delete an entry that was indexed like this Hi, If I index a document like this: IndexWriter writer = createWriter(); Document document = new Document(); document.add(Field.Text(ID_FIELD_NAME, componentId)); document.add(Field.Text(CONTENTS_FIELD_NAME, componentDescription)); writer.addDocument(document); writer.optimize(); writer.close(); What code must I execute to later delete the document (I tried following the docs and whats done in the code and test cases. I saw Terms being used to ID the document to delete. But I am not clear what value to put in the Term, as I do not know how Terms relate to Fields). Many thanks, Mike. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query creation
You'll need to apply some kind of filter or add another field to the index which contains only the first word (Yes you'll need to rebuild the index in this case) -Original Message- From: Armbrust, Daniel C. [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 5:49 PM To: 'Lucene Users List' Subject: Query creation Is it possible to create a query that would find a match in a document if and only if the query (a one word query) matched with the first word in the field I am searching? Or do I have to rebuild my indexes, with a field that only contains the first word? Thanks, Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: RC2 requires reindexing?
You can find RC2 in CVS -Original Message- From: Jan Agermose [mailto:[EMAIL PROTECTED] Sent: Friday, August 29, 2003 6:32 AM To: Lucene Users List Subject: Re: RC2 requires reindexing? Ok, on the first posting about RC2 i looked for et, but as I did not find any RC2 I guessed he was mistaken... but now? What RC2 are You talking about and if its 1.3RC2 where do I find it and why does the webpage not mention it (or the download area hold it) ? Jan - Original Message - From: Lukas Zapletal [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, August 29, 2003 12:14 PM Subject: Re: RC2 requires reindexing? it do not need reindexing, it works fine for me On Thu, 28 Aug 2003 11:27:34 -0400, Terry Steichen [EMAIL PROTECTED] wrote: I just switched to RC2 and found that a number of queries now don't work. (When I switch back to RC! they work fine.) Can't seem to figure out a pattern regarding those that don't work versus those (the vast majority) that still work fine. I looked in the RC2 source and noticed that the dates on IndexWriter and IndexReader and a bunch of related modules seem to have been changed. Is it necessary to reindex (a major task for my stuff) to use RC2? Regards, Terry -- Lukas Zapletal http://www.tanecni-olomouc.cz/lzap icq: 17569735 mail: lzap_at_root.czjabber: lzap_at_njs.netlab.cz pgp: 715B 5502 4FB3 65E7 266B 927E CE9F 1D04 0EE2 4DB7 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Newbie Questions
1. You need to use MultiFieldQueryParser 2. I think you should use PorterStemFilter instead of fuzzy query http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Por terStemFilter.html -Original Message- From: Mark Woon [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 26, 2003 12:54 AM To: [EMAIL PROTECTED] Subject: Newbie Questions Hi all... I've been playing with Lucene for a couple days now and I have a couple questions I'm hoping some one can help me with. I've created a Lucene index with data from a database that's in several different fields, and I want to set up a web page where users can search the index. Ideally, all searches should be as google-like as possible. In Lucene terms, I guess this means the query should be fuzzy. For example, if someone searches for cancer then I'd like to get back all resuls with any form of the word cancer in the term (cancerous, breast cancer, etc.). So far, I seem to be having two problems: 1) How can I search all fields at the same time? The QueryParser seems to only search one specific field. 2) How can I automatically default all searches into fuzzy mode? I don't want my users to have to know that they must add a ~ at the end of all their terms. Thanks, -Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Bug: TermQuery toString - incorrect
I have a TermQuery object which contains a term which has space (two words). But when I do a toString() I get a query that matches an OR operation. Example: The Term +Small Business results with a toString method as +(SocioEconomicInformation:Small Business) And the expected result should be +(SocioEconomicInformation:Small Business) The problem is when I try to parse it again I get +(SocioEconomicInformation:Small Content:Business) Because it does not have the double quotes it tokenizes the term Small Business in to two terms [Small] [Business] instead of one [Small Business] I use Lucene 1.3 RC1. Aviran
RE: keyword indexing
If you are searching on keyword you might need to use TermQuery in order to have an exact match -Original Message- From: Jan Agermose [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 16, 2003 1:04 PM To: [EMAIL PROTECTED] Subject: keyword indexing I'm having some problems with chars in keywords that are not a-z0-9 chars... If I have a keyword like Det Naturvidenskabelige Fakultet or a name Jan Agermose - well besides the fact I need to lowercase the keywords as the querystring is lowercased by lucene, I still cannot get any hits on the keywords. Det Naturvidenskabelige Fakultet - hits = 0 Det* - hits! Det Naturvidenskabelige Fakultet - hits = 0 I can understand the last one - but shouldn't the first one return hits? If not, using keywords seems to be limited to keywords composed of [a-z0-9]+ ??? Now I do a string replace on [^a-z0-9]+ / (removing all the chars) but this gives the queryparse some problems I would think - unless in my special case where the user is not really free to compose queries on there own - therefore I can do the same stringreplace thing on the input :-D But I would like for the poweruser to input real queries - and this leaves me with the problem of parsing queries. I need to do stringreplace only within double quotes... This should be lucenes problem not mine :-D Am I missing something ?? Jan Agermose - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Maybe a stupid question?
You can add as many fields with the same name as your heart desire on the same document. This will give you multiple values Aviran -Original Message- From: Olivier Cochet [mailto:[EMAIL PROTECTED] Sent: Thursday, July 10, 2003 10:43 AM To: Lucene Users List Subject: Maybe a stupid question? Hello, i would like to know if it was possible to introduce more than once value for a field in a document. I think that is must be possible, but i don't know how to make it. Thanks for answering and knocking on my head if i had stupid questions ;-) olivier cochet from Paris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Results sorted by date instead of score?
You'll need to sort the results after you collected them. There is a project called SortedField in lucene's contribution or sandbox (I don't remember exactly) which will help you sort by any field. -Original Message- From: Wilton, Reece [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 02, 2003 6:01 PM To: [EMAIL PROTECTED] Subject: Results sorted by date instead of score? Search hits come back ordered by score. How do I get my results sorted by the date of the article? I have added the article date as a keyword field. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using Lucene in an multiple index/large io scenario
You'll probably need to optimize the index more often. This will reduce the number of files lucene open. Also if you can merge several fields into one, it will also reduce the number of files. Aviran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, June 30, 2003 3:54 PM To: [EMAIL PROTECTED] Subject: Using Lucene in an multiple index/large io scenario Hello, i am ProjectManager from the columba.sourceforge.net java mailclient-project and we integrated Lucene as the search-backend half a year ago. It is now working for small scale mailtraffic but with increasing mailtraffic Lucene throws OutOfMemory and TooManyFilesOpen-Exceptions. I am now wondering if Lucene is capable of doing the job for us (like Otis Gospodnetic suggested) and would appreciate any help and knowledge you can share on this topic. I think the problem arises from following issues: - Lucene is designed to create an index once in a while and not to update an index frequently. We need it to add and delete documents very often *and* search the index eventualy after every operation. Has anyone experiences running Lucene in such an environment or do you think it is impossible? - Do you have an suggestion on how to use Lucene in such an environment because it is not very nice code if you have to create a new IndexReader/Writer after every operation? - We introduced a RAMIndex that is merged to the FileIndex after N operations to reduce the load and to not merge documents that are removed directly after they are added (with filters on the mailboxes that is happening very often). Any ideas if that was wise or if there is a better solution? - Does Lucene have problems with many indices in the same virtual machine? We have an index for every mailfolder and get TooManyFilesOpen-Excpetions when having 10 indices open. Maybe we should try to have only a single index that holds all messages? If you like to look at sourcecode, how we implememted all this look at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/columba/columba/src/mail/ core/org/columba/mail/folder/search/LuceneSearchEngine.java?rev=1.7cont ent-type=text/vnd.viewcvs-markup Its not nice to just give you the plain code and not the relevant snippets, but these are more general design issues that i think are better explained in words than in code. I would really like to see Lucene integrated in Columba, but i had to learn that it is no easy task, maybe an impossible one. Based on the responses i willl decide if we continue to work with Lucene or sadly have to drop it. Thanks in advance Timo Stich [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: date ranges.....
Use RangeQuery to search on the date field -Original Message- From: host unknown [mailto:[EMAIL PROTECTED] Sent: Friday, June 27, 2003 10:39 AM To: [EMAIL PROTECTED] Subject: date ranges. Hi all Here's my scenario I'm building a calendaring application and using Lucene (one of many times I've used it on our site) for the indexing/retrieval mechanism. The calendar has events. An event consists of: start date, end date, start time, end time, and descriptive information. Most begin and end on the same day, however not all of them. Here's where the problem lies. Let's say an event runs from 20030625 (june 25 2003) until 20030701 and I want to search all events (several thousand) and know what's happening today (20030727). The results I'm looking for can be described with this sql statement: Select * from events where start_date = 20030627 and end_date =20030627. How do i write this 'query' with Lucene? Many thanks, Dominic _ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: query question in trouble
In is probably a STOP word in your analyzer -Original Message- From: Ryan Clifton [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2003 3:13 PM To: Lucene Users List Subject: query question in trouble Hello, Upon reviewing the results of some queries recently I noticed that the query: in trouble always searches for trouble. Is 'in' a keyword that I'm not aware of? I searched the whole query syntax page and didn't see it mentioned. I tried an trouble and the query worked fine. The query parser appears to be stripping out 'in', but not doing anything with it. Here's my log: **Query: in trouble 2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble (Query.toString()) 2003-06-11 12:08:50,569 DEBUG 6582 total matching documents **Query: an trouble 2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble (Query.toString()) 2003-06-11 12:06:12,342 DEBUG 1 total matching documents Any ideas? Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort results by date alone?
I think I saw a solution for this in the past. Try to search the mailing list. Anyway you can always use the SearchBean which is in lucene sandbox to sort by any field. -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of David Weitzman Sent: Tuesday, May 27, 2003 8:26 PM To: [EMAIL PROTECTED] Subject: Sort results by date alone? I think it's possible, but I'm not sure how Scorers work. I just want to place the most recent hits at the front and the oldest ones at the back (where date is a field in the documents). Is there a simple way to do this? Thanks, David Weitzman - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Wildcard workaround
You can also index the file names with a leading character. For instance index file1.exe will be indexed as _file1.exe and always add the leading character to the search term. So if the user input is *.exe your query should be _*.exe and if the user input fi* you'll change it to _fi* Aviran -Original Message- From: David Warnock [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 28, 2003 10:55 AM To: Lucene Users List Subject: Re: Wildcard workaround Andrei, I have a file database indexed by content and also by filename. It would be nice if the user could perform a usual search like *.ext. Anybody tried a workaround for this issue ? ( this is needed only for the name of the file, for the rest of the terms the rules are fine with me) If the term begins with * then could you expand it into a set of 36 terms eg a*.ext b*.ext ... z*.ext 0*.ext No idea how this would compare to the other alternatives for speed. But it would be simple to code and would not increase index size. Of course if filenames can use unicode character sets then you have a problem. At that point you would need to do a check of what all the first characters are to know what terms to use (ie only create a tewrm for each character that is used as the 1st character of a filename). HTH Dave -- David Warnock, Sundayta Ltd. http://www.sundayta.com iDocSys for Document Management. VisibleResults for Fundraising. Development and Hosting of Web Applications and Sites. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Querying Question
You should not tokenize the file name instead you should use doc.add(new Field(name, value, true, true, true)); Or Doc.add(Field.keyword(name,value)); Aviran -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:27 PM To: Lucene Users List Subject: RE: Querying Question Use the following type of Field: doc.add(new Field(name, value, true, true, true)); Thanks, Rob -Original Message- From: Aviran Mordo [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:19 PM To: 'Lucene Users List' Subject: RE: Querying Question Did you index the value field as a keyword? Aviran -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:11 PM To: Lucene Users List Subject: Querying Question Importance: High Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]