Re: Patchs for RussianAnalyzer
Erik, Look, please second my letter whithout attachment. It has the texts in body letter. Vladimir. On Mon, 29 Mar 2004 12:06:45 -0500 Erik Hatcher [EMAIL PROTECTED] wrote: Vladimir, I have just taken a look at your submitted patches. I have no objections to making Cp1251 the default charset used in the no-arg constructor to RussianAnalyzer, but all of your other changes are formatting along with the addition of some other constructors. Could you please provide a functionality-only diff for your patches, preferably in a single file attached to a Bugzilla issue? Thanks, Erik On Mar 17, 2004, at 8:25 AM, Vladimir Yuryev wrote: Dear developers! The user using RussianAnalyzer writes to you of Lucene. There is one problem at work only with it of Analyzer it is parameter of the Russian coding (you it know as the set of the code tables for one language always causes admiration). East Europe or the population the using applied programs in Russian use the coding windows-1251 as basic or widely widespread client a platform MS Windows. There is an opinion to update constructor without parameters establishing default Cp1251. See attached file: RussianAnalyzerPatchs.tgz RussianAnalyzer.java.path RussianLetterTokenizer.java.patch RussianLowerCaseFilter.java.patch RussianStemFilter.java.patch TestRussianAnalyzer.java.path Such updating will remove mess (for the beginners in Lucene or beginners of Russian) and will facilitate use Analyzers at switchings multilanguage search. Regards, Vladimir Yuryev. RussianAnalyzerPatchs.tgz - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Patchs for RussianAnalyzer
On Mar 30, 2004, at 3:38 AM, Vladimir Yuryev wrote: Erik, Look, please second my letter whithout attachment. It has the texts in body letter. Vladimir. I don't have that e-mail you refer to. Please use the standard Jakarta Bugzilla issue tracking system, though. You can place an attachment to an issue after you create it - e-mail ends up mangling in-line patches. What I'm after is a clean patch that *only* changes the functionality you desire, not code formatting also. We can clean up code formatting in another pass if needed - or I can just do that on my end after reviewing the functionality-only patch. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: UNIX command-line indexing script?
charlie, i wrote this in java.Ofcourse I am ready to share. But i have some problems when indexing large volume of data. I am under testing. Linto On Fri, 26 Mar 2004 Charlie Smith wrote : So, Linto, Did you write this in PERL or JAVA. Would you be willing to part with copy of source? Linto wrote on 3/16/04 I have wrote one that will index PDF,DOC,XLS,XML,HTML,TXT and plain/text files. I wrote this based on demo application and using other open soure componets POI by Apache (for doc and exel) and PDFBox. I modified client interface also. Now its looks like google. Still i have to do a couple of things. 1) At present i'm using UNIX 'file' command to check it is plain text. This will spwan process and take more time. The advantage this is in unix based mechines where file extention is not important.( it uses magic numbers. ) 2) The information such as Index Location, Directory, URL, etc. should be kept in an xml file. So that it cam be dynamic. 3) Categeory Since apache guys provided good frame work every thing made easy. Thanks guys! Linto On Sat, 13 Mar 2004 Charlie Smith wrote : Anyone written a simple UNIX command-line indexing script which will read a bunch off different kinds of docs and index them? I'd like to make a cron job out of this so as to be able to come back and read it later during a search. PERL or JAVA script would be fine. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Patchs for RussianAnalyzer
Erik, I made BUG # 28050. Vladimir On Tue, 30 Mar 2004 06:19:04 -0500 Erik Hatcher [EMAIL PROTECTED] wrote: On Mar 30, 2004, at 3:38 AM, Vladimir Yuryev wrote: Erik, Look, please second my letter whithout attachment. It has the texts in body letter. Vladimir. I don't have that e-mail you refer to. Please use the standard Jakarta Bugzilla issue tracking system, though. You can place an attachment to an issue after you create it - e-mail ends up mangling in-line patches. What I'm after is a clean patch that *only* changes the functionality you desire, not code formatting also. We can clean up code formatting in another pass if needed - or I can just do that on my end after reviewing the functionality-only patch. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene optimization with one large index and numerous small indexes.
Esmond Pitt wrote: Don't want to start a buffer size war, but these have always seemed too small to me. I'd recommend upping both InputStream and OutputStream buffer sizes to at least 4k, as this is the cluster size on most disks these days, and also a common VM page size. Okay. Reading and writing in smaller quantities than these is definitely suboptimal. This is not obvious to me. Can you provide Lucene benchmarks which show this? Modern filesystems have extensive caches, perform read-ahead and delay writes. Thus file-based system calls do not have a close correspondence to physical operations. To my thinking, the primary role of file buffering in Lucene is to minimize the overhead of the system call itself, not to minimize physical i/o operations. Once the overhead of the system call is made insignificant, larger buffers offer little measurable improvement. Also, we cannot increase the size of these blindly. Buffers are the largest source of per-query memory allocation in Lucene, with one (or two for phrases and spans) allocated for every query term. Folks whose applications perform wildcard queries have encountered out-of-memory exceptions with the current buffer size. Possibly one could implement a term wildcard mechanism which does not require a buffer per term, or perhaps one could allocate small buffers for infrequent terms (the vast majority). If such changes were made then it might be feasable to bump up the buffer size somewhat. But, back to my first point, one must first show that larger buffers offer significant performance improvements. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
Thanks for the information. I downloaded 1.3-rc2 and put a IndexReader.close() at the end of the search routine. This seems to have cleared up the problems. Also, demo source code for results.jsp to return a pointer to IndexReader so that it could be closed at end of search. Ie. searcher = new IndexSearcher(ir = IndexReader.open(indexName)); //create an indexSearcher for our page ... ir.close(); erik 3/27/2004 4:44:28 AM On Mar 27, 2004, at 1:28 AM, Charlie Smith wrote: What would be the URL for the JUnit stuff? Look in the src/test directory of where you checked out Lucene. All JUnit tests live there and below. BTW: I was able to build a new Index.class file, with the additional line iw.setUserCompoundFile(true) after extracting the lucene-1.4-rc1-dev.jar. Then reindexed. Guess what - no worky. :( Maybe you'd care to share some *technical* details to elaborate on no worky?! Still get the too many files open error on invoking a modified results.jsp. (The one that comes with Lucene.) The index is created with a call to the IndexWriter.class file. The Index.class file calls IndexWriter, and I modified to have the setUseCompoundFile(true). Added lines 350 and 442 as suggested. What Index.class are you talking about? The demo application? Can I get 1.3-RC2? Could someone point me to the URL for this download please ;) Use CVS :) I noticed following entry in mail archives: http://www.mail-archive.com/[EMAIL PROTECTED]/ msg06118.html along with 139 others that dealt with the too many files open problem. Looks like this is a high priority problem that might justify a new release in and of itself? People have been using Lucene for years and managing the file handle issue by setting ulimit and other tricks like optimizing to reduce the number of segments. So it is not as much a problem as it is a known issue that can be managed. My ulimit is set to unlimited. From what I can tell, it is a stress test issue that seems to work under 1.3-rc2. Would anyone understand the differences to know if it will work as well under next stable release of Lucene? I'm not up to speed on what the issues with 1.3 final are - I've just started hearing about it. Is there a reproducible example that demonstrates a problem? Erik John Brown has made his source available. Go to Google and search for docSearcher. He seems quite willing to help where needed. Use the reults.jsp routine that comes with Lucene to test, with following changes: snip Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer --- Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer 68,69c54,56 query = QueryParser.parse(queryString, contents, analyzer); //parse the } catch (ParseException e) { //query and construct the Query --- query = QueryParser.parse(queryString, body, analyzer); //parse the //query = query.rewrite(reader); } catch (ParseException e) { //query and construct the Query 87a75 trtdfont size=5Search results for /fontfont size=5 color=blue%=queryString%/td/tr trtr 108a96,97 // cws: 2/25/04 added this to get format href link. RE r = new RE(/path/to/site/root/); 111d99 tr 114,122c102,131 String doctitle = doc.get(title);//get its title String url = doc.get(url); //get its url field if ((doctitle == null) || doctitle.equals()) //use the url if it has no title doctitle = url; //then output! % tda href=%=url%%=doctitle%/a/td td%=doc.get(summary)%/td /tr --- String path = doc.get(path); String type = doc.get(type); String title = doc.get(title); // cws: 2/25/04 added this to get format href link. String path_part = r.subst(path, /); String summary = doc.get(summary); String size = doc.get(size); String date = doc.get(mod_date); // date formating java.util.Date bd=DateField.stringToDate(date); Calendar nowD=Calendar.getInstance(); nowD.setTime(bd); int mon=nowD.get(nowD.MONTH)+1; int year=nowD.get(nowD.YEAR); int day=nowD.get(nowD.DAY_OF_MONTH); date =
Re: Lucene 1.4 - lobby for final release
Your opinion of course on the issued of too many files open not being a bug. I found it to be otherwise. Thanks for the info on popular elections. Being a newbie to this list, I am finding that most others on the list a bit more pleasant. But then, you not up for a popular election, are you? Appreciate all you do to keep us from shooting ourselves in the foot. Thankyou. :( cutting 3/29/2004 11:27:58 AM Charlie Smith wrote: I'll vote yes please release new version with too many files open fixed. There is no too many files open bug, except perhaps in your application. It is however an easy to encounter problem if you don't close indexes or if you change Lucene's default parameters. It will be considerably harder to make happen in 1.4, to keep so many people from shooting themselves in the foot. Also, releases are not made by popular election. They are made by volunteer developer when deemeed appropriate. If you'd like to get more involved in Lucene's development, please contribute constructive efforts to the lucene-dev mailing list. Maybe default the setUserCompoundFile(true) to true on this go around. This was discussed at lenght on the developer mailing list a while back. The change has been made and will be present in 1.4. Otherwise, how can I get 1.3-RC2? I can't seem to locate it. The second hit for a Google search on lucene 1.3RC2 reveals: http://www.apachenews.org/archives/000134.html These search engines sure are amazing, aren't they! Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
The Filter got called more than one time
Hi, We implemented a Filter that performs filtering based on some internal pricing logic. While testing we discovered that this filter got called several times, not like the FAQ says, exactly one time. And the number of calls made was based on how big the result set was. I printed out the calling stack and discovered that Hits.doc(n) also calls IndexSearcher.search(Query, Filter) when there're more docs needed. I can understand the lazy retrieve for optimization, but it seems wrong to me to just call the search function again and again. At least the filter should not be invoked over and over again. Logic in our filter is a little bit heavier than usual already. We definitely want to reduce the number of calls to it. Is there any way we can work around this? Call to Searcher.search() at com.comergent.reference.appservices.productService.search.query.PricingFilte r.bits(PricingFilter.java:244) at com.comergent.api.appservices.search.query.CmgtFilter.bits(CmgtFilter.java:1 08) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93) at org.apache.lucene.search.Hits.init(Hits.java:80) at org.apache.lucene.search.Searcher.search(Searcher.java:71) Call to Hits.doc() at com.comergent.reference.appservices.productService.search.query.PricingFilte r.bits(PricingFilter.java:244) at com.comergent.api.appservices.search.query.CmgtFilter.bits(CmgtFilter.java:1 08) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93) at org.apache.lucene.search.Hits.hitDoc(Hits.java:153) at org.apache.lucene.search.Hits.doc(Hits.java:118) Thanks Ching-pei - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: The Filter got called more than one time
Use a caching mechanism for your filter, so the bitset is not regenerated. CachingWrappingFilter is your friend :) Erik On Mar 30, 2004, at 2:28 PM, Ching-Pei Hsing wrote: Hi, We implemented a Filter that performs filtering based on some internal pricing logic. While testing we discovered that this filter got called several times, not like the FAQ says, exactly one time. And the number of calls made was based on how big the result set was. I printed out the calling stack and discovered that Hits.doc(n) also calls IndexSearcher.search(Query, Filter) when there're more docs needed. I can understand the lazy retrieve for optimization, but it seems wrong to me to just call the search function again and again. At least the filter should not be invoked over and over again. Logic in our filter is a little bit heavier than usual already. We definitely want to reduce the number of calls to it. Is there any way we can work around this? Call to Searcher.search() at com.comergent.reference.appservices.productService.search.query.Pricing Filte r.bits(PricingFilter.java:244) at com.comergent.api.appservices.search.query.CmgtFilter.bits(CmgtFilter.j ava:1 08) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93) at org.apache.lucene.search.Hits.init(Hits.java:80) at org.apache.lucene.search.Searcher.search(Searcher.java:71) Call to Hits.doc() at com.comergent.reference.appservices.productService.search.query.Pricing Filte r.bits(PricingFilter.java:244) at com.comergent.api.appservices.search.query.CmgtFilter.bits(CmgtFilter.j ava:1 08) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93) at org.apache.lucene.search.Hits.hitDoc(Hits.java:153) at org.apache.lucene.search.Hits.doc(Hits.java:118) Thanks Ching-pei - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[patch] MultiSearcher should support getSearchables()
Seems to only make sense to allow a caller to find the searchables a MultiSearcher was created with: 'diff' -uN MultiSearcher.java.bak MultiSearcher.java --- MultiSearcher.java.bak 2004-03-30 14:57:41.660109642 -0800 +++ MultiSearcher.java 2004-03-30 14:57:46.530330183 -0800 @@ -208,4 +208,8 @@ return searchables[i].explain(query,doc-starts[i]); // dispatch to searcher } + public Searchable[] getSearchables() { +return searchables; + } + } -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Near performance question
Based on the nature of our documents, we sometimes experience extremely long response times when executing NEAR operations against a document (sometimes well over minutes - even though the operation is restricted to a single document). Our analysis of the code indicates (we think): It looks up each of the terms in the word.dbx file. It intersects the occurrence lists. (So far so good!) It takes each gid found in the occurrence list and: finds its parent right up until the root of the document (in dom.dbx). Traverses the tree depth-first until it finds the node text of interest. Does the expected scan to find out if the term distance requirement is satisfied. We did some timings on our document (Rusticus). It started off taking 1 second per occ and grew to 25 seconds. If we changed the dom.dbx buffers, we got significant improvement, but still relatively slow (343 occs). QUESTION: Seems to us the occs are ordered by gid (and we don't do any updating). Is there a simple way to make use of the positioning information of the tree levels for the prior occurrence on the current occurrence so that we don't have to start again from the document root? Thanks, Joe Paulsen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Performance of hit highlighting and finding term positions for a specific document
I'm playing with this package: http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index. IndexReader.termPositions( Term term ) is term specific not term and document specific. Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions? -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: [patch] MultiSearcher should support getSearchables()
On Mar 30, 2004, at 5:59 PM, Kevin A. Burton wrote: Seems to only make sense to allow a caller to find the searchables a MultiSearcher was created with: Could you elaborate on why it makes sense? What if the caller changed a Searchable in the array? Would anything bad happen? (I don't know, haven't looked at the code). 'diff' -uN MultiSearcher.java.bak MultiSearcher.java --- MultiSearcher.java.bak 2004-03-30 14:57:41.660109642 -0800 +++ MultiSearcher.java 2004-03-30 14:57:46.530330183 -0800 @@ -208,4 +208,8 @@ return searchables[i].explain(query,doc-starts[i]); // dispatch to searcher } + public Searchable[] getSearchables() { +return searchables; + } + } -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Performance of hit highlighting and finding term positions for a specific document
On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. What if the original analyzer removed stopped words, stemmed, and injected synonyms? Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions? Big +1, regardless of the implementation details. Hit hilighting is so commonly requested that having it available at least in the sandbox, or perhaps even in the core, makes a lot of sense. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Performance of hit highlighting and finding term positions for a specific document
I agree with you that a highlight package should be available directly from the lucene website. To offer this much-desired feature, having a dependency on a personal web site seems a little weird to me. It would also force the community to support this functionality, which would seem appropriate. cheers, sv On Tue, 30 Mar 2004, Kevin A. Burton wrote: I'm playing with this package: http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index. IndexReader.termPositions( Term term ) is term specific not term and document specific. Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Performance of hit highlighting and finding term positions for a specific document
Erik Hatcher wrote: On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. What if the original analyzer removed stopped words, stemmed, and injected synonyms? Just use the same analyzer :)... I agree it's not the best approach for this reason and the CPU reason. Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions? Big +1, regardless of the implementation details. Hit hilighting is so commonly requested that having it available at least in the sandbox, or perhaps even in the core, makes a lot of sense. Well if we could make it efficient by using the frequency and positions of terms we're all set :)... I just need to figure out how to do this efficiently per document. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: [patch] MultiSearcher should support getSearchables()
Erik Hatcher wrote: On Mar 30, 2004, at 5:59 PM, Kevin A. Burton wrote: Seems to only make sense to allow a caller to find the searchables a MultiSearcher was created with: Could you elaborate on why it makes sense? What if the caller changed a Searchable in the array? Would anything bad happen? (I don't know, haven't looked at the code). Yes... something bad could happen... but that would be amazingly stupid ... we should probably recommend that it be readonly. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: Performance of hit highlighting and finding term positions for a specific document
Kevin A. Burton wrote: I'm playing with this package: http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index. IndexReader.termPositions( Term term ) is term specific not term and document specific. As far as I know it's not currently possible to get this information from a standard lucene index. Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions? I've been meaning to look into good ways to store token offset information to allow for very efficient highlighting and I believe Mark may also be looking into improving the highlighter via other means such as temporary ram indexes. Search the archives to get a background on some of the idea's we've tossed around ('Dmitry's Term Vector stuff, plus some' and 'Demoting results' come to mind as threads that touch this topic). Regards, Bruce Ritchie http://www.jivesoftware.com/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [patch] MultiSearcher should support getSearchables()
On Mar 30, 2004, at 8:52 PM, Kevin A. Burton wrote: Erik Hatcher wrote: On Mar 30, 2004, at 5:59 PM, Kevin A. Burton wrote: Seems to only make sense to allow a caller to find the searchables a MultiSearcher was created with: Could you elaborate on why it makes sense? What if the caller changed a Searchable in the array? Would anything bad happen? (I don't know, haven't looked at the code). Yes... something bad could happen... but that would be amazingly stupid ... we should probably recommend that it be readonly. No question that it'd be unwise to do. We could say the same argument for making everything public access as well and say it'd be stupid to override this method, but we made it public anyway. I'd rather opt on the side of safety. Besides, you haven't provided a use case for why you need to get the searchers back from a MultiSearcher :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]