thanks for your mail
Received your mail we will get back to you shortly - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
'Sponsored' links
I am a newbie to Lucene, and this is my first serious posting to Lucene-user. This is to solicit comment upon the problem of supplying a sponsored links capability within Lucene. This capability would not affect at all which documents are returned by a query, but would cause any 'sponsored' documents present among the results to be displayed before other documents in the list returned. I have looked over the correspondence in Lucene-user, but not found anything addressing this topic; if I have missed it, please tell me where and when, and ignore the rest of this. It seems to me that there are three ways to achieve the capability: 1. Preset boost values for 'sponsored' documents, with an implied burden of reindexing when sponsors are modified. 2. Post-qualify documents present in the hit list for their sponsorship status, building a new hit list. 3. Modify the query to search using both the full query as an unsponsored boolean clause with the default boost value, and for each sponsor, to repeat the full query ANDed with that sponsor with the appropriate boost value. Are there other strategies not considered? Assuming a small list of sponsors (10 or fewer), and low volatility amongst the sponsors (1 change / month or less) which method is best? I have been pursuing method #1, almost to the exclusion of the others, but have encountered an unknown difficulty in the implementation (separate posting). In particular, while it is clear that #3 is doable, I know nothing about the searching burden added by multiplying the user's query by one plus the count of sponsors. Regarding #3, if my understanding is right, then: Sponsors name: s1, s2, s3 ... words or phrases: s1w1, s1w2, ... , s2w1, s2w2, ... , s3w1 ... boost values: s1v, s2v, s3v then given query q as user input, form: q or (q and (s1w1 | s1w2 | s1w3 | ...)^s1v) or (q and (s2w1 | s2w2 ...)^s2v) or (q and (s3w1 ...)^s3v) Is this correct? Does the strategy of search identify any kind of intermediate sublist to speed up searching? (But then it would start to resemble #2.) Rolling ones own for #2 would run query q, and get the HitCollector. Separately running queries for each of: s1w1 | s1w2 | s1w3 | ..., s2w1 | s2w2 ... s3w1 ... and merge each hit collector with the one from query q. (Just AND the bitsets???) Lastly adjust scores and form a new composite HitCollecter. By this time I have told everyone much more than I know. Stray thought:-- can HitCollectors be cached at application init? There are many other questions regarding details of implementation, but their proper place is another communication. Just by preparing this document for dissemination has helped greatly. All and any comments are much appreciated. Thank you all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Intermediate indexing before final
I am a newbie to Lucene, and have been learning by experiment and from the demos. A problem has arisen in indexing a document after creation, and before indexing in the permanent index. It is being indexed to this small lookaside index in order to determine whether it is sponsored [i.e. contains any word that causes it to be included in one of the 'sponsored' document levels.] (A separate letter deals with the larger issues of sponsorship.) If it is sponsored, then a setBoost for the document will be issued, with a level-dependent value. The code in question arises from within IndexHTML near: doc = new HTMLDocument(file); writer.addDocument(doc); In the case at issue, this code has been changed to: doc = new HTMLDocument(file); int boost = sponsoredValue(doc); doc.setBoost(boost); writer.addDocument(doc); The sponsoredValue method never returns. The exception occurs after a longish delay in eclipse, about 2-3 seconds. The document used is: http://www.w3.org/TR/xquery stored as a local file. The same document indexes correctly when the call to sponsoredValue and setBoost are removed. HTMLDocument was modified in minor ways. HTMLParser is destined for modification, but is still vanilla. Note that altering RAMDirectory to FSDirectory makes no difference and does not change the behavior. I greatly Appreciate any help, thank you all. - the Document doc: url: Keyword, string file: Unindexed, string modified: Keyword, string uid: as in HTMLdemo, string contents: Text, reader title: Text, string metadata: Text, string the code: private static RAMDirectory ramDir = null; private static IndexWriter ramWriter = null; private static IndexReader ramReader = null; private static IndexSearcher ramSearcher = null; public int sponsoredValue(Document doc) { . . . ramDir = new RAMDirectory(); ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true); +-- ramWriter.addDocument(doc); | ramWriter.close(); | ramWriter = null; | ramReader = IndexReader.open(ramDir); | ramSearcher = new IndexSearcher(ramReader); | . | . | . | } | the Exception: java.io.IOException: Pipe closed at java.io.PipedInputStream.receive(Unknown Source) at java.io.PipedInputStream.receive(Unknown Source) at java.io.PipedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141) at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200) at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 'Sponsored' links
Does the sponsored information have to be in the index? Couldn't you lookup the sponsor info in a database (or something else) after getting back your initial results and then re-sort the hit list, moving up the sponsored elements while maintaining the rest of the results as is? If your list of sponsors are truly that small, you could just put 'em in a file and load the list into memory. Seems then you don't have to re-index when your sponsorships change and you really have no dependencies on Lucene with trying to get boost values right, etc. I guess this resembles #2. [EMAIL PROTECTED] 02/15/04 03:49PM I am a newbie to Lucene, and this is my first serious posting to Lucene-user. This is to solicit comment upon the problem of supplying a sponsored links capability within Lucene. This capability would not affect at all which documents are returned by a query, but would cause any 'sponsored' documents present among the results to be displayed before other documents in the list returned. I have looked over the correspondence in Lucene-user, but not found anything addressing this topic; if I have missed it, please tell me where and when, and ignore the rest of this. It seems to me that there are three ways to achieve the capability: 1. Preset boost values for 'sponsored' documents, with an implied burden of reindexing when sponsors are modified. 2. Post-qualify documents present in the hit list for their sponsorship status, building a new hit list. 3. Modify the query to search using both the full query as an unsponsored boolean clause with the default boost value, and for each sponsor, to repeat the full query ANDed with that sponsor with the appropriate boost value. Are there other strategies not considered? Assuming a small list of sponsors (10 or fewer), and low volatility amongst the sponsors (1 change / month or less) which method is best? I have been pursuing method #1, almost to the exclusion of the others, but have encountered an unknown difficulty in the implementation (separate posting). In particular, while it is clear that #3 is doable, I know nothing about the searching burden added by multiplying the user's query by one plus the count of sponsors. Regarding #3, if my understanding is right, then: Sponsors name: s1, s2, s3 ... words or phrases: s1w1, s1w2, ... , s2w1, s2w2, ... , s3w1 boost values: s1v, s2v, s3v then given query q as user input, form: q or (q and (s1w1 | s1w2 | s1w3 | ...)^s1v) or (q and (s2w1 | s2w2 ...)^s2v) or (q and (s3w1 ...)^s3v) Is this correct? Does the strategy of search identify any kind of intermediate sublist to speed up searching? (But then it would start to resemble #2.) Rolling ones own for #2 would run query q, and get the HitCollector. Separately running queries for each of: s1w1 | s1w2 | s1w3 | ..., s2w1 | s2w2 ... s3w1 ... and merge each hit collector with the one from query q. (Just AND the bitsets???) Lastly adjust scores and form a new composite HitCollecter. By this time I have told everyone much more than I know. Stray thought:-- can HitCollectors be cached at application init? There are many other questions regarding details of implementation, but their proper place is another communication. Just by preparing this document for dissemination has helped greatly. All and any comments are much appreciated. Thank you all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
can't create webapp demo index
Hi, I'm having trouble creating the index for the webapp demo. I had no trouble creating the index for the non-webapp demo, but I get an NullPointerException when I try it for the webapp. I'm on Windows2000 and here's my input and the error message that I got: C:\tomcat\webapps\examplesjava org.apache.lucene.demo.IndexHTML -create -index C:\tomcat\webapps\index caught a class java.lang.NullPointerException with message: null TIA, Ted __ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
thanks for your mail
Received your mail we will get back to you shortly - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Field Reindex Question
Hi, I'm thinking of using Lucene in an application that might change the field data without modifying the document. It would be nice to only have to rewrite the field index information, which is much smaller than the information for the document. Would anyone know if this is possible? Thanks in Advance, Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Field Reindex Question
You must remove and re-add the entire document to perform an update. Such is the (current) nature of Lucene. Erik On Feb 15, 2004, at 10:25 PM, Tim Walters wrote: Hi, I'm thinking of using Lucene in an application that might change the field data without modifying the document. It would be nice to only have to rewrite the field index information, which is much smaller than the information for the document. Would anyone know if this is possible? Thanks in Advance, Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]