Re: Hierarchical classified documents
25 nov 2006 kl. 17.26 skrev Robert Koberg: What I do is add a 'path' field with the xpath to the node. Then you first narrow your search by finding documents with paths like: /node[1]/node[3]* You use a wildcard query? That can turn out to be very expensive if you have a thousand and thousand of heirarchies. Or? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Hierarchical classified documents
karl wettin wrote: 25 nov 2006 kl. 17.26 skrev Robert Koberg: What I do is add a 'path' field with the xpath to the node. Then you first narrow your search by finding documents with paths like: /node[1]/node[3]* You use a wildcard query? That can turn out to be very expensive if you have a thousand and thousand of heirarchies. Or? I think it is a valid option to have, at least for my needs. I do it for a website hierarchy. Hopefully, a website will not have a very deep hierarchy :) Basically, it is presented to the user in a right+click context menu on a nav tree. You /could/ use the wildcard query with the hope that the user has drilled down as far as possible. Or you could just search a directory and not go any deeper (by not including the wildcard char -- presented to the user as a search option). best, -Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: RAMDirectory vs MemoryIndex
I tested this. I use a single static analyzer for all my documents, and the caching analyzer was not working properly. I had to add a method to clear the cache each time a new document was to be indexed, and then it worked as expected. I have never looked into lucenes inner working so I am not sure if what I did is correct. I also had to comment some code cause I merged the memory stuff from trunk with lucene 2.0. Performance was certainly much better (4 times faster in my very gross testing), but for my processing that operation is only a very small, so I will keep the original way, without caching the tokens, just to be able to use the unmodified lucene 2.0. I found a data problem in my tests, but as I was not going to pursue that improvement for now I did not look into it. thanks, javier On 11/23/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote: Out of interest, I've checked an implementation of something like this into AnalyzerUtil SVN trunk: /** * Returns an analyzer wrapper that caches all tokens generated by the underlying child analyzer's * token stream, and delivers those cached tokens on subsequent calls to * tokenStream(String fieldName, Reader reader). * * This can help improve performance in the presence of expensive Analyzer / TokenFilter chains. * * Caveats: * 1) Caching only works if the methods equals() and hashCode() methods are properly * implemented on the Reader passed to tokenStream(String fieldName, Reader reader). * 2) Caching the tokens of large Lucene documents can lead to out of memory exceptions. * 3) The Token instances delivered by the underlying child analyzer must be immutable. * * @param child *the underlying child analyzer * @return a new analyzer */ public static Analyzer getTokenCachingAnalyzer(final Analyzer child) { ... } Check it out, and let me know if this is close to what you had in mind. Wolfgang. On Nov 22, 2006, at 9:19 AM, Wolfgang Hoschek wrote: > I've never tried it, but I guess you could write an Analyzer and > TokenFilter that no only feeds into IndexWriter on > IndexWriter.addDocument(), but as a sneaky side effect also > simultaneously saves its tokens into a list so that you could later > turn that list into another TokenStream to be added to MemoryIndex. > How much this might help depends on how expensive your analyzer > chain is. For some examples on how to set up analyzers for chains > of token streams, see MemoryIndex.keywordTokenStream and class > AnalzyerUtil in the same package. > > Wolfgang. > > On Nov 22, 2006, at 4:15 AM, jm wrote: > >> checking one last thing, just in case... >> >> as I mentioned, I have previously indexed the same document in >> another >> index (for another purpose), as I am going to use the same analyzer, >> would it be possible to avoid analyzing the doc again? >> >> I see IndexWriter.addDocument() returns void, so it does not seem to >> be an easy way to do that no? >> >> thanks >> >> On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote: >>> >>> On Nov 21, 2006, at 12:38 PM, jm wrote: >>> >>> > Ok, thanks, I'll give MemoryIndex a go, and if that is not good >>> enoguh >>> > I will explore the other options then. >>> >>> To get started you can use something like this: >>> >>> for each document D: >>> MemoryIndex index = createMemoryIndex(D, ...) >>> for each query Q: >>> float score = index.search(Q) >>> if (score > 0.0) System.out.println("it's a match"); >>> >>> >>> >>> >>>private MemoryIndex createMemoryIndex(Document doc, Analyzer >>> analyzer) { >>> MemoryIndex index = new MemoryIndex(); >>> Enumeration iter = doc.fields(); >>> while (iter.hasMoreElements()) { >>>Field field = (Field) iter.nextElement(); >>>index.addField(field.name(), field.stringValue(), analyzer); >>> } >>> return index; >>>} >>> >>> >>> >>> > >>> > >>> > On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote: >>> >> On Nov 21, 2006, at 7:43 AM, jm wrote: >>> >> >>> >> > Hi, >>> >> > >>> >> > I have to decide between using a RAMDirectory and >>> MemoryIndex, but >>> >> > not sure what approach will work better... >>> >> > >>> >> > I have to run many items (tens of thousands) against some >>> >> queries (100 >>> >> > at most), but I have to do it one item at a time. And I already >>> >> have >>> >> > the lucene Document associated with each item, from a previous >>> >> > operation I perform. >>> >> > >>> >> > From what I read MemoryIndex should be faster, but apparently I >>> >> cannot >>> >> > reuse the document I already have, and I have to create a new >>> >> > MemoryIndex per item. >>> >> >>> >> A MemoryIndex object holds one document. >>> >> >>> >> > Using the RAMDirectory I can use only one of >>> >> > them, also one IndexWriter, and create a IndexSearcher and >>> >> IndexReader >>> >> > per item, for searching and removing the item each time. >>> >> > >>> >> > Any thoug
Re: Error in QueryTermExtractor.getTermsFromBooleanQuery
Nope, not seen that one. Looks like the reference to no such field is in the Java instance data sense, not the Lucene document sense. Class versioning issues somewhere? That method takes a parameter called "prohibited" which is the name of the field reported in the error. Is the word "prohibited" a reserved Java word somewhere now? What JVM are you running on - 1.6? Cheers Mark Otis Gospodnetic wrote: Hi, I just moved from 1.9.1 to 2.1-dev. One error that seems to happen a lot now is below. I haven't had the chance to investigate yet (note the time), but I thought I'd throw (no pun intended) it out there and see if anyone else has seen this before. java.lang.NoSuchFieldError: prohibited at org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBooleanQuery(QueryTermExtractor.java:91) at org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:66) at org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:59) at org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:45) at org.apache.lucene.search.highlight.QueryScorer.(QueryScorer.java:48) The only thing I know so far is that the field I'm passing to the highlighter is actually empty, so there will be nothing to highlight, but it still shouldn't bomb. Here is a snippet from my code: TokenStream tokenStream = ANALYZER.tokenStream(_textFieldName, new StringReader(text)); highlightText = _highlighter.getBestFragments(tokenStream, text, _maxNumFragmentsRequired, "..."); ... That "text" variable holds the content of the field, and I just happen to know it's empty/blank (I currently don't store anything in that Field). I can't test with a non-empty field right now to check whether that throws QueryTermExtractor off. Thanks, Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Send instant messages to your online friends http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to set query time scoring
I have already set some score at the index time. And now i want to set some score at the query time. But i am not getting any idea of how to set the score at query time in lucene. Has anybody an idea how to do this? Regards Sajid -- View this message in context: http://www.nabble.com/How-to-set-query-time-scoring-tf2709773.html#a7554766 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to set query time scoring
Hi sajid, As you already boost data at indexing time... You can boost query at search time... eg. If you are firing boolean query and phrasequery...you might need to boost phrasequery PhraseQuery pq = new PhraseQuery(); pq.setBoost(2.0f); Thanks. Bhavin pandya - Original Message - From: "Sajid Khan" <[EMAIL PROTECTED]> To: Sent: Monday, November 27, 2006 10:17 AM Subject: How to set query time scoring I have already set some score at the index time. And now i want to set some score at the query time. But i am not getting any idea of how to set the score at query time in lucene. Has anybody an idea how to do this? Regards Sajid -- View this message in context: http://www.nabble.com/How-to-set-query-time-scoring-tf2709773.html#a7554766 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Newbie Search Question
Erick Erickson wrote: > > And how are you storing your date? Field.Store.YES? NO? COMPRESSED? > I think, here is my problem...I have found this in the FileDocument.java: doc.add(new Field("contents", new FileReader(f))); Field.Store.YES is missing, but when I try to put this argument, i become an error message. I'll try to found a solution for the problem, but if you have any tips for me - please let me know :) thanks in advance, sirakov -- View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7555948 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Database searching using Lucene....
Hi, I need some inputs on the database searching using lucene. Lucene directly supports the document searching but I am unable to find out the easy and the fastest way for database searching. Which option would be better - SPs or Lucene search engine in terms of implementation, performance and security...if anyone has already done analysis on the same, can you please provide me the comparison matrix or benchmarks for the same ? Thanks in advance Regards Inderjeet ***The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review,retransmission,dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.***