sitegeist
Did anyone write some neat tool for statistical analysis of hits over time? I need one. And it must be fast. Was thinking something like this: List timeFrames; class TimeFrame { Date from; Date to; void add(Hits hits) { int score = 10; for (int d = 0; score<0 && d
Re: Making SpanQuery more effiicent
After some more research, it seems that one of the bottlenecks is Spans.next(), can I drop anything out in order to improve performance? Most of the queries are SpanNearQuery with SpanOrQuery as its clauses. Any help would be much appreciated. Regards, Michael On 5/25/06, Michael Chan <[EMAIL PROTECTED]> wrote: I see. Also, as I'm only interested in the number of results returned and not in the ranking of documents returned, is there any component I can simplify in order to improve search performance? Perhaps, Scorer or Similarity? Thanks. Michael On 5/24/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Unfortunately, I want to have subqueries inside my query (e.g. (t1 AND > : t2) NEAR (t3 OR t4)), and PhraseQuery seems to allow only Terms inside > : it. > > In that case, you aren't just using SpanQuery for the use of slop -- you > are using the Span information, you just don't realize it (that's how all > of the SpanQueries work -- the get the Slop information from the sub > queries and propogate them up. (which is also why you can't use just any > only Query as a clause in a SpanNearQuery) > > : > > As I use SpanQuery purely for the use of slop, I was wondering how to > : > > make SpanQuery more efficient,. Since I don't need any span > : > > information, is there a way to disable the computation for span and > : > > other unneeded overhead? > > > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search oddities
It appears that I was confused about the way analyzers are working. I assumed that a typical analyzer would just remove hyphens and treat the phrase as a space. We're just using StandardAnalyzer. When we search (using QueryParser) for the phrase "t-mobile" (including quotes) we're getting results back which only have the phrase "mobile" in them. I would assume the analyzer would convert this to the "t mobile" (again, in quotes) and would only return documents containing that phrase. Oddly, however, if we search for "pay-tv", we only get back documents that actually have the phrase "pay tv" or "pay-tv" - nothing which just has "pay" or "tv". I'm not quite sure why "t-mobile" is behaving differently to "pay-tv". If anyone could point me in the right direction I'd be very grateful! Many thanks, Tim. The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
best way to get specific results
Hi all, I need to be able to get specific documents out of the returned documents without the need to retrieve all the other documents. just to describe my case, the user is allowed to specify in the queryString the page number and number of results to return. for example if a query returns 1000 results, the user is interested only in the results between 500&550. the way I implemented it is run a normal query using IndexSercher.search(Query()) and then get the specified documents out of the hits object. I am wondering if there is a more efficient way than this, is using TopDocs better than the hits object, knowing that some users may need more than a 1000 docs back in one query?. thank you for your help, Omar. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search oddities
On Donnerstag 25 Mai 2006 16:18, [EMAIL PROTECTED] wrote: > When we search (using QueryParser) for the phrase "t-mobile" (including > quotes) t-mobile becomes "t mobile", but "t" is a stopword by default. Why? Maybe the person who added it has a dislike for German Telekom :-) But seriously, you should probably file a bug report. Workaround for now is to use your own stopwords. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search oddities
On May 25, 2006, at 11:01 AM, Daniel Naber wrote: On Donnerstag 25 Mai 2006 16:18, [EMAIL PROTECTED] wrote: When we search (using QueryParser) for the phrase "t- mobile" (including quotes) t-mobile becomes "t mobile", but "t" is a stopword by default. Why? Maybe the person who added it has a dislike for German Telekom :-) But seriously, you should probably file a bug report. Workaround for now is to use your own stopwords. "t" is a stop word because words like "don't" get analyzed into [don] [t]. In the short term, its not really a bug but just the nature of how it was meant to be. Changing the default stop words in the 1.9/2.0 releases isn't going to happen... but certainly lobbying for this to be more sensible in the future is worth it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search oddities
On Donnerstag 25 Mai 2006 17:48, Erik Hatcher wrote: > "t" is a stop word because words like "don't" get analyzed into [don] > [t]. Maybe it should, but it doesn't it seems: don't gets parsed as field:don't using StandardAnalyzer and QueryParser. Mhh, maybe this is because people use different characters: don't don´t don`t Only the first (and correct?) one is not split up using StandardAnalyzer. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Integrating a J2EE Application into "Generic" Enterprise Search
Hi, I am planning to integrate Lucene into our application. However, I also want to support the general enterprise search market and what our customers have installed. Ideally, we would develop: 1. generic search support services a. index records into logical "document-centric" records with urls for access. b. Triggers to generated updated records in real time. c. Batch indexer for generation of intial data for search engine. 2. specific search enginer support - e.g. lucene, retrievalware, etc. Are there any enterprise search intergration standards (e.g. xml schema)? Any recommendations for best approaching this? Thanks, Nick
Re: Question about special characters
My own solution until I have another one better, I use FuzzyQuery for every term in the phrase. For example "My work is the worst" ->> My~ work~ is~ the~ worst What do you think about this uggly solution? I don't have anything more ideas. 2006/5/24, Dan Wiggin <[EMAIL PROTECTED]>: I need some functionality and I don't know how to do. The problem is special characters like à, ä , ç or ñ latin characters in the text. Now I use iso latin filter, but the problem is when I want to obtain most term used. These term are stored without ` ´ ^ or another "character attribute". For example "plàntïuç" (it isn't a real word) is stored like the term "plantiuc". How can I do to have in term vector the word "plàntïuç". thks for all replies. PD: excuse if this question is solved somewhere, but I don't saw it.
Re: best way to get specific results
: if a query returns 1000 results, the user is interested only in the : results between 500&550. the way I implemented it is run a normal query : using IndexSercher.search(Query()) and then get the specified documents : out of the hits object. I am wondering if there is a more efficient way : than this, is using TopDocs better than the hits object, knowing that : some users may need more than a 1000 docs back in one query?. generally speaking, yes TopDocs (or TopFieldDocs) are better then Hits if you plan on acessing morethen the first 100 or so results .. Hits will reexecute your search over and over as you ask for higher numbered results, while with TopDocs you search is executed once, and you are given only the Doc IDs of the first N docs you asked for, with no other processing done behind the scenes (in your case, it sounds like N would be 550, and you'd start accessing the ScoreDoc[] at 500. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Question about special characters
I think I'm missing something here. the whole point of the ISOLatin1AccentFilter is to replace accented characters with their unaccented equivalent -- it sounds like that's working just fine, If you want teh words in teh term vector to contain the accents, why don't you stop using that filter? if the problem is that you need to be able to match on both the accented form and the non accented form, perhaps you should have two fields, or modify the ISOLatin1AccentFilter so it puts both versions of the token in the TokenStream with the same position? : > The problem is special characters like à, ä , ç or ñ latin characters in : > the text. : > Now I use iso latin filter, but the problem is when I want to obtain most : > term used. These term are stored without ` ´ ^ or another "character : > attribute". : > For example "plàntïuç" (it isn't a real word) is stored like the term : > "plantiuc". : > How can I do to have in term vector the word "plàntïuç". : > : > thks for all replies. : > PD: excuse if this question is solved somewhere, but I don't saw it. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Boolean query term match count
Hello, I'm working on a search application and I need to know if it is possible to get the number of terms that actually matched a Boolean query. For example let's say I have field test with values aaa bbb ccc d e f and I constructed a Boolean query like this: test:aaa OR test:bbb OR test:e is it possible to get the count - 3 for the the terms (aaa, bbb, e) that matched from the test field. If not would it be possible to modify the BooleanScorer to accomplish this? TIA, Michael
Re: Boolean query term match count
On Thursday 25 May 2006 21:08, Crump, Michael wrote: > Hello, > > > > I'm working on a search application and I need to know if it is possible > to get the number of terms that actually matched a Boolean query. For > example let's say I have field test with values aaa bbb ccc d e f and I > constructed a Boolean query like this: test:aaa OR test:bbb OR test:e > is it possible to get the count - 3 for the the terms (aaa, bbb, e) that > matched from the test field. If not would it be possible to modify the > BooleanScorer to accomplish this? Have a look at Similarity and DefaultSimilarity. It looks like you need your own Similarity with a non constant coord() implementation, and some constant value for the rest of the methods in there. The first argument to coord() is the number of matching clauses for a document for a boolean query. The value returned by coord() will be used in the calculation of the score value for the document. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Integrating a J2EE Application into "Generic" Enterprise Search
On 5/25/06, Nicholas Van Weerdenburg <[EMAIL PROTECTED]> wrote: Are there any enterprise search intergration standards (e.g. xml schema)? It may or may not be what you are looking for, but there is Solr, a lucene-based search server with XML/HTTP interfaces. It's primarily meant to be a standaone server (think database), but it is possible to embed. See my sig for the link. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Integrating a J2EE Application into "Generic" Enterprise Search
Solr is nice when you can change the existing enterprise applications, extract content and post xml content to the server. But definately still a lot of coding. I would say DBSight is another alternative here. It has similar architecture as Solr, but it crawls databases by configurable SQLs. Only need to plug into any existing databases by JDBC, and it can fit any schema. No xml, xslt efforts. Usually in 15 minutes you can have a google-like search. Chris -- Lucene Search on Any Databases/Applications http://www.dbsight.net On 5/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 5/25/06, Nicholas Van Weerdenburg <[EMAIL PROTECTED]> wrote: > Are there any enterprise search intergration standards (e.g. xml > schema)? It may or may not be what you are looking for, but there is Solr, a lucene-based search server with XML/HTTP interfaces. It's primarily meant to be a standaone server (think database), but it is possible to embed. See my sig for the link. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Integrating a J2EE Application into "Generic" Enterprise Search
Both sound interesting, but what I want is to be able to generate the intermediate xml that most enterprise search servers could use to quickly integrate with them. e.g. customer 1 uses retrievalware for enterprise search customer 2 uses Solr customer 3 uses yyy. How do I build our my functionality to support RetrievalWare, Solr, and YYY? One thought is to build a directory view that invites crawlers in to auto-index. DBSight sounds interesting but it does't help me with other enterprise search tools from the sounds of it. I'm also thinking that our database schema is resistant to sql-oriented queries. Thanks, Nick -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Thu 5/25/2006 10:15 PM To: java-user@lucene.apache.org Cc: Subject:Re: Integrating a J2EE Application into "Generic" Enterprise Search Solr is nice when you can change the existing enterprise applications, extract content and post xml content to the server. But definately still a lot of coding. I would say DBSight is another alternative here. It has similar architecture as Solr, but it crawls databases by configurable SQLs. Only need to plug into any existing databases by JDBC, and it can fit any schema. No xml, xslt efforts. Usually in 15 minutes you can have a google-like search. Chris -- Lucene Search on Any Databases/Applications http://www.dbsight.net On 5/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 5/25/06, Nicholas Van Weerdenburg <[EMAIL PROTECTED]> wrote: > > Are there any enterprise search intergration standards (e.g. xml > > schema)? > > It may or may not be what you are looking for, but there is Solr, a > lucene-based search server with XML/HTTP interfaces. It's primarily > meant to be a standaone server (think database), but it is possible to > embed. See my sig for the link. > > -Yonik > http://incubator.apache.org/solr Solr, the open-source Lucene search server > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search in Jasper Report
Hi All, I m using Jasper report as a report tool.In my application a report has 300 pages. How can i use Lucene to search in .jasper file. Reg. Chandrakant S Chouhan