HOWTO USE SORT on QUERY PARSER :)
Hey Guys' Apologies... Gee th's so simple u have explained me Thx a lot. Please correct me If I am wrong 1) So U tell me that On Field type FIELD_CONTENTS , the relevant hits can be sorted wrt Field type FIELD_DATE [ Where FIELD_DATE FIELD_CONTENTS are Field Typos for Lucene]... 2) To Run the Junit test's Do I need to Dwnload all the Files from CVS [Will there be a build .aml within the CVS] to run and execute the Tests... with regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 12:08 PM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( example: query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer); Sort sort =new Sort(); sort.setSort(FIELD_DATE,true); //hits = searcher.search(query,sort); hits = multiSearcher.search(query,sort); ... FIELD_DATE - indexed field. Regards, Vladimir On Wed, 14 Jul 2004 12:02:33 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hey Guys Apologies Before running the Build.xml for the Junit Test files , Do I need to Download all the Files present in Search folder from lucene CVS TEST in order to get the O/p Results With regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 11:38 AM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( It is config problem. Run build.xml -- [Run ANT...]-- Run unit tests. Vladimir. On Wed, 14 Jul 2004 11:27:25 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies I am using Eclipse 3.0 Ide , so when I run this file within the IDE,I am not able to VIEW the O/p Results. [ Till now I have no Idea about how to setup and run the Junit tests/View results on the O.ps ] Please give me some Tips on this . With regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 11:12 AM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( Hi! From CVS -- jakarta-lucene/src/test/org/apache/lucene/search/TestSort.java Run it as UnitTest ( :-( -- :-)) Best regards, Vladimir. On Tue, 13 Jul 2004 15:31:18 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hey Guys Apologies Can somebody please explain to me with a simple SRC example of how to use SORT on Query parser [1.4 lucene] [ I am confused with the code snippet on the CVS Test Case] with regards Karthik -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 13, 2004 2:29 AM To: [EMAIL PROTECTED] Subject: Re: Could search results give an idea of which field matched See the explain functionality in the Javadocs and previous threads. You can ask Lucene to explain why it got the results it did for a give hit. [EMAIL PROTECTED] 07/12/04 04:52PM I search the index on multiple fields. Could the search results also tell me which field matched so that the document was selected? From what I can tell, only the document number and a score are returned, is there a way to also find out what was the field(s) of the document matched the query? Sildy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Why is Field.java final?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tuesday 13 July 2004 18:12, Doug Cutting wrote: John Wang wrote: On the same thought, how about the org.apache.lucene.analysis.Token class. Can we make it non-final? Sure, if you make a case for why it should be non-final. How about the ability to provide a writer to termText in order to exchange a word by a synonym without having to create another object? I favor everything which makes the Lucene API less restricitve thus making more unexpected things possible :-) Mit freundlichem Gruß / With kind regards Holger Klawitter - -- lists at klawitter dot de -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFA9NvS1Xdt0HKSwgYRAg0IAKCFVclqmhjiD5yugIQenkQnRnELWgCgoaf2 rjrg92P0kWuMAj+wEXpH23Y= =z3rj -END PGP SIGNATURE- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Doug Cutting wrote: Aviran wrote: I changed the Lucene 1.4 final source code and yes this is the source version I changed. Note that this patch won't produce the a speedup on earlier releases, since their was another multi-thread bottleneck higher up the stack that was only recently removed, revealing this lower-level bottleneck. The other patch was: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html Both are required to see the speedup. Thanks... Also, is there any reason folks cannot use 1.4 final now? No... just that I'm trying to be conservative... I'm probably going to look at just migrating to 1.4 ASAP but we're close to a milestone... Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Result + Highlighter
Hi Guys Some week 's back had reported a problem regarding Search on Indexed file using Highlighter The Highlighter used to Dipslay [Pad] or [0] between words ( The Field type is Field.Text type, stores the HTML summary ) [ I am using a CustomAnalyzer which is similar to Standard Analyzer with 555 ENGLISH_STOP_WORDS] If any body has sombody looked into this matter for patch , please specfy.. with rehards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 1:06 AM To: Lucene Users List Subject: Re: Search Result Look at the Term Highlighter here: http://jakarta.apache.org/lucene/docs/lucene-sandbox/ On Jul 13, 2004, at 2:32 PM, Hetan Shah wrote: I think I have not explained my question correctly. What is happening is when I show the result on a page the text below the link as shown below. Test Page for Apache Installation http://dev-server.sfbay:8880/docs/sample.htm Sample content Jakarta Lucene - Lucene Sandbox http://dev-server.sfbay:8880/docs/lucene-sandbox/index.html [Jakarta Lucene] About Overview Powered by Lucene Who We Are Mailing Lists Resources FAQ (Official) jGuru FAQ Getting Started Query Syntax File Formats Javadoc Contributions Articles, etc. Benchmark In first example the search criteria sample occurs in the beginning of the page and so it shows up in the text below the link. In the second example the keyword sample shows up somewhere later in the document and so it does not show up in the text below the link. What can I do so that in all cases the text below the link always has the piece of the document where the keyword is found? thanks in advance. -H Hetan Shah wrote: What I am trying to figure out is. In my search result which is returned by the Document doc = hits.doc(i); text to show = doc.get(summary); The summary field seems to contain only the first few lines of the document. How can I make it to contain the piece that matches the query string? Thanks. -H Hetan Shah wrote: David, Do you know, in the demo code, how do I override or change this value so that I get to see the appropriate chuck of document? Would this change make the actual result to show the relevant section of the document? Sorry to sound so ignorant, I am very new at the whole search technology, getting to learn a lot from a great supportive community. Thanks, -H David Spencer wrote: Hetan Shah wrote: My search results are only displaying the top portion of the indexed documents. It does match the query in the later part of the document. Where should I look to change the code in demo3 of default 1.3 final distribution. In general if I want to show the block of document that matches with the query string which classes should I use? Sounds like this: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH Thanks guys. -H --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problems indexing Japanese with CJKAnalyzer
Hi all, Thanks for the help on indexing Japanese documents. I eventually got things working, and here's an update so that other folks might have an easier time in similar situations. The problem I had was indeed with the encoding, but it was more than just the encoding on the initial creation of the HTMLParser (from the Lucene demo package). In HTMLDocument, doing this: InputStreamReader reader = new InputStreamReader( new FileInputStream(f), SJIS); HTMLParser parser = new HTMLParser( reader ); creates the parser and feeds it Unicode from the original Shift-JIS encoding document, but then when the document contents is fetched using this line: Field fld = Field.Text(contents, parser.getReader() ); HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter using the default encoding, which in my case was Windows 1252 (essentially Latin-1). That was bad. In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on both the Reader and Writer got things mostly working. The one missing piece was in the options section of the HTMLParser.jj file. The original grammar file generates an input character stream class that treats the input as a stream of 1-byte characters. To have JavaCC generate a stream class that handles double-byte characters, you need the option UNICODE_INPUT=true. So, there were essentially three changes in two files: HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit UTF8 encoding on Reader and Writer creation in getReader(). As far as I can tell, this changes works fine for all of the languages I need to handle, which are English, French, German, and Japanese. HTMLDocument - add explicit encoding of SJIS when creating the Reader used to create the HTMLParser. (For western languages, I use encoding of ISO8859_1.) And of course, use the right language tokenizer. --Jon earlier responses snipped; see the list archive - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Scoring without normalization!
How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks.
ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer
Hi, A user mistyped their search terms and entered a query that looked like this: the AND title:bla I am using lucene 1.4 rc3. My web app, which is using a StandardAnalyzer, got an ArrayIndexOutOfBoundsException (stack trace below). I can reproduce this with the lucene demo (both the jsp and the comand line util). Since I have the queryParser.parse(queryString) call in a try statement I am now catching this exception so it fixes the issue. My question is: should the queryParser catch that there is no term before trying to add a clause when using a StandardAnalyzer? Is this even possible? Should the burden be on the application to either catch the exception or parse the query before handing it out to the queryParser? Claude Here is the stack trace: java.lang.ArrayIndexOutOfBoundsException: -1 0 java.util.Vector.elementAt(Vector.java:437) at org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java: 181) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:509) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108) at QueryExec.runQuery(QueryExec.java:245)
RE: Scoring without normalization!
If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: RE: Scoring without normalization!
Thanks! Just what I wanted. On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching against Database
Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
One Field!
I have an index with multiple fields. Right now I am using MultiFieldQueryParser to search the fields. This means that if the same term occurs in multiple fields, it will be weighed accordingly. Is there any way to treat all the fields in question as one field and score the document accordingly without having to reindex. Thanks.