HOWTO USE SORT on QUERY PARSER :)

2004-07-14 Thread Karthik N S
Hey

  Guys'

Apologies...

Gee th's so simple u have explained me Thx a lot.


Please correct me If I am wrong

1)

So U  tell me that On Field type  FIELD_CONTENTS  , the relevant hits can
be sorted  wrt  Field type FIELD_DATE 

[ Where FIELD_DATE  FIELD_CONTENTS are Field Typos for Lucene]...


2)
  To Run the Junit test's Do I need to Dwnload all the Files from CVS [Will
there be a build .aml within the CVS] to run and execute  the Tests...


with regards
Karthik


-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 12:08 PM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


example:
query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer);
Sort sort =new Sort();
sort.setSort(FIELD_DATE,true);
//hits = searcher.search(query,sort);
hits = multiSearcher.search(query,sort);
...
FIELD_DATE - indexed field.

Regards,
Vladimir

On Wed, 14 Jul 2004 12:02:33 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hey
   Guys

Apologies

   Before running the Build.xml for the  Junit Test files , Do I need
to
Download all the Files present in  Search folder
from lucene CVS TEST in order to get the O/p Results

With regards
Karthik



-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:38 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


It is config problem.
Run build.xml -- [Run ANT...]-- Run unit tests.
Vladimir.

On Wed, 14 Jul 2004 11:27:25 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hi
Guys

Apologies

I am using Eclipse 3.0 Ide , so when I run this file within the IDE,I
am not
able to VIEW the O/p Results.
[ Till now I have no Idea about how to setup and run the Junit
tests/View
results on  the O.ps ]

Please give me some Tips on this .

With regards
Karthik

-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:12 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


Hi!

 From CVS --
jakarta-lucene/src/test/org/apache/lucene/search/TestSort.java
Run it as  UnitTest  (   :-(   --   :-))

Best regards,
Vladimir.

On Tue, 13 Jul 2004 15:31:18 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hey

  Guys

Apologies

   Can somebody please explain to me with a simple SRC example of
 how to
use SORT on Query parser [1.4 lucene]
  [ I am confused with the code snippet on the CVS Test Case]



with regards
Karthik

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 13, 2004 2:29 AM
To: [EMAIL PROTECTED]
Subject: Re: Could search results give an idea of which field matched


See the explain functionality in the Javadocs and previous threads.
 You can
ask Lucene to explain why it got the results it did for a give hit.

 [EMAIL PROTECTED] 07/12/04 04:52PM 
I search the index on multiple fields. Could the search results also
tell me which field matched so that the document was selected? From
what
I can tell, only the document number and a score are returned, is
there
a way to also find out what was the field(s) of the document matched
the
query?



Sildy





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Why is Field.java final?

2004-07-14 Thread Holger Klawitter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday 13 July 2004 18:12, Doug Cutting wrote:
 John Wang wrote:
 On the same thought, how about the org.apache.lucene.analysis.Token
  class. Can we make it non-final?
 Sure, if you make a case for why it should be non-final.

How about the ability to provide a writer to termText in order to exchange
a word by a synonym without having to create another object?

I favor everything which makes the Lucene API less restricitve
thus making more unexpected things possible :-)

Mit freundlichem Gruß / With kind regards
Holger Klawitter
- --
lists at klawitter dot de
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFA9NvS1Xdt0HKSwgYRAg0IAKCFVclqmhjiD5yugIQenkQnRnELWgCgoaf2
rjrg92P0kWuMAj+wEXpH23Y=
=z3rj
-END PGP SIGNATURE-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-14 Thread Kevin A. Burton
Doug Cutting wrote:
Aviran wrote:
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.

Note that this patch won't produce the a speedup on earlier releases, 
since their was another multi-thread bottleneck higher up the stack 
that was only recently removed, revealing this lower-level bottleneck.

The other patch was:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html
Both are required to see the speedup.
Thanks...
Also, is there any reason folks cannot use 1.4 final now?
No... just that I'm trying to be conservative... I'm probably going to 
look at just migrating to 1.4 ASAP but we're close to a milestone...

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search Result + Highlighter

2004-07-14 Thread Karthik N S
Hi Guys

  Some week 's back had  reported a problem regarding  Search on Indexed
file  using Highlighter

  The Highlighter used to Dipslay   [Pad] or  [0] between  words  ( The
Field type is Field.Text type, stores the HTML summary )

  [ I am using  a CustomAnalyzer which is similar to  Standard Analyzer with
555 ENGLISH_STOP_WORDS]

  If any body has sombody looked into this matter for patch , please
specfy..



with rehards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 1:06 AM
To: Lucene Users List
Subject: Re: Search Result


Look at the Term Highlighter here:

http://jakarta.apache.org/lucene/docs/lucene-sandbox/


On Jul 13, 2004, at 2:32 PM, Hetan Shah wrote:

 I think I have not explained my question correctly. What is happening
 is when I show the result on a page the text below the link as shown
 below.

 Test Page for Apache Installation
 http://dev-server.sfbay:8880/docs/sample.htm
 Sample content

 Jakarta Lucene - Lucene Sandbox
 http://dev-server.sfbay:8880/docs/lucene-sandbox/index.html
 [Jakarta Lucene] About Overview Powered by Lucene Who We Are Mailing
 Lists Resources FAQ (Official) jGuru FAQ Getting Started Query Syntax
 File Formats Javadoc Contributions Articles, etc. Benchmark


 In first example the search criteria sample occurs in the beginning
 of the page and so it shows up in the text below the link. In the
 second example the keyword sample shows up somewhere later in the
 document and so it does not show up in the text below the link. What
 can I do so that in all cases the text below the link always has the
 piece of the document where the keyword is found?

 thanks in advance.

 -H

 Hetan Shah wrote:

 What I am trying to figure out is. In my search result which is
 returned by the

 Document doc = hits.doc(i);
 text to show = doc.get(summary);

 The summary field seems to contain only the first few lines of the
 document. How can I make it to contain the piece that matches the
 query string?

 Thanks.
 -H

 Hetan Shah wrote:

 David,

 Do you know, in the demo code, how do I override or change this
 value so that I get to see the appropriate chuck of document? Would
 this change make the actual result to show the relevant section of
 the document?

 Sorry to sound so ignorant, I am very new at the whole search
 technology, getting to learn a lot from a great supportive
 community.

 Thanks,
 -H
 David Spencer wrote:

 Hetan Shah wrote:

 My search results are only displaying the top portion of the
 indexed documents. It does match the query in the later part of
 the document. Where should I look to change the code in demo3 of
 default 1.3 final distribution. In general if I want to show the
 block of document that matches with the query string which classes
 should I use?




 Sounds like this:

 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/
 IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH


 Thanks guys.
 -H


 ---
 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]



 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Problems indexing Japanese with CJKAnalyzer

2004-07-14 Thread Jon Schuster
Hi all,

Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.

The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of the HTMLParser (from the Lucene demo
package). In HTMLDocument, doing this:

InputStreamReader reader = new InputStreamReader( new
FileInputStream(f), SJIS);
HTMLParser parser = new HTMLParser( reader );

creates the parser and feeds it Unicode from the original Shift-JIS encoding
document, but then when the document contents is fetched using this line:

Field fld = Field.Text(contents, parser.getReader() );

HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter
using the default encoding, which in my case was Windows 1252 (essentially
Latin-1). That was bad.

In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on
both the Reader and Writer got things mostly working. The one missing piece
was in the options section of the HTMLParser.jj file. The original grammar
file generates an input character stream class that treats the input as a
stream of 1-byte characters. To have JavaCC generate a stream class that
handles double-byte characters, you need the option UNICODE_INPUT=true.

So, there were essentially three changes in two files:

HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit
UTF8 encoding on Reader and Writer creation in getReader(). As far as I
can tell, this changes works fine for all of the languages I need to handle,
which are English, French, German, and Japanese.

HTMLDocument - add explicit encoding of SJIS when creating the Reader used
to create the HTMLParser. (For western languages, I use encoding of
ISO8859_1.)

And of course, use the right language tokenizer.

--Jon

earlier responses snipped; see the list archive

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Scoring without normalization!

2004-07-14 Thread Jones G
How do I remove document normalization from scoring in Lucene? I just want to stick to 
TF IDF.

Thanks.

ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer

2004-07-14 Thread Claude Devarenne
Hi,
A user mistyped their search terms and entered a query that looked like  
this:

the AND title:bla
I am using lucene 1.4 rc3. My web app,  which is using a  
StandardAnalyzer, got an ArrayIndexOutOfBoundsException (stack trace  
below).  I can reproduce this with the lucene demo (both the jsp and  
the comand line util).

Since I have the queryParser.parse(queryString) call in a try statement  
I am now catching this exception so it fixes the issue.

My question is: should the queryParser catch that there is no term  
before trying to add a clause when using a StandardAnalyzer?  Is this  
even possible? Should the burden be on the application to either catch  
the exception or parse the query before handing it out to the  
queryParser?

Claude
Here is the stack trace:
java.lang.ArrayIndexOutOfBoundsException: -1  0
java.util.Vector.elementAt(Vector.java:437) at
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java: 
181)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:509)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108)
at QueryExec.runQuery(QueryExec.java:245)

RE: Scoring without normalization!

2004-07-14 Thread Anson Lau
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



// Comment out the following
//float scoreNorm = 1.0f;
//if (length  0  scoreDocs[0].score  1.0f) {
//  scoreNorm = 1.0f / scoreDocs[0].score;
//}

// And just set scoreNorm to 1.
int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
From: Jones G [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: RE: Scoring without normalization!

2004-07-14 Thread Jones G
Thanks! Just what I wanted.

On Thu, 15 Jul 2004 Anson Lau wrote :
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



 // Comment out the following
 //float scoreNorm = 1.0f;
 //if (length  0  scoreDocs[0].score  1.0f) {
 //  scoreNorm = 1.0f / scoreDocs[0].score;
 //}

 // And just set scoreNorm to 1.
 int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching against Database

2004-07-14 Thread Hetan Shah
Hello All,
I have got all the answers from this fantastic mailing list. I have 
another question ;)

What is the best way (Best Practices) to integrate Lucene with live 
database, Oracle to be more specific. Any pointers are really very much 
appreciated.

thanks guys.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


One Field!

2004-07-14 Thread Jones G
I have an index with multiple fields. Right now I am using MultiFieldQueryParser to 
search the fields. This means that if the same term occurs in multiple fields, it will 
be weighed accordingly. Is there any way to treat all the fields in question as one 
field and score the document accordingly without having to reindex.

Thanks.