Re: Searching against Database

2004-07-14 Thread Sergiu Gordea
Hi,
I have a simillar problem. I'm working on a web application in which the 
users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows 
have in database tables.

I can get the  IDs of the documents that can be accesible by the user, 
but if this are 1000, what will happen in Lucene?

Is this a valid solution? Can anyone provide a better idea?
Thanks,
Sergiu
lingaraju wrote:
Hello
Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful
Thanks and regards
Raju
- Original Message - 
From: "Hetan Shah" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database

 

Hello All,
I have got all the answers from this fantastic mailing list. I have
another question ;)
What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very much
appreciated.
thanks guys.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Re: Searching against Database

2004-07-14 Thread Jones G
I don't have any best practices to offer. I have been using Lucene with MySQL for an 
year though.

All I do is store a key of some sort in the index
new Field("id", getPK(), true, false, false)

and then relate that to the database in code.

For "Live Oracle" databases, you might consider different things.

As I hear, Oracle lets you use Java in PL (no experience here). So you might consider 
to add some code into the triggers to add and delete documents from the index. But 
modifying the index is not as quick as modifying a database in most cases. So you 
might want to come up with some sort of a compromise on this.

Perhaps more experienced users in this list will have better insights.

Hope that helps.


On Thu, 15 Jul 2004 lingaraju wrote :
>Hello
>
>Even i am searching the same code as all my web display information is
>stored  in database.
>Early response will be very much helpful
>
>Thanks and regards
>Raju
>
>- Original Message -
> From: "Hetan Shah" <[EMAIL PROTECTED]>
>To: "Lucene Users List" <[EMAIL PROTECTED]>
>Sent: Thursday, July 15, 2004 5:56 AM
>Subject: Searching against Database
>
>
> > Hello All,
> >
> > I have got all the answers from this fantastic mailing list. I have
> > another question ;)
> >
> > What is the best way (Best Practices) to integrate Lucene with live
> > database, Oracle to be more specific. Any pointers are really very much
> > appreciated.
> >
> > thanks guys.
> > -H

Re: ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer

2004-07-14 Thread Morus Walter
Claude Devarenne writes:
> 
> My question is: should the queryParser catch that there is no term  
> before trying to add a clause when using a StandardAnalyzer?  Is this  
> even possible? Should the burden be on the application to either catch  
> the exception or parse the query before handing it out to the  
> queryParser?
> 
Yes. Yes. No.
There are fixes in bugzilla that would make query parser read that query
as title:bla and simply drop the stop word.

see http://issues.apache.org/bugzilla/show_bug.cgi?id=9110
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching against Database

2004-07-14 Thread lingaraju
Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message - 
From: "Hetan Shah" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


> Hello All,
>
> I have got all the answers from this fantastic mailing list. I have
> another question ;)
>
> What is the best way (Best Practices) to integrate Lucene with live
> database, Oracle to be more specific. Any pointers are really very much
> appreciated.
>
> thanks guys.
> -H
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



One Field!

2004-07-14 Thread Jones G
I have an index with multiple fields. Right now I am using MultiFieldQueryParser to 
search the fields. This means that if the same term occurs in multiple fields, it will 
be weighed accordingly. Is there any way to treat all the fields in question as one 
field and score the document accordingly without having to reindex.

Thanks.

Searching against Database

2004-07-14 Thread Hetan Shah
Hello All,
I have got all the answers from this fantastic mailing list. I have 
another question ;)

What is the best way (Best Practices) to integrate Lucene with live 
database, Oracle to be more specific. Any pointers are really very much 
appreciated.

thanks guys.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: RE: Scoring without normalization!

2004-07-14 Thread Jones G
Thanks! Just what I wanted.

On Thu, 15 Jul 2004 Anson Lau wrote :
>If you don't mind hacking the source:
>
>In Hits.java
>
>In method "getMoreDocs()"
>
>
>
> // Comment out the following
> //float scoreNorm = 1.0f;
> //if (length > 0 && scoreDocs[0].score > 1.0f) {
> //  scoreNorm = 1.0f / scoreDocs[0].score;
> //}
>
> // And just set scoreNorm to 1.
> int scoreNorm = 1;
>
>
>I don't know if u can do it without going to the src.
>
>Anson
>
>
>-Original Message-
> From: Jones G [mailto:[EMAIL PROTECTED]
>Sent: Thursday, July 15, 2004 6:52 AM
>To: [EMAIL PROTECTED]
>Subject: Scoring without normalization!
>
>How do I remove document normalization from scoring in Lucene? I just want
>to stick to TF IDF.
>
>Thanks.
>
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>


RE: Scoring without normalization!

2004-07-14 Thread Anson Lau
If you don't mind hacking the source:

In Hits.java

In method "getMoreDocs()"



// Comment out the following
//float scoreNorm = 1.0f;
//if (length > 0 && scoreDocs[0].score > 1.0f) {
//  scoreNorm = 1.0f / scoreDocs[0].score;
//}

// And just set scoreNorm to 1.
int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
From: Jones G [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer

2004-07-14 Thread Claude Devarenne
Hi,
A user mistyped their search terms and entered a query that looked like  
this:

the AND title:bla
I am using lucene 1.4 rc3. My web app,  which is using a  
StandardAnalyzer, got an ArrayIndexOutOfBoundsException (stack trace  
below).  I can reproduce this with the lucene demo (both the jsp and  
the comand line util).

Since I have the queryParser.parse(queryString) call in a try statement  
I am now catching this exception so it fixes the issue.

My question is: should the queryParser catch that there is no term  
before trying to add a clause when using a StandardAnalyzer?  Is this  
even possible? Should the burden be on the application to either catch  
the exception or parse the query before handing it out to the  
queryParser?

Claude
Here is the stack trace:
java.lang.ArrayIndexOutOfBoundsException: -1 < 0
java.util.Vector.elementAt(Vector.java:437) at
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java: 
181)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:509)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108)
at QueryExec.runQuery(QueryExec.java:245)

Scoring without normalization!

2004-07-14 Thread Jones G
How do I remove document normalization from scoring in Lucene? I just want to stick to 
TF IDF.

Thanks.

RE: Problems indexing Japanese with CJKAnalyzer

2004-07-14 Thread Jon Schuster
Hi all,

Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.

The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of the HTMLParser (from the Lucene demo
package). In HTMLDocument, doing this:

InputStreamReader reader = new InputStreamReader( new
FileInputStream(f), "SJIS");
HTMLParser parser = new HTMLParser( reader );

creates the parser and feeds it Unicode from the original Shift-JIS encoding
document, but then when the document contents is fetched using this line:

Field fld = Field.Text("contents", parser.getReader() );

HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter
using the default encoding, which in my case was Windows 1252 (essentially
Latin-1). That was bad.

In the HTMLParser.jj grammar file, adding an explicit encoding of "UTF8" on
both the Reader and Writer got things mostly working. The one missing piece
was in the "options" section of the HTMLParser.jj file. The original grammar
file generates an input character stream class that treats the input as a
stream of 1-byte characters. To have JavaCC generate a stream class that
handles double-byte characters, you need the option UNICODE_INPUT=true.

So, there were essentially three changes in two files:

HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit
"UTF8" encoding on Reader and Writer creation in getReader(). As far as I
can tell, this changes works fine for all of the languages I need to handle,
which are English, French, German, and Japanese.

HTMLDocument - add explicit encoding of "SJIS" when creating the Reader used
to create the HTMLParser. (For western languages, I use encoding of
"ISO8859_1".)

And of course, use the right language tokenizer.

--Jon



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Result + Highlighter

2004-07-14 Thread Karthik N S
Hi Guys

  Some week 's back had  reported a problem regarding  "Search on Indexed
file"  using Highlighter

  The Highlighter used to Dipslay   "[Pad]" or  "[0]" between  words  ( The
Field type is "Field.Text" type, stores the HTML summary )

  [ I am using  a CustomAnalyzer which is similar to  Standard Analyzer with
555 ENGLISH_STOP_WORDS]

  If any body has sombody looked into this matter for patch , please
specfy..



with rehards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 1:06 AM
To: Lucene Users List
Subject: Re: Search Result


Look at the Term Highlighter here:

http://jakarta.apache.org/lucene/docs/lucene-sandbox/


On Jul 13, 2004, at 2:32 PM, Hetan Shah wrote:

> I think I have not explained my question correctly. What is happening
> is when I show the result on a page the text below the link as shown
> below.
>
> Test Page for Apache Installation
> 
> Sample content
>
> Jakarta Lucene - Lucene Sandbox
> 
> [Jakarta Lucene] About Overview Powered by Lucene Who We Are Mailing
> Lists Resources FAQ (Official) jGuru FAQ Getting Started Query Syntax
> File Formats Javadoc Contributions Articles, etc. Benchmark
>
>
> In first example the search criteria "sample" occurs in the beginning
> of the page and so it shows up in the text below the link. In the
> second example the keyword "sample" shows up somewhere later in the
> document and so it does not show up in the text below the link. What
> can I do so that in all cases the text below the link always has the
> piece of the document where the keyword is found?
>
> thanks in advance.
>
> -H
>
> Hetan Shah wrote:
>
>> What I am trying to figure out is. In my search result which is
>> returned by the
>>
>> Document doc = hits.doc(i);
>>  = doc.get("summary");
>>
>> The summary field seems to contain only the first few lines of the
>> document. How can I make it to contain the piece that matches the
>> query string?
>>
>> Thanks.
>> -H
>>
>> Hetan Shah wrote:
>>
>>> David,
>>>
>>> Do you know, in the demo code, how do I override or change this
>>> value so that I get to see the appropriate chuck of document? Would
>>> this change make the actual result to show the relevant section of
>>> the document?
>>>
>>> Sorry to sound so ignorant, I am very new at the whole search
>>> technology, getting to learn a lot from a great supportive
>>> community.
>>>
>>> Thanks,
>>> -H
>>> David Spencer wrote:
>>>
 Hetan Shah wrote:

> My search results are only displaying the top portion of the
> indexed documents. It does match the query in the later part of
> the document. Where should I look to change the code in demo3 of
> default 1.3 final distribution. In general if I want to show the
> block of document that matches with the query string which classes
> should I use?




 Sounds like this:

 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/
 IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH

>
> Thanks guys.
> -H
>
>
> ---
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
>


 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

>>>
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-14 Thread Kevin A. Burton
Doug Cutting wrote:
Aviran wrote:
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.

Note that this patch won't produce the a speedup on earlier releases, 
since their was another multi-thread bottleneck higher up the stack 
that was only recently removed, revealing this lower-level bottleneck.

The other patch was:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html
Both are required to see the speedup.
Thanks...
Also, is there any reason folks cannot use 1.4 final now?
No... just that I'm trying to be conservative... I'm probably going to 
look at just migrating to 1.4 ASAP but we're close to a milestone...

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: HOWTO USE SORT on QUERY PARSER :)

2004-07-14 Thread Vladimir Yuryev
Besides 1) the point is independent from 2) point.
Presence of test programs gives you a visual example to that as it is 
necessary to use the given class and a guarantee ~99.9... that this 
class works.

Regards,
Vladimir.
On Wed, 14 Jul 2004 12:27:12 +0530
 "Karthik N S" <[EMAIL PROTECTED]> wrote:
Hey
 Guys'
Apologies...
Gee th's so simple u have explained me Thx a lot.
Please correct me If I am wrong
1)
So U  tell me that On Field type  "FIELD_CONTENTS"  , the relevant 
hits can
be sorted  wrt  Field type "FIELD_DATE "

[ Where FIELD_DATE & FIELD_CONTENTS are Field Typos for 
Lucene]...

2)
 To Run the Junit test's Do I need to Dwnload all the Files from CVS 
[Will
there be a build .aml within the CVS] to run and execute  the 
Tests...

with regards
Karthik
-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 12:08 PM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(
example:
query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer);
Sort sort =new Sort();
sort.setSort(FIELD_DATE,true);
//hits = searcher.search(query,sort);
hits = multiSearcher.search(query,sort);
...
FIELD_DATE - indexed field.
Regards,
Vladimir
On Wed, 14 Jul 2004 12:02:33 +0530
 "Karthik N S" <[EMAIL PROTECTED]> wrote:
Hey
  Guys
Apologies
  Before running the Build.xml for the  Junit Test files , Do I need
to
Download all the Files present in  "Search folder"
   from lucene CVS TEST in order to get the O/p Results
With regards
Karthik

-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:38 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(
It is config problem.
Run build.xml --> [Run ANT...]--> Run unit tests.
Vladimir.
On Wed, 14 Jul 2004 11:27:25 +0530
 "Karthik N S" <[EMAIL PROTECTED]> wrote:
Hi
Guys
Apologies
I am using Eclipse 3.0 Ide , so when I run this file within the IDE,I
am not
able to VIEW the O/p Results.
[ Till now I have no Idea about how to setup and run the Junit
tests/View
results on  the O.ps ]
Please give me some Tips on this .
With regards
Karthik
-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:12 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(
Hi!
From CVS -->
jakarta-lucene/src/test/org/apache/lucene/search/TestSort.java
Run it as  UnitTest  (   :-(   -->   :-))
Best regards,
Vladimir.
On Tue, 13 Jul 2004 15:31:18 +0530
 "Karthik N S" <[EMAIL PROTECTED]> wrote:
Hey
 Guys
Apologies
  Can somebody please explain to me with a simple SRC example of
how to
use SORT on Query parser [1.4 lucene]
 [ I am confused with the code snippet on the CVS Test Case]

with regards
Karthik
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 13, 2004 2:29 AM
To: [EMAIL PROTECTED]
Subject: Re: Could search results give an idea of which field matched
See the explain functionality in the Javadocs and previous threads.
You can
ask Lucene to explain why it got the results it did for a give hit.
[EMAIL PROTECTED] 07/12/04 04:52PM >>>
I search the index on multiple fields. Could the search results also
tell me which field matched so that the document was selected? From
what
I can tell, only the document number and a score are returned, is
there
a way to also find out what was the field(s) of the document matched
the
query?

Sildy


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Why is Field.java final?

2004-07-14 Thread Holger Klawitter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday 13 July 2004 18:12, Doug Cutting wrote:
> John Wang wrote:
> >On the same thought, how about the org.apache.lucene.analysis.Token
> > class. Can we make it non-final?
> Sure, if you make a case for why it should be non-final.

How about the ability to provide a writer to termText in order to exchange
a word by a synonym without having to create another object?

I favor everything which makes the Lucene API less restricitve
thus making more unexpected things possible :-)

Mit freundlichem Gruß / With kind regards
Holger Klawitter
- --
lists  klawitter  de
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFA9NvS1Xdt0HKSwgYRAg0IAKCFVclqmhjiD5yugIQenkQnRnELWgCgoaf2
rjrg92P0kWuMAj+wEXpH23Y=
=z3rj
-END PGP SIGNATURE-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]