Re: Searching against Database

2004-07-15 Thread lingaraju
Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message - 
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


 Hello All,

 I have got all the answers from this fantastic mailing list. I have
 another question ;)

 What is the best way (Best Practices) to integrate Lucene with live
 database, Oracle to be more specific. Any pointers are really very much
 appreciated.

 thanks guys.
 -H


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer

2004-07-15 Thread Morus Walter
Claude Devarenne writes:
 
 My question is: should the queryParser catch that there is no term  
 before trying to add a clause when using a StandardAnalyzer?  Is this  
 even possible? Should the burden be on the application to either catch  
 the exception or parse the query before handing it out to the  
 queryParser?
 
Yes. Yes. No.
There are fixes in bugzilla that would make query parser read that query
as title:bla and simply drop the stop word.

see http://issues.apache.org/bugzilla/show_bug.cgi?id=9110
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Re: Searching against Database

2004-07-15 Thread Jones G
I don't have any best practices to offer. I have been using Lucene with MySQL for an 
year though.

All I do is store a key of some sort in the index
new Field(id, getPK(), true, false, false)

and then relate that to the database in code.

For Live Oracle databases, you might consider different things.

As I hear, Oracle lets you use Java in PL (no experience here). So you might consider 
to add some code into the triggers to add and delete documents from the index. But 
modifying the index is not as quick as modifying a database in most cases. So you 
might want to come up with some sort of a compromise on this.

Perhaps more experienced users in this list will have better insights.

Hope that helps.


On Thu, 15 Jul 2004 lingaraju wrote :
Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message -
 From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


  Hello All,
 
  I have got all the answers from this fantastic mailing list. I have
  another question ;)
 
  What is the best way (Best Practices) to integrate Lucene with live
  database, Oracle to be more specific. Any pointers are really very much
  appreciated.
 
  thanks guys.
  -H

Re: Searching against Database

2004-07-15 Thread Sergiu Gordea
Hi again,
I'm thinking to get the list of IDs from the database and the list of 
hits from Lucene Index and to create a comparator in order to eliminate the
not permitted  Hits from the list.

Which solution do you think is better?
Thanks,
Sergiu

Sergiu Gordea wrote:
Hi,
I have a simillar problem. I'm working on a web application in which 
the users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows 
have in database tables.

I can get the  IDs of the documents that can be accesible by the user, 
but if this are 1000, what will happen in Lucene?

Is this a valid solution? Can anyone provide a better idea?
Thanks,
Sergiu
lingaraju wrote:
Hello
Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful
Thanks and regards
Raju
- Original Message - From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database
 

Hello All,
I have got all the answers from this fantastic mailing list. I have
another question ;)
What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very much
appreciated.
thanks guys.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search +QueryParser+Score

2004-07-15 Thread Karthik N S

  Hey Guy's

 Apologies.

 I have a Question

Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or
lesser 
   BEFORE  doing the Query Parser  for search , so that the returns Hits for
the Score settings only.



with regards
Karthik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: One Field!

2004-07-15 Thread Erik Hatcher
On Jul 14, 2004, at 10:19 PM, Jones G wrote:
I have an index with multiple fields. Right now I am using 
MultiFieldQueryParser to search the fields. This means that if the 
same term occurs in multiple fields, it will be weighed accordingly. 
Is there any way to treat all the fields in question as one field and 
score the document accordingly without having to reindex.
You could change the coord() factor of Similarity in a custom 
implementation - that might do what you want with scoring.

But I prefer having a single queryable field that aggregates everything 
I want searchable, which would require re-indexing in your scenario.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching against Database

2004-07-15 Thread Erik Hatcher
In this situation, you may want to investigate implementing a custom  
Filter which is user-specific and constrains the search space to only  
the rows a specific user is allowed to search.

Erik
On Jul 15, 2004, at 3:04 AM, Sergiu Gordea wrote:
Hi again,
I'm thinking to get the list of IDs from the database and the list of  
hits from Lucene Index and to create a comparator in order to  
eliminate the
not permitted  Hits from the list.

Which solution do you think is better?
Thanks,
Sergiu

Sergiu Gordea wrote:
Hi,
I have a simillar problem. I'm working on a web application in which  
the users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows  
have in database tables.

I can get the  IDs of the documents that can be accesible by the  
user, but if this are 1000, what will happen in Lucene?

Is this a valid solution? Can anyone provide a better idea?
Thanks,
Sergiu
lingaraju wrote:
Hello
Even i am searching the same code as all my web display information  
is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju
- Original Message - From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database

Hello All,
I have got all the answers from this fantastic mailing list. I have
another question ;)
What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very  
much
appreciated.

thanks guys.
-H
 
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search +QueryParser+Score

2004-07-15 Thread Erik Hatcher
Kathik,
I have a really hard time following your questions, otherwise I'd chime 
in on them more often.  Your meaning is not often clear.

In the case of normalizing the score to 1.0 or less - this is precisely 
what Hits does for you.  I'm not sure what you mean by BEFORE doing 
QueryParser - a score is computed based on a query, so it necessarily 
must come after.

Erik
On Jul 15, 2004, at 6:55 AM, Karthik N S wrote:
  Hey Guy's
 Apologies.
 I have a Question
Is there any API avaliable in Lucene1.4 to set the Score value to 
1.0f or
lesser 
   BEFORE  doing the Query Parser  for search , so that the returns 
Hits for
the Score settings only.


with regards
Karthik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search +QueryParser+Score

2004-07-15 Thread Erik Hatcher
I don't really understand what QueryParser has to do with your 
question.  If you want only Hits that have a score of 1.0 (keep in mind 
that Hits normalizes scores if they are over 1.0), why not just walk 
all the Hits in order until you get to one that is not 1.0?

Or, use a HitCollector to collect hits (scores not normalized with a 
HitCollector) and bail out when you are done.  (although bailing out of 
a HitCollector is not as clean as we should make it in Lucene 2.0 - we 
should add that to the whiteboard).

Erik
On Jul 15, 2004, at 7:36 AM, Karthik N S wrote:
Hey Guys...
Apologies
  Let me be more Specific regarding the last mail
  I would like to get all  Hits returned with score  = 1.0  ONLY  using
Query Parser .

  What are my Options.
with regards
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 4:45 PM
To: Lucene Users List
Subject: Re: Search +QueryParser+Score
Kathik,
I have a really hard time following your questions, otherwise I'd chime
in on them more often.  Your meaning is not often clear.
In the case of normalizing the score to 1.0 or less - this is precisely
what Hits does for you.  I'm not sure what you mean by BEFORE doing
QueryParser - a score is computed based on a query, so it necessarily
must come after.
Erik
On Jul 15, 2004, at 6:55 AM, Karthik N S wrote:
  Hey Guy's
 Apologies.
 I have a Question
Is there any API avaliable in Lucene1.4 to set the Score value to
1.0f or
lesser 
   BEFORE  doing the Query Parser  for search , so that the returns
Hits for
the Score settings only.

with regards
Karthik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Searching against Database

2004-07-15 Thread wallen
If you know ahead of time which documents are viewable by a certain user
group you could add a field, such as group, and then when you index the
document you put the names of the user groups that are allowed to view that
document.  Then your query tool can append, for example AND
group:developers to the user's query.  Then you will not have to merge
results.

-Will

-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 2:58 AM
To: Lucene Users List
Subject: Re: Searching against Database


Hi,

I have a simillar problem. I'm working on a web application in which the 
users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows 
have in database tables.

I can get the  IDs of the documents that can be accesible by the user, 
but if this are 1000, what will happen in Lucene?

 Is this a valid solution? Can anyone provide a better idea?

 Thanks,

 Sergiu


lingaraju wrote:

Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message - 
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


  

Hello All,

I have got all the answers from this fantastic mailing list. I have
another question ;)

What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very much
appreciated.

thanks guys.
-H


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problems indexing Japanese with CJKAnalyzer

2004-07-15 Thread Praveen Peddi
If its a web application, you have to cal request.setEncoding(UTF-8)
before reading any parameters. Also make sure html page encoding is
specified as UTF-8 in the metatag. most web app servers decode the request
paramaters in the system's default encoding algorithm. If u call above
method, I think it will solve ur problem.

Praveen
- Original Message - 
From: Bruno Tirel [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:15 AM
Subject: RE: Problems indexing Japanese with CJKAnalyzer


Hi All,

I am also trying to localize everything for French application, using UTF-8
encoding. I have already applied what Jon described. I fully confirm his
recommandation for HTML Parser and HTML Document changes with UNICODE and
UTF-8 encoding specification.

In my case, I have still one case not functional : using meta-data from HTML
document, as in demo3 example. Trying to convert to UTF-8, or
ISO-8859-1, it is still not correctly encoded when I check with Luke.
A word Propriété is seen either as Propri?t? with a square, or as
Propriã©tã©.
My local codepage is Cp1252, so should be viewed as ISO-8859-1. Same result
when I use local FileEncoding parameter.
All the other fields are correctly encoded into UTF-8, tokenized and
successfully searched through JSP page.

Is anybody already facing this issue? Any help available?
Best regards,

Bruno


-Message d'origine-
De : Jon Schuster [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 14 juillet 2004 22:51
À : 'Lucene Users List'
Objet : RE: Problems indexing Japanese with CJKAnalyzer

Hi all,

Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.

The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of the HTMLParser (from the Lucene demo
package). In HTMLDocument, doing this:

InputStreamReader reader = new InputStreamReader( new
FileInputStream(f), SJIS);
HTMLParser parser = new HTMLParser( reader );

creates the parser and feeds it Unicode from the original Shift-JIS encoding
document, but then when the document contents is fetched using this line:

Field fld = Field.Text(contents, parser.getReader() );

HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter
using the default encoding, which in my case was Windows 1252 (essentially
Latin-1). That was bad.

In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on
both the Reader and Writer got things mostly working. The one missing piece
was in the options section of the HTMLParser.jj file. The original grammar
file generates an input character stream class that treats the input as a
stream of 1-byte characters. To have JavaCC generate a stream class that
handles double-byte characters, you need the option UNICODE_INPUT=true.

So, there were essentially three changes in two files:

HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit
UTF8 encoding on Reader and Writer creation in getReader(). As far as I
can tell, this changes works fine for all of the languages I need to handle,
which are English, French, German, and Japanese.

HTMLDocument - add explicit encoding of SJIS when creating the Reader used
to create the HTMLParser. (For western languages, I use encoding of
ISO8859_1.)

And of course, use the right language tokenizer.

--Jon

earlier responses snipped; see the list archive

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching against Database

2004-07-15 Thread Sergiu Gordea
This is not a solution in my case,
becasue the permissions of the groups, and the user groups can be 
changed, and it will make managing index to be a nightmare.

 anyway,
 I appreciate the advice, maybe it will be useful for the other  guys 
that asked this question.

  Sergiu
[EMAIL PROTECTED] wrote:
If you know ahead of time which documents are viewable by a certain user
group you could add a field, such as group, and then when you index the
document you put the names of the user groups that are allowed to view that
document.  Then your query tool can append, for example AND
group:developers to the user's query.  Then you will not have to merge
results.
-Will
-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 2:58 AM
To: Lucene Users List
Subject: Re: Searching against Database
Hi,
I have a simillar problem. I'm working on a web application in which the 
users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows 
have in database tables.

I can get the  IDs of the documents that can be accesible by the user, 
but if this are 1000, what will happen in Lucene?

Is this a valid solution? Can anyone provide a better idea?
Thanks,
Sergiu
lingaraju wrote:
 

Hello
Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful
Thanks and regards
Raju
- Original Message - 
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


   

Hello All,
I have got all the answers from this fantastic mailing list. I have
another question ;)
What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very much
appreciated.
thanks guys.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Wildcard search with my own analyzer

2004-07-15 Thread Joel Shellman
I wanted to support categories, and so I created my own analyzer so that:
Root Category||My Category||Some Other Things
Would be split up into three terms split by ||, and I wanted it to stay 
case sensitive.

If I do a search for:
categories:Root Category
it works fine. But if I do a search for:
categories:Root Cate*
it doesn't find it.
What do I need to do so that wildcard searching will work on this? I am 
using the same analyzer for indexing and searching (otherwise the first 
search wouldn't work either).

Thank you,
Joel Shellman
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Anyone use MultiSearcher class

2004-07-15 Thread Mark Florence
Don, I think I finally understand your problem -- and mine -- with
MultiSearcher. I had tested an implementation of my system using
ParallelMultiSearcher to split a huge index over many computers.
I was very impressed by the results on my test data, but alarmed
after a trial with live data :)

Consider MultiSearcher.search(Query Q). Suppose that Q aggregated
over ALL the Searchables in the MultiSearcher would return 1000
documents. But, the Hits object created by search() will only cache
the first 100 documents. When Hits.doc(101) is called, Hits will
cache 200 documents -- then 400, 800, 1600 and so on. How does Hits
get these extra documents? By calling the MultiSearcher again.

Now consider a MultiSearcher as described above with 2 Searchables.
With respect to Q, Searchable S has 1000 documents, Searchable T
has zero. So to fetch the 101st document, not only is S searched,
but T is too, even though the result of Q applied to T is still zero
and will always be zero. The same thing will happen when fetching
the 201st, 401st and 801st document.

This accounts for my slow performance, and I think yours too. That
your observed degradation is a power of 2 is a clue.

My performance is especially vulnerable because slave Searchables
in the MultiSearcher are Remote -- accessed via RMI.

I guess I have to code smarter around MultiSearcher. One problem
you highlight is that Hits is final -- so it is not possible even to
modify the 100/200/400 cache size logic.

Any ideas from anyone would be much appreciated.

Mark Florence
CTO, AIRS
800-897-7714 x 1703
[EMAIL PROTECTED]




-Original Message-
From: Don Vaillancourt [mailto:[EMAIL PROTECTED]
Sent: Monday, July 12, 2004 12:36 pm
To: Lucene Users List
Subject: Anyone use MultiSearcher class


Hello,

Has anyone used the Multisearcher class?

I have noticed that searching two indexes using this MultiSearcher class
takes 8 times longer than searching only one index.  I could understand if
it took 3 to 4 times longer to search due to sorting the two search results
and stuff, but why 8 times longer.

Is there some optimization that can be done to hasten the search?  Or
should I just write my own MultiSearcher.  The problem though is that there
is no way for me to create my own Hits object (no methods are available and
the class is final).

Anyone have any clue?

Thanks


Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.













-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching against Database

2004-07-15 Thread Natarajan.T
Hi,

You how to convert RTF file file txt file.
Any API available?

If u have any sample code pls send it to me.

Regards,
Natarajan.

-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 15, 2004 2:16 PM
To: Lucene Users List
Subject: Re: Searching against Database

This is not a solution in my case,

becasue the permissions of the groups, and the user groups can be 
changed, and it will make managing index to be a nightmare.

  anyway,

  I appreciate the advice, maybe it will be useful for the other  guys 
that asked this question.

   Sergiu

[EMAIL PROTECTED] wrote:

If you know ahead of time which documents are viewable by a certain
user
group you could add a field, such as group, and then when you index the
document you put the names of the user groups that are allowed to view
that
document.  Then your query tool can append, for example AND
group:developers to the user's query.  Then you will not have to merge
results.

-Will

-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 2:58 AM
To: Lucene Users List
Subject: Re: Searching against Database


Hi,

I have a simillar problem. I'm working on a web application in which
the 
users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows 
have in database tables.

I can get the  IDs of the documents that can be accesible by the user, 
but if this are 1000, what will happen in Lucene?

 Is this a valid solution? Can anyone provide a better idea?

 Thanks,

 Sergiu


lingaraju wrote:

  

Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message - 
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


 



Hello All,

I have got all the answers from this fantastic mailing list. I have
another question ;)

What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very
much
appreciated.

thanks guys.
-H


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


   

  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Wildcard search with my own analyzer

2004-07-15 Thread Erik Hatcher
On Jul 15, 2004, at 10:02 AM, Morus Walter wrote:
Joel Shellman writes:
What do I need to do so that wildcard searching will work on this? I 
am
using the same analyzer for indexing and searching (otherwise the 
first
search wouldn't work either).

Check what query is produced (query.toString(...)).
I guess that query parser which seems to be what you are using does not
support wildcards within `'.
Right... when you use double-quotes, a PhraseQuery is implied, and it 
has no support for wildcards (currently).

Check the AnalysisParalysis page on the wiki for some insight into how 
to go about trouble-shooting things like this.  First is to eliminate 
QueryParser and see if you can make a query through the API that 
matches what you're after.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Searching against Database

2004-07-15 Thread Daniel de Souza Teixeira
See this document!
http://www.jguru.com/faq/view.jsp?EID=1074229

Regards!
-- 
Daniel

 Hi,

 You how to convert RTF file file txt file.
 Any API available?

 If u have any sample code pls send it to me.

 Regards,
 Natarajan.

 -Original Message-
 From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
 Sent: Thursday, July 15, 2004 2:16 PM
 To: Lucene Users List
 Subject: Re: Searching against Database

 This is not a solution in my case,

 becasue the permissions of the groups, and the user groups can be
 changed, and it will make managing index to be a nightmare.

   anyway,

   I appreciate the advice, maybe it will be useful for the other  guys
 that asked this question.

Sergiu

 [EMAIL PROTECTED] wrote:

If you know ahead of time which documents are viewable by a certain
 user
group you could add a field, such as group, and then when you index the
document you put the names of the user groups that are allowed to view
 that
document.  Then your query tool can append, for example AND
group:developers to the user's query.  Then you will not have to merge
results.

-Will

-Original Message-
From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 2:58 AM
To: Lucene Users List
Subject: Re: Searching against Database


Hi,

I have a simillar problem. I'm working on a web application in which
 the
users have different permissions.
Not all information stored in the index is public for all users.

The documents in Index are identified by the same  ID that the  rows
have in database tables.

I can get the  IDs of the documents that can be accesible by the user,
but if this are 1000, what will happen in Lucene?

 Is this a valid solution? Can anyone provide a better idea?

 Thanks,

 Sergiu


lingaraju wrote:



Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message -
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database






Hello All,

I have got all the answers from this fantastic mailing list. I have
another question ;)

What is the best way (Best Practices) to integrate Lucene with live
database, Oracle to be more specific. Any pointers are really very
 much
appreciated.

thanks guys.
-H


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Problems indexing Japanese with CJKAnalyzer ... Or French with UTF-8 and MetaData

2004-07-15 Thread Bruno Tirel
I don't think I understand correctly your proposal.
As a basis, I am using Demo3 with indexHTML, HTMLDocument and HTMLParser.
Inside HTML parser, I am calling getMetaTags (calling addMetaData) wich
return Properties object. My issue is coming fron this definition :
Properties are stored into ISO-8859-1 encoding, when all my data encodings
inside and outside are UTF-8.
I am not successful in getting UTF-8 values from this Parser.GetMetaTags()
through any conversion.
These data are extracted from an HTML page, with UTF-8 encoding declared at
the beginning of the file.
I do not see how to call a request.setEncoding(UTF-8) : I need the Parser
to have knowledge of UTF-8 encoding... And it doesn't appear when using
Properties object.

Any feedback?

-Message d'origine-
De : Praveen Peddi [mailto:[EMAIL PROTECTED] 
Envoyé : jeudi 15 juillet 2004 15:12
À : Lucene Users List
Objet : Re: Problems indexing Japanese with CJKAnalyzer

If its a web application, you have to cal request.setEncoding(UTF-8)
before reading any parameters. Also make sure html page encoding is
specified as UTF-8 in the metatag. most web app servers decode the request
paramaters in the system's default encoding algorithm. If u call above
method, I think it will solve ur problem.

Praveen
- Original Message -
From: Bruno Tirel [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:15 AM
Subject: RE: Problems indexing Japanese with CJKAnalyzer


Hi All,

I am also trying to localize everything for French application, using UTF-8
encoding. I have already applied what Jon described. I fully confirm his
recommandation for HTML Parser and HTML Document changes with UNICODE and
UTF-8 encoding specification.

In my case, I have still one case not functional : using meta-data from HTML
document, as in demo3 example. Trying to convert to UTF-8, or
ISO-8859-1, it is still not correctly encoded when I check with Luke.
A word Propriété is seen either as Propri?t? with a square, or as
Propriã©tã©.
My local codepage is Cp1252, so should be viewed as ISO-8859-1. Same result
when I use local FileEncoding parameter.
All the other fields are correctly encoded into UTF-8, tokenized and
successfully searched through JSP page.

Is anybody already facing this issue? Any help available?
Best regards,

Bruno


-Message d'origine-
De : Jon Schuster [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 14 juillet 2004 22:51
À : 'Lucene Users List'
Objet : RE: Problems indexing Japanese with CJKAnalyzer

Hi all,

Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.

The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of the HTMLParser (from the Lucene demo
package). In HTMLDocument, doing this:

InputStreamReader reader = new InputStreamReader( new
FileInputStream(f), SJIS);
HTMLParser parser = new HTMLParser( reader );

creates the parser and feeds it Unicode from the original Shift-JIS encoding
document, but then when the document contents is fetched using this line:

Field fld = Field.Text(contents, parser.getReader() );

HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter
using the default encoding, which in my case was Windows 1252 (essentially
Latin-1). That was bad.

In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on
both the Reader and Writer got things mostly working. The one missing piece
was in the options section of the HTMLParser.jj file. The original grammar
file generates an input character stream class that treats the input as a
stream of 1-byte characters. To have JavaCC generate a stream class that
handles double-byte characters, you need the option UNICODE_INPUT=true.

So, there were essentially three changes in two files:

HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit
UTF8 encoding on Reader and Writer creation in getReader(). As far as I
can tell, this changes works fine for all of the languages I need to handle,
which are English, French, German, and Japanese.

HTMLDocument - add explicit encoding of SJIS when creating the Reader used
to create the HTMLParser. (For western languages, I use encoding of
ISO8859_1.)

And of course, use the right language tokenizer.

--Jon

earlier responses snipped; see the list archive

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Powered By Lucene image?

2004-07-15 Thread yahootintin . 1247688
Hi,



Are there any powered by Lucene images?  I thought there used to
be some on the site but I can't find them now.  Any help is appreciated!



Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: RE: Scoring without normalization!

2004-07-15 Thread Jones G
Sadly, I am still running into problems

Explain shows the following after the modification.

Rank: 1 ID: 11285358Score: 5.5740864E8
5.5740864E8 = product of:
  8.3611296E8 = sum of:
8.3611296E8 = product of:
  6.6889037E9 = weight(title:iron in 1235940), product of:
0.12621856 = queryWeight(title:iron), product of:
  7.0507255 = idf(docFreq=10816)
  0.017901499 = queryNorm
5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
  1.0 = tf(termFreq(title:iron)=1)
  7.0507255 = idf(docFreq=10816)
  7.5161928E9 = fieldNorm(field=title, doc=1235940)
  0.125 = coord(1/8)
2.7106019E-8 = product of:
  1.08424075E-7 = sum of:
5.7318403E-9 = weight(abstract:an in 1235940), product of:
  0.03711049 = queryWeight(abstract:an), product of:
2.073038 = idf(docFreq=1569960)
0.017901499 = queryNorm
  1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
1.0 = tf(termFreq(abstract:an)=1)
2.073038 = idf(docFreq=1569960)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
1.0269223E-7 = weight(abstract:iron in 1235940), product of:
  0.111071706 = queryWeight(abstract:iron), product of:
6.2046037 = idf(docFreq=25209)
0.017901499 = queryNorm
  9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
2.0 = tf(termFreq(abstract:iron)=4)
6.2046037 = idf(docFreq=25209)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
  0.25 = coord(2/8)
  0.667 = coord(2/3)
Rank: 2 ID: 8157438 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 159395), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=159395)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 3 ID: 10543103Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 553967), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 553967), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=553967)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 4 ID: 8753559 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 2563152), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=2563152)
0.125 = coord(1/8)
  0.3334 = coord(1/3)

I would like to get rid of all normalizations and just have TF and IDF.
What am I missing?


On Thu, 15 Jul 2004 Anson Lau wrote :
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



 // Comment out the following
 //float scoreNorm = 1.0f;
 //if (length  0  scoreDocs[0].score  1.0f) {
 //  scoreNorm = 1.0f / scoreDocs[0].score;
 //}

 // And just set scoreNorm to 1.
 int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring without normalization!

2004-07-15 Thread Doug Cutting
Have you looked at:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
in particular, at:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float)
Doug
Jones G wrote:
Sadly, I am still running into problems
Explain shows the following after the modification.
Rank: 1 ID: 11285358Score: 5.5740864E8
5.5740864E8 = product of:
  8.3611296E8 = sum of:
8.3611296E8 = product of:
  6.6889037E9 = weight(title:iron in 1235940), product of:
0.12621856 = queryWeight(title:iron), product of:
  7.0507255 = idf(docFreq=10816)
  0.017901499 = queryNorm
5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
  1.0 = tf(termFreq(title:iron)=1)
  7.0507255 = idf(docFreq=10816)
  7.5161928E9 = fieldNorm(field=title, doc=1235940)
  0.125 = coord(1/8)
2.7106019E-8 = product of:
  1.08424075E-7 = sum of:
5.7318403E-9 = weight(abstract:an in 1235940), product of:
  0.03711049 = queryWeight(abstract:an), product of:
2.073038 = idf(docFreq=1569960)
0.017901499 = queryNorm
  1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
1.0 = tf(termFreq(abstract:an)=1)
2.073038 = idf(docFreq=1569960)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
1.0269223E-7 = weight(abstract:iron in 1235940), product of:
  0.111071706 = queryWeight(abstract:iron), product of:
6.2046037 = idf(docFreq=25209)
0.017901499 = queryNorm
  9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
2.0 = tf(termFreq(abstract:iron)=4)
6.2046037 = idf(docFreq=25209)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
  0.25 = coord(2/8)
  0.667 = coord(2/3)
Rank: 2 ID: 8157438 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 159395), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=159395)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 3 ID: 10543103Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 553967), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 553967), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=553967)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 4 ID: 8753559 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 2563152), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=2563152)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
I would like to get rid of all normalizations and just have TF and IDF.
What am I missing?
On Thu, 15 Jul 2004 Anson Lau wrote :
If you don't mind hacking the source:
In Hits.java
In method getMoreDocs()

   // Comment out the following
   //float scoreNorm = 1.0f;
   //if (length  0  scoreDocs[0].score  1.0f) {
   //  scoreNorm = 1.0f / scoreDocs[0].score;
   //}
   // And just set scoreNorm to 1.
   int scoreNorm = 1;
I don't know if u can do it without going to the src.
Anson
-Original Message-
From: Jones G [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!
How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.
Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Re: Scoring without normalization!

2004-07-15 Thread Jones G
Thanks. I tried overriding Similarity, returning 1 in lengthNorm and queryNorm and 
setSimilarity on IndexSearcher with this.

Query: 1 Found: 1540632
Rank: 1 ID: 8157438 Score: 0.9994
3.73650457E11 = weight(title:iron in 159395), product of:
  7.0507255 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
1.0 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=159395)

How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring?

I tried modifying TermQuery without much luck.


On Thu, 15 Jul 2004 Doug Cutting wrote :
Have you looked at:

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html

in particular, at:

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float)

Doug

Jones G wrote:
Sadly, I am still running into problems

Explain shows the following after the modification.

Rank: 1 ID: 11285358Score: 5.5740864E8
5.5740864E8 = product of:
   8.3611296E8 = sum of:
 8.3611296E8 = product of:
   6.6889037E9 = weight(title:iron in 1235940), product of:
 0.12621856 = queryWeight(title:iron), product of:
   7.0507255 = idf(docFreq=10816)
   0.017901499 = queryNorm
 5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
   1.0 = tf(termFreq(title:iron)=1)
   7.0507255 = idf(docFreq=10816)
   7.5161928E9 = fieldNorm(field=title, doc=1235940)
   0.125 = coord(1/8)
 2.7106019E-8 = product of:
   1.08424075E-7 = sum of:
 5.7318403E-9 = weight(abstract:an in 1235940), product of:
   0.03711049 = queryWeight(abstract:an), product of:
 2.073038 = idf(docFreq=1569960)
 0.017901499 = queryNorm
   1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
 1.0 = tf(termFreq(abstract:an)=1)
 2.073038 = idf(docFreq=1569960)
 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
 1.0269223E-7 = weight(abstract:iron in 1235940), product of:
   0.111071706 = queryWeight(abstract:iron), product of:
 6.2046037 = idf(docFreq=25209)
 0.017901499 = queryNorm
   9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
 2.0 = tf(termFreq(abstract:iron)=4)
 6.2046037 = idf(docFreq=25209)
 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
   0.25 = coord(2/8)
   0.667 = coord(2/3)
Rank: 2 ID: 8157438 Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 159395), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 159395), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=159395)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)
Rank: 3 ID: 10543103Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 553967), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 553967), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=553967)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)
Rank: 4 ID: 8753559 Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 2563152), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=2563152)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)

I would like to get rid of all normalizations and just have TF and IDF.
What am I missing?


On Thu, 15 Jul 2004 Anson Lau wrote :

If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



// Comment out the following
//float scoreNorm = 1.0f;
//if (length  0  scoreDocs[0].score  1.0f) {
//  scoreNorm = 1.0f / scoreDocs[0].score;
//}

// And just set scoreNorm to 1.
int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent: 

Token or not Token, PerFieldAnalyzer

2004-07-15 Thread Florian Sauvin
Hello,
When indexing a field, we have the choice of tokenizing it or not. I
have a custom analyzer that contains a tokenizer... does it mean that
if the boolean token is set to false, the analyzer is not applied on
the field content?
Everywhere in the documentation (and it seems logical) you say to use
the same analyzer for indexing and querying... how is this handled on
not tokenized fields?
In my case, I have certain fields on which I want the tokenization and
anlysis and everything to happen... but on other fields, I just want to
index the content as it is (no alterations at all) and not analyze at
query time... is that possible?
--
Florian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Token or not Token, PerFieldAnalyzer

2004-07-15 Thread Doug Cutting
Florian Sauvin wrote:
Everywhere in the documentation (and it seems logical) you say to use
the same analyzer for indexing and querying... how is this handled on
not tokenized fields?
Imperfectly.
The QueryParser knows nothing about the index, so it does not know which 
fields were tokenized and which were not.  Moreover, even the index does 
not know this, since you can freely intermix tokenized and untokenized 
values in a single field.

In my case, I have certain fields on which I want the tokenization and
anlysis and everything to happen... but on other fields, I just want to
index the content as it is (no alterations at all) and not analyze at
query time... is that possible?
It is very possible.  A good way to handle this is to use 
PerFieldAnalyzerWrapper.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: release migration plan

2004-07-15 Thread Doug Cutting
fp235-5 wrote:
I am looking at the code to implement setIndexInterval() in IndexWriter. I'd
like to have your opinion on the best way to do it.
Currently the creation of an instance of TermInfosWriter requires the following
steps:
...
IndexWriter.addDocument(Document)
IndexWriter.addDocument(Document, Analyser)
DocumentWriter.addDocument(String, Document)
DocumentWriter.writePostings(Posting[],String)
TermInfosWriter.init
To give a different value to indexInterval in TermInfosWriter, we need to add a
variable holding this value into IndexWriter and DocumentWriter and modify the
constructors for DocumentWriter and TermInfosWriter. (quite heavy changes)
I think this is the best approach.  I would replace other parameters in 
these constructors which can be derived from an IndexWriter with the 
IndexWriter.  That way, if we add more parameters like this, they can 
also be passed in through the IndexWriter.

All of the parameters to the DocumentWriter constructor are fields of 
IndexWriter.  So one can instead simply pass a single parameter, an 
IndexWriter, then access its directory, analyzer, similarity and 
maxFieldLength in the DocumentWriter constructor.  A public 
getDirectory() method would also need to be added to IndexWriter for 
this to work.

Similarly, two of SegmentMerger's constructor parameters could be 
replaced with an IndexWriter, the directory and boolean useCompoundFile.

In SegmentMerge I would replace the directory parameter with IndexWriter.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching against Database

2004-07-15 Thread Hetan Shah
Is it possible to search against the column in the table ? If so are 
there any limitations on the # of columns one should target to search 
against?

any other suggestions?
Thanks.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching against Database

2004-07-15 Thread Peter M Cipollone

- Original Message - 
From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 7:51 PM
Subject: Re: Searching against Database


 Is it possible to search against the column in the table ? If so are
 there any limitations on the # of columns one should target to search
 against?

What you can search against all depends on how you index your columns.  I
believe you mentioned that you had data in multiple tables for each record
(or Document in Lucene).  If you map your your columns to Lucene Fields, and
make sure that the primary key for each record is stored in the same Lucene
Document object as the columns (Fields), then you should be golden.

Someone earlier pointed out that Oracle allows Java in its stored
procedures, so if you use a single stored procedure to insert a new record,
that same procedure can create a matching Lucene Document and add it to the
index.

For updates, you will need to delete the Lucene document and then add a new
copy of the updated record.  If you have a primary key field that is indexed
and stored in Lucene, you can use

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#delete(org.apache.lucene.index.Term)

to delete the old version.

Pete





 any other suggestions?
 Thanks.
 -H


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]