Re: SEARCH +HITS+LIMIT

2004-12-09 Thread Erik Hatcher
On Dec 9, 2004, at 2:59 AM, Karthik N S wrote:
Apologies...
Again, no need to apologize for asking questions.  Just ask :)
One question for the form [ Especially Erik]
And no need to address anyone specifically here, though I'm happy to 
help and am always here listening.

Question
How to Display the Contents for the Hits in  Incremental order ?
[ Each Time a re hit to the Mergerindex with Incremental X value ].
This would solve the problem of Out of Memory by prefetching all the 
hit in
one strait go process.
It is unwise, as you've seen, to prefetch the documents from Hits.  
Hits is designed for exactly what you're after.  You can get the number 
of hits using hits.size().

My recommendation for paging through search results (and I'm using this 
extensively myself) in a web application is to simply display 25 
results on one page - iterating *only* through those 25 desired.  The 
link (or button) the user presses to go to the next or previous page 
passes the page number and query (search.jsp?query=some+querypage=2).  
Your server-side logic will search again, and start the hits iteration 
for 25 entries at 25 * the value of the page attribute.

Lucene is plenty fast enough to allow this to work without much 
concern.  Since your index is getting somewhat large, though, you 
probably do want to keep a persistent instance of IndexSearcher 
available on the server (perhaps in application scope).  But a first 
pass implementation don't concern yourself with even that - just 
re-instantiate IndexSearcher for every search and see how fast things 
are.  If its fast enough, leave well enough alone.

Does this all make sense?
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


lucene in action ebook

2004-12-09 Thread alan . hicks
Does anyone have a definite date for the ebook release? I have been
checking the manning website since the other day when this was mentioned?

Thanks,
Alan.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



InderWriter.optimize()

2004-12-09 Thread Yura Smolsky
Hello, lucene-user.

I used FSDirectory as storage for index. And I have used optimize()
method of IndexWriter to optimize index for faster access.

Now I use DbDirectory (Berkley DB) as storage. Does it make sense to
use optimize method on index stored in this storage?..

What does optimize do actually?

Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene in action ebook

2004-12-09 Thread Erik Hatcher
I have the e-book PDF in my possession.  I have been prodding Manning 
on a daily basis to update the LIA website and get the e-book 
available.  It is ready, and I'm sure that its just a matter of them 
pushing it out.  There may be some administrative loose ends they are 
tying up before releasing it to the world.  It should be available any 
minute now, really.  :)

Erik
On Dec 9, 2004, at 5:38 AM, [EMAIL PROTECTED] wrote:
Does anyone have a definite date for the ebook release? I have been
checking the manning website since the other day when this was 
mentioned?

Thanks,
Alan.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: finalize delete without optimize

2004-12-09 Thread Aviran
Lucene standard API does not support this kind of operation.

Aviran
http://www.aviransplace.com


-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 08, 2004 17:32 PM
To: [EMAIL PROTECTED]
Subject: Re: finalize delete without optimize


Hi folks:

I sent this out a few days ago without a response. 

Please help.

Thanks in advance

-John


On Mon, 6 Dec 2004 21:15:00 -0800, John Wang [EMAIL PROTECTED] wrote:
 Hi:
 
   Is there a way to finalize delete, e.g. actually remove them from 
 the segments and make sure the docIDs are contiguous again.
 
   The only explicit way to do this is by calling 
 IndexWriter.optmize(). But this call does a lot more (also merges all 
 the segments), hence is very expensive. Is there a way to simply just 
 finalize the deletes without having to merge all the segments?
 
If not, I'd be glad to submit an implementation of this feature if 
 the Lucene devs agree this is useful.
 
 Thanks
 
 -John


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: InderWriter.optimize()

2004-12-09 Thread Aviran
Beside merging the segments, optimize also physically deletes all the
deleted documents from the index (When you call delete, lucene only marks
the documents as deleted, they physically deleted when you call optimize).

Aviran
http://www.aviransplace.com

-Original Message-
From: Yura Smolsky [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 5:55 AM
To: [EMAIL PROTECTED]
Subject: InderWriter.optimize()


Hello, lucene-user.

I used FSDirectory as storage for index. And I have used optimize() method
of IndexWriter to optimize index for faster access.

Now I use DbDirectory (Berkley DB) as storage. Does it make sense to use
optimize method on index stored in this storage?..

What does optimize do actually?

Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: LIMO problems

2004-12-09 Thread Luke Shannon
I use Luke. It is pretty good.

http://www.getopt.org/luke/

Luke
- Original Message - 
From: Daniel Cortes [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, December 09, 2004 8:32 AM
Subject: LIMO problems


 Hi, I'm tying Limo (Index Monitor of Lucene) and I have a problem,
 obviously it will be a silly problem but now I don't
 have solution.
 Someone can tell me how structure it have limo.properties file?
 because I have any example thanks.
 If you know another web-aplication for administration Lucenes Index say
me.
 Thanks for all, and excuse me for my silly questions.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



maxDoc()

2004-12-09 Thread Garrett Heaver
Can anyone please explain to my why maxDoc returns 0 when Luke shows 239,473
documents?

 

maxDoc returns the correct number until I delete a document. And I have
called optimize after the delete but still the problem remains

 

Strange.

 

Any ideas greatly appreciated

Garrett



Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
But when I am searching, it only searches in the index. Stored fields are 
only used to display the results, not to search. Why would it lose the terms 
in the index when I retrieve the document?

First solution is not possible (I can't create a new document) since I only 
have modified fields.

When I get a document, doesn't the fields have indexed terms along with it? 
Is there no way to get a full document (along with indexed terms) and clone 
it and add it to the index?

Well is there anyway I ca update a document with just one field (because I 
only have data for that one field)?

Praveen
- Original Message - 
From: Justin Swanhart [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, December 08, 2004 5:59 PM
Subject: Re: partial updating of lucene


You unstored fields were not stored in the index, only their terms
were stored.  When you get the document from the index and modify it,
those terms are lost when you add the document again.
You can either simply create a new document and populate all the
fields and add that document to the index, or you can add the unstored
fields to the document retrieved in step 1.
On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi
[EMAIL PROTECTED] wrote:
Hi all,
I have a question about updating the lucene document. I know that there 
is no API to do that now. So this is what I am doing in order to update 
the document with the field title.

1) Get the document from lucene index
2) Remove a field called title and add the same field with a modified 
value
3) Remove the docment (based on one of our field) using Reader and then 
close the Reader.
4) Add the document that is obtained in 1 and modified in 2.

I am not sure if this is the right way of doing it but I am having 
problems searching for that document after updating it. The problem is 
only with the un stored fields.

For example, I search as description:boy where description is a 
unstored, indexed, tokenized field in the document. I find 1 document. 
Now I update the document the document's title as descripbed above and 
repeat the same search description:boy and now I don't find any 
results. I have not touched the field description at all. I just 
updated the field title.

Is this an expected behaviour? If not, is it a bug.
If I change the field description as stored, indexed and tokenized, the 
search works fine before and after updating.

Praveen
**
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel:  401.854.3475
Fax:  401.861.3596
web: http://www.contextmedia.com
**
Context Media- The Leader in Enterprise Content Integration

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: LIMO problems

2004-12-09 Thread Luke Francl
On Thu, 2004-12-09 at 07:32, Daniel Cortes wrote:
 Hi, I'm tying Limo (Index Monitor of Lucene) and I have a problem, 
 obviously it will be a silly problem but now I don't
 have solution.
 Someone can tell me how structure it have limo.properties file?
 because I have any example thanks.
 If you know another web-aplication for administration Lucenes Index say me.
 Thanks for all, and excuse me for my silly questions.

Daniel,

Julien or I will be happy to help you, but I need more information. What
version of LIMO are you using?

In LIMO 0.5.2, Julien added a new feature which allows you to configure
the LIMO web application while it is running through the limo.properties
file.

This file is in the standard Java properties file format:

index name=filesystem location

However, you shouldn't need to care about this detail, as there is a
method to add indexes from the web application.

If you have any other questions, please don't hesitate to ask.

Regards,
Luke Francl
LIMO developer

P.S.: LIMO 0.5.2 adds a new index file browser that shows you some
interesting details about your index files. Check it out!


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
If I store all the fields I am indexing, is it safe to get the document, 
update a fields and add it again to the search index? I do not want to lose 
anything and I want to make sure that document is same before and after 
updating (execpt for the updated fields).

Praveen
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, December 09, 2004 10:00 AM
Subject: Re: partial updating of lucene


On Dec 9, 2004, at 9:48 AM, Praveen Peddi wrote:
But when I am searching, it only searches in the index. Stored fields are 
only used to display the results, not to search. Why would it lose the 
terms in the index when I retrieve the document?

First solution is not possible (I can't create a new document) since I 
only have modified fields.

When I get a document, doesn't the fields have indexed terms along with 
it? Is there no way to get a full document (along with indexed terms) and 
clone it and add it to the index?

Well is there anyway I ca update a document with just one field (because 
I only have data for that one field)?
A Document only carries along its *stored* fields.  Fields that are 
indexed, but not stored, are not retrievable from Document.

Have a look at the tool Luke (Google for luke lucene :) and see how it 
does its Reconstruct and Edit facility.  It is possible, though 
potentially lossy, to reconstruct a document and add it again.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: LIMO problems

2004-12-09 Thread Luke Francl
On Thu, 2004-12-09 at 10:07, Daniel Cortes wrote:
 I've the last version of LIMO.
 It is running in a Tomcat and I can't add any Index and don't load the 
 index that I create the index before from console (java 
 org.apache.lucene.demo.IndexFiles ...)
 This is the reasson that I demand the structure of limo.properties 
 because this file don't exist and this maner I can force to load the 
 localitation of de Index File.
 Thanks for your time.

Ah, this probably means that LIMO cannot write to this location. If you
give the user you are running Tomcat as permission to write files to
your webapps/limo.war directory (or whatever it's called, I don't
actually use Tomcat), it should work.

If you don't want to do that for security reasons, simply create the
file and put it there yourself. It should be at the same level as the
index.jsp file.

Regards,
Luke Francl
LIMO developer


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: partial updating of lucene

2004-12-09 Thread Luke Francl
On Thu, 2004-12-09 at 09:00, Erik Hatcher wrote:

 Have a look at the tool Luke (Google for luke lucene :) and see how 
 it does its Reconstruct and Edit facility.  It is possible, though 
 potentially lossy, to reconstruct a document and add it again.

Or look at LIMO's implementation of that feature, which to my eyes is a
little easier to read (of course that's probably because I wrote it...
;):

http://cvs.sourceforge.net/viewcvs.py/limo/limo/src/net/sourceforge/limo/LimoUtils.java?rev=1.6view=markup

(check out LimoUtils.reconstructDocument())

However, if you're doing analysis on your text to remove stopwords and
stuff like that, this WILL be lossy. I consider it more of an aid for
debugging than a way to re-index documents, though I suppose it would
work for that as well. However, I believe the process would be highly
resource intensive so I wouldn't recommend it.

The better solution is to add a stored keyword field that stores the
location of your document, and then re-index it from the source.

Regards,
Luke Francl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: maxDoc()

2004-12-09 Thread Otis Gospodnetic
Hello Garrett,

Share some code, it will be easier for others to help you that way. 
Obviously, this would be a huge bug if the problem were within Lucene.

Otis

--- Garrett Heaver [EMAIL PROTECTED] wrote:

 Can anyone please explain to my why maxDoc returns 0 when Luke shows
 239,473
 documents?
 
  
 
 maxDoc returns the correct number until I delete a document. And I
 have
 called optimize after the delete but still the problem remains
 
  
 
 Strange.
 
  
 
 Any ideas greatly appreciated
 
 Garrett
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Retrieving all docs in the index

2004-12-09 Thread Erik Hatcher
On Dec 9, 2004, at 1:35 PM, Ravi wrote:
 Is there any other way to extract all documents from an index apart
from adding an additional field with the same value to all documents 
and
then doing a term query on that field with the common value?
Of course.  Have a look at the IndexReader API.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Retrieving all docs in the index

2004-12-09 Thread Ravi
I'm sorry I don't think I articulated my question well. We use a date
filter to sort the search results. This works fine when te user provides
some search criteria. But if he gives an empty search criteria, we need
to return all the documents in the index in the given date range sorted
by date. So I was looking for a query that returns me all documents in
the index and then I want to apply the date filter on it.  


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 1:55 PM
To: Lucene Users List
Subject: Re: Retrieving all docs in the index

On Dec 9, 2004, at 1:35 PM, Ravi wrote:
  Is there any other way to extract all documents from an index apart 
 from adding an additional field with the same value to all documents 
 and then doing a term query on that field with the common value?

Of course.  Have a look at the IndexReader API.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Retrieving all docs in the index

2004-12-09 Thread Aviran
In this case you'll have to add another field with a fixed value to all the
documents and query on that field


Aviran
http://www.aviransplace.com

-Original Message-
From: Ravi [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 14:04 PM
To: Lucene Users List
Subject: RE: Retrieving all docs in the index


I'm sorry I don't think I articulated my question well. We use a date filter
to sort the search results. This works fine when te user provides some
search criteria. But if he gives an empty search criteria, we need to return
all the documents in the index in the given date range sorted by date. So I
was looking for a query that returns me all documents in the index and then
I want to apply the date filter on it.  


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 1:55 PM
To: Lucene Users List
Subject: Re: Retrieving all docs in the index

On Dec 9, 2004, at 1:35 PM, Ravi wrote:
  Is there any other way to extract all documents from an index apart
 from adding an additional field with the same value to all documents 
 and then doing a term query on that field with the common value?

Of course.  Have a look at the IndexReader API.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Retrieving all docs in the index

2004-12-09 Thread Paul Elschot
On Thursday 09 December 2004 21:18, Ravi wrote:
 That was exactly my original question. I was wondering if there are
 alternatives to this approach.  

In case you need only a few of the top ranking documents,
and the documents are to be sorted by date anyway,
you might consider to search each of the dates in sorted
order separately until you have enough results.

In that way there is no need to use a field with some constant
value. Nonetheless, I can recommend to have a special field
containing all the field names for a document. As all
docs normally contain a primary key, the name of the primary
key field can serve as the constant value.

Regards,
Paul Elschot

 
 -Original Message-
 From: Aviran [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, December 09, 2004 2:08 PM
 To: 'Lucene Users List'
 Subject: RE: Retrieving all docs in the index
 
 In this case you'll have to add another field with a fixed value to all
 the documents and query on that field
 
 
 Aviran
 http://www.aviransplace.com
 
 -Original Message-
 From: Ravi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 14:04 PM
 To: Lucene Users List
 Subject: RE: Retrieving all docs in the index
 
 
 I'm sorry I don't think I articulated my question well. We use a date
 filter
 to sort the search results. This works fine when te user provides some
 search criteria. But if he gives an empty search criteria, we need to
 return
 all the documents in the index in the given date range sorted by date.
 So I
 was looking for a query that returns me all documents in the index and
 then
 I want to apply the date filter on it.  
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, December 09, 2004 1:55 PM
 To: Lucene Users List
 Subject: Re: Retrieving all docs in the index
 
 On Dec 9, 2004, at 1:35 PM, Ravi wrote:
   Is there any other way to extract all documents from an index apart
  from adding an additional field with the same value to all documents 
  and then doing a term query on that field with the common value?
 
 Of course.  Have a look at the IndexReader API.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Retrieving all docs in the index

2004-12-09 Thread Ravi
Thanks Paul. I think I'll go with the first approach (adding a new
field).  

-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 09, 2004 3:49 PM
To: [EMAIL PROTECTED]
Subject: Re: Retrieving all docs in the index

On Thursday 09 December 2004 21:18, Ravi wrote:
 That was exactly my original question. I was wondering if there are 
 alternatives to this approach.

In case you need only a few of the top ranking documents, and the
documents are to be sorted by date anyway, you might consider to search
each of the dates in sorted order separately until you have enough
results.

In that way there is no need to use a field with some constant value.
Nonetheless, I can recommend to have a special field containing all the
field names for a document. As all docs normally contain a primary key,
the name of the primary key field can serve as the constant value.

Regards,
Paul Elschot

 
 -Original Message-
 From: Aviran [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 2:08 PM
 To: 'Lucene Users List'
 Subject: RE: Retrieving all docs in the index
 
 In this case you'll have to add another field with a fixed value to 
 all the documents and query on that field
 
 
 Aviran
 http://www.aviransplace.com
 
 -Original Message-
 From: Ravi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 14:04 PM
 To: Lucene Users List
 Subject: RE: Retrieving all docs in the index
 
 
 I'm sorry I don't think I articulated my question well. We use a date 
 filter to sort the search results. This works fine when te user 
 provides some search criteria. But if he gives an empty search 
 criteria, we need to return all the documents in the index in the 
 given date range sorted by date.
 So I
 was looking for a query that returns me all documents in the index and

 then I want to apply the date filter on it.
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 09, 2004 1:55 PM
 To: Lucene Users List
 Subject: Re: Retrieving all docs in the index
 
 On Dec 9, 2004, at 1:35 PM, Ravi wrote:
   Is there any other way to extract all documents from an index apart

  from adding an additional field with the same value to all documents

  and then doing a term query on that field with the common value?
 
 Of course.  Have a look at the IndexReader API.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Coordination value

2004-12-09 Thread Jason Haruska
I would like to adjust the score lucene is returning to use the
coordination component more. For example, I have a BooleanQuery
containing three TermQueries. I would like to adjust the score so that
documents containing all three terms appear first, followed by docs
that contain only two of the terms, followed by documents that contain
only one of the terms.

I understand that the coordination is a component of the overall
document score currently, but I'd like to make it more absolute. I was
wondering if someone on the list has done something similar.

I have implemented a hack that works by adding a function to the
BooleanWeight class but it is very slow. I believe it is inefficient
because it uses the Explanation class to get the coordination value.
There must be an easier way that I'm missing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Coordination value

2004-12-09 Thread Chuck Williams
There is an easier way.  You should use a custom Similarity, which
allows you to define your own coord() method.  Look at DefaultSimilarity
(which specializes Similarity).

I'd suggest analyzing your scores first with explain() to decide what
you really want to tweak.  Just a guess, but your issue might be that
your idf()'s are dominating the score computation.  I had this problem
and change the default idf() to take a final square root, since Lucene
squares that contribution (which is one of its few areas that is
generally not considered best practice).  I also boost the base of the
logarithms on both tf and idf to weight those factors lower.

Good luck,

Chuck

   -Original Message-
   From: Jason Haruska [mailto:[EMAIL PROTECTED]
   Sent: Thursday, December 09, 2004 1:36 PM
   To: Lucene Users List
   Subject: Coordination value
   
   I would like to adjust the score lucene is returning to use the
   coordination component more. For example, I have a BooleanQuery
   containing three TermQueries. I would like to adjust the score so
that
   documents containing all three terms appear first, followed by docs
   that contain only two of the terms, followed by documents that
contain
   only one of the terms.
   
   I understand that the coordination is a component of the overall
   document score currently, but I'd like to make it more absolute. I
was
   wondering if someone on the list has done something similar.
   
   I have implemented a hack that works by adding a function to the
   BooleanWeight class but it is very slow. I believe it is inefficient
   because it uses the Explanation class to get the coordination value.
   There must be an easier way that I'm missing.
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene in action ebook

2004-12-09 Thread Kevin A. Burton
Erik Hatcher wrote:
I have the e-book PDF in my possession. I have been prodding Manning 
on a daily basis to update the LIA website and get the e-book 
available. It is ready, and I'm sure that its just a matter of them 
pushing it out. There may be some administrative loose ends they are 
tying up before releasing it to the world. It should be available any 
minute now, really. :)
Send off a link to the list when its out...
We're all holding our breath ;)
(seriously)
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Permissioning Documents

2004-12-09 Thread Steve Skillcorn, Docuviz Technologies
Hi;
 
I'm currently using Lucene (which I am extremely impressed with BTW) to
index a knowledge base of documents.  One issue I have is that only certain
documents are available to certain users (or groups).  The number of
documents is large, into the 100,000s, and the number of uses can be into
the 1000s.  Obviously, the users permissioned to see certain documents can
change regularly, so storing the user id's in the Lucene document is
undesirable, as a permission change could mean a delete and re-add to
potentially 100s of documents.
 
Does anyone have any guidance as to how I should approach this?
 
Would this be something the Lucene community would be interested in having
committed back if I embark on an optimised development at the index level?
 
Or, in the opinion of the group, should I just brute force the adds/deletes
and be done with it?
 
All advice greatly appreciated.
 
Steve
 
 


Re: lucene in action ebook

2004-12-09 Thread Paul Smith
synchronized(luceneEbook){
luceneEbook.wait();
}
Just waiting for the notifyAll()
Kevin A. Burton wrote:
Erik Hatcher wrote:
I have the e-book PDF in my possession. I have been prodding Manning 
on a daily basis to update the LIA website and get the e-book 
available. It is ready, and I'm sure that its just a matter of them 
pushing it out. There may be some administrative loose ends they are 
tying up before releasing it to the world. It should be available any 
minute now, really. :)

Send off a link to the list when its out...
We're all holding our breath ;)
(seriously)
Kevin
--
*Paul Smith
*Software Architect

*Aconex
* 31 Drummond Street, Carlton, VIC 3053, Australia
*Tel: +61 3 9661 0200  *Fax: +61 3 9654 9946
Email: [EMAIL PROTECTED]  www.aconex.com**
This email and any attachments are intended solely for the addressee. 
The contents may be privileged, confidential and/or subject to copyright 
or other applicable law. No confidentiality or privilege is lost by an 
erroneous transmission. If you have received this e-mail in error, 
please let us know by reply e-mail and delete or destroy this mail and 
all copies. If you are not the intended recipient of this message you 
must not disseminate, copy or take any action in reliance on it. The 
sender takes no responsibility for the effect of this message upon the 
recipient's computer system.**

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]