date:20050228

UWTV Program: Google: A Behind-the-Scenes Look

2005-02-28 Thread Chakra Yadavalli

Just came across this interesting webcast. Check it out.
-- Chakra
"Google: A Behind-the-Scenes Look
Search is one of the most important applications used on the internet
and poses some of the most interesting challenges in computer science.
Providing high-quality search requires understanding across a wide
range of computer science disciplines. In this program, Jeff Dean of
Google describes some of these challenges, discusses applications
Google has developed, and highlights systems they've built, including
GFS, a large-scale distributed file system, and MapReduce, a library
for automatic parallelization and distribution of large-scale
computation. He also shares some interesting observations derived from
Google's web data."
http://www.uwtv.org/programs/displayevent.asp?rid=2459
-- 
Visit my weblog: http://www.jroller.com/page/cyblogue

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-28 Thread Kevin A. Burton

Doug Cutting wrote:
The default value is probably good for all but folks with very large 
indexes, who may wish to increase the default somewhat.  Also folks 
with smaller indexes and very high query volumes may wish to decrease 
the default.  It's a classic time/memory tradeoff.  Higher values use 
less memory and make searches a bit slower, smaller values use more 
memory and make searches a bit faster.
BTW.. can you define "a bit"...
Is "a bit" 5%?  10%?  Benchmarks would be ncie but I'm not that picky.  
I just want to see what performance hits/benefits I could see by 
tweaking the values.

Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

PDF Highlighter Package

2005-02-28 Thread Ben Litchfield


For those of you that support indexing PDF documents, PDFBox now supports
Adobe's PDF Highlight specification
(http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf)

PDFBox is now capable of generating an XML document that describes words
in a PDF document to highlight.

An "in action" example can be seen at

http://pavilion.csh.rit.edu:8080/pdfbox/index.html

You can enter any web accessible PDF and any keywords.  The PDF will open
normally and after a short pause(this is running on an old slow server)
will jump to the first selected keyword.

Source code is available in CVS or in tonight's nightly build.

Any comments/suggestions are welcome.

Special thanks to Stephan Lagraulet, who made this possible with code
contributions.

Ben
http://www.pdfbox.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-28 Thread Doug Cutting

Chris Hostetter wrote:
 1) If making it mutatable requires changes to other classes to propogate
it, then why is it now an instance variable instead of a static?
(Presumably making it an instance variable allows subclasses to
override the value, but if other classes have internal expectations
of the value, that doesn't seem safe)
Its an instance variable because it can vary from instance-to-instance. 
 This value is specified when an index segment is written, and 
subsequently read from disk and used when reading that segment.  It's an 
instance variable in both the writing and reading code.  The thing 
that's lacking is a way to pass in alternate values to the writing code.

The reason that other classes are involved is that the reading and 
writing code are in non-public classes.  We don't want to expose the 
implementation too much by making these public, but would rather expose 
these as getter/setter methods on the relevant public API.

 2) Should it be configurable through a get/set method, or through a
system property?
(which rehashes the instance/global question)
That's indeed the question.  My guess is that a system property would be 
probably be sufficient for most, but perhaps not for all.  Similarly 
with a static setter/getter.  But a getter/setter on IndexWriter would 
make everyone happy.

 3) Is it important that a writer updating an existing index use the same
value as the writer that initial created the index?  if so should
there really be a "preferedIndexInterval" variable which is mutatable,
and a "currentIndexInterval" which is set to the value of the index
currently being updated.  Such that preferedIndexInterval is used when
making an index from scratch and currentIndexInterval is used when
adding segments to a new index?
It's used whenever an index segment is created.  Index segments are 
created when documents are added and when index segments are merged to 
form larger index segments.  Merging happens frequently while indexing. 
 Optimization merges all segments.

The value can vary in each segment.
The default value is probably good for all but folks with very large 
indexes, who may wish to increase the default somewhat.  Also folks with 
smaller indexes and very high query volumes may wish to decrease the 
default.  It's a classic time/memory tradeoff.  Higher values use less 
memory and make searches a bit slower, smaller values use more memory 
and make searches a bit faster.

Unless there are objections I will add this as:
  IndexWriter.setTermIndexInterval()
  IndexWriter.getTermIndexInterval()
Both will be marked "Expert".
Further discussion should move to the lucene-dev list.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Doug Cutting

Claude Libois wrote:
The explanation given by the IndexSearcher indicate me that the boost of my
title is
1.0 where  it should be 10.0.
I really don't understand what it's wrong.
You're seeing the boost for the query term, not the boost for the 
document's field.  The boost for the field in the document is multiplied 
by its lengthNorm.  This product is displayed in explanations as the 
"fieldNorm".

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

IndexSearch and IndexWriter on 2 CPU's

2005-02-28 Thread Yura Smolsky

Hello.

I have Dual CPU's box with RH Linux. I run two processes on this box.

1. IndexWriter which adds new documents into index constantly 24/7/365
:)
2. IndexSearcher, which perform searchers from this index.

Sometimes "writer" begins to merge index (this caused by mergeFactor
and structure of Lucene index) "inside" addDocument method. And if merge begins 
then my "writer" process
takes both CPU's time (180-200% totally). Actually most time time goes
to IO operations.

When merge operation begins then all searches performed by
IndexSearcher on this computer are very-very slowed down b/c all CPU
time is under first process.

How can I "give" second process more CPU time or how can I reduce IO
time of first process?

Maybe I can tweak something about index configuration.
I have set
   writer.mergeFactor = 2
   writer.minMergeDocs = 2500


Yura Smolsky.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

ANN: LuceGene bioinformatics application updated

2005-02-28 Thread Don Gilbert

LuceGene release 1.4  is available now at
http://www.gmod.org/lucegene/
and http://eugenes.org/gmod/lucegene/

LuceGene is an open-source document/object search and retrieval system
specially tuned for bioinformatics text databases and documents.  It is
similar in concept to the commercial SRS package (Sequence Retrieval
System). LuceGene is written in Java, built with the open-source Lucene
package [http://jakarta.apache.org/lucene/]

This release includes an easy to use demonstration. Pop it into a Tomcat
web server and run.

LuceGene adds these bioinformatics methods to Lucene:

 * Indexing adaptors for formats such as XML, PDF Documents,
 Biosequences, Spreadsheets, HTML, and others, with fine tuning by data
 field.

 * Configurations for bio-data include UniProt/Swiss-Prot, Fasta and
 GenBank sequences, BIND protein interactions, BLAST outputs,
 Medline and others.

 * Support for batch-list look-ups and searches by ID, gene names, etc.

 * Web interface with paged results, batch downloads, search
 refinement and search-linking among data libraries.

 * Web Services support with a SOAP interface.

 * Output support for data-field selection and formats such as
 Spreadsheet, XML, HTML, and others.

It can take as little as a few hours engineering time to add new
databank parsing, making it a cost-effective way to use many
bioinformatics data sets.

LuceGene is speedy with big data sets: indexing and searching the
UniProt library of 1.7 million sequences with LuceGene is comparable to
using SRS. Gene Annotation object search and retrieval with LuceGene is
10x to 20x faster than using a Postgres Chado database.

-- Don Gilbert
Genome Informatics Lab
Indiana University, Bloomington IN
http://iubio.bio.indiana.edu/gil/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

2005-02-28 Thread Erik Hatcher

On Feb 28, 2005, at 10:39 AM, Stanislav Jordanov wrote:
> What did you do in your private investigation?
1. empirical tests with an index of nearly 75,000 docs (I am attaching 
the test source)
Only certain (.txt?) attachments are allowed to come through on the 
mailing list.

> Sorted by descending relevance (the default), or in some other way?
In some other way - sorted by some column (asc or desc - doesn't 
matter)
Using IndexSearcher(query, sort)?
 > If a search is fast enough, as you report, then you can simply start
> your access to Hits at the appropriate spot.  For the current systems
> I'm working on, this is the approach I've used - start iterating hits
> at (pageNumber - 1) * numberOfItemsPerPage.
>
> Is that approach insufficient?
I'm afraid this is not sufficient;
Either I am doing something wrong,
or it is not that simple:
following is a log from my test session;
It appears that IndexSearcher.search(...) finishes rather fast
compared to the time it takes to fetch the last document from the Hits 
object.
I assume you are only accessing the documents you wish to display 
rather than all of them up to where you need.   Also keep in mind that 
accessing a Document is when the document is pulled from the index.  If 
you have a large amount of data in a document it will take a 
corresponding amount of time to load it.  You may need to restructure 
what you store in a document to reduce the load times.  Or perhaps you 
need to investigate the (is it in the codebase already?) patch to load 
fields lazily upon demand instead.

Erik


The log starts here:
pa
Found 74222 document(s) that matched query 'pa'
Sorting by "sfile_name"
query executed in 16ms
Last doc accessed in 375ms
us
Found 74222 document(s) that matched query 'us'
Sorting by "sfile_name"
query executed in 31ms
Last doc accessed in 219ms
1
Found 74222 document(s) that matched query '1'
Sorting by "sfile_name"
query executed in 15ms
Last doc accessed in 235ms
5
Found 74222 document(s) that matched query '5'
Sorting by "sfile_name"
query executed in 422ms
Last doc accessed in 219ms
6
Found 72759 document(s) that matched query '6'
Sorting by "sfile_name"
query executed in 344ms
Last doc accessed in 250ms
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

2005-02-28 Thread Stanislav Jordanov




> What did you do in your private 
investigation?1. empirical tests with an index of 
nearly 75,000 docs (I am attaching the test source)
2. reviewing and tracing the source code of 
Lucene
(I do not claim I have gained a deep understanding 
of it ;-)
 
> Sorted by descending relevance (the default), 
or in some other way?In some other way - sorted by some column (asc or desc 
- doesn't matter)
 
> If a search is fast enough, as you report, 
then you can simply start > your access to Hits at the appropriate 
spot.  For the current systems > I'm working on, this is the 
approach I've used - start iterating hits > at (pageNumber - 1) * 
numberOfItemsPerPage.> > Is that approach 
insufficient?
I'm afraid this is not sufficient;
Either I am doing something wrong,
or it is not that simple:
following is a log from my test 
session;
It appears that IndexSearcher.search(...) finishes 
rather fast
compared to the time it takes to fetch the last 
document from the Hits object.
The log starts here:

pa
Found 74222 document(s) that matched query 
'pa'
Sorting by "sfile_name"
query executed in 16ms
Last doc accessed in 375ms
us
Found 74222 document(s) that matched query 
'us'
Sorting by "sfile_name"
query executed in 31ms
Last doc accessed in 219ms
1
Found 74222 document(s) that matched query 
'1'
Sorting by "sfile_name"
query executed in 15ms
Last doc accessed in 235ms
5
Found 74222 document(s) that matched query 
'5'
Sorting by "sfile_name"
query executed in 422ms
Last doc accessed in 219ms
6
Found 72759 document(s) that matched query 
'6'
Sorting by "sfile_name"
query executed in 344ms
Last doc accessed in 
250ms
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

2005-02-28 Thread Erik Hatcher

On Feb 28, 2005, at 6:00 AM, Stanislav Jordanov wrote:
my private investigation already left me sceptic about the outcome of 
this
issue,
but I've decided to post it as a final resort.
What did you do in your private investigation?
Suppose I have an index of about 5,000,000 docs
and I am running a single term queries against it, including queries 
which
return say 1,000,000 or even more hits.

The hits are sorted by some column and I am happy with the query 
execution
time (i.e. the time spent in the IndexSearcher.query(...) method).
Now comes the problem: it is a product requirement that the client is
allowed to quickly access (by scrolling) a random page of the result 
set.
Put in different words the app must quickly (in less that a second) 
respond
to requests like: "Give me the results from No 567100 to No 567200"
(remember the results are sorted thus ordered).
Sorted by descending relevance (the default), or in some other way?
If a search is fast enough, as you report, then you can simply start 
your access to Hits at the appropriate spot.  For the current systems 
I'm working on, this is the approach I've used - start iterating hits 
at (pageNumber - 1) * numberOfItemsPerPage.

Is that approach insufficient?
Erik

I took a look at Lucene's internals which only left me with the 
suspision
that this is an impossible task.
Would anyone, please, prove my suspision wrong?

Regards
Stanislav

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

2005-02-28 Thread Volodymyr Bychkoviak

just retrieve Documents from 567100 to 567200 from Hits class you got 
while searching.

Stanislav Jordanov wrote:
Guys,
my private investigation already left me sceptic about the outcome of this
issue,
but I've decided to post it as a final resort.
Perhaps the gurus know the right answer :-)
Suppose I have an index of about 5,000,000 docs
and I am running a single term queries against it, including queries which
return say 1,000,000 or even more hits.
The hits are sorted by some column and I am happy with the query execution
time (i.e. the time spent in the IndexSearcher.query(...) method).
Now comes the problem: it is a product requirement that the client is
allowed to quickly access (by scrolling) a random page of the result set.
Put in different words the app must quickly (in less that a second) respond
to requests like: "Give me the results from No 567100 to No 567200"
(remember the results are sorted thus ordered).
I took a look at Lucene's internals which only left me with the suspision
that this is an impossible task.
Would anyone, please, prove my suspision wrong?
Regards
Stanislav

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search performance with one index vs. many indexes

2005-02-28 Thread Runde, Kevin

Hi All,

Sorry about that please disregard that last email. I must not be fully
awake yet.

Sorry,
Kevin Runde 

-Original Message-
From: Runde, Kevin [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 7:34 AM
To: Lucene Users List
Subject: RE: Search performance with one index vs. many indexes

Follow Up to the article from Friday 

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 1:30 AM
To: Lucene Users List
Subject: Re: Search performance with one index vs. many indexes

Jochen Franke writes:
> Topic: Search performance with large numbers of indexes vs. one large
index
> 
> 
> My questions are:
> 
> - Is the size of the "wordlist" the problem?
> - Would we be a lot faster, when we have a smaller number
> of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be
distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search performance with one index vs. many indexes

2005-02-28 Thread Runde, Kevin

Follow Up to the article from Friday 

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 1:30 AM
To: Lucene Users List
Subject: Re: Search performance with one index vs. many indexes

Jochen Franke writes:
> Topic: Search performance with large numbers of indexes vs. one large
index
> 
> 
> My questions are:
> 
> - Is the size of the "wordlist" the problem?
> - Would we be a lot faster, when we have a smaller number
> of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be
distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Morus Walter

Claude Libois writes:
> The explanation given by the IndexSearcher indicate me that the boost of my
> title is
> 1.0 where  it should be 10.0.
> I really don't understand what it's wrong.

AFAIK you cannot get the boost of a field from the index because it's 
not stored as such.
It's calculated in the fields length norm or something like that during
indexing. Search the list archives for details.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Claude Libois

The explanation given by the IndexSearcher indicate me that the boost of my
title is
1.0 where  it should be 10.0.
I really don't understand what it's wrong.
Claude Libois
[EMAIL PROTECTED]
Technical associate - Unisys

- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 28, 2005 11:10 AM
Subject: Re: Boost doesn't works


> Use the IndexSearcher.explain() feature to look at how Lucene is
> calculating the score.
>
> Erik
>
>
> On Feb 28, 2005, at 3:32 AM, Claude Libois wrote:
>
> > I use MultiFieldQueryParser(search only done on summary,title and
> > content)
> > with a FilteredQuery.
> > Claude Libois
> > [EMAIL PROTECTED]
> > Technical associate - Unisys
> >
> > - Original Message -
> > From: "Morus Walter" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" 
> > Sent: Monday, February 28, 2005 9:28 AM
> > Subject: Re: Boost doesn't works
> >
> >
> >> Claude Libois writes:
> >>> Hello. I'm using Lucene for an application and I want to boost the
> >>> title
> > of
> >>> my documents.
> >>> For that I use the setBoost method that is applied on the title
> >>> field.
> >>> However when I look with luke(1.6) I don't see any boost on this
> >>> field
> > and
> >>> when
> >>> I do a search the score isn't change. What's wrong?
> >>
> >> How do you search?
> >> I guess you cannot see a change unless you combine searches in
> >> different
> >> fields, since scores are normalized.
> >>
> >> Morus
> >>
> >> -
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Fast access to a random page of the search results.

2005-02-28 Thread Stanislav Jordanov

Guys,
my private investigation already left me sceptic about the outcome of this
issue,
but I've decided to post it as a final resort.
Perhaps the gurus know the right answer :-)

Suppose I have an index of about 5,000,000 docs
and I am running a single term queries against it, including queries which
return say 1,000,000 or even more hits.

The hits are sorted by some column and I am happy with the query execution
time (i.e. the time spent in the IndexSearcher.query(...) method).
Now comes the problem: it is a product requirement that the client is
allowed to quickly access (by scrolling) a random page of the result set.
Put in different words the app must quickly (in less that a second) respond
to requests like: "Give me the results from No 567100 to No 567200"
(remember the results are sorted thus ordered).
I took a look at Lucene's internals which only left me with the suspision
that this is an impossible task.
Would anyone, please, prove my suspision wrong?

Regards
Stanislav



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Erik Hatcher

Use the IndexSearcher.explain() feature to look at how Lucene is 
calculating the score.

Erik
On Feb 28, 2005, at 3:32 AM, Claude Libois wrote:
I use MultiFieldQueryParser(search only done on summary,title and 
content)
with a FilteredQuery.
Claude Libois
[EMAIL PROTECTED]
Technical associate - Unisys

- Original Message -
From: "Morus Walter" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 28, 2005 9:28 AM
Subject: Re: Boost doesn't works

Claude Libois writes:
Hello. I'm using Lucene for an application and I want to boost the 
title
of
my documents.
For that I use the setBoost method that is applied on the title 
field.
However when I look with luke(1.6) I don't see any boost on this 
field
and
when
I do a search the score isn't change. What's wrong?
How do you search?
I guess you cannot see a change unless you combine searches in 
different
fields, since scores are normalized.

Morus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Claude Libois

I use MultiFieldQueryParser(search only done on summary,title and content)
with a FilteredQuery.
Claude Libois
[EMAIL PROTECTED]
Technical associate - Unisys

- Original Message - 
From: "Morus Walter" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 28, 2005 9:28 AM
Subject: Re: Boost doesn't works


> Claude Libois writes:
> > Hello. I'm using Lucene for an application and I want to boost the title
of
> > my documents.
> > For that I use the setBoost method that is applied on the title field.
> > However when I look with luke(1.6) I don't see any boost on this field
and
> > when
> > I do a search the score isn't change. What's wrong?
>
> How do you search?
> I guess you cannot see a change unless you combine searches in different
> fields, since scores are normalized.
>
> Morus
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boost doesn't works

2005-02-28 Thread Morus Walter

Claude Libois writes:
> Hello. I'm using Lucene for an application and I want to boost the title of
> my documents.
> For that I use the setBoost method that is applied on the title field.
> However when I look with luke(1.6) I don't see any boost on this field and
> when
> I do a search the score isn't change. What's wrong?

How do you search?
I guess you cannot see a change unless you combine searches in different 
fields, since scores are normalized.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Boost doesn't works

2005-02-28 Thread Claude Libois

Hello. I'm using Lucene for an application and I want to boost the title of
my documents.
For that I use the setBoost method that is applied on the title field.
However when I look with luke(1.6) I don't see any boost on this field and
when
I do a search the score isn't change. What's wrong?
Here is the code where I set the boost factor.

public Document getDocument() throws TechnicalException {
Document doc = new Document();
log.trace(new TraceMessage("will  add title,resume,content,date to
the Lucene Document"));
Field field = Field.UnStored("Content",content);
doc.add(field);
doc.add(Field.Text("Summary", summary));
field = Field.Text("Title", title);
field.setBoost(10);
doc.add(field);
return doc;

}
Do I have to do something else to activate boosting?

Claude Libois
[EMAIL PROTECTED]
Technical associate - Unisys


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

UWTV Program: Google: A Behind-the-Scenes Look

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

PDF Highlighter Package

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

Re: Boost doesn't works

IndexSearch and IndexWriter on 2 CPU's

ANN: LuceGene bioinformatics application updated

Re: Fast access to a random page of the search results.

Re: Fast access to a random page of the search results.

Re: Fast access to a random page of the search results.

Re: Fast access to a random page of the search results.

RE: Search performance with one index vs. many indexes

RE: Search performance with one index vs. many indexes

Re: Boost doesn't works

Re: Boost doesn't works

Fast access to a random page of the search results.

Re: Boost doesn't works

Re: Boost doesn't works

Re: Boost doesn't works

Boost doesn't works

20 matches

Site Navigation

Mail list logo

Footer information