From what you said, I'm thinking of switching to IndexModifier.
Yes, IndexModifier would synchronize add/delete. One should notice the
performance comment in IndexModifier
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexModifier.html
- While you can freely mix calls to
The problem I've had before was that I set my writer to null
right after close it. That's why I got lock timeout exception
when i try to create a the writer again. Guess I just need
to close it, and re-open it would avoid the
locking problems then.
It is valid to nullify the just closed
The lock time out exception is caused by trying to open multiple
IndexWriter objects in parallel - each of the 5 threads is creating its own
IndexWriter object in each invocation of addAndIndex(). This cannot work -
I think that chapter 2.9 of Lucene in Action is essential reading for
fixing this
I've tried changing to one indexing thread
(instead of 5) but still get the same problem. can't figure out why this
happens.
The program as listed seems to accesss an existing index - since 'create'
is always false for both 'FSDirectory.getDirectoy(,)' and 'new
IndexWriter(,,)'. Perhaps an old
I did clean everything but still getting the same problem. I'm using
lucene
2.0. Do you get the same problem on your machine?
Please try with this code - http://cdoronc.20m.com/tmp/indexingThreads.zip
Regards,
Doron
-
To
can't access the file:
http://cdoronc.20m.com/tmp/indexingThreads.zip
Yes, this Web host sometimes behaves strange when clicking a link from a
mail program. Please try to copy
cdoronc.20m.com/tmp
to the Web Browser (e.g. Firefox), click Enter.
This should show the content of that tmp folder,
Hits hits = searcher.search(qp.Query(queryStr));
I think it should be qp.parse(String query) (rather than qp.Query(String
field))
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
doc.add(new Field(to,
[EMAIL PROTECTED],
...
PrefixQuery pq = new PrefixQuery(new Term(to,
[EMAIL PROTECTED]));
Perhaps a typo in the query text -
Indexed text: [EMAIL PROTECTED]
Searched text: [EMAIL PROTECTED]
The searched text is not a prefix of the indexed one.
Regards,
Does it matter what order I add the sub-queries to the BooleanQuery Q.
That is, is the execution speed for the search faster (slower) if I do:
Q.add(Q1, BooleanClause.Occur.MUST);
Q.add(Q2, BooleanClause.Occur.MUST);
Q.add(Q3, BooleanClause.Occur.MUST);
As
\u002d would add -.
Originally request was for _ - \u005f
Mark Miller [EMAIL PROTECTED] wrote on 21/07/2006 13:09:28:
| #LETTER: // unicode letters
[
\u0041-\u005a,
\u0061-\u007a,
\u00c0-\u00d6,
\u00d8-\u00f6,
\u00f8-\u00ff,
(Seems 1.9 javadoc could be just a bit more clear on this.)
The following should do the work:
QueryParser qp = new MultiFieldQueryParser(fields[], analyzer);
Query = qp.parse(qtext);
Notice the difference in semantics as explained in the deprecated comment
in 1.9.
Also see the
Just realized that the some text part should also be grouped, so checked
that this variation also works:
qtxt = some text AND ( AUTHOR_NAME:krish OR EMPLOYEE_NAME:krish );
--- field:some +field:text +(AUTHOR_NAME:krish EMPLOYEE_NAME:krish)
qtxt = (some text) AND ( AUTHOR_NAME:krish OR
Few comments -
(from first posting in this thread)
The indexing was taking much more than minutes for a 1 MB log file. ...
I would expect to be able to index at least a of GB of logs within 1 or 2
minutes.
1-2 minutes per GB would be 30-60 GB/Hour, which for a single machine/jvm
is a lot -
By the resulted query toString(), boolean query would not work correctly:
qtxt: a foo
[1] Multi Field Query (OR) : (title:a body:a) (title:foo body:foo)
[2] Multi Field Query (AND): +(title:a body:a) +(title:foo body:foo)
[3] Boolean Query : (title:a title:foo) (body:a body:foo)
--
A document per row is seems correct to me too.
If search would be by msisdn / messageid, - and if, as it seems, these are
keywords, not free text that needs to be analyzed, they both should have
Index.UNTOKENIZED. Also, since no search is to be done by the line content,
the line should have
This task reminds me more of a count(*) sql query than a text search query.
Assuming that using a text search engine is a pre requisite, I can think of
two approaches - basing on Lucene scoring as suggested in the question, or
a more simple approach (below).
For the scoring approach - I don't
hu andy [EMAIL PROTECTED] wrote on 28/07/2006 01:28:14:
These codes are written in C#,. There is a C# version of Lucene 1.9,
which
I am not a C#'er so I might have misunderstood this code, still, here is my
take;
One general comment - the program sent is not self contained so it's hard
to debug
John john [EMAIL PROTECTED] wrote on 28/07/2006 06:36:19:
Hello,
I tried to add a field like that
field = new Field(number, 1,
Field.Store.YES,Field.Index.UN_TOKENIZED);
so i should be indexed and to analyzed? my writer is
writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
Doron Cohen/Haifa/[EMAIL PROTECTED] wrote on 28/07/2006 00:18:47:
For the scoring approach - I don't see an easy way to get the
counts from the score of the results, although the TF (term
frequency in candidate docs) is known+used during document
scoring, and although it seems
Hi Russel,
I am also interested in the internals of Lucene's ranking and how one
can/should alter the scoring. For now I was just learning from existing
code of Lucene scorers and Weights. Your question seemed interesting, so I
in fact implemented a quick scorer that would return the raw tf as a
A thought - would you (or the project lead;-) consider limiting the
'wildcard expansion'?
Assuming a query like:
( uni* near(5) science )
I.e. match docs with any word with prefix uni that spans no further than
5 from the word science. Assume current lexicon has M (say 1200) words
See IndexReader methods - terms() and terms(Term) - and Lucene FAQ -
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#terms()
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#terms(org.apache.lucene.index.Term)
If the 'small classifieds index' is sufficiently small to be re-indexed
every night, I think this would be a simple solution - just set the
document boosts according to these statistics - i.e. boost more down docs
of classifieds that were shown more yesterday -
Hi Otis,
I think that synchronizing the entire method would be an overkill - instead
it would be sufficient to synchronize on a by field object so that only
if two requests for the same cold/missing field are racing, one of them
would wait for the other to complete loading that field. I think
[EMAIL PROTECTED] wrote on 09/08/2006 11:22:12:
Assuming field wasn't being used to synchronize on something else,
this would still block *all* IndexReaders/Searchers trying to sort on
that field.
In Solr, it would make the situation worse. If I had my warmed-up
IndexSearcher serving live
[EMAIL PROTECTED] wrote on 09/08/2006 20:32:20:
Heh... interfaces strike again.
Well then since we *know* that no one has their own implementation
(because they would not have been able to register it), we should be
able to safely upgrade the interface to a class (anyone want to supply
a
Hi Deepan, The steps below seems correct, given that all the fields of the
original document are also stored - the javadoc for
indexReader.document(int n) (which I assume is what you are using) says:
Returns the stored fields of the nth Document in this index. - so, only
stored fields would exist
Hi Russel, my apologies for the delayed response. I rather have all
correspondence on the mailing list, but to keep this mail thread readable I
put the files at http://cdoronc.awardspace.com/TfTermQuery . I hope it
helps you and would be interested in your comments.
Regards,
Doron
Russell M.
On 8/10/06, Doron Cohen [EMAIL PROTECTED] wrote:
Sorting was introduced to Lucene before my time, so I don't know the
reasons behind it. Maybe it was seen as non-optimial or non-core and
so was kept out of the IndexReader.
I admit, it does feel like the level of abstraction that FieldCache
See
http://www.nabble.com/Accessing-%22term-frequency-information%22-for-documents-tf1964461.html#a5390696
- Doron
aslam bari [EMAIL PROTECTED] wrote on 17/08/2006 23:13:27:
Dear All,
I am new to Lucene. I am searching for a word circle in my
indexed document list. It gives me total
i have two applications on an windows machine. One is the searchengine
where
the index is can be searched.
The second application runs one time on a day which updates
(deletions/adding)
the index.
My question:
The index is already opened (Indexreader) by the frist application. Is
there
a
Jason Polites [EMAIL PROTECTED] wrote on 27/08/2006 09:36:07:
I would have thought that simultaneous cross-JVM access to an index was
outside of scope of the core Lucene API (although it would be great), but
maybe the file system basis allows for this (?).
Lucene does protect you from
[discussion moved here from dev-list]
Could it be an out-of-mem error?
Can you run it with a debugger, to see what really happens?
JVMs usually create a javacore file, and in case of an out-of-mem also a
heapdump file - these give more info on the problem. In case this file was
not created in
Hits is not really a simple container - it references a certain searcher -
that same searcher that was used to find these hits. When a request for a
result document is made, the Hits object delegates this request to the
searcher. So in order to page through the results using an existing Hits
I believe this should go to the solr-user@lucene.apache.org ?
Michael Imbeault [EMAIL PROTECTED] wrote on 05/09/2006
23:26:55:
Old issue (see
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html),
but I'm experiencing the same exact thing on windows xp, latest tomcat.
I
? (Don't think about state of users in webapp
for a while)
Best Regards.
jacky
- Original Message -
From: Doron Cohen [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, September 06, 2006 2:06 PM
Subject: Re: Keep hits in results
Hits is not really a simple
HODAC, Olivier [EMAIL PROTECTED] wrote on 06/09/2006 03:04:15:
hello,
I design an application which bottleneck concerns the indexing
process. Objects indexation blocks the user's action. Furthermore, I
need to index a large maount of documents (3 per day) and save
them on the file
WATHELET Thomas [EMAIL PROTECTED] wrote on 23/08/2006
00:49:25:
Is it possible to update fields in an existing index.
If yes how to proceed.
Unfortunately no.
To update a (document's) field that document must be removed and re-added.
It think option B cannot work because due to the MUST operator it requires
both databasemanagement and accountmanagement to be in the subtype
field.
Option A however should work, once the padding blank spaces are removed
from the field name - notice that while the standard analyzer would trim
I think it is not possible, by only modifying Similarity, to make the total
score only count for documents boosts (which is the original request in
this discussion).
This is because a higher level scorer always sums the scores of its
sub-scorers - is this right...? if so there are probably two
I found out how to determine the number of documents in which a term
appeared by looking at the Luke code, but how does one determine the
number of times it occurs in each document?
Use TermDocs -
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html
Something like -
I have been using the Lucene 2.0 distro Index to index my files,
currently
it indexes filepath and contents. I want to index, lastModified()
(Returns
the time that the file denoted by this abstract pathname was last
modified.), and file length, length().
Can someone please show me how to do
I have been using the Lucene 2.0 distro Index to index my
files, currently it indexes filepath and contents. I want
to index, lastModified() (Returns the time that the file
denoted by this abstract pathname was last
modified.), and file length, length().
Can someone please show me
Stelios Eliakis [EMAIL PROTECTED] wrote on 23/09/2006 02:39:27:
I want to extract the Best Fragment (passage) from a text file.
When I use the following code I take the first fragment that contains my
query. Nevertheless, the JavaDoc says that the function getBestFragment
returns the best
?
Thanks in advance
Stelios Eliakis
On 9/26/06, Doron Cohen [EMAIL PROTECTED] wrote:
Stelios Eliakis [EMAIL PROTECTED] wrote on 23/09/2006 02:39:27:
I want to extract the Best Fragment (passage) from a text file.
When I use the following code I take the first fragment
QueryParser can do that for you - something like:
QueryParser qp = new QueryParser( CONTENTS , new
StandardAnalyzer() );
qp.setDefaultOperator ( Operator.AND );
Query q = qp.parse ( TOOLS FOR TRAILER );
Result query should be:
+content:tools +content:trailer
Van Nguyen
SSN actually is a common situation.
Assume you have a (relational) database with a table of products with three
columns :
- SSN, which is also a primary key for that table,
- DESCRIPTION, which has free text (i.e. unformatted text) describing the
product.
- OTHER - additional info.
Also assume
You can store TermVectors with position info, but I don't think this would
be enough for what you are asking, because it is not meant for direct
access to a term by its position, and because TermVectors store tokens,
i.e. the indexed form of the word, which I am not sure is what you need.
It
The problem stems from using the query parser for searching a non tokenized
field (book).
You can either create a term query for searching in that field, like this:
new TermQuery(new Term(book,first title));
Or tokenize the field book and keep using QueryParser.
Decision is based on how you
If I understand the question, you do not want to boost in advance a certain
doc, but rather score higher those documents containing the search term
closer to the start of the document.
There is more to define here - for instance, if doc1 has 5 words but doc2
has 1,000,000 words, would you still
I was wondering if anyone knows of an open source
spam filter that I can add to my project to scan
the posts (which are just plain text) for spam?
I am not aware of any (which does not mean there is none), but just wanted
to draw your attention to a related discussion
: The query you want is
: name:[A TO C] name:[G TO K]
: (each clause being SHOULD, or put another way, an implicit OR in
between.
:
: The problem may be how you analyze the name field... is it tokenized at
all?
: If so, you might be matching on first, last, and middle names, and the
:
I am not sure I understand what you are asking.
I assume you are aware of Lucene Proximity Search - e.g. jakarta apache~4
- see http://lucene.apache.org/java/docs/queryparsersyntax.html
Are you asking if it is possible to search for docs in which the gap
between the two words is exactly N, e.g.
I sometimes find it helpful to think of the query parts as applying
'filtering' logic, helping to understand how query components play together
in determining the acceptable set of results (mostly ignoring scoring here,
which would usually sort the candidate results).
Consider a set of 10
Frode Bjerkholt [EMAIL PROTECTED] wrote on 05/10/2006 01:10:43:
My intention is to give different terms in a field different boost
values.
The queries from a use perspective, will be one fulltext input field.
The following code illustrates this:
Field f1 = new Field(name, John,
I would guess that one of your assumptions is wrong...
The assumptions to check are:
At indexing:
- lpf.getLuceneFieldName() == fav_stores
- pa.getPersonProfileChoice().getChoice() == Banana Republic
At search:
- the query is created like this:
new TermQuery(new Term(fav_stores,Banana
Erick Erickson [EMAIL PROTECTED] wrote on 09/10/2006 13:09:21:
... The kicker is that what we are indexing is
OCR data, some of which is pretty trashy. So you wind up with
interesting
words in your index, things like rtyHrS. So the whole question of
allowing
very specific queries on detailed
A bit of clarification:
Lucene index is made of multiple segments.
Compound format: stores each segment in a single file - less files
created/opened.
Not-compound format: stores each segment in multi-files - more files
created/opened.
Not-compound is likely to be faster for indexing.
Optimizing
Field.Text() was deprecated in Lucene 1.9 and then removed in 2.0.
The book examples were not updated for 2.0 yet.
You should now use Field(String, String, Field.Store, Field.Index).
To have the same behavior as old Field.Text use: Field(name, value,
Field.Store.YES, Field.Index.TOKENIZED).
For
I wonder if this should be in the FAQ entry How do i get code written for
Lucene 1.4.x to work with Lucene 2.x, Or perhaps just adding there a link
to your post here -
http://www.nabble.com/Lucene-in-Action-examples-complie-problem-tf2418478.html#a6743189
Erik Hatcher [EMAIL PROTECTED] wrote on
Nick, could you provide additional info:
(1) Env info - Lucene version, Java version, OS, JVM args (e.g. -XmNNN),
etc...
(2) is this reproducible? By the file sizes there seem to be ~182 indexed
docs when the problem occur, so, if this is reproducible it would hopefully
not take too long. If
I meant ~182K files ...
Nick, could you provide additional info:
(1) Env info - Lucene version, Java version, OS, JVM args (e.g. -XmNNN),
etc...
(2) is this reproducible? By the file sizes there seem to be ~182 indexed
docs when the problem occur, so, if this is reproducible it would
I believe this was fixed in http://issues.apache.org/jira/browse/LUCENE-593
- Doron
Björn Ekengren [EMAIL PROTECTED] wrote on 10/10/2006 02:12:23:
Hello, I have found that the spellchecker behaves a bit strange. My
spell indexer class below doesn't work if I use the spellfield
string set in
These times really are not reasonable. But 60K do not seem much for Lucene.
I once indexed ~1M docs of ~20K each, that's ~20GB input collection. The
result index size was ~2.5GB and the search times for a short query 2-3
words free text (or) query was ~300ms for a hot query and ~900ms for a
cold
Scott Smith [EMAIL PROTECTED] wrote on 12/10/2006 14:14:57:
Supposed I want to index 500,000 documents (average document size is
4kBs). Let's assume I create a single index and that the index is
static (I'm not going to add any new documents to it). I would guess
the index would be around
I am far from perfect in this pdf text extracting, however I noticed
something in your code that you may want to check to clear up the reason
for this failure, see below..
Shivani Sawhney [EMAIL PROTECTED] wrote on 12/10/2006
22:54:07:
Hi All,
I am facing a peculiar problem.
I am trying to
Terry Steichen [EMAIL PROTECTED] wrote on 13/10/2006 08:01:11:
You can just add a field to your indexed docs that always evaluates to a
fixed value. Then you can do queries like: +doc:1 -id:test
Alternatively you can use MatchAllDocsQuery, e.g.
BooleanQuery bq = new BooleanQuery();
The IndexReader is needed for finding all wildcard matches (by the index
lexicon). It seems you do not want to expand the wild card query by the
index lexicon, but rather with that of the highlighted text (which may not
be indexed at all). I think you have at least two ways to do that:
(1) create
John Gilbert [EMAIL PROTECTED] wrote on 14/10/2006 20:14:43:
I am trying to write an Ejb3Directory. It seems to work for index
writing but not for searching.
I get the EOF exception. I assume this means that either my
OutputStream or InputStream is doing
something wrong. It fails because the
now pk is primary key which i am storing but not indexing it..
doc.add(new Field(pk, message.getId().toString(),Field.Store.YES,
Field.Index.NO));
You would need to index it for this to work.
From javadocs for IndexReader.deleteDocuments(Term):
Deletes all documents
Hi Antony, you cannot instruct the query parser to do that. Note that an
application can add both tokenized and un_tokenized data under the same
field name. This is an application logic to know that a certain query is
not to be tokenized. In this case you could create your query with:
query =
Otis Gospodnetic [EMAIL PROTECTED] wrote on 16/10/2006 14:32:13:
Hi Ryan,
StandardAnalyzer should already be smart about keeping email
addresses as a single token:
// email addresses
| EMAIL: ALPHANUM ((.|-|_) ALPHANUM)* @ ALPHANUM
((.|-) ALPHANUM)+
(this is from StandardAnalyzer.jj)
hi Vasu, how about using ChainedFilter(yourPrefixFilters[],
ChainedFilter.AND)?
vasu shah [EMAIL PROTECTED] wrote on 16/10/2006 17:50:27:
Hi,
I have have multiple fields that I need to search on. All these
fields need to support wildcard search. I am ANDing these search
fields using
See also relevant FAQ entry Wiki page:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831
http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing
Steven Parkes [EMAIL PROTECTED] wrote on 17/10/2006 09:12:55:
Lucene takes your date
Not sure if this is the case, but you said searchers, so might be it -
you can (and should) reuse searchers for multiple/concurrent queries.
IndexSearcher is thread-safe, so no need to have a different searcher for
each query. Keep using this searcher until you decide to open a new
searcher -
thanks what i was looking for was the fact if i can donot need to boost
docs then what will be the difference a) in query results and b) time
for indexing and c) time to run query and collect result ?
There is also some precision loss with index time boosting. Also see the
Score Boosting
I'm indexing logs from a transaction-based application.
...
millions documents per month, the size of the indices is ~35 gigs per
month
(that's the lower bound). I have no choice but to 'store' each field
values
(as well as indexing/tokenizing them) because I'll need to retrieve them
in
4) Roughly how large is the index file in comparison to the size of the
input files?
It depends on whether you store fields or just index them, plus
there is also a compression (gzip -9 equivalent) option.
As an example - index size numbers I saw: when indexing 1M docs of ~20KB of
very
Perhaps another comment on the same line - I think you would be able to get
more from your system by bounding the number of open searchers to 2:
- old, serving 'old' queries, would be soon closed;
- new, being opened and warmed up, and then serving 'new' queries;
Because... - if I understood
I don't know why the termDocs option did not work for you. Perhaps you did
not (re)open the searcher after the index was populated? Anyhow, here is a
small code snippet that does just this, see if it works for you, then you
can compare it to your code...
void numberOfTermOcc() throws Exception
Hi Eugene,
If the query parser (from some reason) throws a ParseException, and the RMI
layer attempts to marshal/serialize that exception, there would probably be
an issue because although ParseException is serializable (as all
throwables) it has a Token data member, which is not serializable.
Erick Erickson [EMAIL PROTECTED] wrote on 31/10/2006 05:03:18:
I don't remember who wrote this, Chris or Yonik or Otis, but here's the
word
from somebody who actually knows...
index time field boosts are away to express things like this document
title
is worth twice as much as the title of
Might be related to an already resolved issue - see related discussion:
http://www.nabble.com/lucene-web-demo-is-not-working--tf1736444.html#a4718639
Miren [EMAIL PROTECTED] wrote on 31/10/2006 03:57:50:
parse(java.lang.String) in org.apache.lucene.queryParser.QueryParser
cannot be applied to
[EMAIL PROTECTED] wrote on 02/11/2006 06:36:48:
.. the following operation:
given a Query and a Document, return the score
.. I would like a method which returns the score directly.
.. Btw, I do not have an index, I have 1 Document, and 1 Query.
Lucene scoring -
michele.amoretti wrote:
Ok I am trying the MemoryIndex, but when compiling I have the
following erro message:
package org.apache.lucene.index.memory does not exist
Is it not included in the lucene .jar?
I currently have the latest lucene binaries.
Yes this is not part of core Lucene but
This code adds the same query twice to a boolean query:
Query query =
parser.parse(searchString);
bq1.add(query,
BooleanClause.Occur.MUST);
bq1.add(new BooleanClause(query,
spinergywmy [EMAIL PROTECTED] wrote on 03/11/2006 00:40:42:
I have another problem is I do not perform the real search within
search
feature which according to the way that I have coded, because for the
second
time searching, I actually go back to the index directory to search the
entire
spinergywmy [EMAIL PROTECTED] wrote on 08/11/2006 01:56:00:
within my first search result, there is only one record that
contains
Java and Tomcat words, therefore, there should be only one record
return
for 2nd search. And the highlight is now move from Java to Tomcat.
To my
You did not specify what's wrong - in what way is the code below not
working as you expect?
Two things to check:
(1) search() and refindSearchResult() process the text of the first query
differently. In search() the text is added to multiple fields
(metaField). The way it is done btw would not
Andreas, I could generate the error as you describe.
You can report this bug in http://issues.apache.org/jira/browse/LUCENE
There seem to be a few updates in http://snowball.tartarus.org not
reflected currently in Lucene -
- SnowballProgram.java has this bug fix as you describe
The
I do want to use document boosting... Is that independent from field
boosting? The length normalization on the other hand may not be
necessary.
They go together - see Score Boosting in
http://lucene.apache.org/java/docs/scoring.html
Well it doesn't since there is not justification of why it is the
way it is. Its like saying, here is that car with 5 weels... enjoy
driving.
- I think the explanations there would also answer at least some of
your
questions.
I hoped it would answer *some* of the questions... (not all)
spinergywmy [EMAIL PROTECTED] wrote:
Hi,
I have ask this question before but may be the question wasn't clear.
How can I delete particular index that I want to and keep the rest?
For
instance, I have been indexed document Id, date, user Id and contents, my
question is does that
Karl Koch [EMAIL PROTECTED] wrote:
For the documents Lucene employs
its norm_d_t which is explained as:
norm_d_t : square root of number of tokens in d in the same field as t
Actually (by default) it is:
1 / sqrt(#tokens in d with same field as t)
basically just the square root of the
Lucene RangeQuery would do for the time and numeric reqs.
Mark Mei [EMAIL PROTECTED] wrote:
At the bottom of this email is the sample xml file that we are using
today.
We have about 10 million of these.
We need to know whether Lucene can support the following functionalities.
(1) Each field
Two things I would check:
1) converting pubDate to String during indexing for later
date-range-filtering search results might not work well, because, e.g.,
string wise, 9 100. You could use Lucene's DateTools - there's an
example in TestDateFilter -
There is an example in TestDateFilter
http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/TestDateFilter.java?view=log
Cam Bazz [EMAIL PROTECTED] wrote:
Hello,
how can I make a query to bring documents between timestamp begin and
timestamp end, given that I have
Mark Miller [EMAIL PROTECTED] wrote on 19/12/2006 09:21:00:
LIA mentioned something about needing to rebuild the
index if you change Similarity's. That does not make
sense to me yet. It would seem you could alternate them.
What does scoring have to do with indexing?
For this part of your
Using term vectors means passing on the terms too many times - i.e
- loop on terms
- - loop on docs of a term
- - - loop on terms of a doc
Would something like this be better:
do {
System.out.println(tenum.term()+ appears in +tenum.docFreq()+
docs!);
TermDocs td =
Something like dd if=/path/to/index/foo.cfs of=/dev/null
Be careful not to mistaken with the 'of' argument of 'dd' - see
http://en.wikipedia.org/wiki/Dd_(Unix)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
1 - 100 of 311 matches
Mail list logo