I understand that unlike relational database, Lucene is flexible in having
documents with different set of fields. My index has documents with a date
and content field. There are also a few book keeping documents that does
not have the date field. Things work well except in one case:
Sort
So you got a utf8 encoded text file. But how do you read the file into
Java? The default encoding of Java is likely to be something other than
utf8. Make sure you specify the encoding like:
InputStreamReader( new FileInputStream(filename), UTF-8);
On Wed, 9 Feb 2005 22:32:38 -0700, Owen
For all parser suggestion I think there is one important attribute. Some
parsers returns data provide that the input HTML is sensible. Some parsers
is designed to be most flexible as tolerant as it can be. If the input is
clean and controlled the former class is sufficient. Even some regular
I am trying to do some filtering and rearrangement of search result. Two
possiblity come into mind are iterating though the Hits or making custom
HitCollector.
All documentation invaribly warn about the performance impact of using
HitCollector with large result set. The scenario that google
Subversion rocks!
I have just setup the Windows svn client TortoiseSVN with my favourite
file manager Total Commander 6.5. The svn status and commands are readily
integrated with the file manager. Offline diff and revert are two things I
really like from svn.
The conversion to Subversion
I am pleased to announce that MindRetrieve 0.4.0 has been released.
MindRetrieve is a desktop search tool to help users to search and organize
the web they have seen. Download it from http://mindretrieve.berlios.de/.
Everyday we read a large amount of information from the world wide web.
The
On Wed, 26 Jan 2005 11:42:52 +, John Haxby [EMAIL PROTECTED] wrote:
My copy of Lucene in Action has finally hit my desk in the UK.
Hopefully the dispatch time quoted by amazon.co.uk will now start to
drop to something more sensible.
It's been interesting watching the price changes. When
What is the best way to give recent documents a boost? Not sorting them by
strict date order but to give them some preference. If document 1 filed
last week has a score of 0.5 and document 2 filed last month has a score
of 0.55, then list document 1 first. But if document 1 has a score of
I would love to give it a try. Please email me at aurora00 at gmail.com.
Thanks!
Also what is the opinion on the CJKAnalyzer and ChineseAnalyzer? Some
people actually said the StandardAnalyzer works better. I wonder what's
the pros and cons.
I've written a Chinese Analyzer for Lucene that
I'm trying to build some web search tool that could work for multiple
languages. I understand that Lucene is shipped with StandardAnalyzer plus
a German and Russian analyzers and some more in the sandbox. And that
indexing and searching should use the same analyzer.
Now let's said I have an
Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)? If no, then you don't even need to
optimize until those issues become... issues.
OK I have changed the process to not doing optimize() at all. So far so
good. The number of files hover
care about indexing
speed.
Otis
--- Paul Elschot [EMAIL PROTECTED] wrote:
On Tuesday 21 December 2004 05:49, aurora wrote:
I'm testing the rebuilding of the index. I add several hundred
documents,
optimize and add another few hundred and so on. Right now I have
around
7000 files. I observed after
Right now I am incrementally adding about 100 documents to the index a day
and then optimize after that. I find that optimize essentially rebuilding
the entire index into a single file. So the size of disk write is
proportion to the total index size, not to the size of documents
I'm testing the rebuilding of the index. I add several hundred documents,
optimize and add another few hundred and so on. Right now I have around
7000 files. I observed after the index gets to certain size. Everytime
after optimize, the are two files roughly the same size like below:
Is there a way to auto-generate uid in Lucene? Even it is just a way to
query the highest uid and let the application add one to it will do.
Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
, 2004, at 1:50 PM, aurora wrote:
Is there a way to auto-generate uid in Lucene? Even it is just a way to
query the highest uid and let the application add one to it will do.
Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED
Besides full text indexing, I need a database that represent a large
dictionary like:
(key1, key2) - docid
I am considering between building a home grown solution and using
Berkeley DB. Then I think I was using Lucene anyway, wouldn't it make
sense use it as my database too? Just make key1 and
17 matches
Mail list logo