Morus Walter wrote:
Owen Densmore writes:
1 - I'm a bit concerned that reasonable stemming (Porter/Snowball)
apparently produces non-word stems .. i.e. not really human readable.
(Example: generate, generates, generated, generating - generat)
Although in typical queries this is not important
How to create index with chinese (in utf-8 encoding ) HTML and search
with Lucene ?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Jan 21, 2005, at 4:49 AM, Eric Chow wrote:
How to create index with chinese (in utf-8 encoding ) HTML and search
with Lucene ?
Indexing and searching Chinese basically is no different than using
English with Lucene. We covered a bit about it in Lucene in Action:
On Fri, 2005-01-21 at 10:58 +0100, Bertrand VENZAL wrote:
I wondered how lucene implement the * character, I know that is working
but when I look at the Query Object, it doesn t seem to appear somewhere,
does someone know how is it implemented ?
Take a look at the PrefixQuery and
Search not really correct with UTF-8 !!!
The following is the search result that I used the SearchFiles in the
lucene demo.
d:\Downloads\Softwares\Apache\Lucene\lucene-1.4.3\srcjava
org.apache.lucene.demo.SearchFiles c:\temp\myindex
Usage: java SearchFiles idnex
Query:
Searching for: g
1 - I'm a bit concerned that reasonable stemming
(Porter/Snowball)
apparently produces non-word stems .. i.e. not
really human readable.
It is possible to derive the human-readable form of a
stemmed term using either re-analysis of indexed
content or TermPositionVector. Either of these
On Jan 21, 2005, at 11:42, Eric Chow wrote:
Search not really correct with UTF-8 !!!
Lucene works just fine with any flavor of Unicode as long as _your_
application knows how to consistently deal with Unicode as well.
Remember: the world is not just one Big5 pile.
As far as Analyzer goes, you
OK. But isn't there a limit on the number of BooleanQueries that can be
combined with AND / OR / etc?
Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS 66219
(913) 577-1496
[EMAIL PROTECTED]
-Original Message-
From: Erik Hatcher
I want to understand how Lucene uses stemming but can't find any
documentation on the Lucene site. I'll continue to google but hope that
this list can help narrow my search. I have several questions on the
subject currently but hesitate to list them here since finding a good
document on the
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:
./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java
You can find more
This:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.TooManyClauses.html
?
You can control that limit via
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.html#maxClauseCount
Otis
--- Jerry Jalenak [EMAIL PROTECTED] wrote:
OK.
Hi Ranjan,
It sounds like you are should look at and use Nutch:
http://www.nutch.org
Otis
--- Ranjan K. Baisak [EMAIL PROTECTED] wrote:
I am planning to move to Lucene but not have much
knowledge on the same. The search engine which I had
developed is searching some extranet URLs e.g.
Hello all. I'm new to lucene and think about using it in my project.
I have prices with dynamic structure, containing wares there, about 10K prices
with total 500K wares. Each price has about 5 text fields.
I'll do searches on wares. The difficult part is that I'll do searches for all
wares,
Otis,
Thanks for your help. Is nutch a freeware tool?
regards,
Ranjan
--- Otis Gospodnetic [EMAIL PROTECTED]
wrote:
Hi Ranjan,
It sounds like you are should look at and use Nutch:
http://www.nutch.org
Otis
--- Ranjan K. Baisak [EMAIL PROTECTED]
wrote:
I am planning to move to
OK, OK ... I'll buy the book. I guess its about time since I am deeply
and forever in love with Lucene. Might as well take the final plunge.
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, January 21, 2005 9:12 AM
To: Lucene Users List
Subject: Re:
I am a little fuzzy on the thread-safeness of Lucene, or maybe just java.
From what I understand, and correct me if I'm wrong, Lucene takes care of
concurrency issues and it is ok to run a query while writing to an index.
My question is, does this still hold true if the reader and writer are
--- Otis Gospodnetic [EMAIL PROTECTED] wrote:
No, you can't add documents to an index once you close the IndexWriter.
You can re-open the IndexWriter and add more documents, of course.
Otis
That's what I expected at first, but:
1- It's a disappointment, because such a 'feature' would have
Hello Ashley,
You can read/search while modifying the index, but you have to ensure
only one thread or only one process is modifying an index at any given
time. Both IndexReader and IndexWriter can be used to modify an index.
The former to delete Documents and the latter to add them. You have
I've written a Chinese Analyzer for Lucene that uses a segmenter written by
Erik Peterson. However, as the author of the segmenter does not want his code
released under apache open source license (although his code _is_
opensource), I cannot place my work in the Lucene Sandbox. This is
If you are hosting the code somewhere (e.g. your site, SF, java.net,
etc.), we should link to them from one of the Lucene pages where we
link to related external tools, apps, and such.
Otis
--- Safarnejad, Ali (AFIS) [EMAIL PROTECTED] wrote:
I've written a Chinese Analyzer for Lucene that
I would love to give it a try. Please email me at aurora00 at gmail.com.
Thanks!
Also what is the opinion on the CJKAnalyzer and ChineseAnalyzer? Some
people actually said the StandardAnalyzer works better. I wonder what's
the pros and cons.
I've written a Chinese Analyzer for Lucene that
Are you indexing the FOP PDF's differently than other PDF documents?
Can I assume that you are using PDFBox's LucenePDFDocument.getDocument()
method?
Ben
On Fri, 21 Jan 2005, Luke Shannon wrote:
Hello;
Our CMS now allows users to create PDF documents (uses FOP) and than search
them.
I
Also if you can't wait, see page 2 of
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html
or the LIA e-book ;)
On Fri, 21 Jan 2005 09:27:42 -0500, Kevin L. Cobb
[EMAIL PROTECTED] wrote:
OK, OK ... I'll buy the book. I guess its about time since I am deeply
and forever in love with
We have one large index right now... its about 60G ... When I open it
the Java VM used 940M of memory. The VM does nothing else besides open
this index.
Here's the code:
System.out.println( opening... );
long before = System.currentTimeMillis();
Directory dir =
Kevin A. Burton wrote:
We have one large index right now... its about 60G ... When I open it
the Java VM used 940M of memory. The VM does nothing else besides
open this index.
After thinking about it I guess 1.5% of memory per index really isn't
THAT bad. What would be nice if there was a way
: We have one large index right now... its about 60G ... When I open it
: the Java VM used 940M of memory. The VM does nothing else besides open
Just out of curiosity, have you tried turning on the verbose gc log, and
putting in some thread sleeps after you open the reader, to see if the
memory
As a log4j developer, I've been toying with the idea of what Lucene
could do for me, maybe as an excuse to play around with Lucene.
I've started creating a LoggingEvent-Document converter, and thinking
through how I'd like this utility to work when I came across a question
I wasn't sure about.
I want that Chinese Anayzer !!
On Fri, 21 Jan 2005 17:36:17 +0100, Safarnejad, Ali (AFIS)
[EMAIL PROTECTED] wrote:
I've written a Chinese Analyzer for Lucene that uses a segmenter written by
Erik Peterson. However, as the author of the segmenter does not want his code
released under apache
28 matches
Mail list logo