Hi Jason ,
yes , the doc'n does mention escaping . but thats only for special
characters used in queries , right ?
but i've tried 'escaping' too.
to answer ur question , am sure it is not HTTP request which is eating it up.
Query query = MultiFieldQueryParser.parse(test/s,
Without looking at the source, my guess is that StandardAnalyzer (and
StandardTokenizer) is the culprit. The StandardAnalyzer grammar (in
StandardTokenizer.jj) is probably defined so x/y parses into two
tokens, x and y. s is a default stopword (see
StopAnalyzer.ENGLISH_STOP_WORDS), so it gets
Hello,
When I used the code with CJKAnalyzer and search English Text
(Because the text is mixed with Korean and English )
sometimes the return Stirng is none.
Others works well.
Is the code analyzer dependancy ?
Thanks.
Youngho
--- Test Code ( Just copy of the Book code ) -
More test result
if the text contains ... Family ...
Than
family query string woks OK.
But if the query stirng is Family than the highlighter return none.
Thanks.
Youngho
- Original Message -
From: Youngho Cho [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
sometimes the return Stirng is none.
Is the code analyzer dependancy ?
When the highlighter.getBestFragments returns nothing
this is because there was no match found for query
terms in the TokenStream supplied.
This is nearly always because of Analyzer issues.
Check the post-analysis tokens
https://lucenerar.dev.java.net
LuceneRAR is now working on two containers, verified: The J2EE 1.4 RI and
Orion. Websphere testing is underway, with JBoss to follow.
LuceneRAR is a resource adapter for Lucene, allowing J2EE components to
look up an entry in a JNDI tree, using that reference to
Hello all,
perhaps not such a sophisticated question:
I would like to have a very diverse set of documents in one index. Depending
on the inside of text documents, I would like to put part of the text in
different fields. This means in the searches, when searching a particular
field, some of
Karl,
This is completely fine. You can have documents with different fields
in the same index.
Otis
--- Karl Koch [EMAIL PROTECTED] wrote:
Hello all,
perhaps not such a sophisticated question:
I would like to have a very diverse set of documents in one index.
Depending
on the inside
Nope,
it is very possible. We have an index that holds the search info for
documents, messages in discussion threads, filled in forms etc. etc.
each having their own structure.
cheers,
Aad
Karl Koch wrote:
Hello all,
perhaps not such a sophisticated question:
I would like to have a very
I am in the process of indexing about 1.5 million documents, and have
started down the path of indexing these by month. Each month has between
100,000 and 200,000 documents. From a performance standpoint, is this the
right approach? This allows me to use MultiSearcher (or
I have an index that is frequently updated. When
indexing is completed, an event triggers a new
Searcher to be opened. When the new Searcher is
opened, incoming searches are redirected to the new
Searcher, the old Searcher is closed and nulled, but I
still see about twice the amount of memory in
Jerry Jalenak [EMAIL PROTECTED] writes:
I am in the process of indexing about 1.5 million documents, and have
started down the path of indexing these by month. Each month has between
100,000 and 200,000 documents. From a performance standpoint, is this the
right approach? This allows me to
Make sure that the older searcher is not referenced elsewhere otherwise the
garbage collector should
delete it.
Just remember that the Garbage collector runs when memory is needed but not
immediatly after changing a reference to null.
-Message d'origine-
De : Greg Gershman
Hi All;
I just want to make sure I have the right idea about boosting.
So if I boost a document (Document A) after I index it (lets say a score of
2.0) Lucene will now consider this document relativly more important than
other documents in the index with a boost factor less than 2.0. This boost
Luke,
Boosting is only one of the factors involved in Document/Query scoring.
Assuming that by applying your boosts to Document A or a single field
of Document A increases the total score enough, yes, that Document A
may have the highest score. But just because you boost a single
Document and
Thanks Otis.
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Thursday, January 27, 2005 12:11 PM
Subject: Re: Boosting Questions
Luke,
Boosting is only one of the factors involved in Document/Query scoring.
Hi,
I want to use kXML with Lucene to index XML files. I think it is possible to
dynamically assign Node names as Document fields and Node texts as Text
(after using an Analyser).
I have seen some XML indexing in the Sandbox. Is anybody here which has done
something with a thin pull parser
That's good to know.
I'm indexing on 11 fields (9 keyword, 2 text). The documents themselves are
between 1K to 2K in size.
Is there a point at which IndexSearcher performance begins to fall off? (in
term of # of index records?)
Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne,
Hello Karl,
Grab the source code for Lucene in Action, it's got code that parses
and indexes XML with DOM and SAX. You can see the coverage of that
stuff here:
http://lucenebook.com/search?query=indexing+XML+section%3A7*
I haven't used kXML, but I imagine the LIA code should get you going
Kevin A. Burton wrote:
Is there any way to reduce this footprint? The index is fully
optimized... I'm willing to take a performance hit if necessary. Is
this documented anywhere?
You can increase TermInfosWriter.indexInterval. You'll need to re-write
the .tii file for this to take effect.
Peter Hollas wrote:
Currently we can issue a simple search query and expect a response back
in about 0.2 seconds (~3,000 results) with the Lucene index that we have
built. Lucene gives a much more predictable and faster average query
time than using standard fulltext indexing with mySQL. This
What do I call to get the term frequencies for terms in the Query? I
can't seem to find it in the Javadoc...
Thanks.
Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query? I
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
Just a quick question: after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?
In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down
Hi,
I am trying to delete a document from Lucene index using:
Term aTerm = new Term( uid, path );
aReader.delete( aTerm );
aReader.close();
If the variable path=xxx/foo.txt then I am able to delete the document.
However, if path variable has - in the string, the delete method
Hi,
I was searching using google and just found that there was a new
feature called google mini. Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a
Hello,
Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.
see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
You
Hi,
I agree that Google mini is quite expensive. It might be similar to the
desktop version in quality. Anyone knows google's ratio of index to text? Is
it true that Lucene's index is about 500 times the original text size (not
including image size)? I don't have one installed, so I
Our copy of LIA is in the mail ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).
--Leto
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Hello,
Yes, that is how optimize works - copies all existing index
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to
I think Google mini also includes crawling and a server wrapper. So it
is not entirely an 1-to-1 comparison.
Of couse extending lucene to have those features are not at all
difficult anyway.
-John
On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon)
[EMAIL PROTECTED] wrote:
Hi,
How did you index the uid field? Field.Keyword? If not, that may be
the problem in that the field was analyzed. For a key field like this,
it needs to be unanalyzed/untokenized.
Erik
On Jan 27, 2005, at 6:21 PM, [EMAIL PROTECTED] wrote:
Hi,
I am trying to delete a document from
Thanks for your reply.
I use QueryParser instead of TermQuery.
And all works good !.
Thanks.
Youngho
- Original Message -
From: mark harwood [EMAIL PROTECTED]
To: lucene-user@jakarta.apache.org
Sent: Thursday, January 27, 2005 7:05 PM
Subject: Re: text highlighting
sometimes the
I've often said that there is a business to be had in packaging up
Lucene (and now Nutch) into a cute little box with user friendly
management software to search your intranet. SearchBlox is already
there (except they don't include the box).
I really hope that an application like
I just ran into a similar issue. When you close an IndexSearcher, it
doesn't necessarily close the underlying IndexReader. It depends
which constructor you used to create the IndexSearcher. See the
constructors javadocs or source for the details.
In my case, we were updating and optimizing the
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik. I think this is a business opportunity.
How many people are hating me now and going shh? Raise your
hands!
Otis
--- David Spencer [EMAIL PROTECTED] wrote:
This reminds me, has anyone every discussed
Have you tried using the multifile index format? Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...
Otis
--- Kauler, Leto S [EMAIL PROTECTED] wrote:
Our copy of LIA is in the mail ;)
Yes the final three
500 times the original data? Not true! :)
Otis
--- Xiaohong Yang (Sharon) [EMAIL PROTECTED] wrote:
Hi,
I agree that Google mini is quite expensive. It might be similar to
the desktop version in quality. Anyone knows google's ratio of index
to text? Is it true that Lucene's index is
: processes ended. If you're under linux, try running the 'lsof'
: command to see if there are any handles to files marked (deleted).
: Searcher, the old Searcher is closed and nulled, but I
: still see about twice the amount of memory in use well
: after the original searcher has been
As they say, nothing lasts forever ;)
I like the idea. If a project like this gets going, I think I'd be
interested in helping.
The Google mini looks very well done (they have two demos on the web
page). For $5000, it's probably a very good solution for many
businesses. If the demos are
I think everyone agrees that this would be a very neat application of
opensource technology like Lucene... however (opens drawer, pulls out
devil's advocate hat, places on head)... there are several complexities here
not addressed by Lucene (et. al). Not because Lucene isn't damn fantastic,
Erik,
I am using the keyword field
doc.add(Field.Keyword(uid, pathRelToArea));
anything else I can check on ?
thanks
atul
PS we worked together for Darden project
From: Erik Hatcher [EMAIL PROTECTED]
Date: 2005/01/27 Thu PM 07:46:40 EST
To: Lucene Users List
Could you work up a self-contained RAMDirectory-using example that
demonstrates this issue?
Erik
On Jan 27, 2005, at 9:10 PM, [EMAIL PROTECTED] wrote:
Erik,
I am using the keyword field
doc.add(Field.Keyword(uid, pathRelToArea));
anything else I can check on ?
thanks
atul
PS we
Overall, even if google mini gives a lot of cool features compared to
a bare-born lucene project, what is good with the 50,000 documents
limit. It is useless with that limit. That is just their way of trying
to turn it into another cash cow.
Jian
On Thu, 27 Jan 2005 17:45:03 -0800 (PST), Otis
Jason Polites wrote:
I think everyone agrees that this would be a very neat application of
opensource technology like Lucene... however (opens drawer, pulls out
devil's advocate hat, places on head)... there are several complexities
here not addressed by Lucene (et. al). Not because Lucene
Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail:
Xiaohong Yang (Sharon) wrote:
Hi,
I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't
I think they do a proximity result based on keyword matches. So... If you
search for lucene and the document returned has this word at the very
start and the very end of the document, then you will see the two sentences
(sequences of words) surrounding the two keyword matches, one from the
48 matches
Mail list logo