maybe you should encode the html code ...
Patrick Burleson wrote:
Why oh why did you send this to the tomcat lists?
Don't cross post! Especially when the question doesn't even apply to
one of the lists.
Patrick
On Tue, 7 Sep 2004 16:35:35 -0400, hui liu [EMAIL PROTECTED] wrote:
Hi,
I have such
which analyzer you are using to index chinese pdf documents ?
I think you should use cjkanalyzer
- Original Message -
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 11:27 AM
Subject: pdf in Chinese
Hi all,
i use pdfbox to parse
Hi,
Can you pls,advice me any solution for hebrew analyzer
-Original Message-
From: Chandan Tamrakar [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 11:15 AM
To: Lucene Users List
Subject: Re: pdf in Chinese
which analyzer you are using to index chinese pdf documents ?
I
Hi Bill,
-
But even if it didn't, the second
problem is that the query formed would be
+(title:cutting title:lucene) +(author:cutting author:lucene)
That is, if the word Lucene was in both the author field and the
title field, the match would fit. This clearly isn't what the
searcher
Could you create a simple piece of code (using a RAMDirectory) that
demonstrates this issue?
Erik
On Sep 8, 2004, at 12:35 AM, Minh Kama Yie wrote:
Hi all,
Sorry I should clarify my last point.
The search() would return no hits, but the explain() using the
apparently invalid docId
it is not about analyzer ,i need to read text from pdf file first.
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 4:15 PM
Subject: Re: pdf in Chinese
which analyzer you are using to index
Hi all,
I want to discuss a little problem, lucene doesn't support *Term like
queries.
I know that this can bring a lot of results in the memory and therefore
it is restricted.
I think that allowing this kind of search and limiting the amount of
returned results would be
a more usefull
On Sep 8, 2004, at 6:26 AM, sergiu gordea wrote:
I want to discuss a little problem, lucene doesn't support *Term like
queries.
First of all, this is untrue. WildcardQuery itself most definitely
supports wildcards at the beginning.
I would like to use *schreiben.
The dilemma you've encountered
sergiu gordea writes:
Hi all,
I want to discuss a little problem, lucene doesn't support *Term like
queries.
I know that this can bring a lot of results in the memory and therefore
it is restricted.
That's not the reason for the restriction. That's possible with a* also.
The
.. and here is the way to do it:
(See attached file: SUPPOR~1.RAR)
Erik Hatcher
This appears to be more of a PDFBox issue than a lucene issue, please post
an issue to the PDFBox site.
Also note, that because of certain encodings that a PDF writer can use, it
is impossible to extract text from all PDF documents.
Ben
On Wed, 8 Sep 2004, [EMAIL PROTECTED] wrote:
it is not
Bill,
I don't receive any .java. Could you send it again?
Thanks.
-Mensaje original-
De: Bill Janssen [mailto:[EMAIL PROTECTED]
Enviado el: Martes, 07 de Septiembre de 2004 10:06 p.m.
Para: Lucene Users List
CC: Ali Rouhi
Asunto: MultiFieldQueryParser seems broken... Fix
Hey Ben,
We've been using a distributed environment with three servers and three
separate indecies for the past 2 years since the first stable Lucene
release and it has been great, recently and for the past two months I've
been working on a redesign for our Lucene App and I've shared my
The class is at the end of the message.
But it hink that a better solution is that one suggested by Rene:
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1798116
Wermus Fernando wrote:
Bill,
I don't receive any .java. Could you send it again?
Thanks.
-Mensaje original-
I'm not aware of any Java library that can reliably extract Chinese
text from PDF documents. We're planning on supporting Chinese,
Japanese, and Korean in version 2 of PDFTextStream, but there's no
doubt that it's a huge challenge.
Chas Emerick | [EMAIL PROTECTED]
PDFTextStream: fast PDF
On Wed, 8 Sep 2004, Chas Emerick wrote:
PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/
For those that have not seen, snowtide.com has done a performance
comparison against several Java PDF-Text libraries, including Snowtide's
Would it be cheeky to ask you to post the docs to the group? It would be interesting
to read how you've tackled this.
-Original Message-
From: Nader Henein [mailto:[EMAIL PROTECTED]
Sent: 08 September 2004 13:57
To: Lucene Users List
Subject: Re: Moving from a single server to a cluster
Ben,
Wow, thanks for the plug! :-)
Truthfully, I was worried that our open-source brethren might feel
slighted by the comparison -- that's partially why we wanted to make
sure it was as thorough and transparent as possible so that anyone
could review the results for themselves. I'm glad that
be a pleasure, just didn't want to mislead someone down the wrong way.
Give me a few days and I'll have the new version up.
Nader
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
We went thru the same scenario as yours. We recently made our application
clsuterable and I wrote our own version of jdbc directory (similar to the
SQLDirectory posted by someone) with our own caching. It was great for
searching for indexing had become a real bottleneck. So we have decided to
move
I think you might be refering to the xml files you keep in C:\Program
Files\Apache\Tomcat\conf\Catalina\localhost
I have a file with the contents (myapp.xml):
?xml version='1.0' encoding='utf-8'?
Context docBase=C:/work/aggregation/myapp/web path=/myapp reloadable=true
/Context
-Original
I have to look better, but why the SnowBallAnalizer isn't in
org.apache.lucene.analysis.snowball.SnowballAnalyzer package?
I have lucene 1.4.
I'm doing my own spanish stemmer.
Is in snowball-1.0.jar
I sent you it in private email.
Bye
Ernesto.
- Original Message -
From: Wermus Fernando [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 1:12 PM
Subject: where is the SnowBallAnalyzer?
I have to look better, but why the
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSearcher.html#close()
What is the intent of IndexSearcher.close()?
I want to know how, in a web app, one can stop a search that's in
progress - use case is a user is limited to one search at at time, and
when one (expensive)
Hi, I am assistanting a professor for a IR course.
We need to provide the student with a full-fuctioned
search engine package, and the professor prefers it
being powered by lucene. Since I am new to lucene,
can anyone provide me some information that where
can I get the package? We also want the
Is it safe to change the compound file format option at any time during the life of an
index?
Can I build an index with it off, then turn it on, and call optimize, and have a
compound file formatted index?
And then later, turn it on, call optimize again, and go back the other way?
The
Thanks, David. But it seems that this is downloadable.
Could you please provide me the link for download?
Thank you very much!
Ya
- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 2:43 PM
Subject: Re:
Armbrust, Daniel C. wrote:
Is it safe to change the compound file format option at any time during the life of an
index?
Can I build an index with it off, then turn it on, and call optimize, and have a
compound file formatted index?
And then later, turn it on, call optimize again, and go back
Anne Y. Zhang wrote:
Thanks, David. But it seems that this is downloadable.
Could you please provide me the link for download?
Thank you very much!
http://www.nutch.org/release/
Ya
- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent:
Hmm, I tried that in Luke - but it doesn't seem to take. When I uncheck the use
compound file check box, and then select optimize, it doesn't change anything.
I guess I should just write some code already :)
Dan
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Thanks a lot!
Ya
- Original Message -
From: Bernhard Messer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 3:38 PM
Subject: Re: Full web search engine package using Lucene
Anne Y. Zhang wrote:
Thanks, David. But it seems that this is
Ahh - two new discoveries:
You have to add a document, remove a document, and then call optimize. Then
everything works (nearly as expected)
The version of Lucene that ships with Luke still has the broken optimize code in it
that didn't clean up after itself - so you need to just download
Dave,
I haven't tried this, but I think this would be messy. Lucene needs to
keep index files open, so that when you pull a Document from Hits, it
can read this stuff from those files. If you close index files, you
are likely to get some NPEs or some such.
I don't think you'll find a ready to
René,
Thanks for your note.
I'd think that if a user specified a query cutting lucene, with an
implicit AND and the default fields title and author, they'd
expect to see a match in which both cutting and lucene appears. That is,
(title:cutting OR author:cutting) AND (title:lucene OR
Niraj Alok wrote:
Hi PA,
Thanks for the detail ! Since we are using lucene to store the data also, I
guess I would not be able to use it.
By the way, I could be wrong, but I think the 35% figure you referenced
in the your first e-mail actually does not include any stored fields.
The deal with
I know the index size is very dependent on the content being index...
but running on a unix based machine w/o a filesize limit, best case
scenario... what is the largest number of documents that can be
indexed.
I've seen throughout the list mentions of millions of documents.. 8
million, 20
Given adequate hardware, it can. Take a look at nutch.org. Nutch uses
Lucene at its core.
Otis
--- Chris Fraschetti [EMAIL PROTECTED] wrote:
I know the index size is very dependent on the content being index...
but running on a unix based machine w/o a filesize limit, best case
Chris Fraschetti wrote:
I've seen throughout the list mentions of millions of documents.. 8
million, 20 million, etc etc.. but can lucene potentially handle
billions of documents and still efficiently search through them?
Lucene can currently handle up to 2^31 documents in a single index. To
a
38 matches
Mail list logo