Hello,
I am interested to hear how people handle locked indexes, for example
when catching an IOException like below.
java.io.IOException: Lock obtain timed out:
Lock@/tmp/lucene-0b978f2c0aa12e8dcdbd5b0df491bfc4-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)
at
Hi, pls,
Tell me what is wrong with query:
author:( +name AND full name~) AND book:( +university)
Alex Kiselevsky
Speech Technology Tel:972-9-776-43-46
RD, Amdocs - IsraelMobile: 972-53-63 50 38
mailto:[EMAIL PROTECTED]
The information contained in this message
You'll have to give us more information than that...
What is the problem you are seeing? I'll assume that you get no results.
Tell us of the structure of your documents and how you index every field.
Concerning your syntax, if you are using the distributed query parser, you
don't need the +
I use QueryParser
And I got an exception :
org.apache.lucene.queryParser.ParseException: Encountered ~ at line 1,
column 44.
Was expecting one of:
AND ...
OR ...
NOT ...
+ ...
- ...
( ...
) ...
^ ...
QUOTED ...
TERM ...
SLOP ...
PREFIXTERM ...
From: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
Fuzzy Searches
Lucene supports fuzzy searches based on the Levenshtein Distance, or
Edit Distance algorithm. To do a fuzzy search use the tilde, ~, symbol
at the end of a Single word Term.
I haven't used fuzzy searches, but it
Hello,
If you use Lucene incorrectly (e.g. 2 IndexWriters writing to the same
index), you will see this error. Lucene has no way of telling whether
the lock file was left over from a previous process, or whether it's a
valid lock file because another process is currently indexing documents
or
I have gon through textmining.org, I am able to extract text in string
format. but how can I get it as
lucene document format
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, August 24, 2004 11:54 PM
Subject: Re:
Hi Jon,
Where do I go to get the attached files?
Many Thanks
Simon
- Original Message -
From: Jon Schuster [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 6:25 PM
Subject: RE: Lucene Search Applet
Hi all,
The changes I made to get past
that part you have to do yourself. It is easy, just create a new
Document, create an appropriate Field, give it a name and the string
value you got with textmining.org library, then add the Field to your
Document, and then add the Document to the index with IndexWriter.
Look at one of the
hi there,
i browsed through the list and had some different searches but i do not
find, what i'm looking for.
i got an index which is generated by a bot, collecting websites. there
are sites like www.domain.de/article/1 and www.domain.de/article/1?page=1
these different urls have the same
Hi, anybody heard about Hebrew Analyzer ?
Alex Kiselevsky
Speech Technology Tel:972-9-776-43-46
RD, Amdocs - IsraelMobile: 972-53-63 50 38
mailto:[EMAIL PROTECTED]
The information contained in this message is proprietary of Amdocs,
protected from disclosure, and
That is correct... fuzzy searches are only on a per-term basis.
If what you meant, though, was a phrase query (full near name) you
have to add an explicit slop factor like full name~5
Erik
On Aug 25, 2004, at 2:19 AM, Stephane James Vaucher wrote:
From:
Santosh
please read the API' of lucene.
When you can string from word doc. using textmining api's . try to
convert into some temp. file and try indexing them
If you are able to index PDF and normal file what trouble will you face
indexing a string extracted from word docs ? please also read
My suggestion was referring to a timestamp that could be obtained via
java.io.File, not something provided by Lucene.
Otis
--- Claes Holmerson [EMAIL PROTECTED] wrote:
Yes, looking at the time of the lock was an idea I had but I could
not
find anything like a time stamp. Am I missing
Hi,
Can anyone tell me why there is no lucene 1.4 jar in the maven
repository @ http://www.ibiblio.org/maven/lucene/jars/ ? Who makes them
available? It would be very convenient to be able to get the latest
version from there (or anywhere else)
regards,
Michael Franken
I've used Lucene for a long time, but only in the most basic way. I
have a custom analyzer and a slightly hacked query parser, but in
general it's the basic add document/remove document/query documents
cycle.
In my system, I'm indexing a store of external documents, maintaining
an index for
Avi Drissman wrote:
I've used Lucene for a long time, but only in the most basic way. I
have a custom analyzer and a slightly hacked query parser, but in
general it's the basic add document/remove document/query documents
cycle.
In my system, I'm indexing a store of external documents,
Hi Jon,
I modified the three files exactly the way you said using separate
declaration and static initializer block but for IndexWriter I had to change
4 of the variables because they were final. Then I updated the Lucene JAR
file with the three files in the appropriate directory. But i'm still
Hi, Otis,
Thank you very much. I'll try it.
Best,
Ying
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, August 24, 2004 5:55 PM
Subject: Re: How to implement KWIC (KeyWord In Context) display
Hello Ying,
Take a
What if all Documents in your index contained some flag field + an 'add
date' field. Then you could make a query such as: flag:1 and sort it
by 'add date' field, taking only the very first hit as the most
recently added Document.
Otis
--- Avi Drissman [EMAIL PROTECTED] wrote:
I've used Lucene
Avi,
i would prefer the second approach. If you already store the date time
when the doc was index, you could use the following trick to get the
last document added to the index:
IndexReader ir = IndexReader.open(/tmp/testindex);
int maxDoc = ir.maxDoc();
A collection of links to introductory level Lucene articles (including
one in simplified Chinese and one in Turkish) is available on the
Lucene Wiki at:
URL:http://wiki.apache.org/jakarta-lucene/IntroductionToLucene
Steve
Otis Gospodnetic wrote:
that part you have to do yourself. It is easy,
On Aug 25, 2004, at 11:39 AM, Bernhard Messer wrote:
If you already store the date time when the doc was index, you could
use the following trick to get the last document added to the index:
while (--maxDoc 0) {
Yes, but that's a linear search :(
On Aug 25, 2004, at 11:25 AM, Otis
The more documents match, the slower the search; how long your
particular search would take I cannot tell, though - you should just
test it out and see.
I never needed to use the trick with a flag field in all documents, but
I know others do it.
Otis
--- Avi Drissman [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] 8/25/2004 11:50:01 AM
On Aug 25, 2004, at 11:39 AM, Bernhard Messer wrote:
If you already store the date time when the doc was index, you could
use the following trick to get the last document added to the index:
while (--maxDoc 0) {
Yes, but that's a
On Aug 25, 2004, at 11:57 AM, Grant Ingersoll wrote:
You are right, in the worst case, this would be linear,
No, in _all_ cases this would be linear.
I would bet, that on average,
arguably nearly all cases, you would go through very few iterations
before finding the doc you are interested in
Then
Avi,
I may be confused, as I understand it you said you were interested in
the last document indexed, Berhnard's code does that. Lucene adds
documents sequentially, so counting backwards from the maxDoc() should
get you the last indexed document pretty quickly. If all documents were
deleted,
On Aug 25, 2004, at 12:25 PM, Grant Ingersoll wrote:
I may be confused, as I understand it you said you were interested in
the last document indexed,
Yes, I see what you meant. I'm sorry.
That's actually an interesting option. Is getting the timestamp of the
last document indexed a good enough
On Wednesday 25 August 2004 12:21, B. Grimm [Eastbeam GmbH] wrote:
hi there,
i browsed through the list and had some different searches but i do not
find, what i'm looking for.
i got an index which is generated by a bot, collecting websites. there
are sites like www.domain.de/article/1 and
Hello all,
Is there a way to reduce the indexing time taken when the indexer is
indexing about 30,000 + files. It is roughly taking around 6-7 hours to
do this. I am using IndexHTML class to create the index out of HTML files.
Another issue that I see is every once in a while I get the
I don't think that the demo parser is meant as a production
system component. You can look at Tidy or NekoHtml. They cleanup your html
and are probably optimised.
sv
On Wed, 25 Aug 2004, Hetan Shah wrote:
Hello all,
Is there a way to reduce the indexing time taken when the indexer is
Do you have any pointers for sample code for them?
Would highly appreciate it.
Thanks.
-H
Stephane James Vaucher wrote:
I don't think that the demo parser is meant as a production
system component. You can look at Tidy or NekoHtml. They cleanup your html
and are probably optimised.
sv
On Wed,
JGuru explanation:
http://www.jguru.com/faq/view.jsp?EID=1074228
I have no sample code for neko, I think nutch uses it though. For tidy,
you can look at ant in the sandbox:
Hi,
I suspect this is an easy one but I didn't see a reference in the FAQ's
so I thought I'd ask. I have a file structure like this:
web
- pages
- downloads (pdf docs)
- include
I want to index the html in pages and the pdf's in downloads, but not
the html in include, so I don't want to
Hi Hetan
Th's the major Problem of non Standatrdized Tags for HTML Document's
u are Indexing ,resulting in lag time taken for Indexing process
If u can Tweak the HTMLParser.jj file within lucene.zip '/demo/html'
file
[U have to have some Knowledge of JAVACC for this].
Hetan,
If you are using a corpus with multiple editors, I suggest that you
use a cleaner like tidy as there might be weird stuff appearing in the
html.
sv
On Thu, 26 Aug 2004, Karthik N S wrote:
Hi Hetan
Th's the major Problem of non Standatrdized Tags for HTML Document's
u are
36 matches
Mail list logo