Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on FSDirectory.
If I do a file based copy I suspect I will get corrupted data because of
concurrent write access.
My current favorite is to create an empty index and use
IndexWriter.addIndexes() to copy the current
Hi,
i tested the implementation. It seems to work with basic Powerpoint
slides. The problem i have is that it doesn't extract special characters
like german umlaute. Does anybody already adressed the problem ?
thanks
Bernhard
Magnus Johansson schrieb:
There's some code using POI at
We've recently implemented something similar with the backup process
creating a file (much like the lock files during indexing) that the
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing
or a delete while it's there, wasn't that much work actually.
Nader
Doug Cutting
I don't think so, you have to forget or close the old one and create a
new instance.
Otis
--- Ravi [EMAIL PROTECTED] wrote:
Is there a way to refresh the IndexSearcher object with the newly
added
documents to the index instead of creating a new object?
Thanks in advance,
Ravi.
Good question. I'm not looking at the API now, but I don't recall any
methods that would let you know where Lucene decided to store its
locks. You could peek at the source and follow its logic, though.
Otis
--- [EMAIL PROTECTED] wrote:
Hey guys,
Quick question... is there a way to get the
Hi,
We have indexed a set of web files (jsp , js , xslt , java properties and
html) using the lucene Whitespace Analyzer.
The purpose is to allow developers to find where code / functions are used
and defined across a large and dissperate
content management repository. Hopefully to aid code
Yan Pujante wrote:
I want to run a very fast search that simply returns the matching
document id. Is there any way to associate the document id returned in
the hit collector to the internal document ID stored in the index ?
Anybody has any idea how to do that ? Ideally you would want to be able
Hi.
I just started to play around with Lucene. I was
wondering if searching and indexing can be done
simultaneously from different processes (two different
processes.) For example, searching is serviced from a
web appliation, while indexing is done periodically
from a stand-alone application.
It would nice if the IndexerSearcher contained a method that could return
the last modified date of the index folder it was created with.
This would make it easier to know when you need to create a new Searcher.
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene
Hi, i need know how do you work with PDF, please give the process.
Thanks...
--
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
www.pdfbox.org
Once you get the package installed the code you can use is:
Document doc = LucenePDFDocument.getDocument(file);
writer.addDocument(doc);
This method returns the PDF in Lucene document format.
Luke
- Original Message -
From: Miguel Angel [EMAIL PROTECTED]
To:
K Kim writes:
I just started to play around with Lucene. I was
wondering if searching and indexing can be done
simultaneously from different processes (two different
processes.) For example, searching is serviced from a
web appliation, while indexing is done periodically
from a
This will help:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#getCurrentVersion(org.apache.lucene.store.Directory)
Otis
--- Luke Shannon [EMAIL PROTECTED] wrote:
It would nice if the IndexerSearcher contained a method that could
return
the last modified
I have created a tool that could respond to your question.
It is called Lucene Server (http://luceneserver.sourceforge.net/)
It is a tool for integration of Lucene in distributed environnements (via RMI).
A new release is under developpement. It will include a paginated search
service using
When you use Google and you put in the box amig then press ENTER
Sometimes google show Perhaps it meant amigus how make this
solution??
--
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277
-
Yes it will. Thanks.
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 16, 2004 10:28 AM
Subject: Re: IndexSearcher Refresh
This will help:
You want to use ngrams on your lucene index, then pick the highest ranking
score.
For a demo: http://www.searchmorph.com/kat/spell.jsp
or
http://www.searchmorph.com/pub/ngramspeller/NGramSpeller.java for source
code.
TB
http://www.shopbloomfield.com
On Tue, 16 Nov 2004, Miguel Angel wrote:
Try using a TermQuery instead of QueryParser to see if you get the
results you expect. Exact case matters.
Also, when troubleshooting issues with QueryParser, it is helpful to
see what the actual Query returned is - try displaying its toString
output.
Erik
On Nov 16, 2004, at 6:25
Hi,
I am a new user of Lucene. so please point me to
documentation/archives if these issues have been covered before.
I plan to use Lucene in a application with the following (fairly
standard) requirements:
- Index documents that contain a title, author, date and content
- It is fairly common to
All,
Lucene 1.4 final.
I have an index that has to be updated frequently. A search may
happen at any time. I implemented this by indexing into a
RAMDirectory and then merging with an FSDirecotory at regular
intervals (or sometimes when a search is requested). This seems to
work quite well.
I am interested in pursuing experienced peoples' understanding as I have half
the queue approach developed already.
I am not following why you don't like the queue approach Sergiu. From what I
gathered from this board, if you do lots of updates, the opening of the
WriterIndex is very
I do most of these same things and made these relevant design decisions:
1. Use a combination of query expansion to search across multiple
fields and field concatenation to create document fields that combine
separate object fields. I use multiple fields only when it is important
to weight them
I received the error below when I was attempting to over whelm my system with
incremental update requests.
What is this file it is looking for? I checked the index. It contains:
_4c.del
_4d.cfs
deletable
segments
Where does _4c.fnm come from?
Here is the error:
Unable to create the create
Alternatively, you can use PDFTextStream (http://snowtide.com).
It also has an easy-to-use Lucene API, with code that looks like this:
Document doc = PDFDocumentFactory.buildPDFDocument(pdfFile, config);
indexWriter.addDocument(doc);
One of the nice advantages of this is that the resulting Lucene
Field names are stored in the field info file, with suffix .fnm. - see
http://jakarta.apache.org/lucene/docs/fileformats.html
The .fnm should be inside the .cfs file (cfs files are compound files
that contain all index files described at the above URL). Maybe you
can provide the code that
[EMAIL PROTECTED] wrote:
I am interested in pursuing experienced peoples' understanding as I have half the queue approach developed already.
well I think that experienced people developed lucene :) theyoffered us
the possibility to use multithreading and concurent searching.
Of course ..
It conistantly breaks when I run more than 10 concurrent incremental
updates.
I can post the code on Bugzilla (hopefully when I get to the site it will be
obvious how I can post things).
Luke
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL
Hey Folks, I just inherited a deployed Lucene based application that
started throwing the following exception:
org.apache.lucene.search.BooleanQuery$TooManyClauses
at
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:79)
at
what kind of incremental updates are you doing, because we update our index
every 15 minutes with 100 ~ 200 documents and we're writing to a 6 GB memory
resident index, the IndexWriter runs one instance at a time, so what kind of
increments are we talking about it takes a bit of doing to
The schedule is determined by the users of the system. Basically when the
user(s) change the content (adding/deleting a folder or file, modify a
file's content) through a web based interface a re-index is required of the
content. This could happen 20 times in the span of a few seconds or once in
This is the latest error I have received:
IndexReader out of date and no longer valid for delete, undelete, or setNorm
operations
I need synchronize this process more carefully. I think this goes back to
the point that during my incremental update I sometimes need to forcefully
clear the lock on
On Tue, 2004-11-16 at 14:57, Luke Shannon wrote:
This is the latest error I have received:
IndexReader out of date and no longer valid for delete, undelete, or setNorm
operations
What you need to do is check the version number of the index to
determine if you need to open a new IndexReader
'Concurrent' and 'updates' in the same sentence sounds like a possible
source of the problem. You have to use a single IndexWriter and it
should not overlap with an IndexReader that is doing deletes.
Otis
--- Luke Shannon [EMAIL PROTECTED] wrote:
It conistantly breaks when I run more than 10
That's it, you need to batch your updates, it comes down to do you need to give
your users search accuracy to the second, take your database and put an
is_dirty row on the master table of the object you're indexing and run a
scheduled task every x minutes and have your process read the objects
It doesn't have to be to the second. If things take a few minutes it's ok.
It looks like the first lock issue I'm hitting in my program is when I try
and delete from the Index for the first time. No writer has been created
yet, only the reader so I am not sure why it thinks its locked.
-
On Tuesday 16 November 2004 21:35, Joe Krause wrote:
Hey Folks, I just inherited a deployed Lucene based application that
started throwing the following exception:
org.apache.lucene.search.BooleanQuery$TooManyClauses
...
I did some research regarding this error and found out that the default
On Tue, 2004-11-16 at 16:32, Paul Elschot wrote:
Once you approach 1000 days, you'll get the same problem again,
so you might want to use a filter for the dates.
See DateFilter and the archives on MMDD.
Can anyone point to a good example of how to use the DateFilter?
Thanks,
Luke
This is what I have been doing with DateFilter
DateFilter dateFilter = new DateFilter(published, lLimitDate,
System.currentTimeMillis());
TopFieldDocs docs = searcher.search(parser.parse(sSearchPhrase), dateFilter,
utility.iMaxResults, new Sort(sortFields));
Ed
--- Luke Francl [EMAIL
Hello,
I have been using DateFilter to limit my search results to a certain date
range. I am now asked to replace this filter with one where my search results
have document IDs greater than a given document ID. This document ID is
assigned during indexing and is a Keyword field.
I've browsed
Thank you, Luke. I decided to branch (use multiple try/catch clauses) so
that I know if the IndexReader is open or not. Your remark on locking was
helpful for my understanding of Lucene anyway.
- Original Message -
From: "Luke Shannon" [EMAIL PROTECTED]
To: "Lucene
Hi All,
What's the best implementation of displaying the Next and Prev search result
in Lucene?
Thanks,
Ramon
Hello;
I think I have solved my locking issues. I just made it through the set of
test cases that previously resulted in Index Locking Errors. I just removed
the method from my code that checks for a Index lock and forcefully removes
it after 1 minute. Hopefully they never need to be put back in.
Very cool Luke. I am not quite there yet. I am half way through implementing
the queue approach, but I have hit walls that are making me sit back and figure
out my strategy. I have a struts/tomcat/ojb/mysql project that can
potentially have a million records and growing over time and
Once the index is merged there is only 1 index - there are no
subindices.
Otis
--- Karthik N S [EMAIL PROTECTED] wrote:
Hi Guys,
Apologies .
Can Some body Tell me which API to use to Count the number of
SubIndexe's
in a MERGED Index.
Thx in Advance
Well if the document ID is number (even if it isn't really) you could
use a range query, or just rebuild your index using that specific filed
as a sorted field but if it numeric be aware that if you use integer it
limits how high your numbers can get.
nader
Edwin Tang wrote:
Hello,
I have been
Hi guy's
Apologies.
So A Mergeed Index is again a Single [ addition of subIndexes... ),
If that case , If One of the Field Types is of type 'Field.Keyword'
whic is Unique across the subIndexes [Before Merging].
and If I want to Count this Unique Field in a MergerIndex [After
MySQL does offer a basic fulltext search (with MyISAM tables), but it
doesn't really approach the functionality of Lucene, such as pluggable
tokenizers, stemming, etc. I think MS SQL server has fulltext search
as well, but I have no idea if it's any good.
See
47 matches
Mail list logo