On Wednesday 18 August 2004 22:44, Rob Jose wrote:
Hello
I have indexed several thousand (52 to be exact) text files and I keep
running out of disk space to store the indexes. The size of the documents
I have indexed is around 2.5 GB. The size of the Lucene indexes is around
287 GB. Does
Hi,
Please check for hidden files in the index folder. If
you are using linx, do something like
ls -al index folder
I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking
Guys
Are u Using the Optimizing the index before close process.
If not try using it... :}
karthik
-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 1:00 PM
To: Lucene Users List
Subject: Re: Index Size
Hi,
Please check
Rob,
as Doug and Paul already mentioned, the index size is definitely to big :-(.
What could raise the problem, especially when running on a windows
platform, is that an IndexReader is open during the whole index process.
During indexing, the writer creates temporary segment files which will
be
This is what I did.
There are 2 classes in the lucene source which are not
public and therefore cannot be accessed from outside
the package. The classes are
1. org.apache.lucene.index.SegmentInfos
- collection of segments
2. org.apache.lucene.index.SegmentInfo
-represents a sigle segment
I
Hi
George
Do u think ,the same would work for MERGED Indexes
Please Can u suggest a solution.
Karthik
-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 2:08 PM
To: Lucene Users List
Subject: RE: Restoring a corrupt index
Hi,
I am using lucene search engine for my application.
i am able to search through the text files and htmls as specified by lucene
can you please clarify my doubts
1.can lucene search through pdfs and word documents? if yes then how?
2.can lucene search through database ? if yes then how?
For PDF you need to extract a text from pdf files using pdfbox library and
for word documents u can use apache POI api's . There are messages
posted on the lucene list related to your queries. About database ,i guess
someone must have done it . :)
- Original Message -
From: Santosh
The PDF and WORD stuff has been done too: have a look at
http://www.zilverline.org.
Michael Franken
Chandan Tamrakar wrote:
For PDF you need to extract a text from pdf files using pdfbox library and
for word documents u can use apache POI api's . There are messages
posted on the lucene list
I am recently joined into list, I didnt gone through any previous mails, if
you have any mails or related code please forward it to me
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:47 PM
Subject: Re:
Hi,
Note that Lucene only provides an API to build a
search engine you can use it how ever you want it. You
can pass data to indexing in 2 forms.
1. java.lang.String
2. java.io.Reader
What Lucene recieves is any of the two objects above.
Now in the case of non-text documents you need to
extract
If I understand correctly, You have situation where
you have a large main index and then you create small
indexes and finally merge to the main index. It can
happen that half way through merging, the system
crashed and the index got corrupted. I do not think in
this case you can use my solution.
for pdf u can refer www.pdfbox.org and pls. check the apache POI project
in jakarta.apache.org site for indexing MS documents.
- Original Message -
From: Santosh [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 4:09 PM
Subject: Re: searchhelp
JGURU FAQ
http://www.jguru.com/faq/Lucene
OFFICIAL FAQ
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
MAIL ARCHIVE
http://www.mail-archive.com/[EMAIL PROTECTED]/
hope this helps.
-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 19 August 2004 11:25
To: Lucene
thanks everybody,
but i didnt got any code or any real help in this links
any body has performed previously this search?if yes then please send me the
code, or tell me the what code I have to add to my present lucene
- Original Message -
From: David Townsend [EMAIL PROTECTED]
To: Lucene
As far as I remember, the pdfbox release includes some existing code to
index pdfs with lucene, based upon the demo created for lucene 1.3. In
fact, I think the code only works for lucene 1,3 - something to do with
a change from arrays to vectors in lucene 1.4. I may be wrong though.
Terence,
Calling close() on IndexSearcher will not release the memory
immediately. It will only release resources (e.g. other Java objects
used by IndexSearcher), and it is up to the JVM's garbage collector to
actually reclaim/release the previously used memory. There are
command-line
Use the life-cycle hooks mentioned in another email
(activate/passivate) and when you detect that the server is about to
unload your class, call close() on IndexSearcher. I haven't used
Lucene in an EJB environment, so I don't know the details,
unfortunately. :(
Your simulation may be too fast
Terence,
2) I have a background process to update the index files. If I keep
the IndexSearcher opened, I am not sure whether it will pick up the
changes from the index updates done in the background process.
This is a frequently asked question. Basically, you have to make use
of
Paul
Thank you for your response. I have appended to the bottom of this message
the field structure that I am using. I hope that this helps. I am using
the StandardAnalyzer. I do not believe that I am changing any default
values, but I have also appended the code that adds the temp index to
Hey George
Thanks for responding. I am using windows and I don't see any hidden files.
I have a ton of CFS files (1366/1405). I have 22 F# (F1, F2, etc.) files.
I have two FDT files and two FDX files. And three FNM files. Add these
files to the deletable and segments file and that is all of the
Karthik
Thanks for responding. Yes, I optimize right before I close the index
writer. I added this a little while ago to try and get the size down.
Rob
- Original Message -
From: Karthik N S [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 12:59
Bernhard
Thanks for responding. I do have an IndexReader open on the Temp index. I
pass this IndexReader into the addIndexes method on the IndexWriter to add
these files. I did notice that I have a ton of CFS files that I removed and
was still able to read the indexes. Are these the temporary
I did a little more research into my production indexes, and so far the
first index in the only one that has any other files besides the CFS files.
The other indexes that I have seen have just the deletable and segments
files and a whole bunch of cfs files. Very interesting. Also worth noting
is
Luceners,
I have elements (accounts, contacts, task, events) where I have to find
in any field a word (hello for example). Which is the best way to do
that with Lucene?
In other words,
I have several elements where I have to search a Word. I can make one
search and then order the hits to
FYI,
I want to configure the Indexing file as per the user setting
values(Date Time). Job Scheduler.
How can I handle the job scheduler to indexing???
Any one knows good experience in Quartz Scheduler share with me.
Thanks,
Natarajan.
I thought this was the case. I believe there was a bug in one of the
recent Lucene releases that caused old CFS files not to be removed when
they should be removed. This resulted in your index directory
containing a bunch of old CFS files consuming your disk space.
Try getting a recent nightly
Otis
I am using Lucene 1.3 final. Would it help if I move to Lucene 1.4 final?
Rob
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 7:13 AM
Subject: Re: Index Size
I thought this was the case. I
Just go for 1.4.1 and look at the CHANGES.txt file to see if there were
any index format changes. If there were, you'll need to re-index.
Otis
--- Rob Jose [EMAIL PROTECTED] wrote:
Otis
I am using Lucene 1.3 final. Would it help if I move to Lucene 1.4
final?
Rob
- Original Message
Otis
I upgraded to 1.4.1. I deleted all of my old indexes and started from
scratch. I indexed 2 MB worth of text files and my index size is 8 MB.
Would it be better if I stopped using the
IndexWriter.addIndexes(IndexReader) method and instead traverse the
IndexReader on the temp index and use
Have you tried looking at the contents of this small index with Luke, to see what
actually got put into it? Maybe one of your stored fields is being fed something you
didn't expect.
Dan
-
To unsubscribe, e-mail: [EMAIL
Dan
Thanks for your response. Yes, I have used Luke to look at the index and
everything looks good.
Rob
- Original Message -
From: Armbrust, Daniel C. [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 9:14 AM
Subject: RE: Index Size
Have you
Stupid question:
Are you sure you have the right number of docs in your index? i.e. you're
not adding the same document twice into or via your tmp index.
sv
On Thu, 19 Aug 2004, Rob Jose wrote:
Paul
Thank you for your response. I have appended to the bottom of this message
the field
How many fields do you have and what analyzer are you using?
[EMAIL PROTECTED] 8/19/2004 11:54:25 AM
Otis
I upgraded to 1.4.1. I deleted all of my old indexes and started from
scratch. I indexed 2 MB worth of text files and my index size is 8
MB.
Would it be better if I stopped using the
Grant
Thanks for your response. I have fixed this issue. I have indexed 5 MB
worth of text files and I now only use 224 KB. I was getting 80 MB. The
only change I made was to change the way I merge my temp index into my prod
index. My code changed from:
prodWriter.setUseCompoundFile(true);
Hi all,
I am the Debian package maintainer for Lucene, and I'm having build
problems with 1.4.1. We are very close to a major Debian release (code
named 'sarge'), and the window for changes is very small. Can someone
please help me in the next day or two, otherwise Debian stable will ship
Lucene
36 matches
Mail list logo