Zookeeper, right? Look how Zookeeper is used in Solr, but Zookeeper does
exactly what you want, I believe.
Sent from my iPhone
> On Oct 19, 2023, at 3:49 AM, Gopal Sharma wrote:
>
> Hello Team,
>
> I am new to Lucene and want to use Lucene in a distributed system to write
> in a Amazon EFS i
Hi Gopal,
Indeed, for a single Lucene index, only one writer may be open at a time.
Lucene tries to catch you if you mess this up, using file-based locking.
If you really need concurrent indexing, you could have N IndexWriters each
writing into a private Directory, and then periodically use addIn
Merge frequency is the mergeFactor ? If yes I'm using the default that is 10,
> read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html
>
> Max segment I don't know, where could I see it?
>
> Bye
>
> -Messaggio originale-
> Da: Sh
This are typical symptoms of an index merge.
However, it is hard to predict more without knowing more data. What is
your segment size limit? Have you changed the default merge frequency
or max segments configuration? Would you have an estimate of ratio of
number of segments reaching max limit / to
Another +1. We are also big s3 + lucene users and it is very interesting
what other people came up with. We have an S3 lucene directory that allows
immediate read-only use of lucene indexes stored on s3 with simultaneous
local caching and a prototype of segment based index replication based on
the
+1 to share code for doing 1) and 3) both of which are tricky!
Safely moving / copying bytes around is a notoriously difficult problem ...
but Lucene's "end to end checksums" and per-segment-file-GUID make this
safer.
I think Lucene's replicator module is a good place for this?
Mike McCandless
PM
> To: java-user
> Subject: Re: lucene index file gets corrupted while creating index with 2
> nodes.
>
> There is no chance anyone will try to change the code for 3.6, so
> raising a JIRA is pointless.
>
> see: http://lucene.472066.n3.nabble.com/Issues-with-locked-indices-
>
There is no chance anyone will try to change the code for 3.6, so
raising a JIRA is pointless.
see:
http://lucene.472066.n3.nabble.com/Issues-with-locked-indices-td4339180.html
Uwe is very knowledgeable in this area, so I'd strongly recommend you
follow his advice.
Best,
Erick
On Tue, Jul 31,
Thank you very much, I'll try it!
On Thu, Sep 29, 2016 at 11:22 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> No ... I don't think Luke can recreate the segments file.
>
> I dug around and found the thread I was thinking of:
> http://markmail.org/thread/ayl5q6rgtngeuoyy
>
> Just be
No ... I don't think Luke can recreate the segments file.
I dug around and found the thread I was thinking of:
http://markmail.org/thread/ayl5q6rgtngeuoyy
Just be careful! Make a backup copy of your index first!
Mike McCandless
http://blog.mikemccandless.com
On Thu, Sep 29, 2016 at 11:31 AM,
do you mean `http://www.getopt.org/luke/`?
On Mon, Sep 26, 2016 at 4:58 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> It is in theory possible to reconstruct a segments file by ls-ing all
> other index files and manually rebuilding it but it is not an easy
> task and it would have
It is in theory possible to reconstruct a segments file by ls-ing all
other index files and manually rebuilding it but it is not an easy
task and it would have to make some guesses.
I think in the past a user did manage to create such a tool and maybe
posted the results here either on this list or
sorry to resend.
I'll change IO to local. Is there anyway to recover first index? now it can
not be opened by checkIndex, we are building index of 7 billion webpages,
it costs much time to rebuild.
On Sun, Sep 25, 2016 at 5:31 PM, Ziming Dong
wrote:
> I'll change IO to local. Is there anyway to
I'll change IO to local. Is there anyway to recover first index? now it can
be opened by checkIndex, we are building index of 7 billion webpages, it
costs much time to rebuild.
On Sat, Sep 24, 2016 at 2:54 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> The 'sync' option for an NFS c
The 'sync' option for an NFS client just means that every write is
sent immediately across the network. And it really is useless
performance loss as long as your app (like Lucene) does the "right
thing" with fsync.
The more important question is why fsync sent to your NFS client and
then to the M
I use the macmini on NFS server side. It seems mount option sync is
useless, just slows down the index program.
On Fri, Sep 23, 2016 at 4:43 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> OK sorry I meant your first index, and it seems to have only one
> (broken) segments file. Ca
OK sorry I meant your first index, and it seems to have only one
(broken) segments file. Can you post the "ls -l" output of that first
index? It looks like the file was (illegally) filled with 0s, or at
least the first 4 bytes were.
Lucene writes this file, fsyncs it, does an atomic rename, and
second index is recovered by checkIndex, I don't know what are in second
index directory before recover.
checkIndex can't read first index. index filenames are attached.
I use lucene6.0.0 at the beginning, then I upgrade to lucene6.1.0 to
continue index.
On Thu, Sep 22, 2016 at 10:17 PM, Michael M
Do you have 2 separate segments files in that 2nd index?
Which exact Lucene version is this?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Sep 22, 2016 at 7:44 AM, Ziming Dong wrote:
> I used checkIndex to recover second index though I lost many docs in index,
> but first index can't
I used checkIndex to recover second index though I lost many docs in index,
but first index can't be read by checkIndex, error is
java -cp lucene-core-6.1.0.jar -ea:org.apache.lucene...
> org.apache.lucene.index.CheckIndex /Volumes/HPT8_56T/infomall-index/index0
> Opening index @ /Volumes/HPT8_56T
Hmm I'm no longer so sure this is an IW bug: on commit we fsync the
pending_segments_N and then do an atomic rename to segments_N.
Can you describe your IO system? Is it possible it does not implement
fsync or atomic renames correctly?
Also, your 2nd exception indices the segments_N file was int
Sorry for the slow reply here. Curious that both of these exceptions
are from IW.init. I think this may be a real bug, caused by this:
https://github.com/apache/lucene-solr/commit/981bfba841144d08df1d1a183d39fcd6f195ad56
I'll see if I can make a standalone test case showing this.
If you open th
Somehow you need to get the sorting server-side ... that's really the only
way to do your use case efficiently.
Why can't you sort each request to your N shards, and then do a merge sort
on the client side, to get the top hits?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jul 7, 2016
Any suggestions pls?
On Mon, Jul 4, 2016 at 3:37 PM, Tarun Kumar wrote:
> Hey Michael,
>
> docIds from multiple indices (from multiple machines) need to be
> aggregated, sorted and first few thousand new to be queried. These few
> thousand docs can be distributed among multiple machines. Each ma
Hey Michael,
docIds from multiple indices (from multiple machines) need to be
aggregated, sorted and first few thousand new to be queried. These few
thousand docs can be distributed among multiple machines. Each machine will
search the docs which are there in their own indices. So, pulling sorting
Why not ask Lucene to do the sort on your time field, instead of pulling
millions of docids to the client and having it sort. You could even do
index-time sorting by time field if you want, which makes early termination
possible (faster sorted searches).
But if even on having Lucene do the sort y
Thanks for reply Michael! In my application, i need to get millions of
documents per search.
Use case is following: return documents in increasing order of field time.
Client (caller) can't hold more than a few thousand docs at a time so it
gets all docIds and corresponding time field for each doc
Are you maybe trying to load too many documents for each search request?
The IR.document API is designed to be used to load just a few hits, like a
page worth or ~ 10 documents, per search.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar wrote:
> I
Thank you for bringing closure!
Mike McCandless
http://blog.mikemccandless.com
On Mon, Jan 18, 2016 at 6:02 AM, Ralph Soika wrote:
> Hi,
>
> I have a strange problem with lucene in one of my projects.
> My business application adds business objects which are stored in a database
> into a lucen
Hi,
I found the reason for my problem: I used a static class with static member
variables and run into race conditions.
So it was my fault in implementing my controller bean. Now I implemented my
writer-bean as a singleton EJB. I think now everything is fine again.
Thanks a lot for the project
Found the issue .. It was caused by Solr during replication ..
On 9 June 2015 at 13:41, Umesh Prasad wrote:
> Diagonistic info from checkIndex tool
>
> 7 of 85: name=_psc docCount=10184501
>
> codec=Lucene46
>
> compound=false
>
> numFiles=14
>
> size (MB)=5,174.959
>
> d
Diagonistic info from checkIndex tool
7 of 85: name=_psc docCount=10184501
codec=Lucene46
compound=false
numFiles=14
size (MB)=5,174.959
diagnostics = {timestamp=1433473052105, os=Linux,
os.version=2.6.32-5-amd64, mergeFactor=10, source=merge,
lucene.version=4.6.1 156086
Lucene itself is not a graph database, but maybe look at
http://neo4j.com which I think can index node properties into a Lucene
index.
For synonyms maybe look at Lucene's unit tests for SynonymFilter?:
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_5_0_0/lucene/analysis/common/src/te
Please take a look at this[1]
Not sure if it is a graph database
[1] http://lucene.apache.org/core/4_10_3/demo/overview-summary.html
On Tue, Mar 10, 2015 at 2:56 PM, Noora Alalawi
wrote:
> Hello dears
>
> Please I want your help.
> I need a simple example for add and index synonym in lucene PLE
Please do help here.
Thank you ,
Varun.
On Tuesday, 15 July 2014 2:14 PM, varun sharma
wrote:
I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am
trying to do
Create Index
1. Build index in RAMDirectory based on data stored on HDFS .
2. Once built
Just don't call optimize...
In theory, you could make a custom Directory impl that would split a
single large file (from Lucene's standpoint) into multiple OS files,
but this ... would be a lot of work. It's simpler to just not
optimize.
Mike McCandless
http://blog.mikemccandless.com
On Wed,
In my case, it creates CFS(Compound File) As 10 GB. Can we split that file
while optimize or writing index.
Thanks
On Wed, May 14, 2014 at 7:38 PM, Yogesh patel
wrote:
> Thanks for reply!!!
>
> Can you please provide me sample code for it? I got the idea but i dont
> know how to impleme
Thanks for reply!!!
Can you please provide me sample code for it? I got the idea but i dont
know how to implement it.
Thanks
On Tue, May 13, 2014 at 7:02 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> You can tell the MergePolicy to limit the maximum size of segments it
>
Yogesh patel [yogeshpateldai...@gmail.com] wrote:
> I am using lucene 3.0.1. I am writing many documents with lucene
> Indexwriter. But Indexwriter add all documents into file which becomes more
> than 4GB in my case. so can i distribute files or partitioned ?
Normally Lucene does not produce a si
You can tell the MergePolicy to limit the maximum size of segments it
should merge.
Also, you should try to upgrade: 3.0.1 is REALLY old.
Mike McCandless
http://blog.mikemccandless.com
On Tue, May 13, 2014 at 1:58 AM, Yogesh patel
wrote:
> HI
>
> I am using lucene 3.0.1. I am writing many doc
Thanks, Erick and Robert.
The reason to use Lucene is we like the robustness and community support
for Lucene. We have other components using advanced Lucene features, too.
For one of the component, all we need is an inverted list with a few
customized field types. If Lucene can solve it, we pref
FieldType myType = new FieldType(TextField.TYPE_NOT_STORED);
myType.setIndexOptions(IndexOptions.DOCS_ONLY);
document.add(new Field("title", "some title", myType));
document.add(new Field("body", "some contents", myType));
...
On Sat, Aug 24, 2013 at 3:27 AM, Airway Wong wrote:
> Hi,
>
> To custo
Have you looked at the whole flexible indexing functionality? Here's
a couple of places to start:
http://www.opensourceconnections.com/2013/06/05/build-your-own-lucene-codec/
http://www.slideshare.net/LucidImagination/flexible-indexing-in-lucene-40
I'm still not quite sure why you want to do this,
Thanks for the suggestion.
We plan to build inverted list for a production system, so there is high
demand for reliability and performance.
Lucene is a highly sophisticated IR lib and has a lot of features. Usually
it is much easier to trim down features and Lucene already starts to
support custo
On 08/24/2013 09:27 AM, Airway Wong wrote:
> To customize the inverted list for different format, it seems we have to
> overload many different classes and functions. We are only interested in
> simple inverted index without position/posting information.
>
> Is it possible to customize an inverted
Okay thanks i resolved that by further removing some unwanted jars.
Thanks
Uzair Kamal
- Original Message -
From: "Uzair Kamal"
To: java-user@lucene.apache.org
Sent: Tuesday, June 11, 2013 12:31:03 PM
Subject: [---SPAM---] Re: [---SPAM---] RE: Lucene Index Upgradation
You
(WikitologyPrediction.java:105)
Can you please help.
Thanks
- Original Message -
From: "Uwe Schindler"
To: java-user@lucene.apache.org
Sent: Tuesday, June 11, 2013 11:28:01 AM
Subject: [---SPAM---] RE: Lucene Index Upgradation
Hi,
This is not a Lucene problem or a problem with the
Hi,
This is not a Lucene problem or a problem with the index. It looks like one of
those:
a) You have some outdated Lucene JAR files somewhere in your classpath (e.g.
JAR file from Lucene 2.4 mixed with those of Lucene 3.4) in the classpath. You
should clean up your compilation and runtime clas
I'm study deeply in the index format,
write java utils to log all of it.
And now I have successfully logged .si, .fnm, .fdx, .fdt,
but the .tim and .tiq is too complicated...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Lucene-Index-File-Format-tp4011133p4020685.html
S
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Jong Kim [mailto:jong.luc...@gmail.com]
> Sent: Tuesday, October 02, 2012 5:20 PM
> To: java-user@lucene.apache.org
> Subject: Re: Luc
OK, so it sounds like I'm hearing that
(a) Accessing index files over NFS from a "single" physical process on
a single computer is safe and can be made to work.
(b) Accessing index files over NFS from "multiple" processes/machines might
be problematic
(c) In all cases, the performance would be l
John,
Are you indicating that later Lucene releases might have a config setting
that can control the write I/O timeout? If so, do you happen to know where
it is or how to set it? I did quick Googling, but all I get back is the
write lock timeout which is set to one second by default.
Thanks
/Jong
We've been in production on Lucene over NFS for about 4 years now. Though
we've had performance issues related to NFS (similar to those mentioned on
this thread), we've only seen some reliability issues. Index writing I/O
timeout exceptions are the primary issue. We've addressed these by
impleme
Ok, that saves you from concurrency issue, but in my experience is just
much slower than local file system, so still NFS can be used but with some
tradeoff on performance.
My 2 cents,
Tommaso
2012/10/2 Jong Kim
> The setup is I have a home-grown server process that has exclusive access
> to the
al Message-
> > From: Paul Libbrecht [mailto:p...@hoplahup.net]
> > Sent: Tuesday, October 02, 2012 2:45 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene index on NFS
> >
> > I doubt NFS is an unreliable file-system.
> > Lucene uses normal
The setup is I have a home-grown server process that has exclusive access
to the index files. All reads and writes are done through this server. No
other process is reading the same index files whether it's local or over
NFS.
/Jong
On Tue, Oct 2, 2012 at 8:56 AM, Ian Lea wrote:
> I agree that rel
My Lucene index is accessed by multiple threads in a single process.
/Jong
On Tue, Oct 2, 2012 at 8:45 AM, Paul Libbrecht wrote:
> I doubt NFS is an unreliable file-system.
> Lucene uses normal random access to files and this has no reason to be
> unreliable unless bad things such as network dr
> Sent: Tuesday, October 02, 2012 2:45 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene index on NFS
>
> I doubt NFS is an unreliable file-system.
> Lucene uses normal random access to files and this has no reason to be
> unreliable unless bad things such as network drops
I agree that reliability/corruption is not an issue.
I would also put it that performance is likely to suffer, but that's
not certain. A fast disk mounted over NFS can be quicker than a slow
local disk. And how much do you care about performance? Maybe it
would be fast enough over NFS to make t
I doubt NFS is an unreliable file-system.
Lucene uses normal random access to files and this has no reason to be
unreliable unless bad things such as network drops happen (in which case you'd
get direct failures or timeouts rather than corruption). I've seen fairly
large infrastructures being b
Thank you all for reply.
So it soudns like it is a known fact that the performance would suffer
rather significantly when the index files are accessed over NFS. But how
about reliability and robustness (which seems even more important)? Isn't
there any increased possibility for intermittent errors
My experience in the Lucene 1.x times were a factor of at least four in writing
to NFS and about two when reading from there. I'd discourage this as much as
possible!
(rsync is way more your friend for transporting and replication à la solr
should also be considered)
paul
Le 2 oct. 2012 à 11
You'll certainly need to factor in the performance of NFS versus local disks.
My experience is that smallish low activity indexes work just fine on
NFS, but large high activity indexes are not so good, particularly if
you have a lot of modifications to the index.
You may want to install a custom
How tolerant is your project of decreased search and indexing performance?
You could probably write a simple test that compares search and write
speeds of local and NFS-mounted indexes and make the decision based on the
results.
On Mon, Oct 1, 2012 at 3:06 PM, Jong Kim wrote:
> Hi,
>
> According
See the javadocs for each part of the Lucene40Codec: each class
details its format.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Oct 1, 2012 at 1:16 AM, Selvakumar wrote:
> Hi Pranab Kumar,
>
> I'm not looking for reading the documents through IndexReader.
> I just want to know how do
Hi Pranab Kumar,
I'm not looking for reading the documents through IndexReader.
I just want to know how does lucene persists its data in the index.
I just want to learn about the metadata and the meta-objects of lucene
index.
On 10/1/2012 10:44 AM, parnab kumar wrote:
Hi,
U
Hi,
Use IndexReader instead . You can loop through the index and
read one document at a time .
Thanks,
Parnab
On Mon, Oct 1, 2012 at 10:33 AM, Selvakumar wrote:
> Hi,
>
> I'm new to Lucene and I reading the docs on Lucene.
>
>
> I read through the Lucene Index File Format, so to e
Small correction here: you are able to "write" to a 3.x index using
4.0. It's just that the newly created segments will be written using
the Lucene4x codec.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Aug 27, 2012 at 2:35 PM, Stephen Howe wrote:
> Paul,
>
> So long as you are not w
Paul,
So long as you are not writing to the 3.x index, it appears Lucene 4.0 can
read the indexes in a read only format. See Mike McCandle's blog post
http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html about
the alpha release codecs. As to whether or not that codec allows yo
Technically, you should be able to use both 3.0 and 4.0 indexes in the same
app, but a recent inquiry here indicated some unresolved problem. Here's the
official statement from CHANGES.txt for 4.0-BETA:
" - On upgrading to 4.0, if you do not fully reindex your documents,
Lucene will emulate
Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > http://www.thetaphi.de
> > > > eMail: u...@thetaphi.de
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Ian Lea [mailto:ian@gmail.com
; ServletContext.getRealPath("/WEB-INF/data/myIndexName");
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > >
> &g
> >
> >
> > > -Original Message-
> > > From: Ian Lea [mailto:ian....@gmail.com]
> > > Sent: Monday, November 28, 2011 6:11 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene index inside of a web app?
> > >
>
men
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Ian Lea [mailto:ian@gmail.com]
> > Sent: Monday, November 28, 2011 6:11 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene index inside of a
Lea [mailto:ian@gmail.com]
> Sent: Monday, November 28, 2011 6:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene index inside of a web app?
>
> Using a static string is fine - it just wasn't clear from your original
post what it
> was.
>
> I usually use a full p
Using a static string is fine - it just wasn't clear from your
original post what it was.
I usually use a full path read from a properties file so that I can
change it without a recompile, have different settings on
test/live/whatever systems, etc. Works for me, but isn't the only way
to do it.
Hi,
Thanks for your response. Yes, LUCENE_INDEX_DIRECTORY is a static string
which contains the file system path of the index (for example, c:\\index).
Is this good practice? If not, what should the full path to an index
look like?
Thanks
On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea wrote:
> W
What is LUCENE_INDEX_DIRECTORY? Some static string in your app?
Lucene knows nothing about your app, JSP, or what app server you are
using. It requires a file system path and it is up to you to provide
that. I always use a full path since I prefer to store indexes
outside the app and it avoids
Hello Uwe,
Thank you for the reply! With your suggestion I looked deeper into my code to
find that the services handing me the data had been updated to set a limit. So
it wasn't a lucene issue at all.
Sent from my iPhone
On Mar 24, 2011, at 6:21 PM, "Uwe Schindler" wrote:
> Are you sure that
Are you sure that you not forgot to commit your changes? Maybe that's the
reason you see only 32768 documents. There is no such low limit, the number of
documents is limited by Integer.MAX_VALUE, number of terms is much higher...
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://ww
You should also make sure that it is lucene that is taking the time.
You don't say where your data is coming from but it is often slower to
read the the source data rather than to index it with lucene.
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
--
Ian.
On Wed, Dec 29, 201
Lucene intermittently takes longer as
1. It flushes the buffered docs from memory to the disk and
2. It merges the smaller index segments to form a larger segment on regular
intervals as per the index writer settings.
You may have a look at various IndexWriter params in the javadoc on the
lucene pa
You can do this in trunk right now using the Codec capability. In fact, there
is a text version already, but it is likely to be really slow on anything
significant. You could likely produce something that is faster but still
readable.
On Nov 9, 2010, at 5:46 AM, Paul Libbrecht wrote:
> hello
One major reason is to update a field or rather shadow a field.
i have a field named "testField" in index1 and i want to update that field.
When I update, I want only the new value to be reflected, not the value in
old field.
now parallelreader starts from the latest index, i.e index2 and searches
But why do you feel the need to have a parallel reader that combines result
sets across two indices based on docId?
On Thu, Oct 28, 2010 at 12:17 AM, Nilesh Vijaywargiay <
nilesh.vi...@gmail.com> wrote:
> Pulkit,
> Parallel reader takes the union of all fields for a given id. Thus if I
> want
> t
Pulkit,
Parallel reader takes the union of all fields for a given id. Thus if I want
to add a field or modify a field of a document which has id 2 in index1, I
need to createa a document with id 2 in index2 with the fields I want to
add/modify. Thus parallel reader would treat them as fields of a s
Look interesting, what is the merit in having a second index in order to
keep the document id the same? Perhaps I have misunderstood. Just want to
understand your motivation here.
On Wed, Oct 20, 2010 at 2:57 PM, Nilesh Vijaywargiay wrote:
> I've written a blog regarding a work around for updati
http://lucene.apache.org/java/3_0_2/fileformats.html#Inverted%20Indexing
--
Ian.
On Tue, Jul 27, 2010 at 3:22 AM, shravan
wrote:
>
> Hi,
>
> Can any one clarify me difference between lucene index and database index?
>
> I am just trying to understand how lucene stores index, like databases store
If you follow the rules Otis listed, you should never hit index
corruption, unless something is wrong with your hardware.
Or, if you hit an as-yet-undiscovered bug in Lucene ;)
Mike
On Wed, Jan 13, 2010 at 1:11 AM, zhang99 wrote:
>
> what is the longest time you ever keep index file without req
what is the longest time you ever keep index file without required to
reindex. i notice even big open source life liferay suffer from this.
thanks for the tips
--
View this message in context:
http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139613.html
Se
Hi,
Use the latest version of Lucene, obey Lucene's locks, write with 1
IndexWriter, avoid NFS...
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
- Original Message
> From: zhang99
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:41:19 PM
> Subje
The size of your index isn't a very useful number without knowing a
significant amount about the structure of your index. Depending upon what's
stored, what's indexed and what kind of searching you're doing (e.g.
sorting?) it varies. About all we can say is that you'll probably need less
than 100G.
Hi
24Gb RAM for a 100Gb index is likely to be plenty. You don't have a
huge amount of control over what lucene loads in memory, but take a
look at termInfosIndexDivisor in IndexReader. And I believe that
omitting field norms (Field.setOmitNorms) may help too. Googling for
"lucene memory usage"
This is what we have in Lucene in Action 2:
~/lia2$ ff \*Thread\*java
./src/lia/admin/CreateThreadedIndexTask.java
./src/lia/admin/ThreadedIndexWriter.java
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
-
On Tue, Nov 10, 2009 at 11:43 AM, Jamie Band wrote:
> As an aside note, is there any way for Lucene to support simultaneous writes
> to an index?
The indexing process is highly parallelized... just use multiple
threads to add documents to the same IndexWriter.
-Yonik
http://www.lucidimagination.
You might try re-implementing, using ThreadPoolExecutor
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
glen
2009/11/10 Jamie Band :
> Hi There
>
> Our app spends alot of time waiting for Lucene to finish writing to the
> index. I'd like to minimize this. If y
Hi Anshum,
Thanks so much for the quick response! I think that pretty much covers it.
I was worried that having to delete the document and re-add it simply
because a date field has been updated would make my indexing quite slow.
Seems however that's not something I'll have to worry about. Than
Hi Marquin,
So you have a field that you want to sort on, well thats pretty much a
straight task in lucene.
Sort sort = new Sort();
sort.setSort(, true/false);
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Sort.html
* It would not really be taxing and frequent updates could b
On Mon, May 11, 2009 at 2:06 PM, Babak Farhang wrote:
> I am not familiar with the details of CFS, but I didn't interpret
> Michael's comment to mean that there is actually any rewriting going
> on here. The problem here appears to be one of translating the
> encrypted/compressed file position to
On Mon, May 11, 2009 at 12:19 AM, Andrzej Bialecki wrote:
>
> Unfortunately, current Lucene IndexWriter implementation uses seek /
> overwrite when writing term info dictionary. This is described in more
> detail here:
>
> https://issues.apache.org/jira/browse/LUCENE-532
>
Thanks for the enlight
1 - 100 of 194 matches
Mail list logo