Re: Lucene Index Writer in a distributed system

2023-10-19 Thread Cody Amen
Zookeeper, right? Look how Zookeeper is used in Solr, but Zookeeper does exactly what you want, I believe. Sent from my iPhone > On Oct 19, 2023, at 3:49 AM, Gopal Sharma wrote: > > Hello Team, > > I am new to Lucene and want to use Lucene in a distributed system to write > in a Amazon EFS i

Re: Lucene Index Writer in a distributed system

2023-10-19 Thread Michael McCandless
Hi Gopal, Indeed, for a single Lucene index, only one writer may be open at a time. Lucene tries to catch you if you mess this up, using file-based locking. If you really need concurrent indexing, you could have N IndexWriters each writing into a private Directory, and then periodically use addIn

Re: Lucene index directory grows and shrinks

2019-11-04 Thread Erick Erickson
Merge frequency is the mergeFactor ? If yes I'm using the default that is 10, > read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html > > Max segment I don't know, where could I see it? > > Bye > > -Messaggio originale- > Da: Sh

Re: Lucene index directory grows and shrinks

2019-11-04 Thread Atri Sharma
This are typical symptoms of an index merge. However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / to

Re: Lucene Index Cloud Replication

2019-07-11 Thread Anton Zenkov
Another +1. We are also big s3 + lucene users and it is very interesting what other people came up with. We have an S3 lucene directory that allows immediate read-only use of lucene indexes stored on s3 with simultaneous local caching and a prototype of segment based index replication based on the

Re: Lucene Index Cloud Replication

2019-07-09 Thread Michael McCandless
+1 to share code for doing 1) and 3) both of which are tricky! Safely moving / copying bytes around is a notoriously difficult problem ... but Lucene's "end to end checksums" and per-segment-file-GUID make this safer. I think Lucene's replicator module is a good place for this? Mike McCandless

RE: lucene index file gets corrupted while creating index with 2 nodes.

2018-07-31 Thread Uwe Schindler
PM > To: java-user > Subject: Re: lucene index file gets corrupted while creating index with 2 > nodes. > > There is no chance anyone will try to change the code for 3.6, so > raising a JIRA is pointless. > > see: http://lucene.472066.n3.nabble.com/Issues-with-locked-indices- >

Re: lucene index file gets corrupted while creating index with 2 nodes.

2018-07-31 Thread Erick Erickson
There is no chance anyone will try to change the code for 3.6, so raising a JIRA is pointless. see: http://lucene.472066.n3.nabble.com/Issues-with-locked-indices-td4339180.html Uwe is very knowledgeable in this area, so I'd strongly recommend you follow his advice. Best, Erick On Tue, Jul 31,

Re: lucene index program won't start after power failure

2016-09-29 Thread Ziming Dong
Thank you very much, I'll try it! On Thu, Sep 29, 2016 at 11:22 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > No ... I don't think Luke can recreate the segments file. > > I dug around and found the thread I was thinking of: > http://markmail.org/thread/ayl5q6rgtngeuoyy > > Just be

Re: lucene index program won't start after power failure

2016-09-29 Thread Michael McCandless
No ... I don't think Luke can recreate the segments file. I dug around and found the thread I was thinking of: http://markmail.org/thread/ayl5q6rgtngeuoyy Just be careful! Make a backup copy of your index first! Mike McCandless http://blog.mikemccandless.com On Thu, Sep 29, 2016 at 11:31 AM,

Re: lucene index program won't start after power failure

2016-09-29 Thread Ziming Dong
do you mean `http://www.getopt.org/luke/`? On Mon, Sep 26, 2016 at 4:58 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > It is in theory possible to reconstruct a segments file by ls-ing all > other index files and manually rebuilding it but it is not an easy > task and it would have

Re: lucene index program won't start after power failure

2016-09-25 Thread Michael McCandless
It is in theory possible to reconstruct a segments file by ls-ing all other index files and manually rebuilding it but it is not an easy task and it would have to make some guesses. I think in the past a user did manage to create such a tool and maybe posted the results here either on this list or

Re: lucene index program won't start after power failure

2016-09-25 Thread Ziming Dong
sorry to resend. I'll change IO to local. Is there anyway to recover first index? now it can not be opened by checkIndex, we are building index of 7 billion webpages, it costs much time to rebuild. On Sun, Sep 25, 2016 at 5:31 PM, Ziming Dong wrote: > I'll change IO to local. Is there anyway to

Re: lucene index program won't start after power failure

2016-09-25 Thread Ziming Dong
I'll change IO to local. Is there anyway to recover first index? now it can be opened by checkIndex, we are building index of 7 billion webpages, it costs much time to rebuild. On Sat, Sep 24, 2016 at 2:54 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > The 'sync' option for an NFS c

Re: lucene index program won't start after power failure

2016-09-23 Thread Michael McCandless
The 'sync' option for an NFS client just means that every write is sent immediately across the network. And it really is useless performance loss as long as your app (like Lucene) does the "right thing" with fsync. The more important question is why fsync sent to your NFS client and then to the M

Re: lucene index program won't start after power failure

2016-09-22 Thread Ziming Dong
I use the macmini on NFS server side. It seems mount option sync is useless, just slows down the index program. On Fri, Sep 23, 2016 at 4:43 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK sorry I meant your first index, and it seems to have only one > (broken) segments file. Ca

Re: lucene index program won't start after power failure

2016-09-22 Thread Michael McCandless
OK sorry I meant your first index, and it seems to have only one (broken) segments file. Can you post the "ls -l" output of that first index? It looks like the file was (illegally) filled with 0s, or at least the first 4 bytes were. Lucene writes this file, fsyncs it, does an atomic rename, and

Re: lucene index program won't start after power failure

2016-09-22 Thread Ziming Dong
second index is recovered by checkIndex, I don't know what are in second index directory before recover. checkIndex can't read first index. index filenames are attached. I use lucene6.0.0 at the beginning, then I upgrade to lucene6.1.0 to continue index. On Thu, Sep 22, 2016 at 10:17 PM, Michael M

Re: lucene index program won't start after power failure

2016-09-22 Thread Michael McCandless
Do you have 2 separate segments files in that 2nd index? Which exact Lucene version is this? Mike McCandless http://blog.mikemccandless.com On Thu, Sep 22, 2016 at 7:44 AM, Ziming Dong wrote: > I used checkIndex to recover second index though I lost many docs in index, > but first index can't

Re: lucene index program won't start after power failure

2016-09-22 Thread Ziming Dong
I used checkIndex to recover second index though I lost many docs in index, but first index can't be read by checkIndex, error is java -cp lucene-core-6.1.0.jar -ea:org.apache.lucene... > org.apache.lucene.index.CheckIndex /Volumes/HPT8_56T/infomall-index/index0 > Opening index @ /Volumes/HPT8_56T

Re: lucene index program won't start after power failure

2016-09-22 Thread Michael McCandless
Hmm I'm no longer so sure this is an IW bug: on commit we fsync the pending_segments_N and then do an atomic rename to segments_N. Can you describe your IO system? Is it possible it does not implement fsync or atomic renames correctly? Also, your 2nd exception indices the segments_N file was int

Re: lucene index program won't start after power failure

2016-09-22 Thread Michael McCandless
Sorry for the slow reply here. Curious that both of these exceptions are from IW.init. I think this may be a real bug, caused by this: https://github.com/apache/lucene-solr/commit/981bfba841144d08df1d1a183d39fcd6f195ad56 I'll see if I can make a standalone test case showing this. If you open th

Re: lucene index reader performance

2016-07-07 Thread Michael McCandless
Somehow you need to get the sorting server-side ... that's really the only way to do your use case efficiently. Why can't you sort each request to your N shards, and then do a merge sort on the client side, to get the top hits? Mike McCandless http://blog.mikemccandless.com On Thu, Jul 7, 2016

Re: lucene index reader performance

2016-07-07 Thread Tarun Kumar
Any suggestions pls? On Mon, Jul 4, 2016 at 3:37 PM, Tarun Kumar wrote: > Hey Michael, > > docIds from multiple indices (from multiple machines) need to be > aggregated, sorted and first few thousand new to be queried. These few > thousand docs can be distributed among multiple machines. Each ma

Re: lucene index reader performance

2016-07-04 Thread Tarun Kumar
Hey Michael, docIds from multiple indices (from multiple machines) need to be aggregated, sorted and first few thousand new to be queried. These few thousand docs can be distributed among multiple machines. Each machine will search the docs which are there in their own indices. So, pulling sorting

Re: lucene index reader performance

2016-07-04 Thread Michael McCandless
Why not ask Lucene to do the sort on your time field, instead of pulling millions of docids to the client and having it sort. You could even do index-time sorting by time field if you want, which makes early termination possible (faster sorted searches). But if even on having Lucene do the sort y

Re: lucene index reader performance

2016-07-03 Thread Tarun Kumar
Thanks for reply Michael! In my application, i need to get millions of documents per search. Use case is following: return documents in increasing order of field time. Client (caller) can't hold more than a few thousand docs at a time so it gets all docIds and corresponding time field for each doc

Re: lucene index reader performance

2016-06-28 Thread Michael McCandless
Are you maybe trying to load too many documents for each search request? The IR.document API is designed to be used to load just a few hits, like a page worth or ~ 10 documents, per search. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar wrote: > I

Re: Lucene Index is not always created for new documents added in short time

2016-01-26 Thread Michael McCandless
Thank you for bringing closure! Mike McCandless http://blog.mikemccandless.com On Mon, Jan 18, 2016 at 6:02 AM, Ralph Soika wrote: > Hi, > > I have a strange problem with lucene in one of my projects. > My business application adds business objects which are stored in a database > into a lucen

Re: Lucene Index is not always created for new documents added in short time

2016-01-26 Thread Ralph Soika
Hi, I found the reason for my problem: I used a static class with static member variables and run into race conditions. So it was my fault in implementing my controller bean. Now I implemented my writer-bean as a singleton EJB. I think now everything is fine again. Thanks a lot for the project

Re: Lucene Index Corrupted

2015-06-09 Thread Umesh Prasad
Found the issue .. It was caused by Solr during replication .. On 9 June 2015 at 13:41, Umesh Prasad wrote: > Diagonistic info from checkIndex tool > > 7 of 85: name=_psc docCount=10184501 > > codec=Lucene46 > > compound=false > > numFiles=14 > > size (MB)=5,174.959 > > d

Re: Lucene Index Corrupted

2015-06-09 Thread Umesh Prasad
Diagonistic info from checkIndex tool 7 of 85: name=_psc docCount=10184501 codec=Lucene46 compound=false numFiles=14 size (MB)=5,174.959 diagnostics = {timestamp=1433473052105, os=Linux, os.version=2.6.32-5-amd64, mergeFactor=10, source=merge, lucene.version=4.6.1 156086

Re: Lucene index

2015-03-11 Thread Michael McCandless
Lucene itself is not a graph database, but maybe look at http://neo4j.com which I think can index node properties into a Lucene index. For synonyms maybe look at Lucene's unit tests for SynonymFilter?: https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_5_0_0/lucene/analysis/common/src/te

Re: Lucene index

2015-03-10 Thread Gimantha Bandara
Please take a look at this[1] Not sure if it is a graph database [1] http://lucene.apache.org/core/4_10_3/demo/overview-summary.html On Tue, Mar 10, 2015 at 2:56 PM, Noora Alalawi wrote: > Hello dears > > Please I want your help. > I need a simple example for add and index synonym in lucene PLE

Re: Lucene index corruption on HDFS

2014-08-20 Thread varun sharma
Please do help here. Thank you , Varun. On Tuesday, 15 July 2014 2:14 PM, varun sharma wrote: I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am trying to do Create Index 1. Build index in RAMDirectory based on data stored on HDFS . 2. Once built

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-15 Thread Michael McCandless
Just don't call optimize... In theory, you could make a custom Directory impl that would split a single large file (from Lucene's standpoint) into multiple OS files, but this ... would be a lot of work. It's simpler to just not optimize. Mike McCandless http://blog.mikemccandless.com On Wed,

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-15 Thread Yogesh patel
In my case, it creates CFS(Compound File) As 10 GB. Can we split that file while optimize or writing index. Thanks On Wed, May 14, 2014 at 7:38 PM, Yogesh patel wrote: > Thanks for reply!!! > > Can you please provide me sample code for it? I got the idea but i dont > know how to impleme

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-14 Thread Yogesh patel
Thanks for reply!!! Can you please provide me sample code for it? I got the idea but i dont know how to implement it. Thanks On Tue, May 13, 2014 at 7:02 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > You can tell the MergePolicy to limit the maximum size of segments it >

RE: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-13 Thread Toke Eskildsen
Yogesh patel [yogeshpateldai...@gmail.com] wrote: > I am using lucene 3.0.1. I am writing many documents with lucene > Indexwriter. But Indexwriter add all documents into file which becomes more > than 4GB in my case. so can i distribute files or partitioned ? Normally Lucene does not produce a si

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-13 Thread Michael McCandless
You can tell the MergePolicy to limit the maximum size of segments it should merge. Also, you should try to upgrade: 3.0.1 is REALLY old. Mike McCandless http://blog.mikemccandless.com On Tue, May 13, 2014 at 1:58 AM, Yogesh patel wrote: > HI > > I am using lucene 3.0.1. I am writing many doc

Re: Lucene index customization

2013-08-24 Thread Airway Wong
Thanks, Erick and Robert. The reason to use Lucene is we like the robustness and community support for Lucene. We have other components using advanced Lucene features, too. For one of the component, all we need is an inverted list with a few customized field types. If Lucene can solve it, we pref

Re: Lucene index customization

2013-08-24 Thread Robert Muir
FieldType myType = new FieldType(TextField.TYPE_NOT_STORED); myType.setIndexOptions(IndexOptions.DOCS_ONLY); document.add(new Field("title", "some title", myType)); document.add(new Field("body", "some contents", myType)); ... On Sat, Aug 24, 2013 at 3:27 AM, Airway Wong wrote: > Hi, > > To custo

Re: Lucene index customization

2013-08-24 Thread Erick Erickson
Have you looked at the whole flexible indexing functionality? Here's a couple of places to start: http://www.opensourceconnections.com/2013/06/05/build-your-own-lucene-codec/ http://www.slideshare.net/LucidImagination/flexible-indexing-in-lucene-40 I'm still not quite sure why you want to do this,

Re: Lucene index customization

2013-08-24 Thread Airway Wong
Thanks for the suggestion. We plan to build inverted list for a production system, so there is high demand for reliability and performance. Lucene is a highly sophisticated IR lib and has a lot of features. Usually it is much easier to trim down features and Lucene already starts to support custo

Re: Lucene index customization

2013-08-24 Thread Ivan Krišto
On 08/24/2013 09:27 AM, Airway Wong wrote: > To customize the inverted list for different format, it seems we have to > overload many different classes and functions. We are only interested in > simple inverted index without position/posting information. > > Is it possible to customize an inverted

Re: [---SPAM---] Re: [---SPAM---] RE: Lucene Index Upgradation

2013-06-11 Thread Uzair Kamal
Okay thanks i resolved that by further removing some unwanted jars. Thanks Uzair Kamal - Original Message - From: "Uzair Kamal" To: java-user@lucene.apache.org Sent: Tuesday, June 11, 2013 12:31:03 PM Subject: [---SPAM---] Re: [---SPAM---] RE: Lucene Index Upgradation You

Re: [---SPAM---] RE: Lucene Index Upgradation

2013-06-11 Thread Uzair Kamal
(WikitologyPrediction.java:105) Can you please help. Thanks - Original Message - From: "Uwe Schindler" To: java-user@lucene.apache.org Sent: Tuesday, June 11, 2013 11:28:01 AM Subject: [---SPAM---] RE: Lucene Index Upgradation Hi, This is not a Lucene problem or a problem with the

RE: Lucene Index Upgradation

2013-06-10 Thread Uwe Schindler
Hi, This is not a Lucene problem or a problem with the index. It looks like one of those: a) You have some outdated Lucene JAR files somewhere in your classpath (e.g. JAR file from Lucene 2.4 mixed with those of Lucene 3.4) in the classpath. You should clean up your compilation and runtime clas

Re: Lucene Index File Format

2012-11-16 Thread wgggfiy
I'm study deeply in the index format, write java utils to log all of it. And now I have successfully logged .si, .fnm, .fdx, .fdt, but the .tim and .tiq is too complicated... -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Index-File-Format-tp4011133p4020685.html S

RE: Lucene index on NFS

2012-10-02 Thread Uwe Schindler
Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jong Kim [mailto:jong.luc...@gmail.com] > Sent: Tuesday, October 02, 2012 5:20 PM > To: java-user@lucene.apache.org > Subject: Re: Luc

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
OK, so it sounds like I'm hearing that (a) Accessing index files over NFS from a "single" physical process on a single computer is safe and can be made to work. (b) Accessing index files over NFS from "multiple" processes/machines might be problematic (c) In all cases, the performance would be l

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
John, Are you indicating that later Lucene releases might have a config setting that can control the write I/O timeout? If so, do you happen to know where it is or how to set it? I did quick Googling, but all I get back is the write lock timeout which is set to one second by default. Thanks /Jong

Re: Lucene index on NFS

2012-10-02 Thread Nader, John P
We've been in production on Lucene over NFS for about 4 years now. Though we've had performance issues related to NFS (similar to those mentioned on this thread), we've only seen some reliability issues. Index writing I/O timeout exceptions are the primary issue. We've addressed these by impleme

Re: Lucene index on NFS

2012-10-02 Thread Tommaso Teofili
Ok, that saves you from concurrency issue, but in my experience is just much slower than local file system, so still NFS can be used but with some tradeoff on performance. My 2 cents, Tommaso 2012/10/2 Jong Kim > The setup is I have a home-grown server process that has exclusive access > to the

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
al Message- > > From: Paul Libbrecht [mailto:p...@hoplahup.net] > > Sent: Tuesday, October 02, 2012 2:45 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene index on NFS > > > > I doubt NFS is an unreliable file-system. > > Lucene uses normal

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
The setup is I have a home-grown server process that has exclusive access to the index files. All reads and writes are done through this server. No other process is reading the same index files whether it's local or over NFS. /Jong On Tue, Oct 2, 2012 at 8:56 AM, Ian Lea wrote: > I agree that rel

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
My Lucene index is accessed by multiple threads in a single process. /Jong On Tue, Oct 2, 2012 at 8:45 AM, Paul Libbrecht wrote: > I doubt NFS is an unreliable file-system. > Lucene uses normal random access to files and this has no reason to be > unreliable unless bad things such as network dr

RE: Lucene index on NFS

2012-10-02 Thread Uwe Schindler
> Sent: Tuesday, October 02, 2012 2:45 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene index on NFS > > I doubt NFS is an unreliable file-system. > Lucene uses normal random access to files and this has no reason to be > unreliable unless bad things such as network drops

Re: Lucene index on NFS

2012-10-02 Thread Ian Lea
I agree that reliability/corruption is not an issue. I would also put it that performance is likely to suffer, but that's not certain. A fast disk mounted over NFS can be quicker than a slow local disk. And how much do you care about performance? Maybe it would be fast enough over NFS to make t

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht
I doubt NFS is an unreliable file-system. Lucene uses normal random access to files and this has no reason to be unreliable unless bad things such as network drops happen (in which case you'd get direct failures or timeouts rather than corruption). I've seen fairly large infrastructures being b

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
Thank you all for reply. So it soudns like it is a known fact that the performance would suffer rather significantly when the index files are accessed over NFS. But how about reliability and robustness (which seems even more important)? Isn't there any increased possibility for intermittent errors

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht
My experience in the Lucene 1.x times were a factor of at least four in writing to NFS and about two when reading from there. I'd discourage this as much as possible! (rsync is way more your friend for transporting and replication à la solr should also be considered) paul Le 2 oct. 2012 à 11

Re: Lucene index on NFS

2012-10-02 Thread Ian Lea
You'll certainly need to factor in the performance of NFS versus local disks. My experience is that smallish low activity indexes work just fine on NFS, but large high activity indexes are not so good, particularly if you have a lot of modifications to the index. You may want to install a custom

Re: Lucene index on NFS

2012-10-01 Thread Vitaly Funstein
How tolerant is your project of decreased search and indexing performance? You could probably write a simple test that compares search and write speeds of local and NFS-mounted indexes and make the decision based on the results. On Mon, Oct 1, 2012 at 3:06 PM, Jong Kim wrote: > Hi, > > According

Re: Lucene Index File Format

2012-10-01 Thread Michael McCandless
See the javadocs for each part of the Lucene40Codec: each class details its format. Mike McCandless http://blog.mikemccandless.com On Mon, Oct 1, 2012 at 1:16 AM, Selvakumar wrote: > Hi Pranab Kumar, > > I'm not looking for reading the documents through IndexReader. > I just want to know how do

Re: Lucene Index File Format

2012-09-30 Thread Selvakumar
Hi Pranab Kumar, I'm not looking for reading the documents through IndexReader. I just want to know how does lucene persists its data in the index. I just want to learn about the metadata and the meta-objects of lucene index. On 10/1/2012 10:44 AM, parnab kumar wrote: Hi, U

Re: Lucene Index File Format

2012-09-30 Thread parnab kumar
Hi, Use IndexReader instead . You can loop through the index and read one document at a time . Thanks, Parnab On Mon, Oct 1, 2012 at 10:33 AM, Selvakumar wrote: > Hi, > > I'm new to Lucene and I reading the docs on Lucene. > > > I read through the Lucene Index File Format, so to e

Re: Lucene Index backward compatibility related question

2012-08-28 Thread Michael McCandless
Small correction here: you are able to "write" to a 3.x index using 4.0. It's just that the newly created segments will be written using the Lucene4x codec. Mike McCandless http://blog.mikemccandless.com On Mon, Aug 27, 2012 at 2:35 PM, Stephen Howe wrote: > Paul, > > So long as you are not w

Re: Lucene Index backward compatibility related question

2012-08-27 Thread Stephen Howe
Paul, So long as you are not writing to the 3.x index, it appears Lucene 4.0 can read the indexes in a read only format. See Mike McCandle's blog post http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html about the alpha release codecs. As to whether or not that codec allows yo

Re: Lucene Index backward compatibility related question

2012-08-27 Thread Jack Krupansky
Technically, you should be able to use both 3.0 and 4.0 indexes in the same app, but a recent inquiry here indicated some unresolved problem. Here's the official statement from CHANGES.txt for 4.0-BETA: " - On upgrading to 4.0, if you do not fully reindex your documents, Lucene will emulate

Re: Lucene index inside of a web app?

2011-12-05 Thread KARTHIK SHIVAKUMAR
Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > http://www.thetaphi.de > > > > eMail: u...@thetaphi.de > > > > > > > > > > > > > -Original Message- > > > > > From: Ian Lea [mailto:ian@gmail.com

Re: Lucene index inside of a web app?

2011-12-02 Thread okayndc
; ServletContext.getRealPath("/WEB-INF/data/myIndexName"); > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > &g

Re: Lucene index inside of a web app?

2011-12-01 Thread KARTHIK SHIVAKUMAR
> > > > > > > -Original Message- > > > From: Ian Lea [mailto:ian....@gmail.com] > > > Sent: Monday, November 28, 2011 6:11 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: Lucene index inside of a web app? > > > >

Re: Lucene index inside of a web app?

2011-11-28 Thread okayndc
men > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Ian Lea [mailto:ian@gmail.com] > > Sent: Monday, November 28, 2011 6:11 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene index inside of a

RE: Lucene index inside of a web app?

2011-11-28 Thread Uwe Schindler
Lea [mailto:ian@gmail.com] > Sent: Monday, November 28, 2011 6:11 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene index inside of a web app? > > Using a static string is fine - it just wasn't clear from your original post what it > was. > > I usually use a full p

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea
Using a static string is fine - it just wasn't clear from your original post what it was. I usually use a full path read from a properties file so that I can change it without a recompile, have different settings on test/live/whatever systems, etc. Works for me, but isn't the only way to do it.

Re: Lucene index inside of a web app?

2011-11-28 Thread okayndc
Hi, Thanks for your response. Yes, LUCENE_INDEX_DIRECTORY is a static string which contains the file system path of the index (for example, c:\\index). Is this good practice? If not, what should the full path to an index look like? Thanks On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea wrote: > W

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea
What is LUCENE_INDEX_DIRECTORY? Some static string in your app? Lucene knows nothing about your app, JSP, or what app server you are using. It requires a file system path and it is up to you to provide that. I always use a full path since I prefer to store indexes outside the app and it avoids

Re: Lucene index limit

2011-03-25 Thread pulkitsinghal
Hello Uwe, Thank you for the reply! With your suggestion I looked deeper into my code to find that the services handing me the data had been updated to set a limit. So it wasn't a lucene issue at all. Sent from my iPhone On Mar 24, 2011, at 6:21 PM, "Uwe Schindler" wrote: > Are you sure that

RE: Lucene index limit

2011-03-24 Thread Uwe Schindler
Are you sure that you not forgot to commit your changes? Maybe that's the reason you see only 32768 documents. There is no such low limit, the number of documents is limited by Integer.MAX_VALUE, number of terms is much higher... - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://ww

Re: Lucene index

2010-12-29 Thread Ian Lea
You should also make sure that it is lucene that is taking the time. You don't say where your data is coming from but it is often slower to read the the source data rather than to index it with lucene. See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed -- Ian. On Wed, Dec 29, 201

Re: Lucene index

2010-12-29 Thread Anshum
Lucene intermittently takes longer as 1. It flushes the buffered docs from memory to the disk and 2. It merges the smaller index segments to form a larger segment on regular intervals as per the index writer settings. You may have a look at various IndexWriter params in the javadoc on the lucene pa

Re: Lucene index exchange format?

2010-11-09 Thread Grant Ingersoll
You can do this in trunk right now using the Codec capability. In fact, there is a text version already, but it is likely to be really slow on anything significant. You could likely produce something that is faster but still readable. On Nov 9, 2010, at 5:46 AM, Paul Libbrecht wrote: > hello

Re: Lucene index update

2010-10-27 Thread Nilesh Vijaywargiay
One major reason is to update a field or rather shadow a field. i have a field named "testField" in index1 and i want to update that field. When I update, I want only the new value to be reflected, not the value in old field. now parallelreader starts from the latest index, i.e index2 and searches

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal
But why do you feel the need to have a parallel reader that combines result sets across two indices based on docId? On Thu, Oct 28, 2010 at 12:17 AM, Nilesh Vijaywargiay < nilesh.vi...@gmail.com> wrote: > Pulkit, > Parallel reader takes the union of all fields for a given id. Thus if I > want > t

Re: Lucene index update

2010-10-27 Thread Nilesh Vijaywargiay
Pulkit, Parallel reader takes the union of all fields for a given id. Thus if I want to add a field or modify a field of a document which has id 2 in index1, I need to createa a document with id 2 in index2 with the fields I want to add/modify. Thus parallel reader would treat them as fields of a s

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal
Look interesting, what is the merit in having a second index in order to keep the document id the same? Perhaps I have misunderstood. Just want to understand your motivation here. On Wed, Oct 20, 2010 at 2:57 PM, Nilesh Vijaywargiay wrote: > I've written a blog regarding a work around for updati

Re: Lucene Index Vs Database Index

2010-07-27 Thread Ian Lea
http://lucene.apache.org/java/3_0_2/fileformats.html#Inverted%20Indexing -- Ian. On Tue, Jul 27, 2010 at 3:22 AM, shravan wrote: > > Hi, > > Can any one clarify me difference between lucene index and database index? > > I am just trying to understand how lucene stores index, like databases store

Re: lucene index file randomly crash and need to reindex

2010-01-13 Thread Michael McCandless
If you follow the rules Otis listed, you should never hit index corruption, unless something is wrong with your hardware. Or, if you hit an as-yet-undiscovered bug in Lucene ;) Mike On Wed, Jan 13, 2010 at 1:11 AM, zhang99 wrote: > > what is the longest time you ever keep index file without req

Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread zhang99
what is the longest time you ever keep index file without required to reindex. i notice even big open source life liferay suffer from this. thanks for the tips -- View this message in context: http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139613.html Se

Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread Otis Gospodnetic
Hi, Use the latest version of Lucene, obey Lucene's locks, write with 1 IndexWriter, avoid NFS... Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: zhang99 > To: java-user@lucene.apache.org > Sent: Tue, January 12, 2010 10:41:19 PM > Subje

Re: Lucene Index size v/s available memory

2009-12-23 Thread Erick Erickson
The size of your index isn't a very useful number without knowing a significant amount about the structure of your index. Depending upon what's stored, what's indexed and what kind of searching you're doing (e.g. sorting?) it varies. About all we can say is that you'll probably need less than 100G.

Re: Lucene Index size v/s available memory

2009-12-23 Thread Ian Lea
Hi 24Gb RAM for a 100Gb index is likely to be plenty. You don't have a huge amount of control over what lucene loads in memory, but take a look at termInfosIndexDivisor in IndexReader. And I believe that omitting field norms (Field.setOmitNorms) may help too. Googling for "lucene memory usage"

Re: Lucene index write performance optimization

2009-11-10 Thread Otis Gospodnetic
This is what we have in Lucene in Action 2: ~/lia2$ ff \*Thread\*java ./src/lia/admin/CreateThreadedIndexTask.java ./src/lia/admin/ThreadedIndexWriter.java Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR -

Re: Lucene index write performance optimization

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 11:43 AM, Jamie Band wrote: > As an aside note, is there any way for Lucene to support simultaneous writes > to an index? The indexing process is highly parallelized... just use multiple threads to add documents to the same IndexWriter. -Yonik http://www.lucidimagination.

Re: Lucene index write performance optimization

2009-11-10 Thread Glen Newton
You might try re-implementing, using ThreadPoolExecutor http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html glen 2009/11/10 Jamie Band : > Hi There > > Our app spends alot of time waiting for Lucene to finish writing to the > index. I'd like to minimize this. If y

Re: Lucene index question

2009-08-22 Thread marquinhocb
Hi Anshum, Thanks so much for the quick response! I think that pretty much covers it. I was worried that having to delete the document and re-add it simply because a date field has been updated would make my indexing quite slow. Seems however that's not something I'll have to worry about. Than

Re: Lucene index question

2009-08-21 Thread Anshum
Hi Marquin, So you have a field that you want to sort on, well thats pretty much a straight task in lucene. Sort sort = new Sort(); sort.setSort(, true/false); http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Sort.html * It would not really be taxing and frequent updates could b

Re: Lucene Index Encryption

2009-05-11 Thread Michael McCandless
On Mon, May 11, 2009 at 2:06 PM, Babak Farhang wrote: > I am not familiar with the details of CFS, but I didn't interpret > Michael's comment to mean that there is actually any rewriting going > on here. The problem here appears to be one of translating the > encrypted/compressed file position to

Re: Lucene Index Encryption

2009-05-11 Thread Babak Farhang
On Mon, May 11, 2009 at 12:19 AM, Andrzej Bialecki wrote: > > Unfortunately, current Lucene IndexWriter implementation uses seek / > overwrite when writing term info dictionary. This is described in more > detail here: > > https://issues.apache.org/jira/browse/LUCENE-532 > Thanks for the enlight

  1   2   >