Re: ramdisks

2008-09-05 Thread Toke Eskildsen
On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote: > anyone using ramdisks for storage? there is ramsam and there is also fusion > io. but they are kinda expensive. any other alternatives I wonder? We've done some comparisons of RAM (Lucene RAMDirectory) vs. Flash-SSD vs. conventional harddrives.

Re: delete/reset the index

2008-09-05 Thread simon litwan
叶双明 schrieb: Agree with Michael McCandless!! By that way,it is handling gracefully. thanks for your hints. both of you :) will try how you suggested. simon 2008/9/4 Michael McCandless <[EMAIL PROTECTED]> If you're on Windows, the safest way to do this in general, if there is any possi

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread 叶双明
Do you use index at the slave as a backup for index at the master?? And in case the master break down, you can turn the query to the slave?? When add a Document to master, also add it to the slave? Sorry, I don't clear about what your problem, can you show more detail about what do you worry abou

Re: ramdisks

2008-09-05 Thread Cam Bazz
> On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote: > > anyone using ramdisks for storage? there is ramsam and there is also > fusion > > io. but they are kinda expensive. any other alternatives I wonder? > > We've done some comparisons of RAM (Lucene RAMDirectory) vs. Flash-SSD > vs. conventional

Re: ramdisks

2008-09-05 Thread Toke Eskildsen
On Fri, 2008-09-05 at 10:33 +0200, Cam Bazz wrote: [RAM vs. Flash-SSD vs. harddrives] > I have done similar test with ram vs. disk, and IO was the bottleneck. > What flash ssd did you try with? For disks (as in conventional 10.000/15.000 RPM harddrives), IO is clearly the bottleneck for us also.

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Shalin Shekhar Mangar
Let me try to explain. I have a master where indexing is done. I have multiple slaves for querying. If I commit+optimize on the master and then rsync the index, the data transferred on the network is huge. An alternate way is to commit on master, transfer the delta to the slave and issue an optim

Re: Javadoc wording in IndexWriter.addIndexesNoOptimize()

2008-09-05 Thread Michael McCandless
IndexWriter.{set,get}MaxMergeDocs isn't deprecated, but it is a convenience method for the corresponding calls on the MergePolicy. Sorry, that javadoc is now false -- we decided that check (2nd point in the javadoc) was overly pedantic so it was removed (this was LUCENE-1254), but I forgo

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Michael McCandless
Shalin Shekhar Mangar wrote: Let me try to explain. I have a master where indexing is done. I have multiple slaves for querying. If I commit+optimize on the master and then rsync the index, the data transferred on the network is huge. An alternate way is to commit on master, transfer the

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Jason Rutherglen
In Ocean I had to use a transaction log and execute everything that way like SQL database replication. Then let each node handle it's own merging process. Syncing the indexes is used to get a new node up to speed, otherwise it's avoided for the reasons mentioned in the previous email. On Fri, Se

Re: ramdisks

2008-09-05 Thread Toke Eskildsen
On Fri, 2008-09-05 at 11:00 +0200, Toke Eskildsen wrote: > As for Flash-SSDs, we've tried 2 * MTRON 6000 32GB RAID 0, 2 * SanDisk > 5000 32GB RAID 0 and SanDisk something (64GB model) both as single drive > and 4 drives in RAID 0. Update: The "SanDisk something" turned out to be a Samsung MCCOE64

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I understand your point, I did not say it was a Lucene problem but was rather checking if I my intended design was correct... basically not. Since I thought that I would first break my stream in token to do my special filter, I thought I could do it in one step... Interesting if you are not going

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Sep 5, 2008 at 6:20 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > In Ocean I had to use a transaction log and execute everything that > way like SQL database replication. Then let each node handle it's own > merging process. Syncing the indexes is used to get a new node up to > speed,

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread 叶双明
There is more and more complex, actually I hava a small index system can config multiple index server for query, In my opinion, because index update operating is synchronized between different Thread that update the index, so for indexing new data : can process data that want to index at the ma

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Shalin Shekhar Mangar
On Fri, Sep 5, 2008 at 6:03 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Large segment merges will also send huge traffic. You may just want to > send all updates (document adds/deletes) to all slaves directly? It'd be > nice if you could somehow NOT sync the effects of segment merging

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

2008-09-05 Thread Erick Erickson
I've been tracking this list for a year or more, and this is the first I've ever heard of such a thing. Which leads me to wonder what *else* changed besides your index size. Classpath? jar files? Some sysadmin modified your search box? Is the program throwing an exception that you're masking somewh

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread 叶双明
Just think about the cost of indexing that many documents on each slave . It may slow down the responses from live slaves. I think there must be something like search service at the slaves incude a IndexSearcher or other equals object, and indexing that many documents by a IndexWriter , isn't the

Re: lucene ram buffering

2008-09-05 Thread 叶双明
IndexWriter.setRAMBufferSizeMB() Determines the amount of RAM that may be used for buffering added documents before they are flushed as a new Segment. Does it related to IndexSearcher? And IndexSearcher hasn't setRAMBufferSizeMB() method, mean we can't control the amount of RAM that may be used fo

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Mark Miller
Paul Elschot wrote: Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: Sounds like its more in line with what you are looking for. If I remember correctly, the phrase query factors in the edit distance in scoring, but the NearSpanQuery will just use the combined idf for each of the t

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Mark Miller
SpanScorer will use the similarity slop factor for each matching span size to adjust the effective frequency. Regards, Paul Elschot You have pointed this out to me before. One day I will remember Every time I look things over again I miss it, and I couldn't find that email in the archive

Re: Lucene Memory Leak

2008-09-05 Thread Andy33
If I don't keep the IndexSearcher as a Singleton and instead open and close a new one each time, I have a large memory leak (probably due to the large queries I am doing). After watching the memory a while, I still believe I have a small memory leak even when the Directory, Analyzer, and IndexSear

Re: Lucene Memory Leak

2008-09-05 Thread Chris Lu
Are you using RAMDirectory? I am actually also dealing with a memory leak. My case is only particular to RAMDirectory. http://markmail.org/message/dfgcnnjglne3wynp However, this RAMDirectory case is not as simple as setting searcher=null, because I found some reference to RAMDirectory is held by

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Michael McCandless
Shalin Shekhar Mangar wrote: On Fri, Sep 5, 2008 at 6:03 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Large segment merges will also send huge traffic. You may just want to send all updates (document adds/deletes) to all slaves directly? It'd be nice if you could somehow NOT sync

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread Shalin Shekhar Mangar
On Fri, Sep 5, 2008 at 9:52 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Well this is certainly a nice challenging problem :) Yes it is :-) I think this could be a generally useful feature? > > So you're thinking IndexWriter.commit() would take an optional opaque > argument (maybe a S

Re: Lucene Memory Leak

2008-09-05 Thread Andy33
No, I am using FSDirectory. Unfortunately, my indexes are over 2 GB in size and I don't have a server that has that much free memory just for the indexes. If you figure out anything, let me know just in case it helps my case as well. Thanks. chrislusf wrote: > > Are you using RAMDirectory? >

Re: Beginner: Specific indexing

2008-09-05 Thread Chris Hostetter
: Interesting if you are not going to use an analyser... what then ? I'm : thinking of using javacc, because I oversimplified somewhat the 3 field : string structure, so I need a kind of small grammar for that. Well, the specifics of "what else" is in your files is going to be the biggest factor

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Paul Elschot
Op Friday 05 September 2008 16:57:34 schreef Mark Miller: > Paul Elschot wrote: > > Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: > >> Sounds like its more in line with what you are looking for. If I > >> remember correctly, the phrase query factors in the edit distance > >> in scorin

Re: How can we know if 2 lucene indexes are same?

2008-09-05 Thread markharw00d
I think this could be a generally useful feature? +1. I could definitely use a "commitUserData" option for the same reasons. Thinking more on this, we may not need to modify the index format at all for this use-case. This is easily achieved in the current system by adding a dummy document

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I think I'm getting you. But the files I'm going to parse have many formats : PDF, HTML, Word. they don't have a particular structure, memos if you will. But the ones I'm interested in will have the triplets I described Yes building a TokenFilter as you suggest should do the job. I guess my initi

Somewhat complex scoring/boosting

2008-09-05 Thread Ravindra Sharma
Hi Folks, I have somewhat complex scoring/boosting requirement. Say I have 3 text fields A, B, C and a Numeric field called D. Say My query is "testrank". Scoring should be based on following: Query matches 1. text fields A, B and C, & Highest value of D (highest boost/rank) 2. A and B, & Highe

Custom scoring example ...

2008-09-05 Thread Ravindra Sharma
I am looking for an example if anyone has done any custom scoring with Lucene. I need to implement a Query similar to DisjunctionMaxQuery, the only difference would be it should score based on sum of score of sub queries' scores instead of max. Any custom scoring example will help. (On one hand,

Re: Lucene Memory Leak

2008-09-05 Thread N. Hira
I'm not an expert, so please take this with a grain of salt, but if you return the Hits object, you are inadvertently "holding on" to that IndexSearcher, right? According to the FAQ (http://wiki.apache.org/lucene-java/ ImproveSearchingSpeed), iterating over all Hits will result in addition

Re: Lucene Memory Leak

2008-09-05 Thread 叶双明
In my opinion, do no need to close the Directory, and keep all Directory and all IndexSearcher open. return ivIndexSearcher.search(query, sortOrder); ( I think) is also return the hits getted frmo IndexSearcher, so it is iterate over the first N, no problem. In addition, how much index Directory