On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote:
> anyone using ramdisks for storage? there is ramsam and there is also fusion
> io. but they are kinda expensive. any other alternatives I wonder?
We've done some comparisons of RAM (Lucene RAMDirectory) vs. Flash-SSD
vs. conventional harddrives.
叶双明 schrieb:
Agree with Michael McCandless!! By that way,it is handling gracefully.
thanks for your hints. both of you :)
will try how you suggested.
simon
2008/9/4 Michael McCandless <[EMAIL PROTECTED]>
If you're on Windows, the safest way to do this in general, if there is any
possi
Do you use index at the slave as a backup for index at the master??
And in case the master break down, you can turn the query to the slave??
When add a Document to master, also add it to the slave?
Sorry, I don't clear about what your problem, can you show more detail about
what do you worry abou
> On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote:
> > anyone using ramdisks for storage? there is ramsam and there is also
> fusion
> > io. but they are kinda expensive. any other alternatives I wonder?
>
> We've done some comparisons of RAM (Lucene RAMDirectory) vs. Flash-SSD
> vs. conventional
On Fri, 2008-09-05 at 10:33 +0200, Cam Bazz wrote:
[RAM vs. Flash-SSD vs. harddrives]
> I have done similar test with ram vs. disk, and IO was the bottleneck.
> What flash ssd did you try with?
For disks (as in conventional 10.000/15.000 RPM harddrives), IO is
clearly the bottleneck for us also.
Let me try to explain.
I have a master where indexing is done. I have multiple slaves for querying.
If I commit+optimize on the master and then rsync the index, the data
transferred on the network is huge. An alternate way is to commit on master,
transfer the delta to the slave and issue an optim
IndexWriter.{set,get}MaxMergeDocs isn't deprecated, but it is a
convenience method for the corresponding calls on the MergePolicy.
Sorry, that javadoc is now false -- we decided that check (2nd point
in the javadoc) was overly pedantic so it was removed (this was
LUCENE-1254), but I forgo
Shalin Shekhar Mangar wrote:
Let me try to explain.
I have a master where indexing is done. I have multiple slaves for
querying.
If I commit+optimize on the master and then rsync the index, the data
transferred on the network is huge. An alternate way is to commit on
master,
transfer the
In Ocean I had to use a transaction log and execute everything that
way like SQL database replication. Then let each node handle it's own
merging process. Syncing the indexes is used to get a new node up to
speed, otherwise it's avoided for the reasons mentioned in the
previous email.
On Fri, Se
On Fri, 2008-09-05 at 11:00 +0200, Toke Eskildsen wrote:
> As for Flash-SSDs, we've tried 2 * MTRON 6000 32GB RAID 0, 2 * SanDisk
> 5000 32GB RAID 0 and SanDisk something (64GB model) both as single drive
> and 4 drives in RAID 0.
Update:
The "SanDisk something" turned out to be a Samsung MCCOE64
I understand your point, I did not say it was a Lucene problem but was
rather checking if I my intended design was correct... basically not.
Since I thought that I would first break my stream in token to do my special
filter, I thought I could do it in one step...
Interesting if you are not going
On Fri, Sep 5, 2008 at 6:20 PM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> In Ocean I had to use a transaction log and execute everything that
> way like SQL database replication. Then let each node handle it's own
> merging process. Syncing the indexes is used to get a new node up to
> speed,
There is more and more complex, actually I hava a small index system can
config multiple index server for query,
In my opinion, because index update operating is synchronized between
different Thread that update the index, so
for indexing new data : can process data that want to index at the ma
On Fri, Sep 5, 2008 at 6:03 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> Large segment merges will also send huge traffic. You may just want to
> send all updates (document adds/deletes) to all slaves directly? It'd be
> nice if you could somehow NOT sync the effects of segment merging
I've been tracking this list for a year or more, and this is the
first I've ever heard of such a thing. Which leads me to wonder
what *else* changed besides your index size. Classpath?
jar files? Some sysadmin modified your search box? Is the
program throwing an exception that you're masking somewh
Just think about the cost of indexing that many documents on each
slave . It may slow down the responses from live slaves.
I think there must be something like search service at the slaves incude a
IndexSearcher or other equals object, and indexing that many documents by a
IndexWriter , isn't the
IndexWriter.setRAMBufferSizeMB() Determines the amount of RAM that may be
used for buffering added documents before they are flushed as a new Segment.
Does it related to IndexSearcher?
And IndexSearcher hasn't setRAMBufferSizeMB() method, mean we can't control
the amount of RAM that may be used fo
Paul Elschot wrote:
Op Thursday 04 September 2008 20:39:13 schreef Mark Miller:
Sounds like its more in line with what you are looking for. If I
remember correctly, the phrase query factors in the edit distance in
scoring, but the NearSpanQuery will just use the combined idf for
each of the t
SpanScorer will use the similarity slop factor for each matching
span size to adjust the effective frequency.
Regards,
Paul Elschot
You have pointed this out to me before. One day I will remember
Every time I look things over again I miss it, and I couldn't find that
email in the archive
If I don't keep the IndexSearcher as a Singleton and instead open and close a
new one each time, I have a large memory leak (probably due to the large
queries I am doing). After watching the memory a while, I still believe I
have a small memory leak even when the Directory, Analyzer, and
IndexSear
Are you using RAMDirectory?
I am actually also dealing with a memory leak. My case is only particular to
RAMDirectory.
http://markmail.org/message/dfgcnnjglne3wynp
However, this RAMDirectory case is not as simple as setting searcher=null,
because I found some reference to RAMDirectory is held by
Shalin Shekhar Mangar wrote:
On Fri, Sep 5, 2008 at 6:03 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
Large segment merges will also send huge traffic. You may just
want to
send all updates (document adds/deletes) to all slaves directly?
It'd be
nice if you could somehow NOT sync
On Fri, Sep 5, 2008 at 9:52 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> Well this is certainly a nice challenging problem :)
Yes it is :-)
I think this could be a generally useful feature?
>
> So you're thinking IndexWriter.commit() would take an optional opaque
> argument (maybe a S
No, I am using FSDirectory. Unfortunately, my indexes are over 2 GB in size
and I don't have a server that has that much free memory just for the
indexes.
If you figure out anything, let me know just in case it helps my case as
well. Thanks.
chrislusf wrote:
>
> Are you using RAMDirectory?
>
: Interesting if you are not going to use an analyser... what then ? I'm
: thinking of using javacc, because I oversimplified somewhat the 3 field
: string structure, so I need a kind of small grammar for that.
Well, the specifics of "what else" is in your files is going to be the
biggest factor
Op Friday 05 September 2008 16:57:34 schreef Mark Miller:
> Paul Elschot wrote:
> > Op Thursday 04 September 2008 20:39:13 schreef Mark Miller:
> >> Sounds like its more in line with what you are looking for. If I
> >> remember correctly, the phrase query factors in the edit distance
> >> in scorin
I think this could be a generally useful feature?
+1. I could definitely use a "commitUserData" option for the same reasons.
Thinking more on this, we may not need to modify the index format at all for
this use-case. This is easily achieved in the current system by adding a
dummy document
I think I'm getting you. But the files I'm going to parse have many formats
: PDF, HTML, Word.
they don't have a particular structure, memos if you will. But the ones I'm
interested in will have the triplets I described
Yes building a TokenFilter as you suggest should do the job.
I guess my initi
Hi Folks,
I have somewhat complex scoring/boosting requirement.
Say I have 3 text fields A, B, C and a Numeric field called D.
Say My query is "testrank".
Scoring should be based on following:
Query matches
1. text fields A, B and C, & Highest value of D (highest boost/rank)
2. A and B, & Highe
I am looking for an example if anyone has done any custom scoring with
Lucene.
I need to implement a Query similar to DisjunctionMaxQuery, the only
difference would
be it should score based on sum of score of sub queries' scores instead of
max.
Any custom scoring example will help.
(On one hand,
I'm not an expert, so please take this with a grain of salt, but if
you return the Hits object, you are inadvertently "holding on" to
that IndexSearcher, right?
According to the FAQ (http://wiki.apache.org/lucene-java/
ImproveSearchingSpeed), iterating over all Hits will result in
addition
In my opinion, do no need to close the Directory, and keep all Directory and
all IndexSearcher open.
return ivIndexSearcher.search(query, sortOrder); ( I think) is also return
the hits getted frmo IndexSearcher, so it is iterate over the first N, no
problem.
In addition, how much index Directory
32 matches
Mail list logo