Ok nevermind actually - the simultaneous indexing was something done in zoie
1.3,
and was changed in 1.4 to addIndexesNoOptimize() on the RAMDirectory indexes
as soon as they are big enough.
It's still true that you can throw away the RAMDirectory once the disk index
is
reopened though.
-jake
Hey Eric,
One clarification before letting the rest of this discussion sneak over to
the zoie list:
On Sun, Oct 11, 2009 at 1:51 PM, Angel, Eric wrote:
* Am I wrong to assume that the RAMDir holds the entire index - just as the
> FSDir? Or does RAMDir only hold a portion of the index that ha
t;> * I see that there are plans to have Zoie use Lucene 2.9. How long would
>> you say before it's available?
>>
>> Thanks,
>>
>> E
>>
>> -Original Message-
>> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
>> Sent:
E
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Sat 10/10/2009 12:16 PM
> To: java-user@lucene.apache.org
> Subject: Re: Realtime & distributed
>
> John,
>
> Actually everyone is entitled to their technical opinion a
ata is lost?
* I see that there are plans to have Zoie use Lucene 2.9. How long would you
say before it's available?
Thanks,
E
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Sat 10/10/2009 12:16 PM
To: java-user@lucene.apache.org
Subject: R
John,
Actually everyone is entitled to their technical opinion and
none of the comments were misleading. Jake and yourself
validated that they are true in your comments. I'm simply trying
to create better technology as is everyone on here. The process
takes time and coordination between many parti
Hi Mike,
Zoie itself doesn't do anything with the new with the distributed
side of things - it just plays nicely with it. Zoie, at its core,
exposes a couple of primary interfaces (well, this is a slightly
simplified form of them) :
interface IndexReaderFactory { List getIndexReaders(); },
Hi Jake,
Zoie looks like a a really cool project. I'd like to learn more about
the distributed part of the setup. Any way you could describe that
here or on the wiki?
-Mike
On Thu, Oct 8, 2009 at 9:24 PM, Jake Mannix wrote:
> On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote:
>
>>
>> Does anyo
My deepest apologies for the spam, everyone. I slipped on my G-mail button :)
On Fri, Oct 9, 2009 at 9:09 PM, Bradford Stephens
wrote:
> Hey Eric,
>
> My consulting company specializes in scalable, real-time search with
> distributed Lucene. I'm more than happy to chat, if you'd like! :)
>
> Chee
Hey Eric,
My consulting company specializes in scalable, real-time search with
distributed Lucene. I'm more than happy to chat, if you'd like! :)
Cheers,
Bradford
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote:
>
> Does anyone have any recommendations? I've looked at Katta, but it doesn't
>
I can provide some preliminary numbers (we will need to do some detailed
analysis and post it somewhere):
Dataset: medline
starting index: empty.
add only, no update, for 30 min.
maximum indexing load, 1000 docs/ sec
Under stress, we take indexing events (add only) and stream into both
systems: Z
The dimensions sound good. It's unclear if you're going to post a
chart again, numbers, or code? There's a LUCENE-1577 Jira issue for
code.
On Fri, Oct 9, 2009 at 12:37 PM, Jake Mannix wrote:
> Jason,
>
> We've been running some perf/load/stress tests lately, but on a suggestion
>
> from Ted D
Jason,
We've been running some perf/load/stress tests lately, but on a suggestion
from Ted Dunning, I've been trying to come up with a more "realistic" set of
stress
tests and indexing rates to see where NRT performs well and where it does
not,
instead of just indexing at maximum rate, looping
Jake and John,
It would be interesting and enlightening to see NRT performance
numbers in a variety of configurations. The best way to go about
this is to post benchmarks that others may run in their
environment which can then be tweaked for their unique edge
cases. I wish I had more time to work
Jason:
I would really appreciate it if you would stop making false
statements and misinformation. Everyone is entitled to his/her opinions on
technologies, but deliberately making misleading and false information on
such a distribution is just unethical, and you'll end up just discrediting
On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen wrote:
> There is the Zoie system which uses the RAMDir
> solution,
>
Also, to clarify: zoie does not index into a RAMDir and then periodically
merge that
down to disk, as for one thing, this has a bad failure mode when the system
crashes,
as you
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote:
>
> Does anyone have any recommendations? I've looked at Katta, but it doesn't
> seem to support realtime searching. It also uses hdfs, which I've heard can
> be slow. I'm looking to serve 40gb of indexes and support about 1 million
> updates
Jason,
On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen wrote:
> Today near realtime search (with or without SSDs) comes at a
> price, that is reduced indexing speed due to continued in RAM
> merging. People typically hack something together where indexes
> are held in a RAMDir until being flush
Eric,
Katta doesn't require HDFS which would be slow to search on,
though Katta can be used to copy indexes out of HDFS onto local
servers. The best bet is hardware that uses SSDs because merges
and update latency will greatly decrease and there won't be a
synchronous IO issue as there is with har
Does anyone have any recommendations? I've looked at Katta, but it
doesn't seem to support realtime searching. It also uses hdfs, which
I've heard can be slow. I'm looking to serve 40gb of indexes and
support about 1 million updates per day.
Thx
---
20 matches
Mail list logo