Re: Realtime & distributed

2009-10-11 Thread Jake Mannix
Ok nevermind actually - the simultaneous indexing was something done in zoie 1.3, and was changed in 1.4 to addIndexesNoOptimize() on the RAMDirectory indexes as soon as they are big enough. It's still true that you can throw away the RAMDirectory once the disk index is reopened though. -jake

Re: Realtime & distributed

2009-10-11 Thread Jake Mannix
Hey Eric, One clarification before letting the rest of this discussion sneak over to the zoie list: On Sun, Oct 11, 2009 at 1:51 PM, Angel, Eric wrote: * Am I wrong to assume that the RAMDir holds the entire index - just as the > FSDir? Or does RAMDir only hold a portion of the index that ha

Re: Realtime & distributed

2009-10-11 Thread John Wang
t;> * I see that there are plans to have Zoie use Lucene 2.9. How long would >> you say before it's available? >> >> Thanks, >> >> E >> >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >> Sent:

Re: Realtime & distributed

2009-10-11 Thread John Wang
E > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Sat 10/10/2009 12:16 PM > To: java-user@lucene.apache.org > Subject: Re: Realtime & distributed > > John, > > Actually everyone is entitled to their technical opinion a

RE: Realtime & distributed

2009-10-11 Thread Angel, Eric
ata is lost? * I see that there are plans to have Zoie use Lucene 2.9. How long would you say before it's available? Thanks, E -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Sat 10/10/2009 12:16 PM To: java-user@lucene.apache.org Subject: R

Re: Realtime & distributed

2009-10-10 Thread Jason Rutherglen
John, Actually everyone is entitled to their technical opinion and none of the comments were misleading. Jake and yourself validated that they are true in your comments. I'm simply trying to create better technology as is everyone on here. The process takes time and coordination between many parti

Re: Realtime & distributed

2009-10-09 Thread Jake Mannix
Hi Mike, Zoie itself doesn't do anything with the new with the distributed side of things - it just plays nicely with it. Zoie, at its core, exposes a couple of primary interfaces (well, this is a slightly simplified form of them) : interface IndexReaderFactory { List getIndexReaders(); },

Re: Realtime & distributed

2009-10-09 Thread Michael Masters
Hi Jake, Zoie looks like a a really cool project. I'd like to learn more about the distributed part of the setup. Any way you could describe that here or on the wiki? -Mike On Thu, Oct 8, 2009 at 9:24 PM, Jake Mannix wrote: > On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote: > >> >> Does anyo

Re: Realtime & distributed

2009-10-09 Thread Bradford Stephens
My deepest apologies for the spam, everyone. I slipped on my G-mail button :) On Fri, Oct 9, 2009 at 9:09 PM, Bradford Stephens wrote: > Hey Eric, > > My consulting company specializes in scalable, real-time search with > distributed Lucene. I'm more than happy to chat, if you'd like! :) > > Chee

Re: Realtime & distributed

2009-10-09 Thread Bradford Stephens
Hey Eric, My consulting company specializes in scalable, real-time search with distributed Lucene. I'm more than happy to chat, if you'd like! :) Cheers, Bradford On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote: > > Does anyone have any recommendations?  I've looked at Katta, but it doesn't >

Re: Realtime & distributed

2009-10-09 Thread John Wang
I can provide some preliminary numbers (we will need to do some detailed analysis and post it somewhere): Dataset: medline starting index: empty. add only, no update, for 30 min. maximum indexing load, 1000 docs/ sec Under stress, we take indexing events (add only) and stream into both systems: Z

Re: Realtime & distributed

2009-10-09 Thread Jason Rutherglen
The dimensions sound good. It's unclear if you're going to post a chart again, numbers, or code? There's a LUCENE-1577 Jira issue for code. On Fri, Oct 9, 2009 at 12:37 PM, Jake Mannix wrote: > Jason, > >  We've been running some perf/load/stress tests lately, but on a suggestion > > from Ted D

Re: Realtime & distributed

2009-10-09 Thread Jake Mannix
Jason, We've been running some perf/load/stress tests lately, but on a suggestion from Ted Dunning, I've been trying to come up with a more "realistic" set of stress tests and indexing rates to see where NRT performs well and where it does not, instead of just indexing at maximum rate, looping

Re: Realtime & distributed

2009-10-09 Thread Jason Rutherglen
Jake and John, It would be interesting and enlightening to see NRT performance numbers in a variety of configurations. The best way to go about this is to post benchmarks that others may run in their environment which can then be tweaked for their unique edge cases. I wish I had more time to work

Re: Realtime & distributed

2009-10-08 Thread John Wang
Jason: I would really appreciate it if you would stop making false statements and misinformation. Everyone is entitled to his/her opinions on technologies, but deliberately making misleading and false information on such a distribution is just unethical, and you'll end up just discrediting

Re: Realtime & distributed

2009-10-08 Thread Jake Mannix
On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen wrote: > There is the Zoie system which uses the RAMDir > solution, > Also, to clarify: zoie does not index into a RAMDir and then periodically merge that down to disk, as for one thing, this has a bad failure mode when the system crashes, as you

Re: Realtime & distributed

2009-10-08 Thread Jake Mannix
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote: > > Does anyone have any recommendations? I've looked at Katta, but it doesn't > seem to support realtime searching. It also uses hdfs, which I've heard can > be slow. I'm looking to serve 40gb of indexes and support about 1 million > updates

Re: Realtime & distributed

2009-10-08 Thread Jake Mannix
Jason, On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen wrote: > Today near realtime search (with or without SSDs) comes at a > price, that is reduced indexing speed due to continued in RAM > merging. People typically hack something together where indexes > are held in a RAMDir until being flush

Re: Realtime & distributed

2009-10-08 Thread Jason Rutherglen
Eric, Katta doesn't require HDFS which would be slow to search on, though Katta can be used to copy indexes out of HDFS onto local servers. The best bet is hardware that uses SSDs because merges and update latency will greatly decrease and there won't be a synchronous IO issue as there is with har

Realtime & distributed

2009-10-08 Thread Angel, Eric
Does anyone have any recommendations? I've looked at Katta, but it doesn't seem to support realtime searching. It also uses hdfs, which I've heard can be slow. I'm looking to serve 40gb of indexes and support about 1 million updates per day. Thx ---