Re: Realtime Searching..
Hi Jon: We are running various LinkedIn search systems on Zoie in production. -John On Thu, Feb 19, 2009 at 9:11 AM, Jon Baer jonb...@gmail.com wrote: This part: The part of Zoie that enables real-time searchability is the fact that ZoieSystem contains three IndexDataLoader objects: * a RAMLuceneIndexDataLoader, which is a simple wrapper around a RAMDirectory, * a DiskLuceneIndexDataLoader, which can index directly to the FSDirectory (followed by an optimize() call if a specified optimizeDuration has been exceeded) in batches via an intermediary * BatchedIndexDataLoader, whose primary job is to queue up and batch DataEvents that need to be flushed to disk Sounds like it (might) be / (can) be layered into Solr somehow, has anyone been using this project or testing it? - Jon On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote: Michael, I think you might be get interested in zoie. zoie: real-time search and indexing system built on Apache Lucene http://code.google.com/p/zoie/ Zoie is realtime search project for lucene by Linkedin. Basically, I think it is similar technique to a Otis's trick. In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis Genta On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com wrote: I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Would it not make more sense to wait for the Lucene's IW+IR marriage and other things happening in core Lucene that will make near-real-time search possible? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Wang john.w...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, March 25, 2009 2:34:04 PM Subject: Re: Realtime Searching.. Hi Jon: We are running various LinkedIn search systems on Zoie in production. -John On Thu, Feb 19, 2009 at 9:11 AM, Jon Baer wrote: This part: The part of Zoie that enables real-time searchability is the fact that ZoieSystem contains three IndexDataLoader objects: * a RAMLuceneIndexDataLoader, which is a simple wrapper around a RAMDirectory, * a DiskLuceneIndexDataLoader, which can index directly to the FSDirectory (followed by an optimize() call if a specified optimizeDuration has been exceeded) in batches via an intermediary * BatchedIndexDataLoader, whose primary job is to queue up and batch DataEvents that need to be flushed to disk Sounds like it (might) be / (can) be layered into Solr somehow, has anyone been using this project or testing it? - Jon On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote: Michael, I think you might be get interested in zoie. zoie: real-time search and indexing system built on Apache Lucene http://code.google.com/p/zoie/ Zoie is realtime search project for lucene by Linkedin. Basically, I think it is similar technique to a Otis's trick. In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis Genta On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin wrote: I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Michael, I think you might be get interested in zoie. zoie: real-time search and indexing system built on Apache Lucene http://code.google.com/p/zoie/ Zoie is realtime search project for lucene by Linkedin. Basically, I think it is similar technique to a Otis's trick. In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis Genta On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com wrote: I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
This part: The part of Zoie that enables real-time searchability is the fact that ZoieSystem contains three IndexDataLoader objects: * a RAMLuceneIndexDataLoader, which is a simple wrapper around a RAMDirectory, * a DiskLuceneIndexDataLoader, which can index directly to the FSDirectory (followed by an optimize() call if a specified optimizeDuration has been exceeded) in batches via an intermediary * BatchedIndexDataLoader, whose primary job is to queue up and batch DataEvents that need to be flushed to disk Sounds like it (might) be / (can) be layered into Solr somehow, has anyone been using this project or testing it? - Jon On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote: Michael, I think you might be get interested in zoie. zoie: real-time search and indexing system built on Apache Lucene http://code.google.com/p/zoie/ Zoie is realtime search project for lucene by Linkedin. Basically, I think it is similar technique to a Otis's trick. In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis Genta On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com wrote: I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Yes, the two are similar. As a matter of fact, Zoie is one of the case studies you'll find in the soon to be published Lucene in Action 2nd edition. I just reviewed this very informative case study a few weeks ago and I think people will like it and will likely end up using Zoie until we get true near real-time search added to Lucene and then Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Genta Kaneyama pengui...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, February 19, 2009 10:44:37 PM Subject: Re: Realtime Searching.. Michael, I think you might be get interested in zoie. zoie: real-time search and indexing system built on Apache Lucene http://code.google.com/p/zoie/ Zoie is realtime search project for lucene by Linkedin. Basically, I think it is similar technique to a Otis's trick. In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis Genta On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin wrote: I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Realtime Searching..
I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Michael, The short answer is that Solr is not there yet, but will be. Expect to see real-time search in Lucene first, then in Solr. We have a case study about real-time search with Lucene in the upcoming Lucene in Action 2, but a more tightly integrated real-time search will be added to Lucene down the road (and then Solr). In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Michael Austin mausti...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 1:02:43 PM Subject: Realtime Searching.. I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Thanks Otis, Is it possible to get my hands on the ability in lucene utilizing patches before it is released to the public? (sorry to ask) - How close is it in the source code if I didn't care about the documentation/packaging/etc..? So from what it sounds like, this would be a realtime store(with great search) that could be used instead of a database or in conjunction? Is it wrong to say it's similar to bigtable from google in keeping realtime data in a non relational way but with a better search? Thanks On Fri, Feb 6, 2009 at 1:50 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Michael, The short answer is that Solr is not there yet, but will be. Expect to see real-time search in Lucene first, then in Solr. We have a case study about real-time search with Lucene in the upcoming Lucene in Action 2, but a more tightly integrated real-time search will be added to Lucene down the road (and then Solr). In the mean time you can use the trick of one large and less frequently updated core and one small and more frequently updated core + distributed search across them. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Michael Austin mausti...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 1:02:43 PM Subject: Realtime Searching.. I need to find a solution for our current social application. It's low traffic now because we are early on.. However I'm expecting and want to be prepaired to grow. We have messages of different types that are aggregated into one stream. Each of these message types have much different data so that our main queries have a few unions and many joins. I know that Solr would work great for searching but we need a realtime system (twitter-like) to view user updates. I'm not interested in a few minutes delay; I need something that will be fast updating and searchable and have n columns per record/document. Can solor do this? what is Ocean? Thanks
Re: Realtime Searching..
Just to back up and think about if solr/lucene realtime updating is what I want to begin with.. Would this be something that a twitter type system might use to be more scalable and fast? Let's just say that I have a site with as much message traffic as twitter and I want to be able to update and search fast/realtime. Would this be the path you would initially send me? For example, do you know of a system out there that does memcached type fast caching and lookup but has the ability to look them up with sorting and filtering? Thanks
Re: Realtime Searching..
Michael, There is no single system that will provide Twitter like functionality. You'd have to look into Lucene/Solr for searching, memcached (for example) for caching, maybe caching layer in front of Solr (e.g. varnish, squid, apache), something to store the data in (e.g. RDBMS, HBase, HDFS, depending on your precise needs), etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Michael Austin mausti...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 3:18:44 PM Subject: Re: Realtime Searching.. Just to back up and think about if solr/lucene realtime updating is what I want to begin with.. Would this be something that a twitter type system might use to be more scalable and fast? Let's just say that I have a site with as much message traffic as twitter and I want to be able to update and search fast/realtime. Would this be the path you would initially send me? For example, do you know of a system out there that does memcached type fast caching and lookup but has the ability to look them up with sorting and filtering? Thanks