Re: Realtime Searching..

2009-03-25 Thread John Wang
Hi Jon:
We are running various LinkedIn search systems on Zoie in production.

-John

On Thu, Feb 19, 2009 at 9:11 AM, Jon Baer jonb...@gmail.com wrote:

 This part:

 The part of Zoie that enables real-time searchability is the fact that
 ZoieSystem contains three IndexDataLoader objects:

* a RAMLuceneIndexDataLoader, which is a simple wrapper around a
 RAMDirectory,
* a DiskLuceneIndexDataLoader, which can index directly to the
 FSDirectory (followed by an optimize() call if a specified optimizeDuration
 has been exceeded) in batches via an intermediary
* BatchedIndexDataLoader, whose primary job is to queue up and batch
 DataEvents that need to be flushed to disk

 Sounds like it (might) be / (can) be layered into Solr somehow, has anyone
 been using this project or testing it?

 - Jon


 On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote:

  Michael,

 I think you might be get interested in zoie.

 zoie: real-time search and indexing system built on Apache Lucene
 http://code.google.com/p/zoie/

 Zoie is realtime search project for lucene by Linkedin.
 Basically, I think it is similar technique to a Otis's trick.

  In the mean time you can use the trick of one large and less frequently
 updated core and one small and more frequently updated core + distributed
 search across them.

 Otis


 Genta


 On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com
 wrote:

 I need to find a solution for our current social application. It's low
 traffic now because we are early on.. However I'm expecting and want to
 be
 prepaired to grow.  We have messages of different types that are
 aggregated into one stream. Each of these message types have much
 different
 data so that our main queries have a few unions and many joins.  I know
 that
 Solr would work great for searching but we need a realtime system
 (twitter-like) to view user updates.  I'm not interested in a few minutes
 delay; I need something that will be fast updating and searchable and
 have n
 columns per record/document. Can solor do this? what is Ocean?

 Thanks





Re: Realtime Searching..

2009-03-25 Thread Otis Gospodnetic

Would it not make more sense to wait for the Lucene's IW+IR marriage and other 
things happening in core Lucene that will make near-real-time search possible?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: John Wang john.w...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, March 25, 2009 2:34:04 PM
 Subject: Re: Realtime Searching..
 
 Hi Jon:
 We are running various LinkedIn search systems on Zoie in production.
 
 -John
 
 On Thu, Feb 19, 2009 at 9:11 AM, Jon Baer wrote:
 
  This part:
 
  The part of Zoie that enables real-time searchability is the fact that
  ZoieSystem contains three IndexDataLoader objects:
 
 * a RAMLuceneIndexDataLoader, which is a simple wrapper around a
  RAMDirectory,
 * a DiskLuceneIndexDataLoader, which can index directly to the
  FSDirectory (followed by an optimize() call if a specified optimizeDuration
  has been exceeded) in batches via an intermediary
 * BatchedIndexDataLoader, whose primary job is to queue up and batch
  DataEvents that need to be flushed to disk
 
  Sounds like it (might) be / (can) be layered into Solr somehow, has anyone
  been using this project or testing it?
 
  - Jon
 
 
  On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote:
 
   Michael,
 
  I think you might be get interested in zoie.
 
  zoie: real-time search and indexing system built on Apache Lucene
  http://code.google.com/p/zoie/
 
  Zoie is realtime search project for lucene by Linkedin.
  Basically, I think it is similar technique to a Otis's trick.
 
   In the mean time you can use the trick of one large and less frequently
  updated core and one small and more frequently updated core + 
  distributed
  search across them.
 
  Otis
 
 
  Genta
 
 
  On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin 
  wrote:
 
  I need to find a solution for our current social application. It's low
  traffic now because we are early on.. However I'm expecting and want to
  be
  prepaired to grow.  We have messages of different types that are
  aggregated into one stream. Each of these message types have much
  different
  data so that our main queries have a few unions and many joins.  I know
  that
  Solr would work great for searching but we need a realtime system
  (twitter-like) to view user updates.  I'm not interested in a few minutes
  delay; I need something that will be fast updating and searchable and
  have n
  columns per record/document. Can solor do this? what is Ocean?
 
  Thanks
 
 
 



Re: Realtime Searching..

2009-02-19 Thread Genta Kaneyama
Michael,

I think you might be get interested in zoie.

zoie: real-time search and indexing system built on Apache Lucene
 http://code.google.com/p/zoie/

Zoie is realtime search project for lucene by Linkedin.
Basically, I think it is similar technique to a Otis's trick.

In the mean time you can use the trick of one large and less frequently 
updated core and one small and more frequently updated core + distributed 
search across them.

Otis

Genta


On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com wrote:
 I need to find a solution for our current social application. It's low
 traffic now because we are early on.. However I'm expecting and want to be
 prepaired to grow.  We have messages of different types that are
 aggregated into one stream. Each of these message types have much different
 data so that our main queries have a few unions and many joins.  I know that
 Solr would work great for searching but we need a realtime system
 (twitter-like) to view user updates.  I'm not interested in a few minutes
 delay; I need something that will be fast updating and searchable and have n
 columns per record/document. Can solor do this? what is Ocean?

 Thanks



Re: Realtime Searching..

2009-02-19 Thread Jon Baer

This part:

The part of Zoie that enables real-time searchability is the fact that  
ZoieSystem contains three IndexDataLoader objects:


* a RAMLuceneIndexDataLoader, which is a simple wrapper around a  
RAMDirectory,
* a DiskLuceneIndexDataLoader, which can index directly to the  
FSDirectory (followed by an optimize() call if a specified  
optimizeDuration has been exceeded) in batches via an intermediary
* BatchedIndexDataLoader, whose primary job is to queue up and  
batch DataEvents that need to be flushed to disk


Sounds like it (might) be / (can) be layered into Solr somehow, has  
anyone been using this project or testing it?


- Jon

On Feb 19, 2009, at 9:44 AM, Genta Kaneyama wrote:


Michael,

I think you might be get interested in zoie.

zoie: real-time search and indexing system built on Apache Lucene
http://code.google.com/p/zoie/

Zoie is realtime search project for lucene by Linkedin.
Basically, I think it is similar technique to a Otis's trick.

In the mean time you can use the trick of one large and less  
frequently updated core and one small and more frequently  
updated core + distributed search across them.


Otis


Genta


On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin mausti...@gmail.com  
wrote:
I need to find a solution for our current social application. It's  
low
traffic now because we are early on.. However I'm expecting and  
want to be

prepaired to grow.  We have messages of different types that are
aggregated into one stream. Each of these message types have much  
different
data so that our main queries have a few unions and many joins.  I  
know that

Solr would work great for searching but we need a realtime system
(twitter-like) to view user updates.  I'm not interested in a few  
minutes
delay; I need something that will be fast updating and searchable  
and have n

columns per record/document. Can solor do this? what is Ocean?

Thanks





Re: Realtime Searching..

2009-02-19 Thread Otis Gospodnetic

Yes, the two are similar.  As a matter of fact, Zoie is one of the case studies 
you'll find in the soon to be published Lucene in Action 2nd edition.  I just 
reviewed this very informative case study a few weeks ago and I think people 
will like it and will likely end up using Zoie until we get true near real-time 
search added to Lucene and then Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Genta Kaneyama pengui...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, February 19, 2009 10:44:37 PM
 Subject: Re: Realtime Searching..
 
 Michael,
 
 I think you might be get interested in zoie.
 
 zoie: real-time search and indexing system built on Apache Lucene
 http://code.google.com/p/zoie/
 
 Zoie is realtime search project for lucene by Linkedin.
 Basically, I think it is similar technique to a Otis's trick.
 
 In the mean time you can use the trick of one large and less frequently 
 updated core and one small and more frequently updated core + distributed 
 search across them.
 
 Otis
 
 Genta
 
 
 On Sat, Feb 7, 2009 at 3:02 AM, Michael Austin wrote:
  I need to find a solution for our current social application. It's low
  traffic now because we are early on.. However I'm expecting and want to be
  prepaired to grow.  We have messages of different types that are
  aggregated into one stream. Each of these message types have much different
  data so that our main queries have a few unions and many joins.  I know that
  Solr would work great for searching but we need a realtime system
  (twitter-like) to view user updates.  I'm not interested in a few minutes
  delay; I need something that will be fast updating and searchable and have n
  columns per record/document. Can solor do this? what is Ocean?
 
  Thanks
 



Realtime Searching..

2009-02-06 Thread Michael Austin
I need to find a solution for our current social application. It's low
traffic now because we are early on.. However I'm expecting and want to be
prepaired to grow.  We have messages of different types that are
aggregated into one stream. Each of these message types have much different
data so that our main queries have a few unions and many joins.  I know that
Solr would work great for searching but we need a realtime system
(twitter-like) to view user updates.  I'm not interested in a few minutes
delay; I need something that will be fast updating and searchable and have n
columns per record/document. Can solor do this? what is Ocean?

Thanks


Re: Realtime Searching..

2009-02-06 Thread Otis Gospodnetic
Michael,

The short answer is that Solr is not there yet, but will be.  Expect to see 
real-time search in Lucene first, then in Solr.
We have a case study about real-time search with Lucene in the upcoming Lucene 
in Action 2, but a more tightly integrated real-time search will be added to 
Lucene down the road (and then Solr).

In the mean time you can use the trick of one large and less frequently updated 
core and one small and more frequently updated core + distributed search across 
them.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Michael Austin mausti...@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, February 6, 2009 1:02:43 PM
Subject: Realtime Searching..

I need to find a solution for our current social application. It's low
traffic now because we are early on.. However I'm expecting and want to be
prepaired to grow.  We have messages of different types that are
aggregated into one stream. Each of these message types have much different
data so that our main queries have a few unions and many joins.  I know that
Solr would work great for searching but we need a realtime system
(twitter-like) to view user updates.  I'm not interested in a few minutes
delay; I need something that will be fast updating and searchable and have n
columns per record/document. Can solor do this? what is Ocean?

Thanks


Re: Realtime Searching..

2009-02-06 Thread Michael Austin
Thanks Otis,

Is it possible to get my hands on the ability in lucene utilizing patches
before it is released to the public? (sorry to ask) - How close is it in the
source code if I didn't care about the documentation/packaging/etc..?  So
from what it sounds like, this would be a realtime store(with great
search) that could be used instead of a database or in conjunction? Is
it wrong to say it's similar to bigtable from google in keeping realtime
data in a non relational way but with a better search?

Thanks

On Fri, Feb 6, 2009 at 1:50 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Michael,

 The short answer is that Solr is not there yet, but will be.  Expect to see
 real-time search in Lucene first, then in Solr.
 We have a case study about real-time search with Lucene in the upcoming
 Lucene in Action 2, but a more tightly integrated real-time search will be
 added to Lucene down the road (and then Solr).

 In the mean time you can use the trick of one large and less frequently
 updated core and one small and more frequently updated core + distributed
 search across them.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




 
 From: Michael Austin mausti...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, February 6, 2009 1:02:43 PM
 Subject: Realtime Searching..

 I need to find a solution for our current social application. It's low
 traffic now because we are early on.. However I'm expecting and want to be
 prepaired to grow.  We have messages of different types that are
 aggregated into one stream. Each of these message types have much different
 data so that our main queries have a few unions and many joins.  I know
 that
 Solr would work great for searching but we need a realtime system
 (twitter-like) to view user updates.  I'm not interested in a few minutes
 delay; I need something that will be fast updating and searchable and have
 n
 columns per record/document. Can solor do this? what is Ocean?

 Thanks



Re: Realtime Searching..

2009-02-06 Thread Michael Austin
Just to back up and think about if solr/lucene realtime updating is what I
want to begin with..

Would this be something that a twitter type system might use to be more
scalable and fast? Let's just say that I have a site with as much message
traffic as twitter and I want to be able to update and search
fast/realtime.  Would this be the path you would initially send me?

For example, do you know of a system out there that does memcached type fast
caching and lookup but has the ability to look them up with sorting and
filtering?

Thanks


Re: Realtime Searching..

2009-02-06 Thread Otis Gospodnetic
Michael,

There is no single system that will provide Twitter like functionality.  You'd 
have to look into Lucene/Solr for searching, memcached (for example) for 
caching, maybe caching layer in front of Solr (e.g. varnish, squid, apache), 
something to store the data in (e.g. RDBMS, HBase, HDFS, depending on your 
precise needs), etc.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Michael Austin mausti...@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, February 6, 2009 3:18:44 PM
Subject: Re: Realtime Searching..

Just to back up and think about if solr/lucene realtime updating is what I
want to begin with..

Would this be something that a twitter type system might use to be more
scalable and fast? Let's just say that I have a site with as much message
traffic as twitter and I want to be able to update and search
fast/realtime.  Would this be the path you would initially send me?

For example, do you know of a system out there that does memcached type fast
caching and lookup but has the ability to look them up with sorting and
filtering?

Thanks