Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Daniel Naber
On Tuesday 14 December 2004 20:13, Monsur Hossain wrote:

> My concern is that this just shifts the scaling issue to Lucene, and I
> haven't found much info on how to scale Lucene vertically. Â

You can easily use MultiSearcher to search over several indices. If you 
want the distribution to be more transparent, have a look at Nutch.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Opinions: Using Lucene as a thin database

2004-12-14 Thread Otis Gospodnetic
Well, one could always partition an index, distribute pieces of it
horizontally across multiple 'search servers' and use the built-in
RMI-based and Parallel search feature.  Nutch uses something similar
for search scaling.

Otis


--- Monsur Hossain <[EMAIL PROTECTED]> wrote:

> > My concern is that this just shifts the scaling issue to 
> > Lucene, and I haven't found much info on how to scale Lucene 
> > vertically.  
> 
> By "vertically", of course, I meant "horizontally".  Basically
> scaling
> it across servers as one might do with a relational database.
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Opinions: Using Lucene as a thin database

2004-12-14 Thread Otis Gospodnetic
You can see Flickr-like tag (lookup) system at my Simpy site (
http://www.simpy.com ).  It uses Lucene as the backend for lookups, but
still uses a RDBMS as the primary storage.

I find it that keeping the RDBMS and Lucene indices is a bit of a pain
and error prone, so _thin_ storage layer with simple requirements will
be okay with just using Lucene, while applications with more complex
domain models will quickly run into limitation (using the wrong tool
for the job type of problem).

Otis

--- Monsur Hossain <[EMAIL PROTECTED]> wrote:

> I think this is a great idea, and one that I've been mulling over to
> implement keyword lookups (similar to Flickr.com's tag system).  I
> believe the advantage over a relational database comes from Lucene's
> inverted index, which is highly optimized for this kind of lookup.  
> 
> My concern is that this just shifts the scaling issue to Lucene, and
> I
> haven't found much info on how to scale Lucene vertically.  
> 
> 
> 
> 
> > -Original Message-
> > From: Kevin L. Cobb [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, December 14, 2004 9:40 AM
> > To: [EMAIL PROTECTED]
> > Subject: Opinions: Using Lucene as a thin database
> > 
> > 
> > I use Lucene as a legitimate search engine which is cool. 
> > But, I am also using it as a simple database too. I build an 
> > index with a couple of keyword fields that allows me to 
> > retrieve values based on exact matches in those fields. This 
> > is all I need to do so it works just fine for my needs. I 
> > also love the speed. The index is small enough that it is 
> > wicked fast. Was wondering if anyone out there was doing the 
> > same of it there are any dissenting opinions on using Lucene 
> > for this purpose. 
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Opinions: Using Lucene as a thin database

2004-12-14 Thread Monsur Hossain
I think this is a great idea, and one that I've been mulling over to
implement keyword lookups (similar to Flickr.com's tag system).  I
believe the advantage over a relational database comes from Lucene's
inverted index, which is highly optimized for this kind of lookup.  

My concern is that this just shifts the scaling issue to Lucene, and I
haven't found much info on how to scale Lucene vertically.  




> -Original Message-
> From: Kevin L. Cobb [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, December 14, 2004 9:40 AM
> To: [EMAIL PROTECTED]
> Subject: Opinions: Using Lucene as a thin database
> 
> 
> I use Lucene as a legitimate search engine which is cool. 
> But, I am also using it as a simple database too. I build an 
> index with a couple of keyword fields that allows me to 
> retrieve values based on exact matches in those fields. This 
> is all I need to do so it works just fine for my needs. I 
> also love the speed. The index is small enough that it is 
> wicked fast. Was wondering if anyone out there was doing the 
> same of it there are any dissenting opinions on using Lucene 
> for this purpose. 
> 
>  
> 
>  
> 
>  
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Chris Hostetter
: select * from MY_TABLE where MY_NUMERIC_FIELD > 80
:
: as far as I know you have only the range query so you will have to say
:
: my_numeric_filed:[80 TO ??]
: but this would not work in the a/m example or am I missing something?

RangeQuery allows you to an open ended range -- you can tell the
QueryParser to leave your range opened ended using hte keyword "null",
ie...

my_numeric_filed:[80 TO null]



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Opinions: Using Lucene as a thin database

2004-12-14 Thread Monsur Hossain
> My concern is that this just shifts the scaling issue to 
> Lucene, and I haven't found much info on how to scale Lucene 
> vertically.  

By "vertically", of course, I meant "horizontally".  Basically scaling
it across servers as one might do with a relational database.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread petite_abeille
On Dec 14, 2004, at 15:40, Kevin L. Cobb wrote:
Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.
ZOE [1] [2] takes the same approach and uses Lucene as a relational 
engine of sort.

However, for both practical and ideological reasons, its does not store 
any raw data in the Lucene indices themselves but instead uses JDBM [2] 
for that purpose.

All things considered, update issues aside, Lucene turns out to be a 
very flexible "thin database".

Cheers,
PA.
[1] http://zoe.nu/
[2] http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/
[3] http://jdbm.sourceforge.net/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Praveen Peddi
Hmm. So far all our fields are just strings. But I would guess you should be 
able to use Integer.MAX_VALUE or something on the upper bound. Or there 
might be a better way of doing it.

Praveen
- Original Message - 
From: "Akmal Sarhan" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, December 14, 2004 10:23 AM
Subject: Re: Opinions: Using Lucene as a thin database


that sounds very interesting but how do you handle queries like
select * from MY_TABLE where MY_NUMERIC_FIELD > 80
as far as I know you have only the range query so you will have to say
my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?
regards
Akmal
Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07:
Even we use lucene for similar purpose except that we index and store 
quite
a few fields. Infact I also update partial documents as people suggested. 
I
store all the indexed fields so I don't have to build the whole document
again while updating partial document. The reason we do this is due to 
the
speed. I found the lucene search on a millions objects is 4 to 5 times
faster than our oracle queries (ofcourse this might be due to our pitiful
database design :) ). It works great so far. the only caveat that we had
till now was incremental updates. But now I am implementing real-time
updates so that the data in lucene index is almost always in sync with 
data
in database. So now, our search does not goto the database at all.

Praveen
- Original Message - 
From: "Kevin L. Cobb" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, December 14, 2004 9:40 AM
Subject: Opinions: Using Lucene as a thin database

I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
!EXCUBATOR:41bf0221115901292611315!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Akmal Sarhan
that sounds very interesting but how do you handle queries like
select * from MY_TABLE where MY_NUMERIC_FIELD > 80

as far as I know you have only the range query so you will have to say

my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?

regards

Akmal
Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07:
> Even we use lucene for similar purpose except that we index and store quite 
> a few fields. Infact I also update partial documents as people suggested. I 
> store all the indexed fields so I don't have to build the whole document 
> again while updating partial document. The reason we do this is due to the 
> speed. I found the lucene search on a millions objects is 4 to 5 times 
> faster than our oracle queries (ofcourse this might be due to our pitiful 
> database design :) ). It works great so far. the only caveat that we had 
> till now was incremental updates. But now I am implementing real-time 
> updates so that the data in lucene index is almost always in sync with data 
> in database. So now, our search does not goto the database at all.
> 
> Praveen
> - Original Message - 
> From: "Kevin L. Cobb" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, December 14, 2004 9:40 AM
> Subject: Opinions: Using Lucene as a thin database
> 
> 
> I use Lucene as a legitimate search engine which is cool. But, I am also
> using it as a simple database too. I build an index with a couple of
> keyword fields that allows me to retrieve values based on exact matches
> in those fields. This is all I need to do so it works just fine for my
> needs. I also love the speed. The index is small enough that it is
> wicked fast. Was wondering if anyone out there was doing the same of it
> there are any dissenting opinions on using Lucene for this purpose.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> !EXCUBATOR:41bf0221115901292611315!
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Nader Henein
How big do you expect it to get and how often do you expect to update 
it, we've been using Lucene for about 1 M records (19 fields each) with 
incremental updates every 10 minutes, the performance during updates 
wasn't wonderful, so it took some seriously intense code to sort that 
out, as you mentioned, it comes down to why you need the Thin DB for, 
Lucene is a wonderful search engine, but if I were looking at a fast and 
dirty relational DB, MySQL wins hands down, put them both together and 
you've really got something.

My 2 cents
Nader Henein
Kevin L. Cobb wrote:
I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose. 




 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Praveen Peddi
Even we use lucene for similar purpose except that we index and store quite 
a few fields. Infact I also update partial documents as people suggested. I 
store all the indexed fields so I don't have to build the whole document 
again while updating partial document. The reason we do this is due to the 
speed. I found the lucene search on a millions objects is 4 to 5 times 
faster than our oracle queries (ofcourse this might be due to our pitiful 
database design :) ). It works great so far. the only caveat that we had 
till now was incremental updates. But now I am implementing real-time 
updates so that the data in lucene index is almost always in sync with data 
in database. So now, our search does not goto the database at all.

Praveen
- Original Message - 
From: "Kevin L. Cobb" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, December 14, 2004 9:40 AM
Subject: Opinions: Using Lucene as a thin database

I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Opinions: Using Lucene as a thin database

2004-12-13 Thread Erik Hatcher
On Dec 14, 2004, at 9:40 AM, Kevin L. Cobb wrote:
I use Lucene as a legitimate search engine which is cool. But, I am 
also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.
I use Lucene as the complete data storage for my blog at 
http://www.blogscene.org/erik - all HTTP requests map to a Lucene query 
(based on the path and optional query parameter).   I've been lame and 
have never put any caching in there.

I'm about to start a new project that really needs a relational 
database under the covers, but I'm cringing at the headaches involved 
compared to the joys of using Lucene.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Opinions: Using Lucene as a thin database

2004-12-13 Thread Kevin L. Cobb
I don't have the requirement to do range type select, i.e. the only
operator I would need is the equals. Select * from MY_TABLE where
MY_NUMERIC_FIELD = 80.

My fields that are searchable in my model are always type KEYWORD. I
believe this forces the match to be exact. So thinking about it in
anything other than "equals" terms, I believe, would be a mistake. 

In any case, I believe that the requirement to use Lucene as a "thin DB"
means that your requirements for your database select are fairly simple
and straightforward. 

KLCobb

 
 

-Original Message-
From: Akmal Sarhan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 10:24 AM
To: Lucene Users List
Subject: Re: Opinions: Using Lucene as a thin database

that sounds very interesting but how do you handle queries like
select * from MY_TABLE where MY_NUMERIC_FIELD > 80

as far as I know you have only the range query so you will have to say

my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?

regards

Akmal
Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07:
> Even we use lucene for similar purpose except that we index and store
quite 
> a few fields. Infact I also update partial documents as people
suggested. I 
> store all the indexed fields so I don't have to build the whole
document 
> again while updating partial document. The reason we do this is due to
the 
> speed. I found the lucene search on a millions objects is 4 to 5 times

> faster than our oracle queries (ofcourse this might be due to our
pitiful 
> database design :) ). It works great so far. the only caveat that we
had 
> till now was incremental updates. But now I am implementing real-time 
> updates so that the data in lucene index is almost always in sync with
data 
> in database. So now, our search does not goto the database at all.
> 
> Praveen
> - Original Message - 
> From: "Kevin L. Cobb" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, December 14, 2004 9:40 AM
> Subject: Opinions: Using Lucene as a thin database
> 
> 
> I use Lucene as a legitimate search engine which is cool. But, I am
also
> using it as a simple database too. I build an index with a couple of
> keyword fields that allows me to retrieve values based on exact
matches
> in those fields. This is all I need to do so it works just fine for my
> needs. I also love the speed. The index is small enough that it is
> wicked fast. Was wondering if anyone out there was doing the same of
it
> there are any dissenting opinions on using Lucene for this purpose.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> !EXCUBATOR:41bf0221115901292611315!
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Opinions: Using Lucene as a thin database

2004-12-13 Thread Kevin L. Cobb
I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.