Re: Distributed Fulltext?

2002-02-16 Thread David Axmark
On Fri, 2002-02-15 at 02:44, Alex Aulbach wrote: > Wednesday, from David Axmark: > > > Your other point about exact vs. approximate answers is unclear, I expect > > > that Google's answers are exact for their currently available indexes at any > > > given time. But even if they are approximate, I

Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach
Wednesday, from David Axmark: > > Your other point about exact vs. approximate answers is unclear, I expect > > that Google's answers are exact for their currently available indexes at any > > given time. But even if they are approximate, I'd be happy with that too. > > The scoring on a FULLTEXT

Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach
Wednesday, from Mike Wexler: > I don't think that would be appropriate. My example, is our site (tias.com) has > lots of antiques and collectibles. One popular categories is jewelry. If > somebody does a search for "gold jewelry" and the search engine interprets this > as anything that mentions go

Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach
Hi, I also will explain how we made FTS "fast". (sorry for my bad english) First some DATA: The table which has to be indexed has ~60 entries. There are articles inside it, which are in average 3-4 kb each (which says nothing!) with about 300 words each (this number is very important!). Thi

Re: Distributed Fulltext?

2002-02-13 Thread Steven Roussey
oduce huge speed increases depending on what and how you do things. 4. If you do #2 and #3 you'll notice that you can have x (10 for us) number of servers partition the FTS. We don't actually do this, but we could and therefore get 'Distributed Fulltext' -- the title of this th

Re: Distributed Fulltext?

2002-02-13 Thread hooker
> While any speed up with a full table fulltext search would be helpful > and useful, there are instances where the search is intersected with > another column and the problem of search is therefore more complex but > also leads to potential optimizations. > > In our case we rarely do searches th

Re: Distributed Fulltext?

2002-02-13 Thread hooker
> Steve Rapaport wrote: > > Someone correctly pointed out today that it's not Mysql's job > > to be Google, and I agree. But it seems to me that it would be > > fair for mysql to be able to handle searches in under 1 second > > for databases 1 millionth the size of Google. All I want here > > is

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Brian DeFeyter wrote: > On Wed, 2002-02-13 at 16:39, Mike Wexler wrote: > >> >>Brian DeFeyter wrote: >> >>>I sorta like that idea. I don't know exactly what you can and can't do >>>as far as indexing inside of HEAP tables.. but the index size would >>>likely differ from the written index. Then

Re: Distributed Fulltext?

2002-02-13 Thread Brian DeFeyter
On Wed, 2002-02-13 at 16:39, Mike Wexler wrote: > > > Brian DeFeyter wrote: > > I sorta like that idea. I don't know exactly what you can and can't do > > as far as indexing inside of HEAP tables.. but the index size would > > likely differ from the written index. Then you can expand the idea an

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Brian DeFeyter wrote: > I sorta like that idea. I don't know exactly what you can and can't do > as far as indexing inside of HEAP tables.. but the index size would > likely differ from the written index. Then you can expand the idea and > use the X/(num slices) on (num slices) boxes technique..

Re: Distributed Fulltext?

2002-02-13 Thread Brian DeFeyter
I sorta like that idea. I don't know exactly what you can and can't do as far as indexing inside of HEAP tables.. but the index size would likely differ from the written index. Then you can expand the idea and use the X/(num slices) on (num slices) boxes technique.. sending the query to each, and

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Steve Rapaport wrote: > > Someone correctly pointed out today that it's not Mysql's job > to be Google, and I agree. But it seems to me that it would be > fair for mysql to be able to handle searches in under 1 second > for databases 1 millionth the size of Google. All I want here > is a dec

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
My understanding is that part of how google and Altavista get such high speeds is to keep everything in memory. Is it possible to create a HEAP table with a full text index? If so, does the full text index take advantage of being in memory? For example, I would imagine that if you were keeping

Re: Distributed Fulltext?

2002-02-13 Thread Steven Roussey
> [comparisons to Google...] While any speed up with a full table fulltext search would be helpful and useful, there are instances where the search is intersected with another column and the problem of search is therefore more complex but also leads to potential optimizations. In our case we rar

Re: Distributed Fulltext?

2002-02-13 Thread David Axmark
On Tue, 2002-02-12 at 15:38, Steve Rapaport wrote: > David Axmark writes: > > > So the standard answer with Apples and Oranges certainly apply here! > > More like Äpplen och Apelsiner, that is, different but similar. You Swedish > guys should know. Thanks for answering, David, I appreciate th

Re: Distributed Fulltext?

2002-02-13 Thread Steve Rapaport
I said: > > Why is it that Altavista can index terabytes overnight and return > > a fulltext boolean for the WHOLE WEB > > within a second, and Mysql takes so long? On Friday 08 February 2002 08:56, Vincent Stoessel wrote: > Apples and oranges. Yeah, I know. But let's see if we can make some d

Re: Distributed Fulltext?

2002-02-13 Thread Steve Rapaport
Ooops, factual error: > > If, say, Google, can search 2 trillion web pages, averaging say 70k > > bytes each, in 1 second, and Mysql can search 22 million records, with > > an index on 40 bytes each, in 3 seconds (my experience) on a good day, > > what's the order of magnitude difference? Roughl

Re: Distributed Fulltext?

2002-02-13 Thread alec . cawley
> Why is it that Altavista can index terabytes overnight and return > a fulltext boolean for the WHOLE WEB > within a second, and Mysql takes so long? I don't know about Altavista, but if you read up on Google, they do indeed do some sort of spreading of keywords across multiple machines - last

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Steve Rapaport wrote: > On Friday 08 February 2002 06:14 pm, James Montebello wrote: > >>Distribution is how Google gets its speed. You say clustering won't >>solve the problem, but distributing the indicies across many processors >>*is* going to gain you a huge speed increase through sheer pa

Re: Distributed Fulltext?

2002-02-13 Thread Tod Harter
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote: > Has anyone made a suggestion or thought about ways to distribute > databases which focus on fulltext indexes? > > fulltext indexes do a good job of indexing a moderate amount of data, > but when you get a lot of data to be indexed, the qu

Re: Distributed Fulltext?

2002-02-13 Thread Brian Bray
It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so

Re: Distributed Fulltext?

2002-02-13 Thread James Montebello
I did this at a previous job, and we split the data up more or less this way (we used a pre-existing item number for the split which was essentially random in relation to the text data), with a aggregator that did the query X ways, each to a separate box holding 1/X of the data. The results from

Re: Distributed Fulltext?

2002-02-12 Thread Steve Rapaport
David Axmark writes: > So the standard answer with Apples and Oranges certainly apply here! More like Äpplen och Apelsiner, that is, different but similar. You Swedish guys should know. Thanks for answering, David, I appreciate the attention from a founder. I also appreciate your point that

Re: Distributed Fulltext?

2002-02-12 Thread David Axmark
On Fri, 2002-02-08 at 11:11, Steve Rapaport wrote: > I said: > > > Why is it that Altavista can index terabytes overnight and return > > > a fulltext boolean for the WHOLE WEB > > > within a second, and Mysql takes so long? > > On Friday 08 February 2002 08:56, Vincent Stoessel wrote: > > > Appl

Re: Distributed Fulltext?

2002-02-12 Thread Steve Rapaport
On Friday 08 February 2002 06:14 pm, James Montebello wrote: > Distribution is how Google gets its speed. You say clustering won't > solve the problem, but distributing the indicies across many processors > *is* going to gain you a huge speed increase through sheer parallelism. True, but not

Re: Distributed Fulltext?

2002-02-12 Thread George M. Ellenburg
> Last week on Slashdot there was an article where the CEO of Google mentioned he > uses DRAM (solid state disk arrays) rather than hard drives for the indexes and > arrays because of the magnitude of difference in speed they provide. > > There's your 10^6 difference in speed (or part of it). > >

Re: Distributed Fulltext?

2002-02-12 Thread James Montebello
For the slice servers, you simply assume that if one is lost, you lose X% of the data until it is revived, which is usually not even noticable by the end user. For the aggregators, we had four behind a load-balancer. In practice, we had nearly zero downtime over a roughly 18 month period. james

Re: Distributed Fulltext?

2002-02-12 Thread Alex Aulbach
Yesterday, from Brian DeFeyter: > Has anyone made a suggestion or thought about ways to distribute > databases which focus on fulltext indexes? > > fulltext indexes do a good job of indexing a moderate amount of data, > but when you get a lot of data to be indexed, the queries slow down > signifi

Re: Distributed Fulltext?

2002-02-12 Thread Brian DeFeyter
> On Friday 08 February 2002 08:56, Vincent Stoessel wrote: > >> Apples and oranges. > > Yeah, I know. But let's see if we can make some distinctions. > If, say, Google, can search 2 trillion web pages, averaging say 70k > bytes each, in 1 second, and Mysql can search 22 million records, with

Re: Distributed Fulltext?

2002-02-11 Thread Steve Rapaport
Also, I have to ask the question: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 11:50, Steve Rapaport wrote: > I second the question. It could also reduce the size

Re: Distributed Fulltext?

2002-02-11 Thread Brian DeFeyter
On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip] > Wouldn't be too tough to write a little query routing system if you are using > perl. Use DBD::Proxy on the web server side, and just hack the perl proxy > server so it routes the query to several places and returns a single result > set.

Re: Distributed Fulltext?

2002-02-11 Thread Steve Rapaport
I second the question. It could also reduce the size of the fulltext index and the time taken to update it. -steve > On Thursday 07 February 2002 20:53, Brian wrote: > > Has anyone made a suggestion or thought about ways to distribute > > databases which focus on fulltext indexes? > > > > ful

Distributed Fulltext?

2002-02-11 Thread Brian DeFeyter
Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the queries slow down significantly. I have an example table, with about

Re: Distributed Fulltext?

2002-02-10 Thread George M. Ellenburg
> Last week on Slashdot there was an article where the CEO of Google mentioned he > uses DRAM (solid state disk arrays) rather than hard drives for the indexes and > arrays because of the magnitude of difference in speed they provide. > > There's your 10^6 difference in speed (or part of it). > >

Re: Distributed Fulltext?

2002-02-10 Thread Steve Rapaport
On Friday 08 February 2002 06:14 pm, James Montebello wrote: > Distribution is how Google gets its speed. You say clustering won't > solve the problem, but distributing the indicies across many processors > *is* going to gain you a huge speed increase through sheer parallelism. True, but not

Re: Distributed Fulltext?

2002-02-08 Thread James Montebello
For the slice servers, you simply assume that if one is lost, you lose X% of the data until it is revived, which is usually not even noticable by the end user. For the aggregators, we had four behind a load-balancer. In practice, we had nearly zero downtime over a roughly 18 month period. james

Re: Distributed Fulltext?

2002-02-08 Thread Steve Rapaport
Ooops, factual error: > > If, say, Google, can search 2 trillion web pages, averaging say 70k > > bytes each, in 1 second, and Mysql can search 22 million records, with > > an index on 40 bytes each, in 3 seconds (my experience) on a good day, > > what's the order of magnitude difference? Roughl

Re: Distributed Fulltext?

2002-02-08 Thread Alex Aulbach
Yesterday, from Brian DeFeyter: > Has anyone made a suggestion or thought about ways to distribute > databases which focus on fulltext indexes? > > fulltext indexes do a good job of indexing a moderate amount of data, > but when you get a lot of data to be indexed, the queries slow down > signifi

Re: Distributed Fulltext?

2002-02-08 Thread Steve Rapaport
I said: > > Why is it that Altavista can index terabytes overnight and return > > a fulltext boolean for the WHOLE WEB > > within a second, and Mysql takes so long? On Friday 08 February 2002 08:56, Vincent Stoessel wrote: > Apples and oranges. Yeah, I know. But let's see if we can make some d

Re: Distributed Fulltext?

2002-02-08 Thread alec . cawley
> Why is it that Altavista can index terabytes overnight and return > a fulltext boolean for the WHOLE WEB > within a second, and Mysql takes so long? I don't know about Altavista, but if you read up on Google, they do indeed do some sort of spreading of keywords across multiple machines - last

Re: Distributed Fulltext?

2002-02-07 Thread Steve Rapaport
Also, I have to ask the question: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 11:50, Steve Rapaport wrote: > I second the question. It could also reduce the size

Re: Distributed Fulltext?

2002-02-07 Thread Amir Aliabadi
How do you make something like this fault tolerant? The answer is probably what I suspect, 2 of every thing. How does the aggregator handle this or are these machines in a cluster? We are thinking of how to rebuild our fulltext search. Currently it is in MS SQL 7.0 - MySQL 4.0 seems to blow the

Re: Distributed Fulltext?

2002-02-07 Thread James Montebello
I did this at a previous job, and we split the data up more or less this way (we used a pre-existing item number for the split which was essentially random in relation to the text data), with a aggregator that did the query X ways, each to a separate box holding 1/X of the data. The results from

Re: Distributed Fulltext?

2002-02-07 Thread Brian Bray
It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so

Re: Distributed Fulltext?

2002-02-07 Thread Steve Rapaport
I second the question. It could also reduce the size of the fulltext index and the time taken to update it. -steve > On Thursday 07 February 2002 20:53, Brian wrote: > > Has anyone made a suggestion or thought about ways to distribute > > databases which focus on fulltext indexes? > > > > ful

Re: Distributed Fulltext?

2002-02-07 Thread Brian DeFeyter
On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip] > Wouldn't be too tough to write a little query routing system if you are using > perl. Use DBD::Proxy on the web server side, and just hack the perl proxy > server so it routes the query to several places and returns a single result > set.

Re: Distributed Fulltext?

2002-02-07 Thread Tod Harter
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote: > Has anyone made a suggestion or thought about ways to distribute > databases which focus on fulltext indexes? > > fulltext indexes do a good job of indexing a moderate amount of data, > but when you get a lot of data to be indexed, the qu

Distributed Fulltext?

2002-02-07 Thread Brian DeFeyter
Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the queries slow down significantly. I have an example table, with about