Re: How can i make a distribute search on Solr?
Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through the lot of results from all workers, reduce them according to your own biz rules (eg, remove dupes, sort them by quality / priority... here possibly relying on the original parameters of the query in 1) 5) return the reduced results to the frontend. That seems to be how Sphinx works: http://www.sphinxsearch.com/doc.html#distributed Of course, the details of this are far over my head for either system, so I don't really know if that's a sensible way of doing things or not. Ciao, -- David N. Welton http://www.welton.it/davidw/
Re: How can i make a distribute search on Solr?
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce (that's sort of the point... being able to automatically parallelize a lot of different problems). It really isn't the best fit though, and would end up being much slower than a custom job. Then there is the issue that the way map-reduce is implemented (like hadoop) is also tuned for longer running batch jobs on huge data (temporary files are used, external sorts, initial input, final output is via files, etc). Check out the google map-reduce paper - they don't use it for their search side either. Things are already progressing in the distributed search area: https://issues.apache.org/jira/browse/SOLR-303 Hopefully I'll have time to dig into it more myself in a few weeks. -Yonik
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 09:58:17 +0200 David Welton [EMAIL PROTECTED] wrote: That seems to be how Sphinx works: http://www.sphinxsearch.com/doc.html#distributed Of course, the details of this are far over my head for either system, so I don't really know if that's a sensible way of doing things or not. thanks for the pointer. it does seem that it's pretty much what I had in mind... but it doesn't seem to be based on Lucene (which I particular like, specially for the community...) ... cheers, _ {Beto|Norberto|Numard} Meijome The freethinking of one age is the common sense of the next. Matthew Arnold I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 09:53:46 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce (that's sort of the point... being able to automatically parallelize a lot of different problems). It really isn't the best fit though, and would end up being much slower than a custom job. good point..i wondered whether the whole sorting/whatever wasn't going to make it far slower than something custom. I dont care about mapreduce in particular, but yes the effect - n indexers / searches all fulfilling their part of the overall search results. Then there is the issue that the way map-reduce is implemented (like hadoop) is also tuned for longer running batch jobs on huge data (temporary files are used, external sorts, initial input, final output is via files, etc). I see, didn't know this. Check out the google map-reduce paper - they don't use it for their search side either. yeah, need to :) Things are already progressing in the distributed search area: https://issues.apache.org/jira/browse/SOLR-303 Hopefully I'll have time to dig into it more myself in a few weeks. excellent , thanks _ {Beto|Norberto|Numard} Meijome He uses statistics as a drunken man uses lamp-posts ... for support rather than illumination. Andrew Lang (1844-1912) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: How can i make a distribute search on Solr?
Thanks for your reply, I need the Federated Search. You mean this is not yet supported out of the box. So I have a question that in this situation what can Collection Distribution used for? Jarvis -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 1:47 PM To: solr-user@lucene.apache.org Subject: Re: How can i make a distribute search on Solr? So it means that distributed search is not a basic component in Solr project. I think you just need load balancing. Solr is not a load balancer, you need to find something that works for you and configure that elsewhere. Solr works fine without persistent connections, so simple round robin DNS but it works find. Depending on your usage/loads/requirements it may or may not make sense to have your master DB in the mix. Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box ryan
Re: How can i make a distribute search on Solr?
On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? cheers, B _ {Beto|Norberto|Numard} Meijome With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the It really should be Distributed Search I think (my mistake... I started out calling it Federated). I think Federated search is more about combining search results from different data sources. data and results are combined before they are returned. This is not yet supported out of the box Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. -Yonik
RE: How can i make a distribute search on Solr?
Nutch has two ways to make a distributed query - through HDFS(hadoop file system) or RPC call that is in org.apache.nutch.searcher.DistributedSearch class. But I think these are both not good enough. If we use HDFS to service the user's query. Stability is a problem. We must all do the crawl , index , query on HDFS and use mapreduce. Can we trust in hadoop all the time?:) If we use the RPC call in nutch . Manually separate the index is required . We will receive reduplicate result if there is reduplicate index document on different servers. And also the data updating and single server's error is hard to deal with. Thanks, Jarvis -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 10:37 PM To: solr-user@lucene.apache.org Subject: Re: How can i make a distribute search on Solr? Nutch implements federated search separately from their index generation. My understanding is that MapReduce jobs generate the indexes (Nutch calls them segments) from raw data that has been downloaded, and then makes them available to be searched via remote procedure calls. Queries never pass through MapReduce in any shape or form, only the raw data and indexes. If you take a look at the org.apache.nutch.searcher.DistributedSearch class, specifically the #Client.search method, you can see how they handle the actual federation of results. Thanks, Stu -Original Message- From: Norberto Meijome Sent: Wednesday, September 19, 2007 10:23am To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley wrote: Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? cheers, B _ {Beto|Norberto|Numard} Meijome With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On Wed, 19 Sep 2007 10:29:54 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. I see... so in nutch all nodes have all the date indexed ? Thanks, _ {Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop Imagination is more important than knowledge. Albert Einstein, On Science I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: How can i make a distribute search on Solr?
I think index data which stored in HDFS and generated by map-reduce function is used for searching in NUTCH-0.9 You can see the code in org.apache.nutch.searcher.NutchBean class . :) Jarvis -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Thursday, September 20, 2007 9:52 AM To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Wed, 19 Sep 2007 10:29:54 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. I see... so in nutch all nodes have all the date indexed ? Thanks, _ {Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop Imagination is more important than knowledge. Albert Einstein, On Science I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 09:37:51 +0800 Jarvis [EMAIL PROTECTED] wrote: If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built if it addresses solr's needs. Manually separate the index is required . hmm i imagine this really depends on the application. In my case, this separation of which docs go where happens @ a completely different layer. We will receive reduplicate result if there is reduplicate index document on different servers. Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through the lot of results from all workers, reduce them according to your own biz rules (eg, remove dupes, sort them by quality / priority... here possibly relying on the original parameters of the query in 1) 5) return the reduced results to the frontend. And also the data updating and single server's error is hard to deal with. this really depends on your infrastructure + design. Having the indexing , searching and providing of results in different layers should make for some interesting design options... If each searcher (or wherever the index resides) is really a small cluster of servers , the issue of data safety / server error is addressed @ that point. You can also have repeated data across indexes (again, independent indexes) and that's a more ... randomised :) way of keeping the docs safe... For example, IIRC, googleFS keeps copies of each file in 3 servers or more... cheers, B _ {Beto|Norberto|Numard} Meijome He uses statistics as a drunken man uses lamp-posts ... for support rather than illumination. Andrew Lang (1844-1912) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: How can i make a distribute search on Solr?
HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. Thanks Jarvis. -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Thursday, September 20, 2007 10:06 AM To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Thu, 20 Sep 2007 09:37:51 +0800 Jarvis [EMAIL PROTECTED] wrote: If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built if it addresses solr's needs. Manually separate the index is required . hmm i imagine this really depends on the application. In my case, this separation of which docs go where happens @ a completely different layer. We will receive reduplicate result if there is reduplicate index document on different servers. Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through the lot of results from all workers, reduce them according to your own biz rules (eg, remove dupes, sort them by quality / priority... here possibly relying on the original parameters of the query in 1) 5) return the reduced results to the frontend. And also the data updating and single server's error is hard to deal with. this really depends on your infrastructure + design. Having the indexing , searching and providing of results in different layers should make for some interesting design options... If each searcher (or wherever the index resides) is really a small cluster of servers , the issue of data safety / server error is addressed @ that point. You can also have repeated data across indexes (again, independent indexes) and that's a more ... randomised :) way of keeping the docs safe... For example, IIRC, googleFS keeps copies of each file in 3 servers or more... cheers, B _ {Beto|Norberto|Numard} Meijome He uses statistics as a drunken man uses lamp-posts ... for support rather than illumination. Andrew Lang (1844-1912) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On 19-Sep-07, at 7:21 PM, Jarvis wrote: HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. https://issues.apache.org/jira/browse/SOLR-303? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel regards, -Mike
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 10:02:08 +0800 Jarvis [EMAIL PROTECTED] wrote: You can see the code in org.apache.nutch.searcher.NutchBean class . :) thx for the pointer. _ {Beto|Norberto|Numard} Meijome In order to avoid being called a flirt, she always yielded easily. Charles, Count Talleyrand I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 10:21:39 +0800 Jarvis [EMAIL PROTECTED] wrote: What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. Thanks for the info. Risking starting up a flame war (which is not my intention :) ), what design reasons / features are there in Solr but not in hadoop/nutch that would make it compelling to use solr instead of h/n ? I know, each case is different the feeling i got from a shortish read into h/n was that H/N is geared towards webpage indexing, crawling,etc. But possibly i'm missing something... Where Solr is , from my point of view, far more flexible. In which case, maybe porting HDFS into Solr to add all this clustering / map/reduce options... thanks for your time and insights :) B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
Along similar lines : assuming that i have 2 indexes in the same box , say at : /home/abc/data/index1 and /home/abc/data/index2, and i want the results from both the indexes when i do a search - then how should this be 'optimally' designed - basically these are different Solr homes and i want the results to be clearly demarcated as coming from 2 different sources. -Venkat On 9/20/07, Norberto Meijome [EMAIL PROTECTED] wrote: On Thu, 20 Sep 2007 10:21:39 +0800 Jarvis [EMAIL PROTECTED] wrote: What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. Thanks for the info. Risking starting up a flame war (which is not my intention :) ), what design reasons / features are there in Solr but not in hadoop/nutch that would make it compelling to use solr instead of h/n ? I know, each case is different the feeling i got from a shortish read into h/n was that H/N is geared towards webpage indexing, crawling,etc. But possibly i'm missing something... Where Solr is , from my point of view, far more flexible. In which case, maybe porting HDFS into Solr to add all this clustering / map/reduce options... thanks for your time and insights :) B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. --
How can i make a distribute search on Solr?
Hi everyone, I successfully do the Collection Distribution on two Linux servers - one master with one slave and sync the index data. How can I make a search request to master server and receive the response by all slave servers? OR it should be manually controlled? Thanks Best Regards. Jarvis .
RE: How can i make a distribute search on Solr?
Helpful information. So it means that distributed search is not a basic component in Solr project. Thanks Best Regards. Jarvis . -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 12:55 PM To: solr-user@lucene.apache.org Subject: RE: How can i make a distribute search on Solr? There are two federated/distributed search implementations that are still a few weeks away from maturity: https://issues.apache.org/jira/browse/SOLR-255https://issues.apache.org/jira/browse/SOLR-303Any help in testing them would definitely be appreciated. BUT, if you decide to roll your own, take a look at the following wiki page for details on the complexity of the task: http://wiki.apache.org/solr/FederatedSearch Good luck! Thanks, Stu -Original Message- From: ¹ý¼Ñ Sent: Wednesday, September 19, 2007 12:24am To: solr-user@lucene.apache.org Subject: How can i make a distribute search on Solr? Hi everyone, I successfully do the Collection Distribution on two Linux servers - one master with one slave and sync the index data. How can I make a search request to master server and receive the response by all slave servers? OR it should be manually controlled? Thanks Best Regards. Jarvis .
Re: How can i make a distribute search on Solr?
So it means that distributed search is not a basic component in Solr project. I think you just need load balancing. Solr is not a load balancer, you need to find something that works for you and configure that elsewhere. Solr works fine without persistent connections, so simple round robin DNS but it works find. Depending on your usage/loads/requirements it may or may not make sense to have your master DB in the mix. Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box ryan