Perfect. Thank you very much.
Andy
--- On Fri, 4/8/11, Pascal Coupet wrote:
> From: Pascal Coupet
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
> Question)?
> To: solr-user@lucene.apache.org
> Date: Friday, April 8, 2011, 10:20 AM
> I dit put
of the document in pdf or openoffice
> format? I'm on Linux so there's no way for me to use MS Word.
>
> Thanks.
>
>
> --- On Fri, 4/8/11, Albert Vila wrote:
>
> > From: Albert Vila
> > Subject: Re: Very very large scale Solr Deployment = how to do (Expert
Could anyone please post a version of the document in pdf or openoffice format?
I'm on Linux so there's no way for me to use MS Word.
Thanks.
--- On Fri, 4/8/11, Albert Vila wrote:
> From: Albert Vila
> Subject: Re: Very very large scale Solr Deployment = how to do (Ex
--- On Fri, 4/8/11, Albert Vila wrote:
>
>> From: Albert Vila
>> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
>> Question)?
>> To: solr-user@lucene.apache.org
>> Date: Friday, April 8, 2011, 3:43 AM
>> Ephraim, I still can't
I can't view the document either -- it showed up empty.
Has anyone succeeded in viewing it?
Andy
--- On Fri, 4/8/11, Albert Vila wrote:
> From: Albert Vila
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
> Question)?
> To: solr-user@lucene.apache.o
You might also want to look at the heritrix crawler too:
http://crawler.archive.org/
I have written three crawlers in the past, all for RSS feeds, it is not easy.
Happy to provide tips and help if you want to go down that route.
François
On Apr 8, 2011, at 1:53 AM, Andrea Campi wrote:
AM
> To: solr-user@lucene.apache.org
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
> Question)?
>
> Hello Ephraim, hello Lance, hello Walter,
>
> thanks for your replies:
>
> Ephraim, thanks very much for the further detailed explanation. I will
>
On Fri, Apr 8, 2011 at 6:23 AM, Jens Mueller wrote:
> Hello all,
>
> thanks for your generous help.
>
> I think I now know everything: (What I want to do is to build a web
> crawler
> and index the documents found). I will start with the setup as suggested by
>
>
Write a web crawler from scratch
Hello all,
thanks for your generous help.
I think I now know everything: (What I want to do is to build a web crawler
and index the documents found). I will start with the setup as suggested by
Ephraim (Several sharded masters, each with at least one slave for reads and
some aggregators for quer
On Apr 6, 2011, at 10:29 PM, Jens Mueller wrote:
> Walter, thanks for the advice: Well you are right, mentioning google. My
> question was also to understand how such large systems like google/facebook
> are actually working. So my numbers are just theoretical and made up. My
> system will be smal
il.com]
Sent: Thursday, April 07, 2011 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Very very large scale Solr Deployment = how to do (Expert
Question)?
Hello Ephraim, hello Lance, hello Walter,
thanks for your replies:
Ephraim, thanks very much for the further detailed explanation.
Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Jens Mueller
> To: solr-user@lucene.apache.org
> Sent: Thu, April 7, 2011 1:29:40 AM
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
>Question)
Hello Ephraim, hello Lance, hello Walter,
thanks for your replies:
Ephraim, thanks very much for the further detailed explanation. I will try
to setup a demo system in the next few days and use your advice.
LoadBalancers are an important aspect of your design. Can you recommend one
LB specificall
The bigger answer is that you cannot get to this size by just configuring Solr.
You may have to invent a lot of stuff. Like all of Google.
Where did you get these numbers? The proposed query rate is twice as big as
Google (Feb 2010 estimate, 34K qps).
I work at MarkLogic, and we scale to 100's
I would not use replication. LinkedIn consumer search is a flat system
where one process indexes new entries and does queries simultaneously.
It's a custom Lucene app called Zoie. Their stuff is on Github..
I would get documents to indexers via a multicast IP-based queueing
system. This scales ver
And if you have control over machine placement, split them across racks so that
a power outage on one rack does not take out your search cluster.
François
On Apr 5, 2011, at 3:19 AM, Ephraim Ofir wrote:
> I'm not sure about the scale you're aiming for, but you probably want to
> do both shardin
I'm not sure about the scale you're aiming for, but you probably want to
do both sharding and replication. There's no central server which would
be the bottleneck. The guidelines should probably be something like:
1. Split your index to enough shards so it can keep up with the update
rate.
2. Have
17 matches
Mail list logo