I'm curious how on the fly updates are handled as a new shard is added
to an alias. Eg, how does the system know to which shard to send an
update?
On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:
Hi,
speaking about ES I think it would be fair to mention that one has
AFAIK it can not. You can only add new shards by creating a new index and
you will then need to index new data into that new index. Index aliases are
useful mainly for searching part. So it means that you need to plan for
this when you implement your indexing logic. On the other hand the query
The main point being made is established NoSQL solutions (eg,
Cassandra, HBase, et al) have solved the update problem (among many
other scalability issues, for several years).
If an update is being performed and it is not known where the record
exists, the update capability of the system is
Hi,
I think Katta integration is nice, but it is not very real-time. What if you
want both?
Perhaps a Katta/SolrCloud integration could make the two frameworks play
together, so that some shards in SolrCloud may be marked as static while
others are realtime. SolrCloud will handle indexing the
for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
From: Jason Rutherglen jason.rutherg...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, April 16, 2012 8:42 PM
Subject: Re: Options for automagically Scaling Solr (without needing
/index.html
From: Jason Rutherglen jason.rutherg...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, April 16, 2012 8:42 PM
Subject: Re: Options for automagically Scaling Solr (without needing
distributed index/replication) in a Hadoop environment
One of big
Hi,
speaking about ES I think it would be fair to mention that one has to
specify number of shards upfront when the index is created - that is
correct, however, it is possible to give index one or more aliases which
basically means that you can add new indices on the fly and give them same
alias
One of big weaknesses of Solr Cloud (and ES?) is the lack of the
ability to redistribute shards across servers. Meaning, as a single
shard grows too large, splitting the shard, while live updates.
How do you plan on elastically adding more servers without this feature?
Cassandra and HBase
: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Friday, April 13, 2012 7:16 PM
Subject: Re: Options for automagically Scaling Solr (without needing
distributed index/replication) in a Hadoop environment
Thanks Otis.
I really appreciate the details offered here. This was very
Hi,
This won't give you the performance you need, unless you have enough RAM on the
Solr box to cache the whole index in memory.
Have you tested this yourself?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 12. apr. 2012, at
From: Ali S Kureishy safdar.kurei...@gmail.com
To: Otis Gospodnetic otis_gospodne...@yahoo.com
Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Friday, April 13, 2012 7:16 PM
Subject: Re: Options for automagically Scaling Solr (without needing
distributed index
(without needing
distributed index/replication) in a Hadoop environment
Thanks Otis.
I really appreciate the details offered here. This was very helpful
information.
I'm going to go through Solandra and Elastic Search and see if those make
sense. I was also given a suggestion to use SolrCloud
Hi,
For a web crawl+search like this you will probably need a lot of additional Big
Data crunching, so a Hadoop based solution is wise.
In addition to those products mentioned we also now have Amazon's own
CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr
(not
Thanks Otis.
I really appreciate the details offered here. This was very helpful
information.
I'm going to go through Solandra and Elastic Search and see if those make
sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's
two recommendations for SolrCloud so far), so I will
You could use SolrCloud (for the automatic scaling) and just mount a
fuse[1] HDFS directory and configure solr to use that directory for its
data.
[1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS
On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote:
Hi,
I'm trying to setup a
Thanks Darren.
Actually, I would like the system to be homogenous - i.e., use Hadoop based
tools that already provide all the necessary scaling for the lucene index
(in terms of throughput, latency of writes/reads etc). Since SolrCloud adds
its own layer of sharding/replication that is outside
Solrcloud or any other tech specific replication isnt going to 'just work' with
hadoop replication. But with some significant custom coding anything should be
possible. Interesting idea.
brbrbr--- Original Message ---
On 4/12/2012 09:21 AM Ali S Kureishy wrote:brThanks Darren.
br
Hello Ali,
I'm trying to setup a large scale *Crawl + Index + Search *infrastructure
using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*,
crawled + indexed every *4 weeks, *with a search latency of less than 0.5
seconds.
That's fine. Whether it's doable with any tech
18 matches
Mail list logo