Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
I'm curious how on the fly updates are handled as a new shard is added to an alias. Eg, how does the system know to which shard to send an update? On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, speaking about ES I think it would be fair to mention that one has

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Lukáš Vlček
AFAIK it can not. You can only add new shards by creating a new index and you will then need to index new data into that new index. Index aliases are useful mainly for searching part. So it means that you need to plan for this when you implement your indexing logic. On the other hand the query

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
The main point being made is established NoSQL solutions (eg, Cassandra, HBase, et al) have solved the update problem (among many other scalability issues, for several years). If an update is being performed and it is not known where the record exists, the update capability of the system is

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jan Høydahl
Hi, I think Katta integration is nice, but it is not very real-time. What if you want both? Perhaps a Katta/SolrCloud integration could make the two frameworks play together, so that some shards in SolrCloud may be marked as static while others are realtime. SolrCloud will handle indexing the

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Otis Gospodnetic
for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Jason Rutherglen jason.rutherg...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 16, 2012 8:42 PM Subject: Re: Options for automagically Scaling Solr (without needing

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jason Rutherglen
/index.html From: Jason Rutherglen jason.rutherg...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 16, 2012 8:42 PM Subject: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment One of big

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Lukáš Vlček
Hi, speaking about ES I think it would be fair to mention that one has to specify number of shards upfront when the index is created - that is correct, however, it is possible to give index one or more aliases which basically means that you can add new indices on the fly and give them same alias

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-16 Thread Jason Rutherglen
One of big weaknesses of Solr Cloud (and ES?) is the lack of the ability to redistribute shards across servers. Meaning, as a single shard grows too large, splitting the shard, while live updates. How do you plan on elastically adding more servers without this feature? Cassandra and HBase

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-15 Thread Jason Rutherglen
: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Friday, April 13, 2012 7:16 PM Subject: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment Thanks Otis. I really appreciate the details offered here. This was very

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Jan Høydahl
Hi, This won't give you the performance you need, unless you have enough RAM on the Solr box to cache the whole index in memory. Have you tested this yourself? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. apr. 2012, at

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Otis Gospodnetic
From: Ali S Kureishy safdar.kurei...@gmail.com To: Otis Gospodnetic otis_gospodne...@yahoo.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Friday, April 13, 2012 7:16 PM Subject: Re: Options for automagically Scaling Solr (without needing distributed index

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Lance Norskog
(without needing distributed index/replication) in a Hadoop environment Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Jan Høydahl
Hi, For a web crawl+search like this you will probably need a lot of additional Big Data crunching, so a Hadoop based solution is wise. In addition to those products mentioned we also now have Amazon's own CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr (not

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Ali S Kureishy
Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's two recommendations for SolrCloud so far), so I will

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
You could use SolrCloud (for the automatic scaling) and just mount a fuse[1] HDFS directory and configure solr to use that directory for its data. [1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote: Hi, I'm trying to setup a

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Ali S Kureishy
Thanks Darren. Actually, I would like the system to be homogenous - i.e., use Hadoop based tools that already provide all the necessary scaling for the lucene index (in terms of throughput, latency of writes/reads etc). Since SolrCloud adds its own layer of sharding/replication that is outside

RE: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
Solrcloud or any other tech specific replication isnt going to 'just work' with hadoop replication. But with some significant custom coding anything should be possible. Interesting idea. brbrbr--- Original Message --- On 4/12/2012 09:21 AM Ali S Kureishy wrote:brThanks Darren. br

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Otis Gospodnetic
Hello Ali, I'm trying to setup a large scale *Crawl + Index + Search *infrastructure using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, crawled + indexed every *4 weeks, *with a search latency of less than 0.5 seconds. That's fine.  Whether it's doable with any tech