Re: anyone use hadoop+solr?

2010-06-22 Thread Neeb
Hey James, Just wondering if you ever had a chance to try out hadoop with solr? Would appreciate any information/directions you could give. I am particularly interested in indexing using a mapreduce job. Cheers, -Ali -- View this message in context:

Re: solr with hadoop

2010-06-22 Thread Neeb
Hi, We currently have a master-slave setup for solr with two slave servers. We are using Solrj (stream-update-solr-server) to index master slave, which takes 6 hours to index around 15 million documents. I would like to explore hadoop, in particularly for indexing job using mapreduce approach.

Re: anyone use hadoop+solr?

2010-06-22 Thread Neeb
Thanks Marc, Well I have an HBASE storage architecture and solr master-slave setup with two slave servers. Would this patch work with my setup? Do I need sharding in place? and what tasks would be run at map and reduce phases? I was thinking something like: At Map: read documents as

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Neeb
Thanks guys. I will try this with some test documents, fingers crossed. And by the way, I got the minTokenLen parameter from one of the thread replies (from Erik). Cheerz, Ali -- View this message in context:

Re: Filtering near-duplicates using TextProfileSignature

2010-06-08 Thread Neeb
Hey Andrew, Just wondering if you ever managed to run TextProfileSignature based deduplication. I would appreciate it if you could send me the code fragment for it from solrconfig. I have currently something like this, but not sure if I am doing it right: updateRequestProcessorChain