Build Java Package for required schema and solrconfig files field and configuration.
Hello Everyone, I have created a autosuggest using Solr suggester. I have added a field and field type in schema.xml and did some changes in /suggest request handler into solrconfig.xml. Now, I need to build a java package using those configuration which I need to plug into my current java project. I don't want to use CURL, I need my configuration as jar or java package. How can I do ? Not having experience of jar package too much. Any help please... Thanks, Nitin
Re: How get around solr's spellcheck maxEdit limit of 2?
Ok, But IndexBasedSpellChecker needs a directory where all indexes are stored to do spell check. I don't have any idea about IndexBasedSpellChecker. If you send me snap configuration of that. It will help me.. Thanks On Fri, Jan 22, 2016 at 1:45 AM Dyer, James <james.d...@ingramcontent.com> wrote: > But if you really need more than 2 edits, I think IndexBasedSpellChecker > supports it. > > James Dyer > Ingram Content Group > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, January 21, 2016 11:29 AM > To: solr-user > Subject: Re: How get around solr's spellcheck maxEdit limit of 2? > > bq: ...is anyway to increase that maxEdit > > IIUC, increasing maxEdit beyond 2 increases the space/time required > unacceptably, that limit is there on purpose, put there by people who > know their stuff. > > Best, > Erick > > On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki <nitinml...@gmail.com> > wrote: > > I am using Solr for spell Correction. Solr is limited to maxEdit of 2. > Does > > there is anyway to increase that maxEdit without using phonetic mapping ? > > Please any suggestions > >
How get around solr's spellcheck maxEdit limit of 2?
I am using Solr for spell Correction. Solr is limited to maxEdit of 2. Does there is anyway to increase that maxEdit without using phonetic mapping ? Please any suggestions
IOException, ConnectionTimeout Error while searching
Hello, I indexed 2 million documents and after completing indexing. I tried for searching. It throws IOException and Connection Timeout Error. error:{ msg:org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://192.168.1.25:8983/solr/col_ner_shard1_replica1;, trace:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://192.168.1.25:8983/solr/col_ner_shard1_replica1\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:337)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
Re: Make search faster in Solr
Okay davidphilip. On Mon, Aug 10, 2015 at 8:24 PM davidphilip cherian davidphilipcher...@gmail.com wrote: Hi Nitin, 32 shards for 16 million documents is too much. 2 shards should suffice considering your document sizes are moderate. Caches are to be monitored and tuned accordingly. You should study about caches a bit here https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On Mon, Aug 10, 2015 at 4:34 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have 32 shards and single replica of each shards having 4 nodes over Solr cloud. I have indexed 16 million documents. Without cache, total time taken to search a document is 0.2 second. And with cache is 0.04 second. I don't do anything of cache. Caches are set by default in solrconfig.xml. How to make faster search without cache? Or how to make more faster with cache while searching. Which cache is used for searching?
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, Thanks a lot for your help. I will go through MongoDB. On Mon, Aug 10, 2015 at 9:14 PM Erick Erickson erickerick...@gmail.com wrote: bq: I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. Do not do this. This has nothing to do with the number of searcher threads. And with your update rate, especially if you continue to insist on adding commit=true to every update request, this will explode your memory requirements. To no good purpose whatsoever. bq: But MongoDB can handle concurrent searching and indexing faster. Because MongoDB is optimized for different kinds of operations. Solr is a ranking, free-text search engine. It's an apples-and-oranges comparison. If MongoDB meets your search needs, you should use it. Best, Erick On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Make search faster in Solr
Hi davidphilip, Without caching, Can we do fast searching? On Tue, Aug 11, 2015 at 11:43 AM Nitin Solanki nitinml...@gmail.com wrote: Okay davidphilip. On Mon, Aug 10, 2015 at 8:24 PM davidphilip cherian davidphilipcher...@gmail.com wrote: Hi Nitin, 32 shards for 16 million documents is too much. 2 shards should suffice considering your document sizes are moderate. Caches are to be monitored and tuned accordingly. You should study about caches a bit here https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On Mon, Aug 10, 2015 at 4:34 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have 32 shards and single replica of each shards having 4 nodes over Solr cloud. I have indexed 16 million documents. Without cache, total time taken to search a document is 0.2 second. And with cache is 0.04 second. I don't do anything of cache. Caches are set by default in solrconfig.xml. How to make faster search without cache? Or how to make more faster with cache while searching. Which cache is used for searching?
Re: Concurrent Indexing and Searching in Solr.
Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Make search faster in Solr
Hi, I have 32 shards and single replica of each shards having 4 nodes over Solr cloud. I have indexed 16 million documents. Without cache, total time taken to search a document is 0.2 second. And with cache is 0.04 second. I don't do anything of cache. Caches are set by default in solrconfig.xml. How to make faster search without cache? Or how to make more faster with cache while searching. Which cache is used for searching?
Is cache enabled by default?
Hi, I have commented querycache, filterquerycache and document cache. Still searching is using cache. why so? 2) First time searching a query, it takes time and afterwards, it can't due to cache, I know that. But how to make search always faster even first time searching?
Re: Concurrent Indexing and Searching in Solr.
Thanks Erick for your suggestion. I will remove commit = true and use solr 5.2 and then get back to you again. For further help. Thanks. On Sat, Aug 8, 2015 at 4:07 AM Erick Erickson erickerick...@gmail.com wrote: bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent
Concurrent Indexing and Searching in Solr.
Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
Hi, Upayavira RAM = 28GB CPU = 4 processes.. On Fri, Aug 7, 2015 at 8:53 PM Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Hard Commit not working
Hi, I am trying to index documents using solr cloud. After setting, maxTime to 6 ms in hard commit. Documents are visible instantly while adding them. Not commiting after 6 ms. I have added Solr log. Please check it. I am not getting exactly what is happening. *CURL to commit documents:* curl http://localhost:8983/solr/test/update/json -H 'Content-type:application/json' -d 'json-here' *Solrconfig.xml:* autoCommit maxDocs1/maxDocs maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit !--autoSoftCommit -- !-- maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime -- !--/autoSoftCommit-- *Solr Log: * INFO - 2015-07-30 14:14:12.636; [test shard6 core_node2 test_shard6_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [test_shard6_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADERwaitSearcher=trueopenSearcher=truecommit=truesoftCommit=falsedistrib.from= http://100.77.202.145:8983/solr/test_shard2_replica1/commit_end_point=truewt=javabinversion=2expungeDeletes=false} {commit=} 0 26
Re: Hard Commit not working
Hi Edwards, I am only sending 1 document for indexing then why it is committing instantly. I gave maxTime to 6. On Thu, Jul 30, 2015 at 8:26 PM Edward Ribeiro edward.ribe...@gmail.com wrote: Your maxDocs is set to 1. This is the number of pending docs before autocommit is triggered too. You should set it to a higher value like 1, for example. Edward Em 30/07/2015 11:43, Nitin Solanki nitinml...@gmail.com escreveu: Hi, I am trying to index documents using solr cloud. After setting, maxTime to 6 ms in hard commit. Documents are visible instantly while adding them. Not commiting after 6 ms. I have added Solr log. Please check it. I am not getting exactly what is happening. *CURL to commit documents:* curl http://localhost:8983/solr/test/update/json -H 'Content-type:application/json' -d 'json-here' *Solrconfig.xml:* autoCommit maxDocs1/maxDocs maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit !--autoSoftCommit -- !-- maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime -- !--/autoSoftCommit-- *Solr Log: * INFO - 2015-07-30 14:14:12.636; [test shard6 core_node2 test_shard6_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [test_shard6_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADERwaitSearcher=trueopenSearcher=truecommit=truesoftCommit=falsedistrib.from= http://100.77.202.145:8983/solr/test_shard2_replica1/commit_end_point=truewt=javabinversion=2expungeDeletes=false } {commit=} 0 26
Solr port went down on remote server
Hi, I have installed Solr on remote server and started on port 8983. Now, I have bind my local machine port 8983 with remote server 8983 of Solr using *ssh* (Ubuntu OS). When I am requesting on Solr for getting the suggestions on remote server through local machine calls. Sometimes it gives response, sometimes doesn't. I am not able to detect the problem that why is it so? Is it remote server binding issue? OR Solr went down ? I am not getting the problem. To detect the problem, I ran a crontab job using telnet command to check existence of port (8983) of Solr. It is working fine without throwing any connection refused error. I am able to detect the problem. Any help please..
Re: Data indexing is going too slow on single shard Why?
Okay. Thanks Shawn.. On Thu, Mar 26, 2015 at 12:25 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/26/2015 12:03 AM, Nitin Solanki wrote: Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? Obviously I cannot guarantee it, but I think it's extremely likely that with that much memory, performance will be very good. One other possibility, which is discussed on that wiki page I linked, is that your java heap is being almost exhausted and large amounts of time are spent in garbage collection. If you increase the heap from 4GB to 5GB and see performance get better, then that would be confirmed. There would be less memory available for caching, but constant garbage collection would be a much greater problem than the disk cache being too small. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? On Wed, Mar 25, 2015 at 9:50 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/25/2015 8:42 AM, Nitin Solanki wrote: Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux snip are running. Java heap set to 4096 MB in Solr. While indexing, snip *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. Data Size: 102Gsolr/node1/solr/wikingram_shard1_replica2 102Gsolr/node2/solr/wikingram_shard1_replica1 If both of those are on the same machine, I'm guessing that you're running two Solr instances on that machine, so there's 8GB of RAM used for Java. That means you have about 24 GB of RAM left for caching ... and 200GB of index data to cache. 24GB is not enough to cache 200GB of index. If there is only one Solr instance (leaving 28GB for caching) with 102GB of data on the machine, it still might not be enough. See that SolrPerformanceProblems wiki page I linked in my earlier email. For 102GB of data per server, I recommend at least 64GB of total RAM, preferably 128GB. For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
Hello, * Updating my question again.* Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* ii) Searching a query/term is also taking too much time. Any help on this also. On Wed, Mar 25, 2015 at 4:33 PM, Nitin Solanki nitinml...@gmail.com wrote: Hello, Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* Best, Nitin
Re: Data indexing is going too slow on single shard Why?
Hi Shawn, Sorry for all the things. Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux *Earlier*, I was using 8 shards without replica(default is 1) using SOLR CLOUD. On server, Only Solr is running. There is no other application which are running. Java heap set to 4096 MB in Solr. While indexing, Solr(sometime) eats up whole RAM. I don't know how each solr server takes RAM? Each server taking around 50 GB data(indexed). Actually, I had deleted previous solr architecture, so I don't have any idea that how many documents were on each shards and also don't know total documents. *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. Data Size: 102Gsolr/node1/solr/wikingram_shard1_replica2 102Gsolr/node2/solr/wikingram_shard1_replica1 I am running a python script to index data using Solr RESTAPI. Commiting 2 Documents each time for indexing using python script with Solr RESTAPI. If I missed anything related to Solr. Please inform me.. THanks Shawn. Waiting for your reply On Wed, Mar 25, 2015 at 7:33 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/25/2015 5:03 AM, Nitin Solanki wrote: Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. There's practically no information to go on here, so about all I can offer is general information in return: http://wiki.apache.org/solr/SolrPerformanceProblems I looked over the previous messages that you have sent the list, and I can find very little of the required information about your index. I see a lot of questions from you, but they did not include the kind of details needed here: How much total RAM is in each Solr server? Are there any other programs on the server with significant RAM requirements? An example of such a program would be a database server. On each server, how much memory is dedicated to the java heap(s) for Solr? I gather from other questions that you are running SolrCloud, can you confirm? On a per-server basis, how much disk space do all the index replicas take? How many documents are on each server? Note that for disk space and number of documents, I am asking you to count every replica, not take the total in the collection and divide it by the number of servers. How are you doing your indexing? For this question, I am asking what program or Solr API is actually sending the data to Solr. Possible answers include the dataimport handler, a SolrJ program, one of the other Solr APIs such as a PHP client, and hand-crafted URLs with an HTTP client. Thanks, Shawn
Data indexing is going too slow on single shard Why?
Hello, Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* Best, Nitin
Read or Capture Solr Logs
Hello, I want to read or capture all the queries which are searched by users. Any help on this?
Set search query logs into Solr
Hello, I want to insert searched queries into solr log to track the input of users. I googled too much but didn't find anything. Please help. Your help will be appreciated...
Re: Read or Capture Solr Logs
Hi Markus, Can you please help me. How to do that? Using both Process the logs or make a simple SearchComponent implementation that reads SolrQueryRequest On Tue, Mar 24, 2015 at 4:25 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Markus, Can you please help me. How to do that? Using both Process the logs or make a simple SearchComponent implementation that reads SolrQueryRequest. On Tue, Mar 24, 2015 at 4:17 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hello, you can either process the logs, or make a simple SearchComponent implementation that reads SolrQueryRequest. Markus -Original message- From:Nitin Solanki nitinml...@gmail.com Sent: Tuesday 24th March 2015 11:38 To: solr-user@lucene.apache.org Subject: Read or Capture Solr Logs Hello, I want to read or capture all the queries which are searched by users. Any help on this?
Re: How to deal with different configurations on different collection?
Thanks Shawn. It is working now as you said.. No need to switch to external zookeeper. It is also working in embedded zookeeper On Mon, Mar 23, 2015 at 5:42 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/23/2015 4:51 AM, Nitin Solanki wrote: Few days before, I have created a collection (wikingram) in solr 4.10.4(Solr cloud) by applying default configuration from collection1. *sudo /mnt/nitin/Solr/solr_lm/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir /mnt/nitin/Solr/solr_lm/example/solr/collection1/conf -confname default* Now, I want to create another collection (wikingram2) which needs different configuration. How can I do that? How to deal with different configuration on different collections. *Scenario: * { wikingram : myconf1, wikingram2 : myconf2 } How to set the configuration just like above ? The upconfig command that you executed has uploaded a config named default (because of the -confname default parameters). To do what you want, simply repeat the upconfig command with another configuration directory and -confname myconf2, then use that configName when you call the Collections API to create the second collection. I notice you're using the embedded zookeeper. You're going to want to switch to an external zookeeper ensemble with at least three hosts before you go into production. Thanks, Shawn
How to deal with different configurations on different collection?
Hello, Few days before, I have created a collection (wikingram) in solr 4.10.4(Solr cloud) by applying default configuration from collection1. *sudo /mnt/nitin/Solr/solr_lm/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir /mnt/nitin/Solr/solr_lm/example/solr/collection1/conf -confname default* Now, I want to create another collection (wikingram2) which needs different configuration. How can I do that? How to deal with different configuration on different collections. *Scenario: * { wikingram : myconf1, wikingram2 : myconf2 } How to set the configuration just like above ? I don't have much idea about that.. Any help please?
Re: Whole RAM consumed while Indexing.
Hi Erick, I read mergeFactor Policy for indexing. By default, mergerFactor is 10. As said in document, High value merge factor (e.g., 25): - Pro: Generally improves indexing speed - Con: Less frequent merges, resulting in a collection with more index files which may slow searching Low value merge factor (e.g., 2): - Pro: Smaller number of index files, which speeds up searching. - Con: More segment merges slow down indexing. So, My main purpose is **searching**. Searching must be fast. Therefore, If I set the value of **mergeFactor = 2 ** then indexing will be slow but searching may fast right. Once Again, I will tell. I am indexing(Total data size - 28GB) 2 document at a time that encounter commits after 15 seconds(hard commit) and 10 mins(soft commit). Is searching be fast, if I set **mergeFactor = 2 ** and what should be the value for ramBufferSizeMB, maxBufferedDocs, maxIndexingThreads? Right now, All value are set by default.. On Fri, Mar 20, 2015 at 11:42 AM, Nitin Solanki nitinml...@gmail.com wrote: On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson erickerick...@gmail.com wrote: That or even hard commit to 60 seconds. It's strictly a matter of how often you want to close old segments and open new ones. On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick.. I read your Article. Really nice... Inside that you said that for bulk indexing. Set soft commit = 10 mins and hard commit = 15sec. Is it also okay for my scenario? On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson erickerick...@gmail.com wrote: bq: As you said, do commits after 6 seconds No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_ as Shawn said. So setting it to 6 is every minute. From solrconfig.xml, conveniently located immediately above the autoCommit tag: maxTime - Maximum amount of time in ms that is allowed to pass since a document was added before automatically triggering a new commit. Also, a lot of answers to soft and hard commits is here as I pointed out before, did you read it? https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best Erick On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit
Re: Whole RAM consumed while Indexing.
On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson erickerick...@gmail.com wrote: That or even hard commit to 60 seconds. It's strictly a matter of how often you want to close old segments and open new ones. On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick.. I read your Article. Really nice... Inside that you said that for bulk indexing. Set soft commit = 10 mins and hard commit = 15sec. Is it also okay for my scenario? On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson erickerick...@gmail.com wrote: bq: As you said, do commits after 6 seconds No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_ as Shawn said. So setting it to 6 is every minute. From solrconfig.xml, conveniently located immediately above the autoCommit tag: maxTime - Maximum amount of time in ms that is allowed to pass since a document was added before automatically triggering a new commit. Also, a lot of answers to soft and hard commits is here as I pointed out before, did you read it? https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best Erick On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick
Re: Whole RAM consumed while Indexing.
Hi Alxeandre, Number of segment counts are different but document counts are same. With (soft commit - 300 and hardcommit - 6000) = No. of segment - 43 AND With (soft commit - 6 and hardcommit - 6) = No. of segment - 31 I dont' have any idea related to segment counts. What is it? How to solve it? Any idea. Or it is fine without worrying about segments. Just want to ask - If segment counts are more than searching will be slow? On Wed, Mar 18, 2015 at 10:14 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds
Re: Whole RAM consumed while Indexing.
Hi Erick.. I read your Article. Really nice... Inside that you said that for bulk indexing. Set soft commit = 10 mins and hard commit = 15sec. Is it also okay for my scenario? On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson erickerick...@gmail.com wrote: bq: As you said, do commits after 6 seconds No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_ as Shawn said. So setting it to 6 is every minute. From solrconfig.xml, conveniently located immediately above the autoCommit tag: maxTime - Maximum amount of time in ms that is allowed to pass since a document was added before automatically triggering a new commit. Also, a lot of answers to soft and hard commits is here as I pointed out before, did you read it? https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best Erick On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just
Re: Whole RAM consumed while Indexing.
Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Add replica on shards
Thanks Norgorn. I did the same thing but in different manner.. like - localhost:8983/solr/admin/cores?action=CREATEname=wikingram_shard4_replica3collection=wikingramproperty.shard=shard4 On Wed, Mar 18, 2015 at 7:20 PM, Norgorn lsunnyd...@mail.ru wrote: U can do the same simply by something like that http://localhost:8983/solr/admin/cores?action=CREATEcollection=wikingramname=ANY_NAME_HEREshard=shard1 The main part is shard=shard1, when you create core with existing shard (core name doesn't matter, we use collection_shard1_replica2, but u can do whatever u want), this core becomes a replica and copies data from leading shard. -- View this message in context: http://lucene.472066.n3.nabble.com/Add-replica-on-shards-tp4193659p4193732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Whole RAM consumed while Indexing.
Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing
Re: Whole RAM consumed while Indexing.
When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM
Re: Add replica on shards
Any help please... On Wed, Mar 18, 2015 at 12:02 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have created 8 shards on a collection named as ***wikingram**. Now at that time, I were not created any replica. Now, I want to add a replica on each shard. How can I do? I created this - ** sudo curl http://localhost:8983/solr/admin/collections?action=ADDREPLICAcollection=wikingramshard=shard1node=localhost:8983_solr** but it is not working. It throws errror - response lst name=responseHeader int name=status400/int int name=QTime86/int /lst str name=Operation ADDREPLICA caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : null/str lst name=exception str name=msgCould not find collection : null/str int name=rspCode400/int /lst lst name=error str name=msgCould not find collection : null/str int name=code400/int /lst /response Any help on this?
Add replica on shards
Hi, I have created 8 shards on a collection named as ***wikingram**. Now at that time, I were not created any replica. Now, I want to add a replica on each shard. How can I do? I created this - ** sudo curl http://localhost:8983/solr/admin/collections?action=ADDREPLICAcollection=wikingramshard=shard1node=localhost:8983_solr** but it is not working. It throws errror - response lst name=responseHeader int name=status400/int int name=QTime86/int /lst str name=Operation ADDREPLICA caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : null/str lst name=exception str name=msgCould not find collection : null/str int name=rspCode400/int /lst lst name=error str name=msgCould not find collection : null/str int name=code400/int /lst /response Any help on this?
Re: Want to modify Solr Source Code
Hi Anshum, The reason behind to edit source code is that I am using spell check component on solr. I have implemented it and it is working fine.. But something suggestion frequency goes vary. I have explain that it into this - http://stackoverflow.com/questions/28857915/original-frequency-is-not-matching-with-suggestion-frequency-in-solr. Please check it.. And now, I am thinking that if I will able to add hitcount instead of freq in suggestion then I will be helpful for my purpose. On Tue, Mar 17, 2015 at 1:23 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Nitin, Do you intend to browse the code? If you really want to modify the code, can you tell us about what exactly is it that you're trying to achieve? Can you clarify on how you want to test Solr? If so, do you plan on running the tests that Solr ships with or do you have your own tests? All said and done, if you don't want to use svn but still want to download the Solr source, you can download the same (for Solr 5.0.0) from any of the mirrors listed here: http://www.apache.org/dyn/closer.cgi/lucene/solr/5.0.0 On Tue, Mar 17, 2015 at 12:42 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi Gora, Hi, I want to make changes only into my machine without svn. I want to do test on source code. How ? Any steps to do so ? Please help.. On Tue, Mar 17, 2015 at 1:01 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 12:22, Nitin Solanki nitinml...@gmail.com wrote: Hi, I want to modify the solr source code. I don't have any idea where source code is available. I want to edit source code. How can I do ? Any help please... Please start with: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Bug_Fixes.2C_Tests.2C_etc29 Regards, Gora -- Anshum Gupta
Want to modify Solr Source Code
Hi, I want to modify the solr source code. I don't have any idea where source code is available. I want to edit source code. How can I do ? Any help please...
Re: Want to modify Solr Source Code
Hi Ramkumar, Sorry but svn will create cumbersome for me and I don't want to use it right now. I want to do anything on local machine without using svn. As you said to download -src.tgz. I have download solr-4.10.2-src.tar.gz. Now able to see source code. Now how to configure it and compile any file, if I will change? Any help please.. On Tue, Mar 17, 2015 at 1:21 PM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Is your concern that you want to be able to modify source code just on your machine or that you can't for some reason install svn? If it's the former, even if you checkout using svn, you can't modify anything outside the machine as changes can be checked in only by the committers of the project. You will need to raise a JIRA for the changes to go back in as described by the wiki page. If the latter, try downloading the source code using the downloads section in https://lucene.apache.org/solr and choose the download which ends as -src.tgz, that has the source bundled as a single file. On 17 Mar 2015 07:42, Nitin Solanki nitinml...@gmail.com wrote: Hi Gora, Hi, I want to make changes only into my machine without svn. I want to do test on source code. How ? Any steps to do so ? Please help.. On Tue, Mar 17, 2015 at 1:01 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 12:22, Nitin Solanki nitinml...@gmail.com wrote: Hi, I want to modify the solr source code. I don't have any idea where source code is available. I want to edit source code. How can I do ? Any help please... Please start with: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Bug_Fixes.2C_Tests.2C_etc29 Regards, Gora
Re: Want to modify Solr Source Code
Hi Gora, Hi, I want to make changes only into my machine without svn. I want to do test on source code. How ? Any steps to do so ? Please help.. On Tue, Mar 17, 2015 at 1:01 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 12:22, Nitin Solanki nitinml...@gmail.com wrote: Hi, I want to modify the solr source code. I don't have any idea where source code is available. I want to edit source code. How can I do ? Any help please... Please start with: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Bug_Fixes.2C_Tests.2C_etc29 Regards, Gora
Re: Want to modify Solr Source Code
I have already downloaded http://archive.apache.org/dist/lucene/solr/4.10.2/solr-4.10.2.tgz. Now, How to view or edit the source code of any file? I don't have any idea about it.. Your help is appreciated.. Please guide my step by step.. Thanks again.. On Tue, Mar 17, 2015 at 1:16 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 13:12, Nitin Solanki nitinml...@gmail.com wrote: Hi Gora, Hi, I want to make changes only into my machine without svn. I want to do test on source code. How ? Any steps to do so ? Please help.. You could still use SVN for a local repository. Else, you can download a tar.gz of a Solr distribution from under the Download link at the top right of http://lucene.apache.org/solr/ Regards, Gora
Re: Want to modify Solr Source Code
Hi Gora, Thanks again. Do you have any link/ article of Wiki article? Please send me. On Tue, Mar 17, 2015 at 1:30 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 13:21, Nitin Solanki nitinml...@gmail.com wrote: I have already downloaded http://archive.apache.org/dist/lucene/solr/4.10.2/solr-4.10.2.tgz. Now, How to view or edit the source code of any file? I don't have any idea about it.. Your help is appreciated.. Please guide my step by step.. Thanks again.. You need to learn the basics of putting together a development setup yourself, or from a local mentor. A .tgz is a gzip-compressed tar file that can be unarchived with tar, or most unarchivers. You are probably best off to use a Java IDE, such as Eclipse, to edit the source code. The Wiki article covers how to compile the code, and run the built-in tests. Regards, Gora
Shards doesn't seems to give same suggestion of a term/ misspell term.
Hi everyone, I have stuck in a big issue. First I will explain what I am doing. I am creating a spell correction using Solr where I have indexed 21GB of data and used sharding/ distributed search. I have created 4 nodes having 8 shards without any replica. When I search a term, I got suggestions of it. But here the problem is that each shard/server not able to give me the same suggestion terms of each searched term. *Example* - If I search term - **chare** then I got **care** as suggestion but I got **care** suggestion on 6 shards and missing on 2 shards by which total frequency of **care** on 6 shards is 50. But the actual/original frequency of **case** is 96 and remaining frequency(46) missing out in 2 shards. After using debugQuery=true. I am able to find the frequency issue that left out. Now how to get the suggestion inside each shard same. Or any other solution ? Please help.. Thanks again. Warm Regards, Nitin Solanki
Re: Want to modify Solr Source Code
Hi all, I have configured solr source code with eclipse. Now, I have written a print statement in between the SolrSpellChecker.java. Now, I want to compile this file. How to do that ? Any help please... On Tue, Mar 17, 2015 at 2:27 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 13:38, Nitin Solanki nitinml...@gmail.com wrote: Hi Gora, Thanks again. Do you have any link/ article of Wiki article? Please send me. Sent the link in my very first follow-up: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Bug_Fixes.2C_Tests.2C_etc29 Regards, Gora
Re: Want to modify Solr Source Code
Hi all, How to set breakpoints throughout the Solr code, step through code ? On Tue, Mar 17, 2015 at 6:22 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi all, I have configured solr source code with eclipse. Now, I have written a print statement in between the SolrSpellChecker.java. Now, I want to compile this file. How to do that ? Any help please... On Tue, Mar 17, 2015 at 2:27 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 March 2015 at 13:38, Nitin Solanki nitinml...@gmail.com wrote: Hi Gora, Thanks again. Do you have any link/ article of Wiki article? Please send me. Sent the link in my very first follow-up: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Bug_Fixes.2C_Tests.2C_etc29 Regards, Gora
thresholdTokenFrequency changes suggestion frequency..
Hi, I am not getting that why suggestion frequency goes varies from original frequency. Example - I have a word = *who* and its original frequency is *100* but when I find suggestion of it. It suggestion goes change to *50*. I think it is happening because of *thresholdTokenFrequency*. When I set the value of thresholdTokenFrequency to *0.1* then it gives different frequency for 'who' suggestion while set the value of thresholdTokenFrequency to *0.0001* then it gives something different frequency. Why so? I am not getting logic behind this.. As we know suggestion frequency is same as the index original frequency - *The spellcheck.extendedResults=true parameter provides frequency of each original term in the index (origFreq) as well as the frequency of each suggestion in the index (frequency).*
maxQueryFrequency v/s thresholdTokenFrequency
Hello Everyone, Please anybody can explain me what is the difference between maxQueryFrequency and thresholdTokenFrequency? Got the link - http://wiki.apache.org/solr/SpellCheckComponent#thresholdTokenFrequency but unable to understand.. I am very much confusing in both of them. Your help is appreciated. Warm Regards, Nitin
Re: Whole RAM consumed while Indexing.
Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Update solr schema.xml in real time for Solr 4.10.1
Hi Zheng, As you said **there's no physical schema.xml** but I have. I am using sampletechproductsconfig configuration where I have found schema.xml. In that, I am managing my schema.xml and then I upload that it into zookeeper and reload the collection. On 3/14/15, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, The real time update of the schema means we can just do an update using REST-API curl instead of manually editing the schema.xml and restart the Solr server. In Solr 5.0, if Solr is loading the schema from the resource named in 'managedSchemaResourceName', instead of schema.xml, I can just update it from the REST-API curl. For earlier version of Solr, the default setting is ClassicIndexSchemaFactory, which is read from schema.xml. So besides getting Solr to load the schema from the resource named in 'managedSchemaResourceName', rather than from schema.xml, is there other settings required? Zheng Lin On 12 March 2015 at 23:26, Erick Erickson erickerick...@gmail.com wrote: Actually I ran across a neat IntelliJ plugin that you could install and directly edit ZK files. And I'm pretty sure there are stand-alone programs that do this, but they are all outside Solr. I'm not sure what real time update of the schema is for, would you (Zheng) explain further? Collections _must_ be reloaded for schema changes to take effect so I'm not quite sure what you're referring to. Nitin: The usual process is to have the master config be local, change the local version then upload it to ZK with the upconfig option in zkCli, then reload your collection. Best, Erick On Thu, Mar 12, 2015 at 6:04 AM, Shawn Heisey apa...@elyograg.org wrote: On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote: I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1. Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0? Providing a way to simply edit the config files directly is a potential security issue. We briefly had a way to edit those configs right in the admin UI, but Redhat reported this capability as a security problem, so we removed it. I don't remember whether there is a way to re-enable this functionality. The Schema REST API is available in 4.10. It was also present in 4.9. Currently you can only *add* to the schema, you cannot edit what's already there. Thanks, Shawn
Re: Update solr schema.xml in real time for Solr 4.10.1
Ok.. Got Zheng... Thanks a Lot.. On Sat, Mar 14, 2015 at 1:02 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Nitin, What I experienced is when I create a new collection, there's no physical schema in that collection. But there is schema.xml in some of the example folder. You can create your own schema.xml in your own collection, but in order to use it, you have to change the schemaFactory class to ClassicIndexSchemaFactory in solrconfig.xml. As by default, the schemaFactory class is set to ManagedIndexSchemaFactory in Solr 5.0. Zheng Lin On 14 March 2015 at 15:22, Nitin Solanki nitinml...@gmail.com wrote: Hi Zheng, As you said **there's no physical schema.xml** but I have. I am using sampletechproductsconfig configuration where I have found schema.xml. In that, I am managing my schema.xml and then I upload that it into zookeeper and reload the collection. On 3/14/15, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, The real time update of the schema means we can just do an update using REST-API curl instead of manually editing the schema.xml and restart the Solr server. In Solr 5.0, if Solr is loading the schema from the resource named in 'managedSchemaResourceName', instead of schema.xml, I can just update it from the REST-API curl. For earlier version of Solr, the default setting is ClassicIndexSchemaFactory, which is read from schema.xml. So besides getting Solr to load the schema from the resource named in 'managedSchemaResourceName', rather than from schema.xml, is there other settings required? Zheng Lin On 12 March 2015 at 23:26, Erick Erickson erickerick...@gmail.com wrote: Actually I ran across a neat IntelliJ plugin that you could install and directly edit ZK files. And I'm pretty sure there are stand-alone programs that do this, but they are all outside Solr. I'm not sure what real time update of the schema is for, would you (Zheng) explain further? Collections _must_ be reloaded for schema changes to take effect so I'm not quite sure what you're referring to. Nitin: The usual process is to have the master config be local, change the local version then upload it to ZK with the upconfig option in zkCli, then reload your collection. Best, Erick On Thu, Mar 12, 2015 at 6:04 AM, Shawn Heisey apa...@elyograg.org wrote: On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote: I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1. Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0? Providing a way to simply edit the config files directly is a potential security issue. We briefly had a way to edit those configs right in the admin UI, but Redhat reported this capability as a security problem, so we removed it. I don't remember whether there is a way to re-enable this functionality. The Schema REST API is available in 4.10. It was also present in 4.9. Currently you can only *add* to the schema, you cannot edit what's already there. Thanks, Shawn
Re: Update solr schema.xml in real time for Solr 4.10.1
Hi Zheng, *** I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl ** *. Would please help me how to do this? I need to update both schema.xml and solrconfig.xml in Solr 5.0 in SolrCloud. Your help is appreciated.. *Thanks Again..* On Thu, Mar 12, 2015 at 1:30 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1. Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0? Regards, Edwin
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Hi. Erick.. Would please help me distinguish between Uploading a Configuration Directory and Linking a Collection to a Configuration Set ? On Thu, Mar 12, 2015 at 2:01 AM, Nitin Solanki nitinml...@gmail.com wrote: Thanks a lot Erick.. It will be helpful. On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson erickerick...@gmail.com wrote: The configs are in Zookeeper. So you have to switch your thinking, it's rather confusing at first. When you create a collection, you specify a config set, these are usually in ./server/solr/configsets/data_driven_schema, ./server/solr/configsets/techproducts and the like. The entire conf directory under one of these is copied to Zookeeper (which you can see from the admin screen cloudtree, then in the right hand side you'll be able to find the config sets you uploaded. But, you cannot edit them there directly. You edit them on disk, then push them to Zookeeper, then reload the collection (or restart everything). See the reference guide here: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities Best, Erick On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, alexandre.. Thanks for responding... When I created new collection(wikingram) using solrCloud. It gets create into example/cloud/node*(node1, node2) like that. I have used *schema.xml and solrconfig.xml of sample_techproducts_configs* configuration. Now, The problem is that. If I change the configuration of *solrconfig.xml of * *sample_techproducts_configs*. Its configuration doesn't reflect on *wikingram* collection. How to reflect the changes of configuration in the collection? On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Which example are you using? Or how are you creating your collection? If you are using your example, it creates a new directory under example. If you are creating a new collection with -c, it creates a new directory under the server/solr. The actual files are a bit deeper than usual to allow for a log folder next to the collection folder. So, for example: example/schemaless/solr/gettingstarted/conf/solrconfig.xml If it's a dynamic schema configuration, you don't actually have schema.xml, but managed-schema, as you should be mostly using REST calls to configure it. If you want to see the configuration files before the collection actually created, they are under server/solr/configsets, though they are not configsets in Solr sense, as they do get copied when you create your collections (sharing them causes issues). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 11 March 2015 at 07:50, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have switched from solr 4.10.2 to solr 5.0.0. In solr 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ folder. Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want to know how to configure in solrcloud ?
Re: Whole RAM consumed while Indexing.
Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Thanks Shawn and Erick for explanation... On Thu, Mar 12, 2015 at 9:02 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/12/2015 9:18 AM, Erick Erickson wrote: By and large, I really never use linking. But it's about associating a config set you've _already_ uploaded with a collection. So uploading is pushing the configset from your local machine up to Zookeeper, and linking is using that uploaded, named configuration with an arbitrary collection. But usually you just make this association when creating the collection. The primary use case that I see for linkconfig is in testing upgrades to configurations. So let's say you have a production collection that uses a config that you name fooV1 for foo version 1. You can build a test collection that uses a config named fooV2, work out all the bugs, and then when you're ready to deploy it, you can use linkconfig to link your production collection to fooV2, reload the collection, and you're using the new config. I haven't discussed here how to handle the situation where a reindex is required. One thing you CAN do is run linkconfig for a collection that doesn't exist yet, and then you don't need to include collection.configName when you create the collection, because the link is already present in zookeeper. I personally don't like doing things this way, but I'm pretty sure it works. Thanks, Shawn
Whole RAM consumed while Indexing.
Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
java.nio.channels.CancelledKeyException
Hi, I am indexing documents on Solr 4.10.2. While indexing, I am getting this error in log - java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169) What does it means? Will it skip the current index documents? Or anything else? Please Help...
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Thanks a lot Erick.. It will be helpful. On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson erickerick...@gmail.com wrote: The configs are in Zookeeper. So you have to switch your thinking, it's rather confusing at first. When you create a collection, you specify a config set, these are usually in ./server/solr/configsets/data_driven_schema, ./server/solr/configsets/techproducts and the like. The entire conf directory under one of these is copied to Zookeeper (which you can see from the admin screen cloudtree, then in the right hand side you'll be able to find the config sets you uploaded. But, you cannot edit them there directly. You edit them on disk, then push them to Zookeeper, then reload the collection (or restart everything). See the reference guide here: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities Best, Erick On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, alexandre.. Thanks for responding... When I created new collection(wikingram) using solrCloud. It gets create into example/cloud/node*(node1, node2) like that. I have used *schema.xml and solrconfig.xml of sample_techproducts_configs* configuration. Now, The problem is that. If I change the configuration of *solrconfig.xml of * *sample_techproducts_configs*. Its configuration doesn't reflect on *wikingram* collection. How to reflect the changes of configuration in the collection? On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Which example are you using? Or how are you creating your collection? If you are using your example, it creates a new directory under example. If you are creating a new collection with -c, it creates a new directory under the server/solr. The actual files are a bit deeper than usual to allow for a log folder next to the collection folder. So, for example: example/schemaless/solr/gettingstarted/conf/solrconfig.xml If it's a dynamic schema configuration, you don't actually have schema.xml, but managed-schema, as you should be mostly using REST calls to configure it. If you want to see the configuration files before the collection actually created, they are under server/solr/configsets, though they are not configsets in Solr sense, as they do get copied when you create your collections (sharing them causes issues). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 11 March 2015 at 07:50, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have switched from solr 4.10.2 to solr 5.0.0. In solr 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ folder. Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want to know how to configure in solrcloud ?
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Hi, alexandre.. Thanks for responding... When I created new collection(wikingram) using solrCloud. It gets create into example/cloud/node*(node1, node2) like that. I have used *schema.xml and solrconfig.xml of sample_techproducts_configs* configuration. Now, The problem is that. If I change the configuration of *solrconfig.xml of * *sample_techproducts_configs*. Its configuration doesn't reflect on *wikingram* collection. How to reflect the changes of configuration in the collection? On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Which example are you using? Or how are you creating your collection? If you are using your example, it creates a new directory under example. If you are creating a new collection with -c, it creates a new directory under the server/solr. The actual files are a bit deeper than usual to allow for a log folder next to the collection folder. So, for example: example/schemaless/solr/gettingstarted/conf/solrconfig.xml If it's a dynamic schema configuration, you don't actually have schema.xml, but managed-schema, as you should be mostly using REST calls to configure it. If you want to see the configuration files before the collection actually created, they are under server/solr/configsets, though they are not configsets in Solr sense, as they do get copied when you create your collections (sharing them causes issues). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 11 March 2015 at 07:50, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have switched from solr 4.10.2 to solr 5.0.0. In solr 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ folder. Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want to know how to configure in solrcloud ?
Where is schema.xml and solrconfig.xml in solr 5.0.0
Hello, I have switched from solr 4.10.2 to solr 5.0.0. In solr 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ folder. Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want to know how to configure in solrcloud ?
Re: how to change configurations in solrcloud setup
Hi Aman, You can apply configuration on solr cloud by using this command - sudo path_of_solr/solr_folder_name/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir path_of_solr/solr_folder_name/example/solr/collection1/conf -confname default and then restart all nodes of solrcloud. On Mon, Mar 9, 2015 at 11:43 AM, Aman Tandon amantandon...@gmail.com wrote: Please help. With Regards Aman Tandon On Sat, Mar 7, 2015 at 9:58 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Please tell me what is best way to apply configuration changes in solr cloud and how to do that. Thanks in advance. With Regards Aman Tandon
Re: Frequency of Suggestion are varying from original Frequency in index
Hi ale42, Yes. I am using the same field (gram_ci) to make a query and also using the same field(gram_ci) to build suggestion on it. Here is the explanation: I have a 2 fields - gram and gram_ci. where gram field sets to stored = true and index = true while gram_ci field sets to stored=false but index = true. and making copy field of gram into gram_ci. Both gram and gram_ci fields using same fieldType - StandardTokenizerFactory and ShingleFilterFactory for both index and query. Only the difference is that gram_ci is using lowercaseFilter and gram doesn't. And I am making query on gram_ci not on gram. On Mon, Mar 9, 2015 at 3:24 PM, ale42 alexandre.faye...@etu.esisar.grenoble-inp.fr wrote: When you make a query, does it use the same field type as the field that you are using to build suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequency-of-Suggestion-are-varying-from-original-Frequency-in-index-tp4190927p4191813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Frequency of Suggestion are varying from original Frequency in index
I am using field as standardTokenizerFactory with ShingleFilterFactory. Is it doing so? On 3/9/15, ale42 alexandre.faye...@etu.esisar.grenoble-inp.fr wrote: So, I think it's depend on the field that you are working on ?! -- View this message in context: http://lucene.472066.n3.nabble.com/Frequency-of-Suggestion-are-varying-from-original-Frequency-in-index-tp4190927p4191800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Frequency of Suggestion are varying from original Frequency in index
Hi ale42, I am not using /solr.IndexBasedSpellChecker/. I used solr.DirectSpellChecker. Is there anyway to solve my issue? On Fri, Mar 6, 2015 at 6:27 PM, ale42 alexandre.faye...@etu.esisar.grenoble-inp.fr wrote: I think these frequencies are not the frequence of the term in the same index : - original frequency represents the number of results that you have in lucene index when you query who. - suggestion frequency is the number of results of this term in the spellcheck dictionnary. I guess you're using /solr.IndexBasedSpellChecker/ ! -- View this message in context: http://lucene.472066.n3.nabble.com/Frequency-of-Suggestion-are-varying-from-original-Frequency-in-index-tp4190927p4191397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Frequency of Suggestion are varying from original Frequency in index
Hi Wang, I am using SolrCloud. Is suggestion not working properly in that? On Fri, Mar 6, 2015 at 2:36 PM, gaohang wang gaohangw...@gmail.com wrote: do you use solrcloud?maybe your suggestion is not support distribute 2015-03-04 22:39 GMT+08:00 Nitin Solanki nitinml...@gmail.com: Hi.. I have a term(who) where original frequency of who is 191 but when I get suggestion of who it gives me 90. Why? Example : *Original Frequency* comes like: spellcheck:{ suggestions:[ who,{ numFound:1, startOffset:1, endOffset:4, origFreq:*191*, correctlySpelled,false]}} While In *Suggestion*, it gives like: spellcheck:{ suggestions:[ whs,{ numFound:1, startOffset:1, endOffset:4, origFreq:0, suggestion:[{ word:who, freq:*90*}]}, correctlySpelled,false]}} Why it is so? I am using StandardTokenizerFactory with ShingleFilterFactory in Schema.xml..
Original frequency is not matching with suggestion frequency in SOLR
Hello, Something suggestion frequency varies from the original frequency. Output for *whs is* - *(73)* which is a suggestion of *who is* varies than its actual original frequency *(94). * *Please* check this link for more explanation - *http://stackoverflow.com/questions/28857915/original-frequency-is-not-matching-with-suggestion-frequency-in-solr http://stackoverflow.com/questions/28857915/original-frequency-is-not-matching-with-suggestion-frequency-in-solr*
Frequency of Suggestion are varying from original Frequency in index
Hi.. I have a term(who) where original frequency of who is 191 but when I get suggestion of who it gives me 90. Why? Example : *Original Frequency* comes like: spellcheck:{ suggestions:[ who,{ numFound:1, startOffset:1, endOffset:4, origFreq:*191*, correctlySpelled,false]}} While In *Suggestion*, it gives like: spellcheck:{ suggestions:[ whs,{ numFound:1, startOffset:1, endOffset:4, origFreq:0, suggestion:[{ word:who, freq:*90*}]}, correctlySpelled,false]}} Why it is so? I am using StandardTokenizerFactory with ShingleFilterFactory in Schema.xml..
Why frequency of suggestion is different from indexed frequency in Solr?
Hi, Frequency of suggestion is different from the original frequency which is in indexed. Why so? I have applied StandardTokenizer with ShingleFilterFactory on field.
Get suggestion for each term in the query
Hi, I want to get suggestion of each term/word in query. Condition: i) Either word/term is correct or incorrect. ii) Either word/term has high frequency or has low frequency. Whatever the condition of term/word, I need to suggestion all time.
Confusion in making true or false in spellcheck.onlymorepopular
HI, Only return suggestions that result in more hits for the query than the existing query What does it means the existing query in above sentence for spellcheck.onlymorepopular? what happens when I make true to spellcheck.onlymorepopular or false to spellcheck.onlymorepopular? Any difference in it?
Do Multiprocessing on Solr to search?
Hello, I want to search lakhs of queries/terms concurrently. Is there any technique to do multiprocessing on Solr? Is Solr is capable to handle this situation? I wrote a code in python that do multiprocessing and search lakhs of queries and do hit on Solr simultaneously/ parallely at once but it seems that Solr doesn't able to handle queries at once. Any help Please?
Solr takes time to start
Hello, Why Solr is taking too much of time to start all nodes/ports?
Re: Collations are not working fine.
Hi Rajesh, What configuration had you set in your schema.xml? On Sat, Feb 14, 2015 at 2:18 AM, Rajesh Hazari rajeshhaz...@gmail.com wrote: Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Override freq field from custom field in Suggestions
Hello, I have a scenario where I want to use own custom field instead of freq in suggestions of each term. Custom field will be integer value and having some different value than freq in suggestion. Is it possible in Solr to use custom field instead of freq in suggestion. Your help is appreciated. Thanks and Regards, Nitin Solanki.
Re: Sorting on multi-valued field
Hi Peri, You cannot do sort on multi-valued field. It should be set to false. On Tue, Feb 24, 2015 at 8:07 PM, Peri Subrahmanya peri.subrahma...@htcinc.com wrote: All, Is there a way sorting can work on a multi-valued field or does it always have to be “false” for it to work. Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
How make Searching fast in spell checking
Hello all, I have 49 GB of indexed data. I am doing spell checking things. I have applied ShingleFilter on both index and query part and taking 25 suggestions of each word in the query and not using collations. When I search a phrase(taken 5-6 words. Ex.- barack obama is president of America) then it takes 2 to 3 seconds to process while searching a single term(Ex. - barack) then it takes only 0.23 second which is good. Why phrase checking is taking time. Am I doing something wrong ? Any help on this?
CollationKeyFilterFactory stops suggestions and collations
Hello all, I am working on collations. Somewhere in Solr, I found that UnicodeCollation will do searching fast. But after applying CollationKeyFilterFactory in schema.xml, it stops the suggestions and collations both. Please check the configurations and help me. *Schema.xml:* fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType Solrconfig.xml: requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str !-- Solr will use suggestions from both the 'default' spellchecker and from the 'wordbreak' spellchecker and combine them. collations (re-written queries) can include a combination of corrections from both spellcheckers -- str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest10/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations100/str str name=spellcheck.maxCollationTries1000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str !--strsuggest/str-- !--strquery/str-- /arr /requestHandler
Re: CollationKeyFilterFactory stops suggestions and collations
Hi all, I have found to use UnicodeCollation. I need *lucene-collation-2.9.1.jar. *I am using solr 4.10.2. I have download lucene-collation-2.9.1.jar where I have to store this or Is it already in-built in solr? If it already in solr then why suggestions and collations are not coming? Any help. Please? On Mon, Feb 23, 2015 at 4:43 PM, Nitin Solanki nitinml...@gmail.com wrote: Hello all, I am working on collations. Somewhere in Solr, I found that UnicodeCollation will do searching fast. But after applying CollationKeyFilterFactory in schema.xml, it stops the suggestions and collations both. Please check the configurations and help me. *Schema.xml:* fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType Solrconfig.xml: requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str !-- Solr will use suggestions from both the 'default' spellchecker and from the 'wordbreak' spellchecker and combine them. collations (re-written queries) can include a combination of corrections from both spellcheckers -- str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest10/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations100/str str name=spellcheck.maxCollationTries1000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str !--strsuggest/str-- !--strquery/str-- /arr /requestHandler
Used CollationKeyFilterFactory, Seems not to be working
Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
Is Solr best for did you mean functionality just like Google?
Hello, I came in the worst condition. I want to do spell/query correction functionality. I have 49 GB indexed data where I have applied spellchecker. I want to do same as Google - *did you mean*. *Example* - If any user types any question/query which might be misspell or wrong typed. I need to give them suggestion like Did you mean. Is Solr best for it? Warm Regards, Nitin Solanki
Re: Collations are not working fine.
Hi Charles, How you patch the suggester to get frequency information in the spellcheck response? It's very good. I also want to do that? On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr
Re: Used CollationKeyFilterFactory, Seems not to be working
Hi Ahmet, language= means that it is used for any language - simply define the language as the empty string for most languages *Intention:* I am working on spell/question correction. Just like google, I want to do same as did you mean. Using spellchecker, I got suggestions and collations both. But collations are not coming as I expected. Reason is that spellcheck.maxCollationTries, If I set the value spellcheck.maxCollationTries=10 then it gives nearby 10 results. Sometimes, expected collation doesn't come inside 10 collations. So, I increased the value to 16000 and results come but it takes around 15 sec. on 49GB indexed data. It is worst case. So, somewhere in Solr, I found *unicodeCollation* and it says that build collations fast. Is it fast? Or Am I doing something wrong in collations? On Mon, Feb 23, 2015 at 9:12 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Nitin, How can you pass empty value to the language attribute? Is this intentional? What is your intention to use that filter with suggestion functionality? Ahmet On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'
Hi, I am using Collation Key Filter. After adding it into schema.xml. *Schema.xml* field name=gram type=textSpell indexed=true stored=true required=true multiValued=false/ /fieldTypefieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType * It throws errror...* Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Plugin init failure for [schema.xml] fieldType textSpell: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file is /configs/myconf/schema.xml,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Plugin init failure for [schema.xml] fieldType textSpell: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file is /configs/myconf/schema.xml at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)
Use multiple collections having different configuration
Hello, I have scenario where I want to create/use 2 collection into same Solr named as collection1 and collection2. I want to use distributed servers. Each collection has multiple shards. Each collection contains different configurations(solrconfig.xml and schema.xml). How can I do? In between, If I want to re-configure any collection then how to do that? As I know, If we use single collection which having multiple shards then we need to use this upconfig link - * example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir example/solr/collection1/conf -confname default * and restart all the nodes. For 2 collections into same solr. How can I do re-configure?
Advantage of using Java programming with Solr over Solr API
Hi, What is the advantages of java programming with Solr over Solr API?
Re: Advantage of using Java programming with Solr over Solr API
I mean embedded Solr . On Fri, Feb 20, 2015 at 7:05 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: This question makes no sense. Do you mean embedded Solr vs Standalone? Regards, Alex On 20 Feb 2015 3:30 am, Nitin Solanki nitinml...@gmail.com wrote: Hi, What is the advantages of java programming with Solr over Solr API?
Re: Collations are not working fine.
How to get only the best collations whose hits are more and need to sort them? On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Hi Nitin, I was trying many different options for a couple different queries. In fact, I have collations working ok now with the Suggester and WFSTLookup. The problem may have been due to a different dictionary and/or lookup implementation and the specific options I was sending. In general, we're using spellcheck for search suggestions. The Suggester component (vs. Suggester spellcheck implementation), doesn't handle all of our cases. But we can get things working using the spellcheck interface. What gives us particular troubles are the cases where a term may be valid by itself, but also be the start of longer words. The specific terms are acronyms specific to our business. But I'll attempt to show generic examples. E.g. a partial term like fo can expand to fox, fog, etc. and a full term like brown can also expand to something like brownstone. And, yes, the collation brownstone fox is nonsense. But assume, for the sake of argument, it appears in our documents somewhere. For multiple term query with a spelling error (or partially typed term): brown fo We get collations in order of hits, descending like ... brown fox, brown fog, brownstone fox. So far, so good. For a single term query, brown, we get a single suggestion, brownstone and no collations. So, we don't know to keep the term brown! At this point, we need spellcheck.extendedResults=true and look at the origFreq value in the suggested corrections. Unfortunately, the Suggester (spellcheck dictionary) does not populate the original frequency information. And, without this information, the SpellCheckComponent cannot format the extended results. However, with a simple change to Suggester.java, it was easy to get the needed frequency information use it to make a sound decision to keep or drop the input term. But I'd be much obliged if there is a better way to go about it. Configs below. Thanks, Charlie !-- SpellCheck component -- searchComponent class=solr.SpellCheckComponent name=suggestSC lst name=spellchecker str name=namesuggestDictionary/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=fieldtext_all/str float name=threshold0.0001/float str name=exactMatchFirsttrue/str str name=buildOnCommittrue/str /lst /searchComponent !-- Request Handler -- requestHandler name=/tcSuggest class=solr.SearchHandler lst name=defaults str name=titleSearch Suggestions (spellcheck)/str str name=echoParamsexplicit/str str name=wtjson/str str name=rows0/str str name=defTypeedismax/str str name=dftext_all/str str name=flid,name,ticker,entityType,transactionType,accountType/str str name=spellchecktrue/str str name=spellcheck.count5/str str name=spellcheck.dictionarysuggestDictionary/str str name=spellcheck.alternativeTermCount5/str str name=spellcheck.collatetrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations5/str /lst arr name=last-components strsuggestSC/str /arr /requestHandler -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 3:17 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Charles, Will you please send the configuration which you tried. It will help to solve my problem. Have you sorted the collations on hits or frequencies of suggestions? If you did than please assist me. On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str
Re: Use multiple collections having different configuration
Thanks Shawn.. On Fri, Feb 20, 2015 at 7:53 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/20/2015 4:06 AM, Nitin Solanki wrote: I have scenario where I want to create/use 2 collection into same Solr named as collection1 and collection2. I want to use distributed servers. Each collection has multiple shards. Each collection contains different configurations(solrconfig.xml and schema.xml). How can I do? In between, If I want to re-configure any collection then how to do that? As I know, If we use single collection which having multiple shards then we need to use this upconfig link - * example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir example/solr/collection1/conf -confname default * and restart all the nodes. For 2 collections into same solr. How can I do re-configure? First, upload your two different configurations with zkcli upconfig using two different names. Create your collections with the Collections API, and tell each one to use a different collection.configName. If the collection already exists, use the zkcli linkconfig command, and reload the collection. If you need to change a config, edit the config on disk and re-do the zkcli upconfig. Then reload the collection with the Collections API. Alternately you could upload a whole new config and then link it to the existing collection. The Collections API is not yet exposed in the admin interface, you will need to do those calls yourself. If you're doing this with SolrJ, there are some objects inside CollectionAdminRequest that let you do all the API actions. Thanks, Shawn
Auto-correct the phrase/query
Hello, I want to do same like google phrase/spell correction. If anyone type a query the dark night then I need a suggestion like the dark knight in Solr. Is there anyway to do this?
Re: spellcheck.count v/s spellcheck.alternativeTermCount
I have 48GB of indexed data. I have set spellcheck.count=1 spellcheck.alternativeTermCount=10 but I am getting only 1 suggestions in suggestion block but Suggestions for collations are coming. *PFA*. for details On Thu, Feb 19, 2015 at 1:50 AM, Dyer, James james.d...@ingramcontent.com wrote: It will try to give you suggestions up to the number you specify, but if fewer are available it will not give you any more. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:40 PM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Thanks James, I tried the same thing spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5 suggestions of both life and hope but not like this * The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life. * On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com wrote: Here is an example to illustrate what I mean... - query q=text:(life AND hope)spellcheck.count=10spellcheck.alternativeTermCount=5 - suppose at least one document in your dictionary field has life in it - also suppose zero documents in your dictionary field have hope in them - The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Hi James, How can you say that count doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com wrote: See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Re: How to place whole indexed data on cache
Thanks Dominique. Got your view.. On Wed, Feb 18, 2015 at 11:55 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, As Shawn said, install enough memory in order that all free direct memory (non heap memory) be used as disk cache. Use 40% maximum of the available memory for heap memory (Xmx JVM parameter), but never more than 32 Gb And avoid your server to swap. For most Linux systems, this is configured using the /etc/sysctl.conf value: vm.swappiness = 1 This prevents swapping under normal circumstances, but still allows the OS to swap under emergency memory situations. A swappiness of 1 is better than 0, since on some kernel versions a swappiness of 0 can invoke the OOM-killer http://askubuntu.com/questions/103915/how-do-i-configure-swappiness http://unix.stackexchange.com/questions/88693/why-is-swappiness-set-to-60-by-default Dominique http://www.eolya.fr/ 2015-02-18 14:39 GMT+01:00 Shawn Heisey apa...@elyograg.org: On 2/18/2015 4:20 AM, Nitin Solanki wrote: How can I place whole indexed data on cache by which if I will search any query then I will get response, suggestions, collations rapidly. And also how to view that which documents are on cache and how to verify it? Simply install enough extra memory in your machine for the entire index to fit in RAM that is not being used by programs ... and then do NOT allocate that extra memory to any program. The operating system will automatically do the caching for you as part of normal operation, no config required. https://wiki.apache.org/solr/SolrPerformanceProblems#RAM Relevant articles referenced by that wiki page: http://en.wikipedia.org/wiki/Page_cache http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Thanks, Shawn
Re: Divide 4 Nodes into 100 nodes in Solr Cloud
Hi Yago Shawn, Sorry, I think, you both are taking about shard splitting but I want node splitting. I have 4 nodes. Each node has 2 shards, So, Now, I want 100 Nodes from that 4 nodes and each having 2 shards. Any Ideas? On Wed, Feb 18, 2015 at 9:25 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/18/2015 8:17 AM, Nitin Solanki wrote: I have created 4 nodes having 8 shards. Now, I want to divide those 4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any help please? I think your only real option within a strict interpretation of your requirements is shard splitting. You will probably have to do it several times, and the resulting core names could get very ugly. https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-ShardSplitting Reindexing is a LOT cleaner and is likely to work better. If you build a new collection sharded the way you want across all the new nodes, you can delete the old collection and set up an alias pointing the old name at the new collection, no need to change any applications, as long as they use the collection name rather than the actual core names. The delete and alias might take long enough that there would be a few seconds of downtime, but that's probably all you'd see. Both indexing and queries would work with the alias. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaCollection https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection Thanks, Shawn
Re: Divide 4 Nodes into 100 nodes in Solr Cloud
Okay, thanks Shawn.. On Thu, Feb 19, 2015 at 7:59 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/19/2015 4:18 AM, Nitin Solanki wrote: Sorry, I think, you both are taking about shard splitting but I want node splitting. I have 4 nodes. Each node has 2 shards, So, Now, I want 100 Nodes from that 4 nodes and each having 2 shards. Any Ideas? Node splitting does not exist as a discrete command, but shard splitting is the first step in node splitting. The full procedure would be: *) Split one or more shards. Wait for that to complete. *) Do the ADDREPLICA action for some of the new shards to other hosts. *) Wait for the replication to the new core(s) to complete *) Do the DELETEREPLICA action for those shards on the original hosts. *) Delete the originally-split shard(s) at your leisure. The overall procedure will be labor intensive and might be prone to error, plus as already mentioned, the core names might become very convoluted. It is MUCH cleaner to reindex into a new collection. Thanks, Shawn
Re: spellcheck.count v/s spellcheck.alternativeTermCount
Hi James, How to see the suggestions of spellcheck.alternativeTermCount ? On Wed, Feb 18, 2015 at 11:09 AM, Nitin Solanki nitinml...@gmail.com wrote: Thanks James, I tried the same thing spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5 suggestions of both life and hope but not like this * The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life. * On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com wrote: Here is an example to illustrate what I mean... - query q=text:(life AND hope)spellcheck.count=10spellcheck.alternativeTermCount=5 - suppose at least one document in your dictionary field has life in it - also suppose zero documents in your dictionary field have hope in them - The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Hi James, How can you say that count doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com wrote: See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Why collations are coming even I set the value of spellcheck.count to zero(0)
Hi Everyone, I have set the value of spellcheck.count = 0 and spellcheck.alternativeTermCount = 0. Even though collations are coming when I search any query which is misspelled. Why so? I also set the value of spellcheck.maxCollations = 100 and spellcheck.maxCollationTries = 100. What I know that collations are built on suggestions. So, Have I any misunderstanding about collation or any other configuration issue. Any help Please?
How to place whole indexed data on cache
Hi, How can I place whole indexed data on cache by which if I will search any query then I will get response, suggestions, collations rapidly. And also how to view that which documents are on cache and how to verify it?
Re: Divide 4 Nodes into 100 nodes in Solr Cloud
Okay, It will destroy/harm my indexed data. Right? On Wed, Feb 18, 2015 at 9:01 PM, Yago Riveiro yago.rive...@gmail.com wrote: You can try the SPLIT command — /Yago Riveiro On Wed, Feb 18, 2015 at 3:19 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have created 4 nodes having 8 shards. Now, I want to divide those 4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any help please?
Get nearby suggestions for any phrase searching
Hello, I want to retrieve only top- five suggestions for any phrase/query searching. How to do that? Assume, If I search like ?q=the bark night then I need suggestion/ collation like the dark knight. How to get nearby suggestion/ terms of the phrase?
Divide 4 Nodes into 100 nodes in Solr Cloud
Hi, I have created 4 nodes having 8 shards. Now, I want to divide those 4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any help please?