Re: best way to contribute solr??
Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Tomcat creates a thread for each SOLR core
Hi, guys, I need some help. After updating to SOLR 4.4 the tomcat process is consuming about 2GBs of memory, the CPU usage is about 40% after the start for about 10 minutes. However, the bigger problem is, I have about 1000 cores and seems that for each core a thread is created. The process has more than 1000 threads and everything is extremely slow. Creating or unloading a core even without documents takes about 20 minutes. Searching is more or less good, but storing also takes a lot. Is there some configuration I missed or that I did wrong? There aren't many calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2. Index is uppdated every 30 seconds/10 000 documents. I haven't checked the number of threads before the update, because I didn't have to, it was working just fine. Any suggestion will be highly appreciated, thank you in advance. Regards, Atanas
Re: Ranking code
For the better analysis for ranking of documents, you should need to query the index with these extra parameters(in bold) eg...whole_query*debug=truewt=xml.* Copy that xml and and paste it to http://explain.solr.pl/ you can then easily find out the ranking alalysis in the forms of the pie charts and how much weight is giving to every parameters in your solr config and in the query. On Tue, Apr 8, 2014 at 9:09 PM, Shawn Heisey s...@elyograg.org wrote: On 4/8/2014 3:55 AM, azhar2007 wrote: Im basically trying to understand how results are ranked. Whats the algorithm behind it If you add a debugQuery parameter to your request, set to true, you will see the score calculation for every document included in the response. This is the default similarity class that Solr uses: http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/ search/similarities/DefaultSimilarity.html Thanks, Shawn
Re: best way to contribute solr??
Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Re: Tomcat creates a thread for each SOLR core
I guess this is definitely due to the firstsearcher defined in solrconfig.xml, you must make some tweaks in that I hope it will help. We are using the same typo which you just mentioned here but we are using the indexing server separately and replicating data to our other two server so that it won't harm any search performance. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:10 PM, Atanas Atanasov atanaso...@gmail.comwrote: Hi, guys, I need some help. After updating to SOLR 4.4 the tomcat process is consuming about 2GBs of memory, the CPU usage is about 40% after the start for about 10 minutes. However, the bigger problem is, I have about 1000 cores and seems that for each core a thread is created. The process has more than 1000 threads and everything is extremely slow. Creating or unloading a core even without documents takes about 20 minutes. Searching is more or less good, but storing also takes a lot. Is there some configuration I missed or that I did wrong? There aren't many calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2. Index is uppdated every 30 seconds/10 000 documents. I haven't checked the number of threads before the update, because I didn't have to, it was working just fine. Any suggestion will be highly appreciated, thank you in advance. Regards, Atanas
Re: Tomcat creates a thread for each SOLR core
Are you using all those cores at once? If not, there is a recent settings to allow solr to load cores on demand. If you are using them all, perhaps you need to look into splitting them to different machines (horizontal scaling). What about your caches? How many additional structures you have configured for each core? How much memory you allocated to the Java process. You are probably running out of memory and thrashing with a swap. I am not even sure Java process can access that much memory in one process. You might be better off running multiple Tomcat/Solr instances on the same machine with different subsets of cores. Regards, Alex. P.s. This is general advice, I don't know the specific issues around that version of Solr/Tomcat. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:40 PM, Atanas Atanasov atanaso...@gmail.com wrote: Hi, guys, I need some help. After updating to SOLR 4.4 the tomcat process is consuming about 2GBs of memory, the CPU usage is about 40% after the start for about 10 minutes. However, the bigger problem is, I have about 1000 cores and seems that for each core a thread is created. The process has more than 1000 threads and everything is extremely slow. Creating or unloading a core even without documents takes about 20 minutes. Searching is more or less good, but storing also takes a lot. Is there some configuration I missed or that I did wrong? There aren't many calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2. Index is uppdated every 30 seconds/10 000 documents. I haven't checked the number of threads before the update, because I didn't have to, it was working just fine. Any suggestion will be highly appreciated, thank you in advance. Regards, Atanas
Re: best way to contribute solr??
Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Re: best way to contribute solr??
I will also try to do in the perl as well, this is going to be something great, i am excited :D Thanks a ton!! Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.comwrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Re: best way to contribute solr??
Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Re: best way to contribute solr??
Thanks sir, I will look into this. Solr and its developer are all helpful and awesome, i am feeling great. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not appeal, give an example of where your skills are strongest and I am sure there is a way for you to contribute. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com wrote: Can anybody please explain me that how should i start contributing to solr, i am novice here as well in this technology as well, but i am learning solr day by day. So how should i start ? Thanks Aman Tandon
Re: Tomcat creates a thread for each SOLR core
Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119) at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799) at org.apache.coyote.Response.action(Response.java:174) at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366) ... 24 more On Thu, Apr 10, 2014 at 9:51 AM, Alexandre Rafalovitch
Re: Tomcat creates a thread for each SOLR core
On Thu, Apr 10, 2014 at 2:14 PM, Atanas Atanasov atanaso...@gmail.com wrote: SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error Separate issue, but most likely the client closed the browser and the server had nowhere to send the respond too. So, it complaint. Happens if your serving process is too slow. The other one might be the same or might be different. The server sends headers and expects the body to follow. Then, during processing of the body, an error occurs. The server changes its mind and wants to send an error (e.g. HTTP 500 instead of HTTP 200), but it's too late. The headers were already sent out. So, it complains to the log file instead. The real question is not this exception, but the internal error that caused the server to change its mind. I would concentrate on speed first and see if these problems go away. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
Re: Solr special characters like '(' and ''?
mark. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-tp4129854p4130333.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting huge difference in QTime for terms.lower and terms.prefix
Hi, When I queried terms component with a terms.prefix the QTime for it is 100 milli seconds, where as the same query I am giving with terms.lower then the QTime is 500 milliseconds. I am using the Solr Cloud. I am giving both the cases terms.limit as 60 and terms.sort=index. Query1 Params: terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 100 milli seconds Query2 Params: terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 500 milliseconds The response is giving the same terms in both queries, But the QTime is different. Please let me know why is the difference in QTime for both approaches. Thanks, Jilani
Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1
Thanks for responding, Wolfgang. Changing to LUCENE_43: IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43, null); didn't affect on the index format version, because, I believe, if the format of the index to merge has been of higher version (4.1 in this case), it will merge to the same and not lower version (4.0). But format version certainly could be read from the solrconfig, you are right. Dmitry On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek whosc...@cloudera.comwrote: There is a current limitation in that the code doesn't actually look into solrconfig.xml for the version. We should fix this, indeed. See https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101 Wolfgang. On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello, When we instantiate the MapReduceIndexerTool with the collections' conf directory, we expect, that the Lucene version is respected and the index gets generated in a format compatible with the defined version. This does not seem to happen, however. Checking with luke: the expected Lucene index format: Lucene 4.0 the output Lucene index format: Lucene 4.1 Can anybody shed some light onto the semantics behind specifying the Lucene version in this context? Does this have something to do with what version of solr core is used by the morphline library? Thanks, Dmitry -- Forwarded message -- Dear list, We have been generating solr indices with the solr-hadoop contrib module (SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool that could do the backward conversion, i.e. 4.7-4.3.1? Or is the upgrade the only way to go? -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Shared Stored Field
Hello, We have a denormalized index where certain documents point in essence to the same content. The relevance of the documents depends on the current context. E.g. document A has a different boost factor when we apply filter F1 compared to when we use filter F2 (or F3, etc). To support this we denormalize document A with a unique boost field, such that for each filter he can be found in he has a different relevance. The problem is that the documents have a big stored content that is required for the highlighting snippets. This denormalization grows the index size with a factor 100 in worse case. Storing the same big content field a lot of times times seems really inefficient. Is there a way to point a group of documents to the same stored content fields? Or is there a different way to influence the relevance depending on the current search context? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html Sent from the Solr - User mailing list archive at Nabble.com.
Fails to index if unique field has special characters
Hi, We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while reindexing the document we are getting the following error. This is happening when the unique key has special character, this was not noticed in version 4.6 standalone mode, so we are not sure if this is a version problem or a cloud issue. Example of the unique key is given below, http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html Exception Stack Trace ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296) at org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58) at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandle Thanks,Ayush
Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1
a correction: actually when I tested the above change I had so little data, that it didn't trigger sub-shard slicing and thus merging of the slices. Still, looks as if somewhere in the map-reduce contrib code there is a link to what lucene version to use. Wolfgang, do you happen to know where that other Version.* is specified? On Thu, Apr 10, 2014 at 12:59 PM, Dmitry Kan solrexp...@gmail.com wrote: Thanks for responding, Wolfgang. Changing to LUCENE_43: IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43, null); didn't affect on the index format version, because, I believe, if the format of the index to merge has been of higher version (4.1 in this case), it will merge to the same and not lower version (4.0). But format version certainly could be read from the solrconfig, you are right. Dmitry On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek whosc...@cloudera.comwrote: There is a current limitation in that the code doesn't actually look into solrconfig.xml for the version. We should fix this, indeed. See https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101 Wolfgang. On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello, When we instantiate the MapReduceIndexerTool with the collections' conf directory, we expect, that the Lucene version is respected and the index gets generated in a format compatible with the defined version. This does not seem to happen, however. Checking with luke: the expected Lucene index format: Lucene 4.0 the output Lucene index format: Lucene 4.1 Can anybody shed some light onto the semantics behind specifying the Lucene version in this context? Does this have something to do with what version of solr core is used by the morphline library? Thanks, Dmitry -- Forwarded message -- Dear list, We have been generating solr indices with the solr-hadoop contrib module (SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool that could do the backward conversion, i.e. 4.7-4.3.1? Or is the upgrade the only way to go? -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: best way to contribute solr??
Aman: Here's another helpful resource: http://wiki.apache.org/solr/HowToContribute It tells you how to get the source code, set up an IDE etc. for Solr/Lucene In addition to Alexandre's suggestions, one possibility (but I warn you it can be challenging) is to create unit tests. Part of the build report each night has a coverage, you can get to the latest build here: https://wiki.apache.org/solr/NightlyBuilds click on clover test coverage and pick something, track down what isn't covered (see the clover report link for instance). Warning: You will be completely lost for a while. This is hard stuff when you're just starting out especially. So choose the simplest thing you can for the first go to get familiar with the process if you want to try this. Another place to start is...the user's list. Pick one question a day, research it and try to provide an answer. Clearly label your responses with the degree of certainty you have. Another caution: you'll research something and get back to the list to discover its already been answered sometimes but you'll have gained the knowledge and it gets better over time. Best, Erick On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks sir, I will look into this. Solr and its developer are all helpful and awesome, i am feeling great. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hi Aman, Nice of you to want to help. Let's keep the discussion in the user mailing list as opposed to the developer one (most of the people are on both). What is your skill set? Are you familiar with particular languages? If so, the easiest way to contribute would be the following: 1) Find all the solr client libraries in the language you are most familiar with (PHP, Java, Perl, Python, etc) 2) Create a basic project using that library and latest version of Solr. Maybe using Solr tutorial as a baseline and showing how to do the same steps in the client instead of with command line/Curl. 3) Write a blog post about what you learned, whether the library is supporting latest Solr well and whether it is supporting latest features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud) If that does not
Re: Tomcat creates a thread for each SOLR core
Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused
Re: Getting huge difference in QTime for terms.lower and terms.prefix
Please provide suggestions what could be the reason for this. Thanks, On Thu, Apr 10, 2014 at 2:54 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, When I queried terms component with a terms.prefix the QTime for it is 100 milli seconds, where as the same query I am giving with terms.lower then the QTime is 500 milliseconds. I am using the Solr Cloud. I am giving both the cases terms.limit as 60 and terms.sort=index. Query1 Params: terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 100 milli seconds Query2 Params: terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 500 milliseconds The response is giving the same terms in both queries, But the QTime is different. Please let me know why is the difference in QTime for both approaches. Thanks, Jilani
Re: Tomcat creates a thread for each SOLR core
Thanks for the tip, I already set the core properties. Now tomcat has only 27 threads after start up, which is awesome. Works fine, first search is not noticeably slower than before. I'll put equal values for Xmx and Xms and see if there will be any difference. Regards, Atanas On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.comwrote: Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at
Re: Shared Stored Field
Hmmm, I scanned your question, so maybe I missed something. It sounds like you have a fixed number of filters known at index time, right? So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Of course I may be wy off base here, but BTW, you could use dynamic fields to not have to pre-define the maximum number of boost fields, something like this in my example: dynamicField name=*_boost type=float indexed=true stored=false/ Best Erick On Thu, Apr 10, 2014 at 4:30 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Hello, We have a denormalized index where certain documents point in essence to the same content. The relevance of the documents depends on the current context. E.g. document A has a different boost factor when we apply filter F1 compared to when we use filter F2 (or F3, etc). To support this we denormalize document A with a unique boost field, such that for each filter he can be found in he has a different relevance. The problem is that the documents have a big stored content that is required for the highlighting snippets. This denormalization grows the index size with a factor 100 in worse case. Storing the same big content field a lot of times times seems really inefficient. Is there a way to point a group of documents to the same stored content fields? Or is there a different way to influence the relevance depending on the current search context? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way to contribute solr??
thanks sir, i always smile when people here are always ready for help, i am thankful to all, and yes i started learning by reading daily at least 50-60 mails to increase my knowledge gave my suggestion if i am familiar with it, people here correct me as well if i am wrong. I know it will take time but someday i will contribute as well and thanks for setup it will be quite helpful. In my office i am using solr 4.2 with tomcat right now i am stucked because i don't know how to integrate solr 4.7 with my tomcat, because the problem for me is that i am familiar with the cores architecture of solr 4.2 in which we defined the every core name as well as instanceDir but not with solr 4.7. Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson erickerick...@gmail.comwrote: Aman: Here's another helpful resource: http://wiki.apache.org/solr/HowToContribute It tells you how to get the source code, set up an IDE etc. for Solr/Lucene In addition to Alexandre's suggestions, one possibility (but I warn you it can be challenging) is to create unit tests. Part of the build report each night has a coverage, you can get to the latest build here: https://wiki.apache.org/solr/NightlyBuilds click on clover test coverage and pick something, track down what isn't covered (see the clover report link for instance). Warning: You will be completely lost for a while. This is hard stuff when you're just starting out especially. So choose the simplest thing you can for the first go to get familiar with the process if you want to try this. Another place to start is...the user's list. Pick one question a day, research it and try to provide an answer. Clearly label your responses with the degree of certainty you have. Another caution: you'll research something and get back to the list to discover its already been answered sometimes but you'll have gained the knowledge and it gets better over time. Best, Erick On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks sir, I will look into this. Solr and its developer are all helpful and awesome, i am feeling great. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic project using that library and latest version of Solr.* With Regards Aman Tandon On Thu, Apr 10, 2014 at 11:14 AM,
Re: Tomcat creates a thread for each SOLR core
I don't expect having equal values to make a noticeable difference, except possibly in some corner cases. Setting them equal is mostly for avoiding surprises... Erick On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the tip, I already set the core properties. Now tomcat has only 27 threads after start up, which is awesome. Works fine, first search is not noticeably slower than before. I'll put equal values for Xmx and Xms and see if there will be any difference. Regards, Atanas On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.comwrote: Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at
Re: Tomcat creates a thread for each SOLR core
Hi Atanas, I have a question that, how do know that how much thread does the tomcat has? Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.comwrote: I don't expect having equal values to make a noticeable difference, except possibly in some corner cases. Setting them equal is mostly for avoiding surprises... Erick On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the tip, I already set the core properties. Now tomcat has only 27 threads after start up, which is awesome. Works fine, first search is not noticeably slower than before. I'll put equal values for Xmx and Xms and see if there will be any difference. Regards, Atanas On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.com wrote: Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at
Re: Tomcat creates a thread for each SOLR core
Hi, I see the threads of the tomcat7.exe process in the Windows Task manager. Regards, Atanas Atanasov On Thu, Apr 10, 2014 at 5:28 PM, Aman Tandon amantandon...@gmail.comwrote: Hi Atanas, I have a question that, how do know that how much thread does the tomcat has? Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.com wrote: I don't expect having equal values to make a noticeable difference, except possibly in some corner cases. Setting them equal is mostly for avoiding surprises... Erick On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the tip, I already set the core properties. Now tomcat has only 27 threads after start up, which is awesome. Works fine, first search is not noticeably slower than before. I'll put equal values for Xmx and Xms and see if there will be any difference. Regards, Atanas On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.com wrote: Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) at sun.nio.cs.StreamEncoder.implFlush(Unknown Source) at sun.nio.cs.StreamEncoder.flush(Unknown Source) at java.io.OutputStreamWriter.flush(Unknown Source) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at
Re: Tomcat creates a thread for each SOLR core
OKay okay i am a centos user in office, windows at home :D Thanks Aman Tandon On Thu, Apr 10, 2014 at 8:01 PM, Atanas Atanasov atanaso...@gmail.comwrote: Hi, I see the threads of the tomcat7.exe process in the Windows Task manager. Regards, Atanas Atanasov On Thu, Apr 10, 2014 at 5:28 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Atanas, I have a question that, how do know that how much thread does the tomcat has? Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.com wrote: I don't expect having equal values to make a noticeable difference, except possibly in some corner cases. Setting them equal is mostly for avoiding surprises... Erick On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the tip, I already set the core properties. Now tomcat has only 27 threads after start up, which is awesome. Works fine, first search is not noticeably slower than before. I'll put equal values for Xmx and Xms and see if there will be any difference. Regards, Atanas On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.com wrote: Trying to fit 1,000 cores in 6G of memory is... interesting. That's a lot of stuff in a small amount of memory. I hope these cores' indexes are tiny. The lazy-loading bit for cores has a price. The first user in will pay the warmup penalty for that core while it loads. This may or may not be noticeable but be aware of it. You may or may not want autowarming in place. You can also specify how many cores are kept in memory at one time, they go into an LRU cache and are aged out after they serve their last outstanding request. BTW, current Java practice seems to be setting Xmx and Xms to the same value, 6G in your case. Good Luck! Erick On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote: Thanks for the quick responses, I have allocated 1GB min and 6 GB max memory to Java. The cache settings are the default ones (maybe this is a good point to start). All cores share the same schema and config. I'll try setting the loadOnStartup=*false* transient=*true *options for each core and see what will happen. Those are the exceptions from the log files: SEVERE: Servlet.service() for servlet [default] in context with path [/solrt] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) AND SEVERE: null:ClientAbortException: java.net.SocketException: Software caused connection abort: socket write error at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) at
Re: Shared Stored Field
Erick Erickson wrote So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8). In that case we should add a boost field for each of the values in the document, in general we would get unlimited amount of dynamic fields in the index. But it is possible to select a different boost field depending on the current filter query? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
bq: But it is possible to select a different boost field depending on the current filter query? Well, you're constructing the URL somewhere, you can choose the right boost there can't you? I don't understand this bit: Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8) The _values_ aren't at issue, it's just the name of the field. You can have lots of dynamic fields defined in your documents and it's not too expensive. Don't go wild here, when you get up into the hundreds maybe you should think about it a bit. I feel I'm missing something, some concrete examples would help a lot. Best, Erick On Thu, Apr 10, 2014 at 7:33 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Erick Erickson wrote So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8). In that case we should add a boost field for each of the values in the document, in general we would get unlimited amount of dynamic fields in the index. But it is possible to select a different boost field depending on the current filter query? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat creates a thread for each SOLR core
On 4/10/2014 12:40 AM, Atanas Atanasov wrote: I need some help. After updating to SOLR 4.4 the tomcat process is consuming about 2GBs of memory, the CPU usage is about 40% after the start for about 10 minutes. However, the bigger problem is, I have about 1000 cores and seems that for each core a thread is created. The process has more than 1000 threads and everything is extremely slow. Creating or unloading a core even without documents takes about 20 minutes. Searching is more or less good, but storing also takes a lot. Is there some configuration I missed or that I did wrong? There aren't many calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2. Index is uppdated every 30 seconds/10 000 documents. I haven't checked the number of threads before the update, because I didn't have to, it was working just fine. Any suggestion will be highly appreciated, thank you in advance. If creating a core takes 20 minutes, that sounds to me like the JVM is doing constant full garbage collections to free up enough memory for basic system operation. It could also be explained by temporary work threads having to wait to execute because the servlet container will not allow them to run. When indexing is happening, each core will set aside some memory for buffering index updates. By default, the value of ramBufferSizeMB is 100. If all your cores are indexing at once, multiply the indexing buffer by 1000, and you'll require 100GB of heap memory. You'll need to greatly reduce that buffer size. This buffer was 32MB by default in 4.0 and earlier. If you are not setting this value, this change sounds like it might fully explain what you are seeing. https://issues.apache.org/jira/browse/SOLR-4074 What version did you upgrade from? Solr 4.x is a very different beast than earlier major versions. I believe there may have been some changes made to reduce memory usage in versions after 4.4.0. The jetty that comes with Solr is configured to allow 10,000 threads. Most people don't have that many, even on a temporary basis, but bad things happen when the servlet container will not allow Solr to start as many as it requires. I believe that the typical default maxThreads value you'll find in a servlet container config is 200. Erick's right about a 6GB heap being very small for what you are trying to do. Putting 1000 cores on one machine is something I would never try. If it became a requirement I had to deal with, I wouldn't try it unless the machine had a lot more CPU cores, hundreds of gigabytes of RAM, and a lot of extremely fast disk space. If this worked before a Solr upgrade, I'm amazed. Congratulations to you for fine work! NB: Oracle Java 7u25 is what you should be using. 7u40 through 7u51 have known bugs that affect Solr/Lucene. These should be fixed by 7u60. A pre-release of that is available now, and it should be generally available in May 2014. Thanks, Shawn
Re: Shared Stored Field
Erick Erickson wrote Well, you're constructing the URL somewhere, you can choose the right boost there can't you? Yes of course! As example: We have one filter field called FILTER which can have unlimited values acros all documents. Each document as on average 8 values set for FILTER (e.g. FILTER [1,2,..,8]). So we could add boost fields depending on each of these values as B_1:1.0, ... ,B_7:5.0 for example and use that during query time. This is your suggestions correct? So each document has on average 8 of these dynamic fields, while over the whole index we have unlimited of these fields. What would this mean for the performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Were changes made to facetting on multivalued fields recently?
Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / NEW (4.7.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / It looks like the /analysis/field hanlder is not active in our current setup. I will look into this and perform additional checks later as we are currently doing a full reindex of our DB. Thanks for your time -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-09-14 5:23 PM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/9/2014 2:15 PM, Erick Erickson wrote: Right, but the response in the doc when you make a request is almost, but not quite totally, unrelated to how facet values are tallied. It's all about what tokens are actually in your index, which you can see in the schema browser... Supplement to what Erick has told you: SOLR-5512 seems to be related to facets using docValues. The commit for that issue looks like it only touches on that specifically.If you do not have (and never have had) docValues on this field, then SOLR-5512 should not apply. I am reasonably sure that for facets on fields with docValues, your facets would reflect the *stored* information, not the indexed information. Finally, I don't think that docValues work on fieldtypes whose class is solr.TextField, which is the only class that can have an analysis chain that would turn 4 5 1 into three separate tokens. The response that you shared where the value is 4 5 1 looks like there is only one value in the field -- so for that document, it is effectively the same as one that is single-valued. Bottom line: It looks like either your analysis chain is working differently in the newer version, or you have documents in your newer index that are not in the older one. Can you share the field and fieldType definitions from both versions? Did your luceneMatchVersion change with the upgrade? If you are using DIH to populate your index, can you also share your DIH config? Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date: 27/03/2014 La Base de données des virus a expiré.
Re: Facet search and growing memory usage
On 4/9/2014 11:53 PM, Toke Eskildsen wrote: This does not happen with the 'old' method 'facet.method=enum' - memory usage is stable and solr is unbreakable with my hold-reload test. The memory allocation for enum is both low and independent of the amount of unique values in the facets. The trade-off is that is is very slow for medium- to high-cardinality fields. This is where it is extremely beneficial to have enough RAM to cache your entire index. The term list must be enumerated for every facet request, but if the data is already in the OS disk cache, this is very fast. If the operating system has to read the data off the disk, it will be *very* slow. If facets are happening on lots of fields and are heavily utilized, facet.method=enum should be used, and there must be plenty of RAM to cache all or most of the index data on the machine. The default method (fc) will create the memory structure that Toke has mentioned for *every* field that gets used for facets. If there are only a few fields used for faceting and they have low cardinality, this is not a problem, and the speedup is usually worth the extra heap memory usage. With 40 facets, that is not supportable. Thanks, Shawn
Re: Shared Stored Field
So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in another doc etc? What's so confusing is that in your first e-mail, you said: bq: This denormalization grows the index size with a factor 100 in worse case. Which I took to mean you have at most 100 of these fields. Please look at the function query page I referenced and try a few things so we can deal with specific questions. You can put the results of a _query_ in a function query, so you could probably just form a sub-query that returns a score that you in turn use to boost the doc. Best, Erick On Thu, Apr 10, 2014 at 8:04 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Erick Erickson wrote Well, you're constructing the URL somewhere, you can choose the right boost there can't you? Yes of course! As example: We have one filter field called FILTER which can have unlimited values acros all documents. Each document as on average 8 values set for FILTER (e.g. FILTER [1,2,..,8]). So we could add boost fields depending on each of these values as B_1:1.0, ... ,B_7:5.0 for example and use that during query time. This is your suggestions correct? So each document has on average 8 of these dynamic fields, while over the whole index we have unlimited of these fields. What would this mean for the performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Were changes made to facetting on multivalued fields recently?
On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / Just so you know, there's nothing here that would require the field to be multivalued. WhitespaceTokenizerFactory does not create multiple field values, it creates multiple terms. If you are actually inserting multiple values for the field in SolrJ, then you would need a multivalued field. What is replacing the commas with spaces? I don't see anything here that would do that. It sounds like that part of your indexing is not working. Thanks, Shawn
Re: Fails to index if unique field has special characters
Hi Ayush, I thinks this IBM!12345. The exclamation mark ('!') is critical here, as it distinguishes the prefix used to determine which shard to direct the document to. https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud On Thursday, April 10, 2014 2:35 PM, Cool Techi cooltec...@outlook.com wrote: Hi, We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while reindexing the document we are getting the following error. This is happening when the unique key has special character, this was not noticed in version 4.6 standalone mode, so we are not sure if this is a version problem or a cloud issue. Example of the unique key is given below, http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html Exception Stack Trace ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296) at org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58) at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandle Thanks,Ayush
RE: Facet search and growing memory usage
Shawn Heisey [s...@elyograg.org] wrote: On 4/9/2014 11:53 PM, Toke Eskildsen wrote: The memory allocation for enum is both low and independent of the amount of unique values in the facets. The trade-off is that is is very slow for medium- to high-cardinality fields. This is where it is extremely beneficial to have enough RAM to cache your entire index. The term list must be enumerated for every facet request, but if the data is already in the OS disk cache, this is very fast. Very fast compared to not cached, yes, but still slow compared to fc, for high-cardinality. The processing overhead per term is a great deal larger for enum. I recently ran some tests with Solr's different faceting methods for 50M+ values, but stopped measuring for enum as it took so much longer than the other methods. For a fully cached index. If facets are happening on lots of fields and are heavily utilized, facet.method=enum should be used, and there must be plenty of RAM to cache all or most of the index data on the machine. I do not understand how the number of facets has any influence on the choice between enum and fc. As Solr (sadly) does not support combined structures for multiple facets, each facet is independent from the others. Shouldn't the choice be done for each individual facet? - Toke Eskildsen
RE: Were changes made to facetting on multivalued fields recently?
The SQL query contains a Replace statement that does this -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-10-14 11:30 AM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / Just so you know, there's nothing here that would require the field to be multivalued. WhitespaceTokenizerFactory does not create multiple field values, it creates multiple terms. If you are actually inserting multiple values for the field in SolrJ, then you would need a multivalued field. What is replacing the commas with spaces? I don't see anything here that would do that. It sounds like that part of your indexing is not working. Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014
Re: Facet search and growing memory usage
fwiw, Facets are much less heap greedy when counted for docValues enabled fields, they should not hit UnInvertedField in this case. Try them. On Thu, Apr 10, 2014 at 8:20 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: Shawn Heisey [s...@elyograg.org] wrote: On 4/9/2014 11:53 PM, Toke Eskildsen wrote: The memory allocation for enum is both low and independent of the amount of unique values in the facets. The trade-off is that is is very slow for medium- to high-cardinality fields. This is where it is extremely beneficial to have enough RAM to cache your entire index. The term list must be enumerated for every facet request, but if the data is already in the OS disk cache, this is very fast. Very fast compared to not cached, yes, but still slow compared to fc, for high-cardinality. The processing overhead per term is a great deal larger for enum. I recently ran some tests with Solr's different faceting methods for 50M+ values, but stopped measuring for enum as it took so much longer than the other methods. For a fully cached index. If facets are happening on lots of fields and are heavily utilized, facet.method=enum should be used, and there must be plenty of RAM to cache all or most of the index data on the machine. I do not understand how the number of facets has any influence on the choice between enum and fc. As Solr (sadly) does not support combined structures for multiple facets, each facet is independent from the others. Shouldn't the choice be done for each individual facet? - Toke Eskildsen -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Another japanese analysis problem
My analysis chain includes CJKBigramFilter on both the index and query. I have outputUnigrams enabled on the index side, but it is disabled on the query side. This has resulted in a problem with phrase queries. This is a subset of my index analysis for the three terms you can see in the ICUNF step, separated by spaces: https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png Note that in the CJKBF step, the second unigram is output at position 2, pushing the english terms to 3 and 4. When the customer phrase filter query (lucene query parser) for the first two terms on this specific field, it doesn't match, because the query analysis doesn't output the unigrams and therefore the positions don't match. I would have expected both unigrams to be at position 1. Is this a bug or expected behavior? Thanks, Shawn
Re: Were changes made to facetting on multivalued fields recently?
bq: The SQL query contains a Replace statement that does this Well, I suspect that's where the issue is. The facet values being reported include: int name=4,1134826/int which indicates that the incoming text to Solr still has the commas. Solr is seeing the commas and all. You can cure this by using PatternReplaceCharFilterFactory and doing the substitution at index time if you want to. That doesn't clarify why the behavior has changed though, but my supposition is that it has nothing to do with Solr, and something about your SQL statement is different. Best, Erick On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: The SQL query contains a Replace statement that does this -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-10-14 11:30 AM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / Just so you know, there's nothing here that would require the field to be multivalued. WhitespaceTokenizerFactory does not create multiple field values, it creates multiple terms. If you are actually inserting multiple values for the field in SolrJ, then you would need a multivalued field. What is replacing the commas with spaces? I don't see anything here that would do that. It sounds like that part of your indexing is not working. Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014
filter capabilities are limited?
hi whether it is possible to compare two variables in the Solr filter? As best to build a filter: x - y = 0, i.e. get all records if x = y -- View this message in context: http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter capabilities are limited?
Where are the values coming from? You might be able to use the _val_ hook for function queries if they're in fields in the doc. Or if it's a constant you pass in Let's claim it's just a value in you document. Can't you just form a filter query on it? Details matter, there's not enough info here to say anything definitive. Best, Erick On Thu, Apr 10, 2014 at 10:56 AM, horot roman.she...@gmail.com wrote: hi whether it is possible to compare two variables in the Solr filter? As best to build a filter: x - y = 0, i.e. get all records if x = y -- View this message in context: http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458.html Sent from the Solr - User mailing list archive at Nabble.com.
Relevance/Rank
Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks
Re: filter capabilities are limited?
Values come from the Solr doc. I can not get to compare the two fields to get some result. The logic of such a query: x '' and y '' and x = y. it's something like q=x:* AND y:* AND x:y but the problem is that the field can not be compared direct form. If someone knows how to solve this problem please write examples. -- View this message in context: http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Relevance/Rank
What kind of field is Name? Assuming it's string, you should be able to boost it. Boosts are not relevant to filters (fq) clauses at all, where were you trying to add the boost? You need to provide significantly more information to get a more helpful answer. You might review: http://wiki.apache.org/solr/UsingMailingLists bq: I am getting confused after going thru different sites. You're in luck, the Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide) is becoming the collected source of information. Also, here's a sorely-needed set of up-to-date info: http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I have, there may be more recent copies). Best, Erick On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks
Re: Relevance/Rank
On 4/10/2014 12:49 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Your query is being sent with the fq parameter. Filter queries do not affect scoring at all. They are purely for filtering. You would need to move this to the q parameter (query) in order for what's there to affect relevancy ranking. You will very likely want to look over this: https://wiki.apache.org/solr/SolrRelevancyFAQ Thanks, Shawn
Re: filter capabilities are limited?
Uhhhm, did you look at function queries at all? That should work for you. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Thu, Apr 10, 2014 at 11:51 AM, horot roman.she...@gmail.com wrote: Values come from the Solr doc. I can not get to compare the two fields to get some result. The logic of such a query: x '' and y '' and x = y. it's something like q=x:* AND y:* AND x:y but the problem is that the field can not be compared direct form. If someone knows how to solve this problem please write examples. -- View this message in context: http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Relevance/Rank
Eric, Below is the query part select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2 I am not getting the Name Match record in the first list , I am getting always SKU matching Record. Any help is really appreciated. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 10, 2014 3:35 PM To: solr-user@lucene.apache.org Subject: Re: Relevance/Rank What kind of field is Name? Assuming it's string, you should be able to boost it. Boosts are not relevant to filters (fq) clauses at all, where were you trying to add the boost? You need to provide significantly more information to get a more helpful answer. You might review: http://wiki.apache.org/solr/UsingMailingLists bq: I am getting confused after going thru different sites. You're in luck, the Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide) is becoming the collected source of information. Also, here's a sorely-needed set of up-to-date info: http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I have, there may be more recent copies). Best, Erick On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks
Re: Range query and join, oarse exception when parens are added
On 4/8/2014 22:00 GMT Shawn Heisey wrote: On 4/8/2014 1:48 PM, Mark Olsen wrote: Solr version 4.2.1 I'm having an issue using a join query with a range query, but only when the query is wrapped in parens. This query works: {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50] However this query does not (just wrapping with parens): ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50]) The {!join...} part of that is a localParam. It must precede the entire query. If you want to add parens, here's how you would need to do it: {!join from=member_profile_doc_id to=id}(language_proficiency_id_number:[30 TO 50]) With the left parenthesis where you placed it, the localParam is considered part of the query itself. It becomes incorrect Solr syntax at that point. Thanks, Shawn Shawn, Thank you for the response and I apologize for the delayed reply. If I do the query with the localParam as you show it works, however as my queries become more complex with multiple terms and fields I'm finding it difficult to get the join localParam to work. For example, when I was first playing with the above range query, I was able to get this to work with nested localParams within the parenthesis: +({!join from=member_profile_doc_id to=id}language_noun:english ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30 {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50)) This would work, but as soon as I used the range query with the square brackets it would fail, hence why I thought was an issue. As I've been trying queries it seemed to me that I need to have the localParam before each field:term pair to get results. Another example, this first query yields the results that I am expecting. Note the parenthesis wrapping each localParam and field:term pair. In this case there are two documents that both have the same member_profile_doc_id that are matching the two two queries (language and certification). +({!join from=member_profile_doc_id to=id}language_noun:english) +({!join from=member_profile_doc_id to=id}certification_authority_id_number:50) If I move the localParams outside and no longer declare them with each field:term pair then no results are returned, example query: {!join from=member_profile_doc_id to=id}(+language_noun:english +certification_authority_id_number:50) Unfortunately the documentation for the join localParams only gives a simple field:term example so I've had to experiement with more complex queries to figure out how to get it to work. `Mark
Solr Admin core status - Index is not Current
Hi there I am using solrcloud (4.3). I am trying to get the status of a core from solr using (localhost:8000/solr/admin/cores?action=STATUScore=core) and i get the following output int name=numDocs100/int int name=maxDoc102/int int name=deletedDocs2/int long name=version20527/long int name=segmentCount20/int *bool name=currentfalse/bool* What does current mean? A few of the cores are optimized (with segment count 1) and show current = true and rest show current as false. If i have to make the core as current, what should i do? Is it a big alarm if the value is false? -- Best -- C
Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1
There’s no such other location in there. BTW, you can disable the mtree merge via --reducers=-2 (or --reducers=0 in old versions) . Wolfgang. On Apr 10, 2014, at 3:44 PM, Dmitry Kan solrexp...@gmail.com wrote: a correction: actually when I tested the above change I had so little data, that it didn't trigger sub-shard slicing and thus merging of the slices. Still, looks as if somewhere in the map-reduce contrib code there is a link to what lucene version to use. Wolfgang, do you happen to know where that other Version.* is specified? On Thu, Apr 10, 2014 at 12:59 PM, Dmitry Kan solrexp...@gmail.com wrote: Thanks for responding, Wolfgang. Changing to LUCENE_43: IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43, null); didn't affect on the index format version, because, I believe, if the format of the index to merge has been of higher version (4.1 in this case), it will merge to the same and not lower version (4.0). But format version certainly could be read from the solrconfig, you are right. Dmitry On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek whosc...@cloudera.comwrote: There is a current limitation in that the code doesn't actually look into solrconfig.xml for the version. We should fix this, indeed. See https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101 Wolfgang. On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello, When we instantiate the MapReduceIndexerTool with the collections' conf directory, we expect, that the Lucene version is respected and the index gets generated in a format compatible with the defined version. This does not seem to happen, however. Checking with luke: the expected Lucene index format: Lucene 4.0 the output Lucene index format: Lucene 4.1 Can anybody shed some light onto the semantics behind specifying the Lucene version in this context? Does this have something to do with what version of solr core is used by the morphline library? Thanks, Dmitry -- Forwarded message -- Dear list, We have been generating solr indices with the solr-hadoop contrib module (SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool that could do the backward conversion, i.e. 4.7-4.3.1? Or is the upgrade the only way to go? -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: filter capabilities are limited?
It sounds like you can make it work with the frange qparser plugin: fq={!frange l=0 u=0}sub(field(a),field(b)) Joel Bernstein Search Engineer at Heliosearch On Thu, Apr 10, 2014 at 3:36 PM, Erick Erickson erickerick...@gmail.comwrote: Uhhhm, did you look at function queries at all? That should work for you. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Thu, Apr 10, 2014 at 11:51 AM, horot roman.she...@gmail.com wrote: Values come from the Solr doc. I can not get to compare the two fields to get some result. The logic of such a query: x '' and y '' and x = y. it's something like q=x:* AND y:* AND x:y but the problem is that the field can not be compared direct form. If someone knows how to solve this problem please write examples. -- View this message in context: http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nested documents, block join - re-indexing a single document upon update
On Sun, Mar 16, 2014 at 2:47 PM, danny teichthal dannyt...@gmail.comwrote: To make things short, I would like to use block joins, but to be able to index each document on the block separately. Is it possible? no way. use query time {!join} or denormalize then, field collapsing. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Collections Design
All, What is the best practice or guideline towards considering multiple collections particularly in the solr cloud env? Thanks Srikanth
Re: Range query and join, oarse exception when parens are added
Mark: first off, the details matter. Nothing in your first email mae it clear that the {!join} query you were refering to was not the entirety of your query param -- which is part of the confusion and was a significant piece of Shawn's answer. Had you posted the *exact* request you were sending (with all params) and the full response you got, the root cause of your problem would have been a lot more obvious to some folks very quickly. As shawn mentioned, the {!...} syntax involves localparams and invoking a nammed parser -- normaly this syntax is the *first* thig in a query string, and causes the *entire* string to be parsed by that parser. this is why something like this should work fine for you (please confirm if it does not)... q={!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50] ...because the {! at the begining of the param value tells solr to let that parser (in this case join) parse the entire string, using the localparam options specified (from=member_profile_doc_id to=id). the join parser then delegates to the lucene parser for it's body (language_proficiency_id_number:[30 TO 50]) However ... when the {! syntax is not the first thing in the param value, what happens is that the default parser (lucene) is used -- the lucene parser has a handy feature that supports looking for the {! syntax nested inside of it ... which means you can do things like this... q=+(+{!prefix f=foo}bar -{!term f=yak}wak) -aaa:zzz ...however there is an important caveat to this: when the lucene parser is looking for the {! syntax to know when you want it to delegate to another parser, how does it know if/when you intended for the input to that nested parser to end? Specifically, in the example above, how does it know if the input to the prefix parser was ment to be bar or bar -{!term f=yak}wak) -aaa:zzz The answer is that it's conservative and assumes the input to the nested parser stops as soon as it sees something that looks like the end of a the current clause: whitespace or an open/close parent for example. Which brings us to your specific example... : +({!join from=member_profile_doc_id to=id}language_noun:english : ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30 : {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 : {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50)) ...in this case, when the lucene parser sees the nested {!join...} parsers it has no problem, because it's conservative rulees about the end of the clauses match up with what you expect given the simple term queries. if you change those individual term queries to a range query however... +({!join from=member_profile_doc_id to=id}language_noun:english {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50]) ...in this case, the lucene parser sees the {!join} syntax and delegates to the join parser, but it assumes only the language_proficiency_id_number:[30 portion of the input is ment for that parser, and hangs on to the TO 50] to parse ass additional clauses. the join parse doesn't really care about the input, but when it delegates to another nested instance of the lucene parser, the input language_proficiency_id_number:[30 isn't valid because it's the start of an unfinished range query. Does that make sense so far? As for the solution: when you use the {!foo} syntax, the local param v can be used to specify the main input to the nested parser instead of the usual prefix-ish syntax -- and this scopes the input un-ambiguiously... +({!join from=member_profile_doc_id to=id v='language_noun:english'} {!join from=member_profile_doc_id to=id v='language_proficiency_id_number:[30 TO 50]'}) FWIW, you can also use param derefrencing it it helps make things easier to read for you (and/or if you need to include nested quotes and don't want to deal with the escaping)... q=+({!join from=$from to=id v=$noun} {!join from=$from to=id v=$prof)} from=member_profile_doc_id $noun=language_noun:english $prof=language_proficiency_id_str:[thirty three TO fifty]' -Hoss http://www.lucidworks.com/
Re: Range query and join, oarse exception when parens are added
Chris, Thank you for the detailed explanation, this helps a lot. One of my current hurdles is my search system is in Java using Lucene Query objects to construct a BooleanQuery which is then handed to Solr. Since Lucene does not know about the LocalParams it's tricky to get them to play properly when dealing with complex queries. My first solution was to prefix the LocalParams to the field name, this worked fine until I ran the range query. Changing to use the v= field of a LocalParam would work from the query structure perspective, however getting that into a Lucene Query object will be a fun exercise. `Mark - Original Message - From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Sent: Thursday, April 10, 2014 4:33:19 PM Subject: Re: Range query and join, oarse exception when parens are added Mark: first off, the details matter. Nothing in your first email mae it clear that the {!join} query you were refering to was not the entirety of your query param -- which is part of the confusion and was a significant piece of Shawn's answer. Had you posted the *exact* request you were sending (with all params) and the full response you got, the root cause of your problem would have been a lot more obvious to some folks very quickly. As shawn mentioned, the {!...} syntax involves localparams and invoking a nammed parser -- normaly this syntax is the *first* thig in a query string, and causes the *entire* string to be parsed by that parser. this is why something like this should work fine for you (please confirm if it does not)... q={!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50] ...because the {! at the begining of the param value tells solr to let that parser (in this case join) parse the entire string, using the localparam options specified (from=member_profile_doc_id to=id). the join parser then delegates to the lucene parser for it's body (language_proficiency_id_number:[30 TO 50]) However ... when the {! syntax is not the first thing in the param value, what happens is that the default parser (lucene) is used -- the lucene parser has a handy feature that supports looking for the {! syntax nested inside of it ... which means you can do things like this... q=+(+{!prefix f=foo}bar -{!term f=yak}wak) -aaa:zzz ...however there is an important caveat to this: when the lucene parser is looking for the {! syntax to know when you want it to delegate to another parser, how does it know if/when you intended for the input to that nested parser to end? Specifically, in the example above, how does it know if the input to the prefix parser was ment to be bar or bar -{!term f=yak}wak) -aaa:zzz The answer is that it's conservative and assumes the input to the nested parser stops as soon as it sees something that looks like the end of a the current clause: whitespace or an open/close parent for example. Which brings us to your specific example... : +({!join from=member_profile_doc_id to=id}language_noun:english : ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30 : {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 : {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50)) ...in this case, when the lucene parser sees the nested {!join...} parsers it has no problem, because it's conservative rulees about the end of the clauses match up with what you expect given the simple term queries. if you change those individual term queries to a range query however... +({!join from=member_profile_doc_id to=id}language_noun:english {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 50]) ...in this case, the lucene parser sees the {!join} syntax and delegates to the join parser, but it assumes only the language_proficiency_id_number:[30 portion of the input is ment for that parser, and hangs on to the TO 50] to parse ass additional clauses. the join parse doesn't really care about the input, but when it delegates to another nested instance of the lucene parser, the input language_proficiency_id_number:[30 isn't valid because it's the start of an unfinished range query. Does that make sense so far? As for the solution: when you use the {!foo} syntax, the local param v can be used to specify the main input to the nested parser instead of the usual prefix-ish syntax -- and this scopes the input un-ambiguiously... +({!join from=member_profile_doc_id to=id v='language_noun:english'} {!join from=member_profile_doc_id to=id v='language_proficiency_id_number:[30 TO 50]'}) FWIW, you can also use param derefrencing it it helps make things easier to read for you (and/or if you need to include nested quotes and don't want to deal with the escaping)... q=+({!join from=$from to=id v=$noun} {!join from=$from to=id v=$prof)} from=member_profile_doc_id $noun=language_noun:english
Re: best way to contribute solr??
any help related to my previous mail update?? On Thu, Apr 10, 2014 at 7:52 PM, Aman Tandon amantandon...@gmail.comwrote: thanks sir, i always smile when people here are always ready for help, i am thankful to all, and yes i started learning by reading daily at least 50-60 mails to increase my knowledge gave my suggestion if i am familiar with it, people here correct me as well if i am wrong. I know it will take time but someday i will contribute as well and thanks for setup it will be quite helpful. In my office i am using solr 4.2 with tomcat right now i am stucked because i don't know how to integrate solr 4.7 with my tomcat, because the problem for me is that i am familiar with the cores architecture of solr 4.2 in which we defined the every core name as well as instanceDir but not with solr 4.7. Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson erickerick...@gmail.comwrote: Aman: Here's another helpful resource: http://wiki.apache.org/solr/HowToContribute It tells you how to get the source code, set up an IDE etc. for Solr/Lucene In addition to Alexandre's suggestions, one possibility (but I warn you it can be challenging) is to create unit tests. Part of the build report each night has a coverage, you can get to the latest build here: https://wiki.apache.org/solr/NightlyBuilds click on clover test coverage and pick something, track down what isn't covered (see the clover report link for instance). Warning: You will be completely lost for a while. This is hard stuff when you're just starting out especially. So choose the simplest thing you can for the first go to get familiar with the process if you want to try this. Another place to start is...the user's list. Pick one question a day, research it and try to provide an answer. Clearly label your responses with the degree of certainty you have. Another caution: you'll research something and get back to the list to discover its already been answered sometimes but you'll have gained the knowledge and it gets better over time. Best, Erick On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks sir, I will look into this. Solr and its developer are all helpful and awesome, i am feeling great. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on solr, actually just started working on solr for the geospatial search(not using JTS) only, To be very frank I learned about faceting from Mr Yonik's tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all. What is your suggestion now and yesterday i suscribed for solr-start as well. And sir what do you mean by *Create a basic
Re: Relevance/Rank
What Shawn said. q=*:* is a constant score query, i.e. every match has a score as 1.0. fq clauses don't contribute to the score. The boosts you're specifying have absolutely no effect. Move the fq clause to your main query (q=) to see any effect. Try adding debug=all to your query and look at the explanation of how the score is calculated, I suspect you'll find them all 1.0. Best, Erick On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Eric, Below is the query part select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2 I am not getting the Name Match record in the first list , I am getting always SKU matching Record. Any help is really appreciated. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 10, 2014 3:35 PM To: solr-user@lucene.apache.org Subject: Re: Relevance/Rank What kind of field is Name? Assuming it's string, you should be able to boost it. Boosts are not relevant to filters (fq) clauses at all, where were you trying to add the boost? You need to provide significantly more information to get a more helpful answer. You might review: http://wiki.apache.org/solr/UsingMailingLists bq: I am getting confused after going thru different sites. You're in luck, the Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide) is becoming the collected source of information. Also, here's a sorely-needed set of up-to-date info: http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I have, there may be more recent copies). Best, Erick On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks
Re: Relevance/Rank
Hi Ravi, For the better analysis for ranking of documents, you should need to query the index with these extra parameters(in bold) eg...whole_query*debug=truewt=xml.* Copy that xml and and paste it to http://explain.solr.pl/ you can then easily find out the ranking alalysis in the forms of the pie charts and how much weight is giving to every parameters in your solr config and in the query. On Fri, Apr 11, 2014 at 5:56 AM, Erick Erickson erickerick...@gmail.comwrote: What Shawn said. q=*:* is a constant score query, i.e. every match has a score as 1.0. fq clauses don't contribute to the score. The boosts you're specifying have absolutely no effect. Move the fq clause to your main query (q=) to see any effect. Try adding debug=all to your query and look at the explanation of how the score is calculated, I suspect you'll find them all 1.0. Best, Erick On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Eric, Below is the query part select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2 I am not getting the Name Match record in the first list , I am getting always SKU matching Record. Any help is really appreciated. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 10, 2014 3:35 PM To: solr-user@lucene.apache.org Subject: Re: Relevance/Rank What kind of field is Name? Assuming it's string, you should be able to boost it. Boosts are not relevant to filters (fq) clauses at all, where were you trying to add the boost? You need to provide significantly more information to get a more helpful answer. You might review: http://wiki.apache.org/solr/UsingMailingLists bq: I am getting confused after going thru different sites. You're in luck, the Solr Reference Guide ( https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide ) is becoming the collected source of information. Also, here's a sorely-needed set of up-to-date info: http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I have, there may be more recent copies). Best, Erick On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks -- With Regards Aman Tandon
Re: Relevance/Rank
Hello Erick, I am confused here, how does the boosting will not affect if he is boosting the name products by 2 because he is filtering the results and then applying the boost. On Fri, Apr 11, 2014 at 6:12 AM, Aman Tandon amantandon...@gmail.comwrote: Hi Ravi, For the better analysis for ranking of documents, you should need to query the index with these extra parameters(in bold) eg...whole_query*debug=truewt=xml.* Copy that xml and and paste it to http://explain.solr.pl/ you can then easily find out the ranking alalysis in the forms of the pie charts and how much weight is giving to every parameters in your solr config and in the query. On Fri, Apr 11, 2014 at 5:56 AM, Erick Erickson erickerick...@gmail.comwrote: What Shawn said. q=*:* is a constant score query, i.e. every match has a score as 1.0. fq clauses don't contribute to the score. The boosts you're specifying have absolutely no effect. Move the fq clause to your main query (q=) to see any effect. Try adding debug=all to your query and look at the explanation of how the score is calculated, I suspect you'll find them all 1.0. Best, Erick On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Eric, Below is the query part select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2 I am not getting the Name Match record in the first list , I am getting always SKU matching Record. Any help is really appreciated. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 10, 2014 3:35 PM To: solr-user@lucene.apache.org Subject: Re: Relevance/Rank What kind of field is Name? Assuming it's string, you should be able to boost it. Boosts are not relevant to filters (fq) clauses at all, where were you trying to add the boost? You need to provide significantly more information to get a more helpful answer. You might review: http://wiki.apache.org/solr/UsingMailingLists bq: I am getting confused after going thru different sites. You're in luck, the Solr Reference Guide ( https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide ) is becoming the collected source of information. Also, here's a sorely-needed set of up-to-date info: http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I have, there may be more recent copies). Best, Erick On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL Server. I have a query something like fq=(SKU:123-87458) OR Name: 123-87458 I need to get the Exact Match as first in the results, In this case SKU. But also I can change to display Name in the List which is not exact match but match , the value can be find some where in the Name? In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer Rank Name as 1 and SKU as 2 in the results. Is this Possible , I tried Boosting but that it seems it is for Text, correct me if I am wrong on understanding and any example will be really appreciated. I am getting confused after going thru different sites. Thanks -- With Regards Aman Tandon -- With Regards Aman Tandon
svn vs GIT
Hi, I am new here, i have question in mind that why we are preferring the svn more than git? -- With Regards Aman Tandon
Re: multiple analyzers for one field
The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles. I think the best way to accomplish this is to concoct a single field that contains data from these other source fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case. I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it. So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and tokenizes longer ones, which is probably what I'll look at if nothing else comes up. Thanks Mike On 4/9/2014 4:16 PM, Michael Sokolov wrote: I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal? Which is to have a suggester that suggests words from my full text field and complete phrases drawn from my author and title fields all at the same time. So If I could index author and title using KeyWordAnalyzer, and full text tokenized, that would be the bees knees. -Mike
deleting large amount data from solr cloud
[solr version 4.3.1] Hello, I have a solr cloud (4 nodes - 2 shards) with a fairly large amount documents (~360G of index per shard). Now, a major portion of the data is not required and I need to delete those documents. I would need to delete around 75% of the data. One of the solutions could be to drop the index completely re-index. But this is not an option at the moment. When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. Thank! Vinay
Re: best way to contribute solr??
Put separate issues into separate emails. That way new people will look at the new thread. As it was, it was out of the conversation flow and got lost. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 7:16 AM, Aman Tandon amantandon...@gmail.com wrote: any help related to my previous mail update?? On Thu, Apr 10, 2014 at 7:52 PM, Aman Tandon amantandon...@gmail.comwrote: thanks sir, i always smile when people here are always ready for help, i am thankful to all, and yes i started learning by reading daily at least 50-60 mails to increase my knowledge gave my suggestion if i am familiar with it, people here correct me as well if i am wrong. I know it will take time but someday i will contribute as well and thanks for setup it will be quite helpful. In my office i am using solr 4.2 with tomcat right now i am stucked because i don't know how to integrate solr 4.7 with my tomcat, because the problem for me is that i am familiar with the cores architecture of solr 4.2 in which we defined the every core name as well as instanceDir but not with solr 4.7. Thanks Aman Tandon On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson erickerick...@gmail.comwrote: Aman: Here's another helpful resource: http://wiki.apache.org/solr/HowToContribute It tells you how to get the source code, set up an IDE etc. for Solr/Lucene In addition to Alexandre's suggestions, one possibility (but I warn you it can be challenging) is to create unit tests. Part of the build report each night has a coverage, you can get to the latest build here: https://wiki.apache.org/solr/NightlyBuilds click on clover test coverage and pick something, track down what isn't covered (see the clover report link for instance). Warning: You will be completely lost for a while. This is hard stuff when you're just starting out especially. So choose the simplest thing you can for the first go to get familiar with the process if you want to try this. Another place to start is...the user's list. Pick one question a day, research it and try to provide an answer. Clearly label your responses with the degree of certainty you have. Another caution: you'll research something and get back to the list to discover its already been answered sometimes but you'll have gained the knowledge and it gets better over time. Best, Erick On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks sir, I will look into this. Solr and its developer are all helpful and awesome, i am feeling great. Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Sure, you can do it in Java too. The difference is that Solr comes with Java client SolrJ which is tested and kept up-to-date. But there could still be more tutorials. For other languages/clients, there is a lot less information available. Especially, if you start adding (human) languages into it. E.g. how to process your own language (if non-English). And there are many more ideas on Slide 26 of http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup . As well as an example of processing pipeline for Thai. More of these kinds of things would be useful too. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote: Thank you so much sir :) Can i try in java as well? Thanks Aman Tandon On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Great, Solr + Perl + Geospatial. There are two Perl clients for Solr listed on the Wiki: http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If yes, add them to the Wiki (need to ask permission to edit Wiki). Are those two listed clients dead or alive? Do they work with Solr 4.7.1? Can you make them work with Solr 4.7.1 and recent version of Perl? Can you do a small demo that uses Perl client to index some geospatial information and then do a search for it? I strongly suspect you will hit some interesting issues. Find the fix, contribute back to the Perl library maintainer. Or, at least, clearly describe the issue, if you don't yet know enough to contribute the fix. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote: Okay sir i will mail to solr-user only, I am feeling so thankful to you for all you help, i am java developer with a good knowledge of perl, working on
Re: svn vs GIT
You can find the read-only Git's version of Lucene+Solr source code here: https://github.com/apache/lucene-solr . The SVN preference is Apache Foundation's choice and legacy. Most of the developers' workflows are also around SVN. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 7:48 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am new here, i have question in mind that why we are preferring the svn more than git? -- With Regards Aman Tandon
Re: multiple analyzers for one field
It's an interesting question. To start from, the copyField copies the source content, so there is no source-related tokenization description. Only the target's one. So, that approach is not suitable. Regarding the lookups/auto-complete. There has been a bunch of various implementations added recently, but they are not really documented. Things like BlendedInfixSuggester are a bit hard to discover at the moment. So, there might be something there if one digs a lot. The other option is to do the tokenization in the UpdateRequestProcessor chain. You could clone a field, and do some processing so that by the time the content hits solr, it's already pre-tokenized into multi-value field. Then, you could have KeywordTokenizer on your collector field and separate URPs sub-chains for each original fields that go into that. One related hack would be to create a subclass of FieldMutatingUpdateProcessorFactory that wraps an arbitrary tokenizer and splits out tokens as multi-value output. This is a bit hazy, even in my own mind, but hopefully gives you something new to think about. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 8:05 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles. I think the best way to accomplish this is to concoct a single field that contains data from these other source fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case. I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it. So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and tokenizes longer ones, which is probably what I'll look at if nothing else comes up. Thanks Mike On 4/9/2014 4:16 PM, Michael Sokolov wrote: I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal? Which is to have a suggester that suggests words from my full text field and complete phrases drawn from my author and title fields all at the same time. So If I could index author and title using KeyWordAnalyzer, and full text tokenized, that would be the bees knees. -Mike
Re: svn vs GIT
thanks sir, in that case i need to know about svn as well. Thanks Aman Tandon On Fri, Apr 11, 2014 at 7:26 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: You can find the read-only Git's version of Lucene+Solr source code here: https://github.com/apache/lucene-solr . The SVN preference is Apache Foundation's choice and legacy. Most of the developers' workflows are also around SVN. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 7:48 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am new here, i have question in mind that why we are preferring the svn more than git? -- With Regards Aman Tandon
Re: multiple analyzers for one field
Hi Michael, It IS possible to utilize multiple Analyzers within a single field, but it's not a built in capability of Solr right now. I wrote something I called a MultiTextField which provides this capability, and you can see the code here: https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14 The general idea is that you can pass in a prefix for each piece of your content and then use that prefix to dynamically select one or more Analyzers for each piece of content. So, for example, you could pass in something like this when indexing your document (for a multiValued field): field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field Then, the MultiTextField will parse the prefixes and dynamically grab an Analyzer based upon the prefix. In this case, the first input will be processed using an English Analyzer, the second input will use a spanish analyzer, and the third input will use both a German and French analyzer, as defined when the field is defined in the schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ field name=someMultiTextField type=multiText indexed=true multiValued=true / If you want to automagically map separate fields into one of these dynamic analyzer (MultiText) fields with prefixes, you could either pass the text in multiple times from the client to the same field (with different Analyzer prefixes each time like shown above), OR you could write an Update Request Processor that does this for you. I don't think it is possible to just have the copyField add in prefixes automatically for you, though someone please correct me if I'm wrong. If you implement an Update Request Processor, then inside it you would simply grab the text from each of the relevant fields (i.e. author and title fields) and then add that field's value to the named MultiText field with the appropriate Analyzer prefix based upon each field. I made an example Update Request Processor (see the previous github link and look for MultiTextFieldLanguageIdentifierUpdateProcessor) that you could look at as an example of how to supply different analyzer prefixes to different values within a multiValued field, though you would obviously want to throw away all the language detection stuff since it doesn't match your specific use case. All that being said, this solution may end up being overly complicated for your use case, so your idea of creating a custom analyzer to just handle your example might be much less complicated. At any rate, that's the specific answer to your specific question about whether it is possible to utilize multiple Analyzers within a field based upon multiple inputs. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @ CareerBuilder On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles. I think the best way to accomplish this is to concoct a single field that contains data from these other source fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case. I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it. So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and tokenizes longer ones, which is probably what I'll look at if nothing else comes up. Thanks Mike On 4/9/2014 4:16 PM, Michael Sokolov wrote: I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal? Which is to have a suggester that suggests
Re: multiple analyzers for one field
Yes, I see - I could essentially do the tokenization myself (or using some Analyzer chain) in an Update Processor. Yes I think that could work. Thanks, Alex! -Mike On 4/10/14 10:09 PM, Alexandre Rafalovitch wrote: It's an interesting question. To start from, the copyField copies the source content, so there is no source-related tokenization description. Only the target's one. So, that approach is not suitable. Regarding the lookups/auto-complete. There has been a bunch of various implementations added recently, but they are not really documented. Things like BlendedInfixSuggester are a bit hard to discover at the moment. So, there might be something there if one digs a lot. The other option is to do the tokenization in the UpdateRequestProcessor chain. You could clone a field, and do some processing so that by the time the content hits solr, it's already pre-tokenized into multi-value field. Then, you could have KeywordTokenizer on your collector field and separate URPs sub-chains for each original fields that go into that. One related hack would be to create a subclass of FieldMutatingUpdateProcessorFactory that wraps an arbitrary tokenizer and splits out tokens as multi-value output. This is a bit hazy, even in my own mind, but hopefully gives you something new to think about. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 8:05 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles. I think the best way to accomplish this is to concoct a single field that contains data from these other source fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case. I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it. So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and tokenizes longer ones, which is probably what I'll look at if nothing else comes up. Thanks Mike On 4/9/2014 4:16 PM, Michael Sokolov wrote: I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal? Which is to have a suggester that suggests words from my full text field and complete phrases drawn from my author and title fields all at the same time. So If I could index author and title using KeyWordAnalyzer, and full text tokenized, that would be the bees knees. -Mike
Re: multiple analyzers for one field
Thanks for you detailed answer, Trey! I guess it helps to have just written that book :) By the way - I am eager to get it on our platform (safariflow.com -- but I think it hasn't arrived from Manning yet). I had a half-baked idea about using a prefix like that. It did seem like it would be somewhat complicated, but certainly with your example code I'd have a leg up - thanks again. -Mike On 4/10/14 10:42 PM, Trey Grainger wrote: Hi Michael, It IS possible to utilize multiple Analyzers within a single field, but it's not a built in capability of Solr right now. I wrote something I called a MultiTextField which provides this capability, and you can see the code here: https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14 The general idea is that you can pass in a prefix for each piece of your content and then use that prefix to dynamically select one or more Analyzers for each piece of content. So, for example, you could pass in something like this when indexing your document (for a multiValued field): field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field Then, the MultiTextField will parse the prefixes and dynamically grab an Analyzer based upon the prefix. In this case, the first input will be processed using an English Analyzer, the second input will use a spanish analyzer, and the third input will use both a German and French analyzer, as defined when the field is defined in the schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ field name=someMultiTextField type=multiText indexed=true multiValued=true / If you want to automagically map separate fields into one of these dynamic analyzer (MultiText) fields with prefixes, you could either pass the text in multiple times from the client to the same field (with different Analyzer prefixes each time like shown above), OR you could write an Update Request Processor that does this for you. I don't think it is possible to just have the copyField add in prefixes automatically for you, though someone please correct me if I'm wrong. If you implement an Update Request Processor, then inside it you would simply grab the text from each of the relevant fields (i.e. author and title fields) and then add that field's value to the named MultiText field with the appropriate Analyzer prefix based upon each field. I made an example Update Request Processor (see the previous github link and look for MultiTextFieldLanguageIdentifierUpdateProcessor) that you could look at as an example of how to supply different analyzer prefixes to different values within a multiValued field, though you would obviously want to throw away all the language detection stuff since it doesn't match your specific use case. All that being said, this solution may end up being overly complicated for your use case, so your idea of creating a custom analyzer to just handle your example might be much less complicated. At any rate, that's the specific answer to your specific question about whether it is possible to utilize multiple Analyzers within a field based upon multiple inputs. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @ CareerBuilder On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles. I think the best way to accomplish this is to concoct a single field that contains data from these other source fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case. I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it. So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and
Re: Relevance/Rank
Aman: Oops, looked at the wrong part of the query, didn't see the bq clause. You're right of course. Sorry for the misdirection. Erick
Re: deleting large amount data from solr cloud
First, there is no master node, just leaders and replicas. But that's a nit. No real clue why you would be going out of memory. Deleting a document, even by query should just mark the docs as deleted, a pretty low-cost operation. how much memory are you giving the JVM? Best, Erick On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com wrote: [solr version 4.3.1] Hello, I have a solr cloud (4 nodes - 2 shards) with a fairly large amount documents (~360G of index per shard). Now, a major portion of the data is not required and I need to delete those documents. I would need to delete around 75% of the data. One of the solutions could be to drop the index completely re-index. But this is not an option at the moment. When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. Thank! Vinay
Re: Relevance/Rank
Its fine Erick, I am guessing that maybe* fq=(SKU:204-161)... *this SKU with that value is present in all results that's why Name products are not getting boosted. Ravi: check your results without filtering, does all the results include *SKU:204-161. *I guess this may help. On Fri, Apr 11, 2014 at 9:22 AM, Erick Erickson erickerick...@gmail.comwrote: Aman: Oops, looked at the wrong part of the query, didn't see the bq clause. You're right of course. Sorry for the misdirection. Erick -- With Regards Aman Tandon
Re: Pushing content to Solr from Nutch
Does your Solr schema match the data output by nutch? It’s up to you to create a Solr schema that matches the output of nutch – read up on the nutch doc for that info. Solr doesn’t define that info, nutch does. -- Jack Krupansky From: Xavier Morera Sent: Thursday, April 10, 2014 12:58 PM To: solr-user@lucene.apache.org Subject: Pushing content to Solr from Nutch Hi, I have followed several Nutch tutorials - including the main one http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works, I can see in the console as the pages get crawled and the directories built with the data) but for the life of me I can't get anything posted to Solr. The Solr console doesn't even squint, therefore Nutch is not sending anything. This is the command that I send over that crawls and in theory should also post bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2 But I found that I could also use this one when it is already crawled bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb crawl/segments/* But no luck. This is the only thing that called my attention but I read that by adding the property below would work but doesn't work. No IndexWriters activated - check your configuration This is the property property nameplugin.includes/name valueprotocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)/value /property Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows. -- Xavier Morera email: xav...@familiamorera.com CR: +(506) 8849 8866 US: +1 (305) 600 4919 skype: xmorera
DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7
I am using *DeltaImportHandler* for indexing data in Solr. Currently I am manually indexing the data into Solr by selecting commands full-import or delta-import from the Solr Admin screen. I am using Windows 7 and would like to automate the process by specifying a certain time interval for executing the commands through windows task scheduler or something. e.g.: like every two minutes it should index data into solr. From few sites I came to know that I need to create a *batch file* with some command to run the imports and the batch file is run using *windows scheduler*. But there were no examples regarding this. I am not sure what to code in the batch file and how to link it with the scheduler. Can someone provide me the code and the steps to accomplish it? Thanks a lot in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7
DataImportHandler is just a URL call. You can see the specific URL you want to call by opening debugger window in Chrome/Firefox and looking at the network tab. Then, you have a general problem of how to call a URL from Windows Scheduler. Google brings a lot of results for that, so you should be able to find something you prefer. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 12:02 PM, harshrossi harshro...@gmail.com wrote: I am using *DeltaImportHandler* for indexing data in Solr. Currently I am manually indexing the data into Solr by selecting commands full-import or delta-import from the Solr Admin screen. I am using Windows 7 and would like to automate the process by specifying a certain time interval for executing the commands through windows task scheduler or something. e.g.: like every two minutes it should index data into solr. From few sites I came to know that I need to create a *batch file* with some command to run the imports and the batch file is run using *windows scheduler*. But there were no examples regarding this. I am not sure what to code in the batch file and how to link it with the scheduler. Can someone provide me the code and the steps to accomplish it? Thanks a lot in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565.html Sent from the Solr - User mailing list archive at Nabble.com.