Re: Solr groups not matching with terms in a field
Hi ahmet, Thanks, now i understand better, i will not try my usecase with grouping. Actually i am interested in unique terms in a field i.e tenant_pool. That i get perfectly with http://www.imagesup.net/?di=614212438580 But i am not able to get terms after applying some filter say type:1. That is I need unique terms in tenant_pool field for type:1 query and answer will be P1, L1. Please suggest me if i can get this with out reading each doc from disk. On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, I have never grouped on a tokenised field and I am not sure it makes sense to do so. Reading back ref-guide it says this about group.field parameter The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField https://cwiki.apache.org/confluence/display/solr/Result+Grouping Therefore, it should be single valued. P.S. Don't get confused with TextField type, for example it could create single token when used with keyword tokenizer. Ahmet On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, If you observe output ngroups is 1 and returning only one group P1. But my expectation is it should return three groups P1, L1, L2 as my field is tokenized with space. Please correct me if wrong? On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Everything looks correct, what is the problem here? If you want to see more than one document per group, there is a parameter for that which defaults to 1. Ahmet On Thursday, January 15, 2015 9:02 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, I had done following configuration to test Solr grouping concept. solr version : 4.6.1 (tried in latest version 4.10.3 also) Schema : http://www.imagesup.net/?di=10142124357616 Solrj code to insert docs : http://www.imagesup.net/?di=10142124381116 Response Group's : http://www.imagesup.net/?di=1114212438351 Response Terms' : http://www.imagesup.net/?di=614212438580 Please let me know if am i doing something wrong her
Re: Solr groups not matching with terms in a field
Hi Naresh, Yup terms component does not respect q or fq parameter. Luckily, thats easy with facet component. Example : facet=truefacet.field=tenant_poolq=type:1 Please see more here : https://cwiki.apache.org/confluence/display/solr/Faceting happy faceting, ahmet On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, Thanks, now i understand better, i will not try my usecase with grouping. Actually i am interested in unique terms in a field i.e tenant_pool. That i get perfectly with http://www.imagesup.net/?di=614212438580 But i am not able to get terms after applying some filter say type:1. That is I need unique terms in tenant_pool field for type:1 query and answer will be P1, L1. Please suggest me if i can get this with out reading each doc from disk. On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, I have never grouped on a tokenised field and I am not sure it makes sense to do so. Reading back ref-guide it says this about group.field parameter The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField https://cwiki.apache.org/confluence/display/solr/Result+Grouping Therefore, it should be single valued. P.S. Don't get confused with TextField type, for example it could create single token when used with keyword tokenizer. Ahmet On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, If you observe output ngroups is 1 and returning only one group P1. But my expectation is it should return three groups P1, L1, L2 as my field is tokenized with space. Please correct me if wrong? On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Everything looks correct, what is the problem here? If you want to see more than one document per group, there is a parameter for that which defaults to 1. Ahmet On Thursday, January 15, 2015 9:02 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, I had done following configuration to test Solr grouping concept. solr version : 4.6.1 (tried in latest version 4.10.3 also) Schema : http://www.imagesup.net/?di=10142124357616 Solrj code to insert docs : http://www.imagesup.net/?di=10142124381116 Response Group's : http://www.imagesup.net/?di=1114212438351 Response Terms' : http://www.imagesup.net/?di=614212438580 Please let me know if am i doing something wrong her
Re: OutOfMemoryError for PDF document upload into Solr
Hi Dan, neat idea - made a mental note :-) That brings us back to the point that in complex setups you should not do the document pre-processing directly in SOLR but have an import process which can safely crash when processing a 4GB PDF file Cheers, Siegfried Goeschl On 16.01.15 05:02, Dan Davis wrote: Why re-write all the document conversion in Java ;) Tika is very slow. 5 GB PDF is very big. If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output mode. The HTML mode captures some meta-data that would otherwise be lost. If you need to go faster still, you can also write some stuff linked directly against poppler library. Before you jump down by through about Tika being slow - I wrote a PDF indexer that ran at 36 MB/s per core. Different indexer, all C, lots of getjmp/longjmp. But fast... On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote: Siegfried and Michael Thank you for your replies and help. -Original Message- From: Siegfried Goeschl [mailto:sgoes...@gmx.at] Sent: Thursday, January 15, 2015 3:45 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemoryError for PDF document upload into Solr Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial -heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/11200277628550959 3336/posts w: appinions.com http://www.appinions.com/ On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread http-apr-8983-exec-6 java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.init(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at
Re: How to select the correct number of Shards in SolrCloud
Sharding a query lets you parallel the actual querying the index part of the search. But remember that as soon as you spread the query out more, you also need to bring all 64 results sets back together and consolidate them into a single result set for the end user. At some point, the gain of being able to search the data quicker is outweighed by the cost of this consolidation activity. One other point to mention, which we noticed as a by-product of some large-scale sharding we were testing (256 shards, no caches, whole different kettle of fish!) The resulting query is only as fast as the slowest shard. If you have 64 shards, and 8 shards/cores per machine, how many JVMs are you running per machine? If you have a single JVM with 8 cores in it, then remember as soon as that JVM enters a GC cycle, all those 8 cores will stall processing. If you have a query and it needs to get results from 64 cores, if 63 return in 100ms but the last core is in GC pause and takes 500ms, your query will take just over 500ms. With respect to sharding, I would never start with a large number of shards (and 64 is reasonably large in Solr terms). You might be able to get away without sharding at all, if that meets your latency requirements, then why bother with the complexity of sharding? Use those extra CPUs for processing more QPS instead of a single query faster? Lastly, you mentioned you allocated 32Gb to solr, do you mean to the JVM heap? That's quite a lot of a 64Gb machine, you haven't left much for the page cache. The general rule for Solr is to make the JVM heap as small as you can get away with, to let the OS page cache (which is needed to cache all the index files) with as much memory as possible. On 16 January 2015 at 05:58, Manohar Sripada manohar...@gmail.com wrote: Hi All, My Setup is as follows. There are 16 nodes in my SolrCloud and 4 CPU cores on each Solr Node VM. Each having 64 GB of RAM, out of which I have allocated 32 GB to Solr. I have a collection which contains around 100 million Docs, which I created with 64 shards, replication factor 2, and 8 shards per node. Each shard is getting around 1.6 Million Documents. The reason I have created 64 Shards is there are 4 CPU cores on each VM; while querying I can make use of all the CPU cores. On an average, Solr QTime is around 500ms here. Last time to my other discussion, Erick suggested that I might be over sharding, So, I tried reducing the number of shards to 32 and then 16. To my surprise, it started performing better. It came down to 300 ms (for 32 shards) and 100 ms (for 16 shards). I haven't tested with filters and facets yet here. But, the simple search queries had shown lot of improvement. So, how come the less number of shards performing better?? Is it because there are less number of posting lists to search on OR less merges that are happening? And how to determine the correct number of shards? Thanks, Manohar
Re: Solr groups not matching with terms in a field
I tried facetting also but not worked smoothly for me. Case i had mentioned in email is dummy one and my actual index is with 12 lakh docs and 2 GB size on single machine. Each of tenant_pool field value has 20-30 tokens. Getting all terms in tenant_pool is fast in seconds but when i go with facet path after filter criteria then that is very slow. Because it is reading whole field from disk and i am only interested in terms. On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Yup terms component does not respect q or fq parameter. Luckily, thats easy with facet component. Example : facet=truefacet.field=tenant_poolq=type:1 Please see more here : https://cwiki.apache.org/confluence/display/solr/Faceting happy faceting, ahmet On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, Thanks, now i understand better, i will not try my usecase with grouping. Actually i am interested in unique terms in a field i.e tenant_pool. That i get perfectly with http://www.imagesup.net/?di=614212438580 But i am not able to get terms after applying some filter say type:1. That is I need unique terms in tenant_pool field for type:1 query and answer will be P1, L1. Please suggest me if i can get this with out reading each doc from disk. On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, I have never grouped on a tokenised field and I am not sure it makes sense to do so. Reading back ref-guide it says this about group.field parameter The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField https://cwiki.apache.org/confluence/display/solr/Result+Grouping Therefore, it should be single valued. P.S. Don't get confused with TextField type, for example it could create single token when used with keyword tokenizer. Ahmet On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, If you observe output ngroups is 1 and returning only one group P1. But my expectation is it should return three groups P1, L1, L2 as my field is tokenized with space. Please correct me if wrong? On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Everything looks correct, what is the problem here? If you want to see more than one document per group, there is a parameter for that which defaults to 1. Ahmet On Thursday, January 15, 2015 9:02 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, I had done following configuration to test Solr grouping concept. solr version : 4.6.1 (tried in latest version 4.10.3 also) Schema : http://www.imagesup.net/?di=10142124357616 Solrj code to insert docs : http://www.imagesup.net/?di=10142124381116 Response Group's : http://www.imagesup.net/?di=1114212438351 Response Terms' : http://www.imagesup.net/?di=614212438580 Please let me know if am i doing something wrong her
Re: How to select the correct number of Shards in SolrCloud
On 1/15/2015 10:58 PM, Manohar Sripada wrote: The reason I have created 64 Shards is there are 4 CPU cores on each VM; while querying I can make use of all the CPU cores. On an average, Solr QTime is around 500ms here. Last time to my other discussion, Erick suggested that I might be over sharding, So, I tried reducing the number of shards to 32 and then 16. To my surprise, it started performing better. It came down to 300 ms (for 32 shards) and 100 ms (for 16 shards). I haven't tested with filters and facets yet here. But, the simple search queries had shown lot of improvement. So, how come the less number of shards performing better?? Is it because there are less number of posting lists to search on OR less merges that are happening? And how to determine the correct number of shards? Daniel has replied with good information. One additional problem I can think of when there are too many shards: If your Solr server is busy enough to have any possibility of simultaneous requests, then you will find that it's NOT a good idea to create enough shards to use all your CPU cores. In that situation, when you do a single query, all your CPU cores will be in use. When multiple queries happen at the same time, they have to share the available CPU resources, slowing them down. With a smaller number of shards, the additional CPU cores can handle simultaneous queries. I have an index with nearly 100 million documents. I've divided it into six large cold shards and one very small hot shard. It's not SolrCloud. I put three large shards on each of two servers, and the small shard on one of those two servers. The distributed query normally happens on the server without the small shard. Each server has 8 CPU cores and 64GB of RAM. Solr requires a 6GB heap. My median QTime over the last 231836 queries is 25 milliseconds and my 95th percentile QTime is 376 milliseconds. My query rate is pretty low - I've never seen Solr's statistics for the 15 minute query rate go above a single digit per second. Thanks, Shawn
Re: Solr groups not matching with terms in a field
Hi, Thats a different problem : speed-up faceting. Faceting used all over the place and it is fast. I suggest you looks for faceting improvements. Ahmet On Friday, January 16, 2015 11:17 AM, Naresh Yadav nyadav@gmail.com wrote: I tried facetting also but not worked smoothly for me. Case i had mentioned in email is dummy one and my actual index is with 12 lakh docs and 2 GB size on single machine. Each of tenant_pool field value has 20-30 tokens. Getting all terms in tenant_pool is fast in seconds but when i go with facet path after filter criteria then that is very slow. Because it is reading whole field from disk and i am only interested in terms. On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Yup terms component does not respect q or fq parameter. Luckily, thats easy with facet component. Example : facet=truefacet.field=tenant_poolq=type:1 Please see more here : https://cwiki.apache.org/confluence/display/solr/Faceting happy faceting, ahmet On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, Thanks, now i understand better, i will not try my usecase with grouping. Actually i am interested in unique terms in a field i.e tenant_pool. That i get perfectly with http://www.imagesup.net/?di=614212438580 But i am not able to get terms after applying some filter say type:1. That is I need unique terms in tenant_pool field for type:1 query and answer will be P1, L1. Please suggest me if i can get this with out reading each doc from disk. On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, I have never grouped on a tokenised field and I am not sure it makes sense to do so. Reading back ref-guide it says this about group.field parameter The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField https://cwiki.apache.org/confluence/display/solr/Result+Grouping Therefore, it should be single valued. P.S. Don't get confused with TextField type, for example it could create single token when used with keyword tokenizer. Ahmet On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, If you observe output ngroups is 1 and returning only one group P1. But my expectation is it should return three groups P1, L1, L2 as my field is tokenized with space. Please correct me if wrong? On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Everything looks correct, what is the problem here? If you want to see more than one document per group, there is a parameter for that which defaults to 1. Ahmet On Thursday, January 15, 2015 9:02 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, I had done following configuration to test Solr grouping concept. solr version : 4.6.1 (tried in latest version 4.10.3 also) Schema : http://www.imagesup.net/?di=10142124357616 Solrj code to insert docs : http://www.imagesup.net/?di=10142124381116 Response Group's : http://www.imagesup.net/?di=1114212438351 Response Terms' : http://www.imagesup.net/?di=614212438580 Please let me know if am i doing something wrong her
Re: Easiest way to embed solr in a desktop application
That's correct, even though it should still be possible to embed Jetty, that could change in the future, and that's why support for pluggable containers is being taken away. If you need to deal with the index at a lower level, there's always Lucene you can use as a library instead of Solr. But I am assuming you need to use the search engine at a higher level than that and hence you ask for Solr. In which case, I urge you to think through if you really can't run this out of process, may be this is an XY problem. Keep in mind that Solr has the ability to provide higher level functionality because it can control almost the entirety of the application (which is the philosophical reason behind removal of the war as well), and that's the reason something like EmbeddedSolrServer will always have caveats. On 15 Jan 2015 15:09, Robert Krüger krue...@lesspain.de wrote: I was considering the programmatic Jetty option but then I read that Solr 5 no longer supports being run with an external servlet container but maybe they still support programmatic jetty use in some way. atm I am using solr 4.x, so this would work. No idea if this gets messy classloader-wise in any way. I have been using exactly the approach you described in the past, i.e. I built a really, really simple swing dialogue to input queries and display results in a table but was just guessing that the built-in ui was far superior but maybe I should just live with it for the time being. On Thu, Jan 15, 2015 at 3:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: It’d certainly be easiest to just embed Jetty into your application. You don’t need to have Jetty as a separate process, you could launch it through it’s friendly Java API, configured to use solr.war. If all you needed was to make HTTP(-like) queries to Solr instead of the full admin UI, your application could stick to using EmbeddedSolrServer and also provide a UI that takes in a Solr query string (or builds one up) and then sends it to the embedded Solr and displays the result. Erik On Jan 15, 2015, at 9:44 AM, Robert Krüger krue...@lesspain.de wrote: Hi Andrea, you are assuming correctly. It is a local, non-distributed index that is only accessed by the containing desktop application. Do you know if there is a possibility to run the Solr admin UI on top of an embedded instance somehow? Thanks a lot, Robert On Thu, Jan 15, 2015 at 3:17 PM, Andrea Gazzarini a.gazzar...@gmail.com wrote: Hi Robert, I've used the EmbeddedSolrServer in a scenario like that and I never had problems. I assume you're talking about a standalone application, where the whole index resides locally and you don't need any cluster / cloud / distributed feature. I think the usage of EmbeddedSolrServer is discouraged in a (distributed) service scenario, because it is a direct connection to a SolrCore instance...but this is not a problem in the situation you described (as far as I know) Best, Andrea On 01/15/2015 03:10 PM, Robert Krüger wrote: Hi, I have been using an embedded instance of solr in my desktop application for a long time and it works fine. At the time when I made that decision (vs. firing up a solr web application within my swing application) I got the impression embedded use is somewhat unsupported and I should expect problems. My first question is, is this still the case now (4 years later), that embedded solr is discouraged? The one limitation I am running into is that I cannot use the solr admin UI for debugging purposes (mainly for running queries). Is there any other way to do this other than no longer using embedded solr and programmatically firing up a web application (e.g. using jetty)? Should I do the latter anyway? Any insights/advice greatly appreciated. Best regards, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Solr groups not matching with terms in a field
thanks Ahmet..my problem solved...reason of slow performance of facet query was : not doing setRows(0).. once i done it then it came out in seconds like terms query. On Fri, Jan 16, 2015 at 3:25 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Thats a different problem : speed-up faceting. Faceting used all over the place and it is fast. I suggest you looks for faceting improvements. Ahmet On Friday, January 16, 2015 11:17 AM, Naresh Yadav nyadav@gmail.com wrote: I tried facetting also but not worked smoothly for me. Case i had mentioned in email is dummy one and my actual index is with 12 lakh docs and 2 GB size on single machine. Each of tenant_pool field value has 20-30 tokens. Getting all terms in tenant_pool is fast in seconds but when i go with facet path after filter criteria then that is very slow. Because it is reading whole field from disk and i am only interested in terms. On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Yup terms component does not respect q or fq parameter. Luckily, thats easy with facet component. Example : facet=truefacet.field=tenant_poolq=type:1 Please see more here : https://cwiki.apache.org/confluence/display/solr/Faceting happy faceting, ahmet On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, Thanks, now i understand better, i will not try my usecase with grouping. Actually i am interested in unique terms in a field i.e tenant_pool. That i get perfectly with http://www.imagesup.net/?di=614212438580 But i am not able to get terms after applying some filter say type:1. That is I need unique terms in tenant_pool field for type:1 query and answer will be P1, L1. Please suggest me if i can get this with out reading each doc from disk. On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, I have never grouped on a tokenised field and I am not sure it makes sense to do so. Reading back ref-guide it says this about group.field parameter The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField https://cwiki.apache.org/confluence/display/solr/Result+Grouping Therefore, it should be single valued. P.S. Don't get confused with TextField type, for example it could create single token when used with keyword tokenizer. Ahmet On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com wrote: Hi ahmet, If you observe output ngroups is 1 and returning only one group P1. But my expectation is it should return three groups P1, L1, L2 as my field is tokenized with space. Please correct me if wrong? On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Naresh, Everything looks correct, what is the problem here? If you want to see more than one document per group, there is a parameter for that which defaults to 1. Ahmet On Thursday, January 15, 2015 9:02 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, I had done following configuration to test Solr grouping concept. solr version : 4.6.1 (tried in latest version 4.10.3 also) Schema : http://www.imagesup.net/?di=10142124357616 Solrj code to insert docs : http://www.imagesup.net/?di=10142124381116 Response Group's : http://www.imagesup.net/?di=1114212438351 Response Terms' : http://www.imagesup.net/?di=614212438580 Please let me know if am i doing something wrong her
Re: How to select the correct number of Shards in SolrCloud
Thanks Daniel and Shawn for your valuable suggestions, Daniel, If you have a query and it needs to get results from 64 cores, if 63 return in 100ms but the last core is in GC pause and takes 500ms, your query will take just over 500ms. There is only single JVM running per machine. I will get the QTime from each Solr Core and will check if this is the root cause. Lastly, you mentioned you allocated 32Gb to solr, do you mean to the JVM heap? That's quite a lot of a 64Gb machine, you haven't left much for the page cache. Yes, 32GB to Solr's JVM heap. I wanted to enable Filter FieldValue Cache, as most of my search queries revolves around filters and facets. Also, I am planning to use Document cache. Shawn, Each server has 8 CPU cores and 64GB of RAM. Solr requires a 6GB heap Can you please tell me what is the size of your index? And what is the size of the large cold shard? Can you please suggest if any tool that you use for collecting the statistics? like the QTime's for the queries etc. Thanks, Manohar On Fri, Jan 16, 2015 at 3:23 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/15/2015 10:58 PM, Manohar Sripada wrote: The reason I have created 64 Shards is there are 4 CPU cores on each VM; while querying I can make use of all the CPU cores. On an average, Solr QTime is around 500ms here. Last time to my other discussion, Erick suggested that I might be over sharding, So, I tried reducing the number of shards to 32 and then 16. To my surprise, it started performing better. It came down to 300 ms (for 32 shards) and 100 ms (for 16 shards). I haven't tested with filters and facets yet here. But, the simple search queries had shown lot of improvement. So, how come the less number of shards performing better?? Is it because there are less number of posting lists to search on OR less merges that are happening? And how to determine the correct number of shards? Daniel has replied with good information. One additional problem I can think of when there are too many shards: If your Solr server is busy enough to have any possibility of simultaneous requests, then you will find that it's NOT a good idea to create enough shards to use all your CPU cores. In that situation, when you do a single query, all your CPU cores will be in use. When multiple queries happen at the same time, they have to share the available CPU resources, slowing them down. With a smaller number of shards, the additional CPU cores can handle simultaneous queries. I have an index with nearly 100 million documents. I've divided it into six large cold shards and one very small hot shard. It's not SolrCloud. I put three large shards on each of two servers, and the small shard on one of those two servers. The distributed query normally happens on the server without the small shard. Each server has 8 CPU cores and 64GB of RAM. Solr requires a 6GB heap. My median QTime over the last 231836 queries is 25 milliseconds and my 95th percentile QTime is 376 milliseconds. My query rate is pretty low - I've never seen Solr's statistics for the 15 minute query rate go above a single digit per second. Thanks, Shawn
Apache Solr quickstart tutorial - error while loading main class SimplePostTool
I am following Apache Solr quickstart tutorial http://lucene.apache.org/solr/quickstart.html. The tutorial comes across indexing a directory of rich files which requires implementing java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ . I am getting an error which says: Could not find or load main class org.apache.solr.util.SimplePostTool inspite of following the quickstart tutorial closely. I am not getting how to resolve the error and proceed ahead with the tutorial. I would whole-heartedly appreciate any help. Thanks in advance. Regards, Shubhanshu Gupta LinkedIn https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B | Twitter https://twitter.com/Shubhanshugupta
Re: Apache Solr quickstart tutorial - error while loading main class SimplePostTool
Hi Shubhanshu, How about this one? java -classpath dist/solr-core-*jar -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ Ahmet On Friday, January 16, 2015 3:13 PM, Shubhanshu Gupta shubhanshu.gupt...@gmail.com wrote: I am following Apache Solr quickstart tutorial http://lucene.apache.org/solr/quickstart.html. The tutorial comes across indexing a directory of rich files which requires implementing java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ . I am getting an error which says: Could not find or load main class org.apache.solr.util.SimplePostTool inspite of following the quickstart tutorial closely. I am not getting how to resolve the error and proceed ahead with the tutorial. I would whole-heartedly appreciate any help. Thanks in advance. Regards, Shubhanshu Gupta LinkedIn https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B | Twitter https://twitter.com/Shubhanshugupta
Re: OutOfMemoryError for PDF document upload into Solr
On 16/01/2015 04:02, Dan Davis wrote: Why re-write all the document conversion in Java ;) Tika is very slow. 5 GB PDF is very big. Or you can run Tika in a separate process, or even on a separate machine, wrapped with something to cope if it dies due to some horrible input...we generally avoid document format translation within Solr and do it externally before feeding documents to Solr. Charlie If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output mode. The HTML mode captures some meta-data that would otherwise be lost. If you need to go faster still, you can also write some stuff linked directly against poppler library. Before you jump down by through about Tika being slow - I wrote a PDF indexer that ran at 36 MB/s per core. Different indexer, all C, lots of getjmp/longjmp. But fast... On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote: Siegfried and Michael Thank you for your replies and help. -Original Message- From: Siegfried Goeschl [mailto:sgoes...@gmx.at] Sent: Thursday, January 15, 2015 3:45 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemoryError for PDF document upload into Solr Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial -heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/11200277628550959 3336/posts w: appinions.com http://www.appinions.com/ On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread http-apr-8983-exec-6 java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.init(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at
Re: OutOfMemoryError for PDF document upload into Solr
Tika 1.6 has PDFBox 1.8.4, which has memory issues, eating excessive RAM! Either upgrade to Tika 1.7 (out now) or manually use the PDFBox 1.8.8 dependency. M. On Friday 16 January 2015 15:21:55 Charlie Hull wrote: On 16/01/2015 04:02, Dan Davis wrote: Why re-write all the document conversion in Java ;) Tika is very slow. 5 GB PDF is very big. Or you can run Tika in a separate process, or even on a separate machine, wrapped with something to cope if it dies due to some horrible input...we generally avoid document format translation within Solr and do it externally before feeding documents to Solr. Charlie If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output mode. The HTML mode captures some meta-data that would otherwise be lost. If you need to go faster still, you can also write some stuff linked directly against poppler library. Before you jump down by through about Tika being slow - I wrote a PDF indexer that ran at 36 MB/s per core. Different indexer, all C, lots of getjmp/longjmp. But fast... On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote: Siegfried and Michael Thank you for your replies and help. -Original Message- From: Siegfried Goeschl [mailto:sgoes...@gmx.at] Sent: Thursday, January 15, 2015 3:45 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemoryError for PDF document upload into Solr Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial -heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/11200277628550959 3336/posts w: appinions.com http://www.appinions.com/ On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread http-apr-8983-exec-6 java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.init(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120 ) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extracti ngDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten tStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequ est(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav a:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja va:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja va:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat ionFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte rChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve .java:220) at
Re: OutOfMemoryError for PDF document upload into Solr
It would be nice to have a SolrJ-level implementation as well as a command-line implementation of the extraction request handler so that app ingestion code could do the extraction outside of Solr at the app level and even as a separate process to stream to the app or Solr. That would permit the to do customization, entity extraction, boiler-plate removal, etc. in app-friendly code, before transport to the Solr server. The extraction request handler is a really cool feature and quite sufficient for a lot of scenarios, but additional architectural flexibility would be a big win. -- Jack Krupansky On Fri, Jan 16, 2015 at 10:21 AM, Charlie Hull char...@flax.co.uk wrote: On 16/01/2015 04:02, Dan Davis wrote: Why re-write all the document conversion in Java ;) Tika is very slow. 5 GB PDF is very big. Or you can run Tika in a separate process, or even on a separate machine, wrapped with something to cope if it dies due to some horrible input...we generally avoid document format translation within Solr and do it externally before feeding documents to Solr. Charlie If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output mode. The HTML mode captures some meta-data that would otherwise be lost. If you need to go faster still, you can also write some stuff linked directly against poppler library. Before you jump down by through about Tika being slow - I wrote a PDF indexer that ran at 36 MB/s per core. Different indexer, all C, lots of getjmp/longjmp. But fast... On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote: Siegfried and Michael Thank you for your replies and help. -Original Message- From: Siegfried Goeschl [mailto:sgoes...@gmx.at] Sent: Thursday, January 15, 2015 3:45 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemoryError for PDF document upload into Solr Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial -heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/11200277628550959 3336/posts w: appinions.com http://www.appinions.com/ On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread http-apr-8983-exec-6 java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.init(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse( AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load( ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
Re: Query ReRanking question
As per Erick's suggestion reposting my response to the group. Joel and Erick Thank you very much for helping me out with the ReRanking question a while ago. I have an alternative which seems to be working better for me than ReRanking, can you kindly let me know of any pitfalls that you guys can think of about the this approach ?? Since we value relevancy recency at the same time even though both are mutually exclusive, i thought maybe I can use the function queries to adjust the boost as follows boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1)) What I intended to do here is - if it matched a more recent doc it will take recency into consideration, however if the relevancy is better than date boost we keep relevancy. What do you guys think ?? Thanks, Ravi Kiran Bhaskar On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr ravis...@gmail.com wrote: Joel and Erick, Thank you very much for explaining how the ReRanking works. Now its a bit more clear. Thanks, Ravi Kiran Bhaskar On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote: Oops wrong usage pattern. It should be: 1) Main query is sorted by a field (scores tracked silently in the background). 2) Reranker is reRanking docs based on the score from the main query. Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com wrote: Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th
Re: Apache Solr quickstart tutorial - error while loading main class SimplePostTool
Thanks a lot. It did work. A last favor - can you please explain me, why did the old command didn't work and why this one worked? Although, I do know that the command you have given assumes that I did not set the environment through: export CLASSPATH =dist/solr-core-4.10.2.jar . But I already set the environment, still there was no effect. Please correct me if I am wrong anywhere. Thanks. LinkedIn https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B | Twitter https://twitter.com/Shubhanshugupta On Fri, Jan 16, 2015 at 7:26 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Shubhanshu, How about this one? java -classpath dist/solr-core-*jar -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ Ahmet On Friday, January 16, 2015 3:13 PM, Shubhanshu Gupta shubhanshu.gupt...@gmail.com wrote: I am following Apache Solr quickstart tutorial http://lucene.apache.org/solr/quickstart.html. The tutorial comes across indexing a directory of rich files which requires implementing java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ . I am getting an error which says: Could not find or load main class org.apache.solr.util.SimplePostTool inspite of following the quickstart tutorial closely. I am not getting how to resolve the error and proceed ahead with the tutorial. I would whole-heartedly appreciate any help. Thanks in advance. Regards, Shubhanshu Gupta LinkedIn https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B | Twitter https://twitter.com/Shubhanshugupta
Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name
Hello, I'm running Solr 4.10.2 out of the box with the Solr example. i.e. ant example cd solr/example java -jar start.jar in /example/log At start-up the example gives this message in the log: WARN - 2015-01-16 12:31:40.895; org.apache.solr.core.RequestHandlers; Multiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler Is this a bug? Is there something wrong with the out of the box example configuration? Tom
Solr numFound 0 but doc list empty in Solr Cloud setup
I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }
Re: Query ReRanking question
Ravi: Yep, this is the standard way to have recency influence the rank rather than take over absolute ordering via a sort=date_time or similar. Of course how strongly the rank is influenced is more an art than a science as far as figuring out what actual constants to put in Best, Erick On Fri, Jan 16, 2015 at 8:03 AM, Ravi Solr ravis...@gmail.com wrote: As per Erick's suggestion reposting my response to the group. Joel and Erick Thank you very much for helping me out with the ReRanking question a while ago. I have an alternative which seems to be working better for me than ReRanking, can you kindly let me know of any pitfalls that you guys can think of about the this approach ?? Since we value relevancy recency at the same time even though both are mutually exclusive, i thought maybe I can use the function queries to adjust the boost as follows boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1)) What I intended to do here is - if it matched a more recent doc it will take recency into consideration, however if the relevancy is better than date boost we keep relevancy. What do you guys think ?? Thanks, Ravi Kiran Bhaskar On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr ravis...@gmail.com wrote: Joel and Erick, Thank you very much for explaining how the ReRanking works. Now its a bit more clear. Thanks, Ravi Kiran Bhaskar On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote: Oops wrong usage pattern. It should be: 1) Main query is sorted by a field (scores tracked silently in the background). 2) Reranker is reRanking docs based on the score from the main query. Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com wrote: Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
As I said earlier - single core set up works fine with same solrconfig.xml and schema.xml cd example java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar I am running Solr-4.10. Do I need to change any other configuration for running in solr cloud mode ? On Friday, January 16, 2015 11:56 AM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
One more point: In cloud mode: If I submit a request with fl=id, it returns doc list. But when I add any other field, I get an empty doc list. http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1 { responseHeader: { status: 0, QTime: 7, params: { fl: id, shards: http://localhost:/solr/;, q: domain:ebay, wt: json, rows: 1 } }, response: { numFound: 17, start: 0, maxScore: 3.8559604, docs: [ { id: d8406557-6cd8-46d9-9a5e-29844387afc4 } ] } } Note: all of above works in single core mode. On Friday, January 16, 2015 12:13 PM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: As I said earlier - single core set up works fine with same solrconfig.xml and schema.xml cd example java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar I am running Solr-4.10. Do I need to change any other configuration for running in solr cloud mode ? On Friday, January 16, 2015 11:56 AM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }
Re: Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name
I've seen the same thing, poked around a bit and eventually decided to ignore it. I think there may be a ticket related to that saying it's a logging bug (ie not a real issue), but I couldn't swear to it. -Mike On 01/16/2015 12:36 PM, Tom Burton-West wrote: Hello, I'm running Solr 4.10.2 out of the box with the Solr example. i.e. ant example cd solr/example java -jar start.jar in /example/log At start-up the example gives this message in the log: WARN - 2015-01-16 12:31:40.895; org.apache.solr.core.RequestHandlers; Multiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler Is this a bug? Is there something wrong with the out of the box example configuration? Tom
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
Looks like a config issue to me more than anything else. Can you share your solrconfig? You will not be able to attach a file here but you could share it via pastebin or something similar. Also, why are you adding the shards=http://localhost:8983/solr/collection1; part to your request? You don't need to do that in most cases. On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: One more point: In cloud mode: If I submit a request with fl=id, it returns doc list. But when I add any other field, I get an empty doc list. http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1 { responseHeader: { status: 0, QTime: 7, params: { fl: id, shards: http://localhost:/solr/;, q: domain:ebay, wt: json, rows: 1 } }, response: { numFound: 17, start: 0, maxScore: 3.8559604, docs: [ { id: d8406557-6cd8-46d9-9a5e-29844387afc4 } ] } } Note: all of above works in single core mode. On Friday, January 16, 2015 12:13 PM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: As I said earlier - single core set up works fine with same solrconfig.xml and schema.xml cd example java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar I am running Solr-4.10. Do I need to change any other configuration for running in solr cloud mode ? On Friday, January 16, 2015 11:56 AM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } } -- Anshum Gupta http://about.me/anshumgupta
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
I followed all the steps listed here: http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I have not updated solrconfig.xml and it is same as what comes default with 4.10. The only thing I added extra was list of my fields in example/solr/collection1/conf/schema.xml @shards: If I query with out that param, it returns below error: http://localhost:/solr/collection1/select?q=*:* response lst name=responseHeader int name=status503/int int name=QTime3/int lst name=params str name=q*:*/str /lst /lst lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst /response On Friday, January 16, 2015 12:37 PM, Anshum Gupta ans...@anshumgupta.net wrote: Looks like a config issue to me more than anything else. Can you share your solrconfig? You will not be able to attach a file here but you could share it via pastebin or something similar. Also, why are you adding the shards=http://localhost:8983/solr/collection1; part to your request? You don't need to do that in most cases. On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: One more point: In cloud mode: If I submit a request with fl=id, it returns doc list. But when I add any other field, I get an empty doc list. http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1 { responseHeader: { status: 0, QTime: 7, params: { fl: id, shards: http://localhost:/solr/;, q: domain:ebay, wt: json, rows: 1 } }, response: { numFound: 17, start: 0, maxScore: 3.8559604, docs: [ { id: d8406557-6cd8-46d9-9a5e-29844387afc4 } ] } } Note: all of above works in single core mode. On Friday, January 16, 2015 12:13 PM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: As I said earlier - single core set up works fine with same solrconfig.xml and schema.xml cd example java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar I am running Solr-4.10. Do I need to change any other configuration for running in solr cloud mode ? On Friday, January 16, 2015 11:56 AM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } } -- Anshum Gupta http://about.me/anshumgupta
Re: Solr numFound 0 but doc list empty in Solr Cloud setup
Anshuman, You are right about @shards param not required. One of my shard was down and hence when I added shards.tolerant=true, it worked without shards param. However document list is still empty. content of solrconfig.xml http://pastebin.com/CJxD22t1 On Friday, January 16, 2015 1:24 PM, Jaikit Savla jaikit.sa...@yahoo.com wrote: I followed all the steps listed here: http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I have not updated solrconfig.xml and it is same as what comes default with 4.10. The only thing I added extra was list of my fields in example/solr/collection1/conf/schema.xml @shards: If I query with out that param, it returns below error: http://localhost:/solr/collection1/select?q=*:* response lst name=responseHeader int name=status503/int int name=QTime3/int lst name=params str name=q*:*/str /lst /lst lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst /response On Friday, January 16, 2015 12:37 PM, Anshum Gupta ans...@anshumgupta.net wrote: Looks like a config issue to me more than anything else. Can you share your solrconfig? You will not be able to attach a file here but you could share it via pastebin or something similar. Also, why are you adding the shards=http://localhost:8983/solr/collection1; part to your request? You don't need to do that in most cases. On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: One more point: In cloud mode: If I submit a request with fl=id, it returns doc list. But when I add any other field, I get an empty doc list. http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1 { responseHeader: { status: 0, QTime: 7, params: { fl: id, shards: http://localhost:/solr/;, q: domain:ebay, wt: json, rows: 1 } }, response: { numFound: 17, start: 0, maxScore: 3.8559604, docs: [ { id: d8406557-6cd8-46d9-9a5e-29844387afc4 } ] } } Note: all of above works in single core mode. On Friday, January 16, 2015 12:13 PM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: As I said earlier - single core set up works fine with same solrconfig.xml and schema.xml cd example java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar I am running Solr-4.10. Do I need to change any other configuration for running in solr cloud mode ? On Friday, January 16, 2015 11:56 AM, Jaikit Savla jaikit.sa...@yahoo.com.INVALID wrote: Verified that all my fields are stored and marked as indexed. field name=bodytype=string indexed=true stored=true multiValued=true / -- http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true { responseHeader: { status: 0, QTime: 19, params: { shards: http://localhost:/solr/collection1;, indent: true, start: 1, q: body:from, shards.info: true, wt: json, rows: 10 } }, shards.info: { http://localhost:/solr/collection1: { numFound: 1717, maxScore: 0.5327856, shardAddress: http://localhost:/solr/collection1;, time: 12 } }, response: { numFound: 1707, start: 1, maxScore: 0.5327856, docs: [ ] } } On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com wrote: Any chance that you've defined rows=0 in your handler? Or is it possible that you have not set stored=true for any of your fields? Best, Erick On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla jaikit.sa...@yahoo.com.invalid wrote: I am using below tutorial for Solr Cloud setup with 2 shards http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster I am able to get the default set up working. However, I have a requirement where my index is not in default location (data/index) and hence when I start jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I query I get results with numFound 0 but doc list is always empty. I verified that my index does have fields stored and indexed. Anyone else faced similar issue or have an idea on what I am missing ? Verified that by loading single core. Appreciate any help. request: http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1 response: { responseHeader: { status: 0, QTime: 18, params: { shards: http://localhost:/solr/collection1;, indent: true, q: body:\to\, _: 1421390858638, wt: json } }, response: { numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } } -- Anshum Gupta http://about.me/anshumgupta
Solr Cloud Stress Test
Hi, I am a student, planning to learn and do a features and functionality test of solr-cloud as one of my project. I liked to do the stress and performance test of solr-cloud on my local machine. (machine of 16gb ram, 250 gb ssd and 2.2 GHz Intel Core i7). Multiple features of cloud. What is the recommended way to get started with it? Thanks. David
Need Debug Direction on Performance Problem
Hi all, We have single solr index with 3 fixed fields(on of field is tokenized with space) and rest dynamic fields(string fields in range of 10-20). Current size of index is 2 GB with around 12 lakh docs and solr nodes are of 4 core, 16 gb ram linux machines. Writes performance is good then we tested one read query(In select query we are applying filter criteria on tokenized field reading only score field, no grouping/faceting) in two setups : *Setup1 : *Single Node Cloud with shards=1, replication=1 In this setup whole 12 lakh docs are on same machine. Our filter query reading around 10 lakh docs with only score field is taking *1 minutes*. *Setup2 : *Two Node Cloud with shards=2, replication=1 In this setup whole 6 lakh docs on node1 and 6 lakh on node2. Our same filter query reading around 10 lakh docs with only score field is taking *114 minutes.* Please guide us what can be possible reasons of degradation of performance after sharding of index. How can we check where solr server is taking time to return results. Thanks Naresh