Data Import Handler - resource not found - Jetty - Windows 7
Have most of experience working on Solr with Tomcat. However I recently started with Jetty. I am using Solr 4.7.0 on Windows 7. I have configured solr properly and am able to see the admin UI as well as velocity browse. Dataimporthandler screen is also getting displayed. However when I do a full import it fails with the following error: INFO - 2014-07-25 12:28:35.177; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/dataimport params={indent=truecommand=status_=1406271515176wt=json} status=0 QTime=0 ERROR - 2014-07-25 12:28:35.179; org.apache.solr.common.SolrException; java.io.IOException: Can't find resource 'C:/solr-4.7.0/example/solr/collection1/conf' in classpath or 'C:\solr-4.7.0\example\solr\collection1\conf' at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:342) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:134) Few Notes: My solrconfig.xml has dataimport configured and i have used: lib dir=C:/solr-4.7.0/example/solr/collection1/lib regex=solr-dataimporthandler-4.7.0.jar / lib dir=C:/solr-4.7.0/example/solr/collection1/lib regex=solr-dataimporthandler-extras-4.7.0.jar / lib dir=C:/solr-4.7.0/example/solr/collection1/lib regex=mysql-connector-java-5.1.18-bin.jar / Also my jars are present on those paths. On my core admin UI I can see correct datadir which is C:\solr-4.7.0\example\solr\collection1\data\ Any help would be appreciated. Thanks, Yavar -
Re: Shuffle results a little
from what i gather about reranking query is that it would further fine-pick results, rather than dispurse similarities, or am i looking at it the wrong way? -- View this message in context: http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any Solr consultants available??
On 24/07/2014 01:54, Alexandre Rafalovitch wrote: On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote: All the great Solr guys I know are quite busy. Sounds like an opportunity for somebody to put together a training hacker camp, similar to https://hackerbeach.org/ . Cross-train consultants in Solr, immediately increase their value. Do it somewhere on the beach or in the mountains, etc. If somebody organizes it, I would probably even be interested to teaching the first (newbie) part. And the graduation project would a be a solr-consutants.com website to make it easier to find those same consultants later. :-) Regards, Alex. P.s. Last issue of my newsletter had Solr big ideas. The one above was not in it, but it is - I believe - also viable. Contact me if it catches your fancy for more detailed brainstorming and notes sharing. We're definitely interested in the idea of 'growing' more Solr consultants, and eventually committers. Beaches and mountains are good too :) I think the skill shortage is a huge problem for the open source search world. Charlie Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: integrating Accumulo with solr
Dear Jack, Actually I am going to do benefit-cost analysis for in-house developement or going for sqrrl support. Best regards. On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com wrote: Like I said, you're going to have to be a real, hard-core gunslinger to do that well. Sqrrl uses Lucene directly, BTW: Full-Text Search: Utilizing open-source Lucene and custom indexing methods, Sqrrl Enterprise users can conduct real-time, full-text search across data in Sqrrl Enterprise. See: http://sqrrl.com/product/search/ Out of curiosity, why are you not using that integrated Lucene support of Sqrrl Enterprise? -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Thursday, July 24, 2014 3:07 PM To: solr-user@lucene.apache.org Subject: Re: integrating Accumulo with solr Dear Jack, Thank you. I am aware of datastax but I am looking for integrating accumulo with solr. This is something like what sqrrl guys offer. Regards. On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com wrote: If you are not a true hard-core gunslinger who is willing to dive in and integrate the code yourself, instead you should give serious consideration to a product such as DataStax Enterprise that fully integrates and packages a NoSQL database (Cassandra) and Solr for search. The security aspects are still a work in progress, but certainly headed in the right direction. And it has Hadoop and Spark integration as well. See: http://www.datastax.com/what-we-offer/products-services/ datastax-enterprise -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Thursday, July 24, 2014 10:30 AM To: solr-user@lucene.apache.org Subject: Re: integrating Accumulo with solr Thank you very much. Nice Idea but how can Solr and Accumulo can be synchronized in this way? I know that Solr can be integrated with HDFS and also Accumulo works on the top of HDFS. So can I use HDFS as integration point? I mean set Solr to use HDFS as a source of documents as well as the destination of documents. Regards. On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote: Ali, Sounds like a good choice. It's pretty standard to store the primary storage id as a field in Solr so that you can search the full text in Solr and then retrieve the full document elsewhere. I would recommend creating a document structure in Solr with whatever fields you want indexed (most likely as text_en, etc.), and then store a string field named content_id, which would be the Accumulo row id that you look up with a scan. One caveat -- Accumulo will be protected at the cell level, but if you need your Solr search results to be protected by complex authorization strings similar to Accumulo, you will need to write your own QParserPlugin and use post filtering: http://java.dzone.com/articles/custom-security-filtering-solr The code you see in that article is written for an earlier version of Solr, but it's not too difficult to adjust it for the latest (we've done so in our project). Once you've implemented this, you would store an authorizations string field in each Solr document, and pass in the authorizations that the user has access to in the fq parameter of every query. It's also not too bad to write something that parses the Accumulo authorizations string (like AB(C|D|E|F)) and interpret it accordingly in the QParserPlugin. This will give you true row level security in Solr and Accumulo, and it performs quite well in Solr. Let me know if you have any other questions. Joe On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Joe, Hi, I am going to store the crawl web pages in accumulo as the main storage part of my project and I need to give these data to solr for indexing and user searches. I need to do some social and web analysis on my data as well as having some security features. Therefore accumulo is my choice for the database part and for index and search I am going to use Solr. Would you please guide me through that? On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote: We store data in both Solr and Accumulo -- do you have more details about what kind of data and indexing you want? Is there a reason you're thinking of using both databases in particular? On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear All, Hi, I was wondering is there anybody out there that tried to integrate Solr with Accumulo? I was thinking about using Accumulo on top of HDFS and using Solr to index data inside Accumulo? Do you have any idea how can I do such integration? Best regards. -- A.Nazemian -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being
Re: Any Solr consultants available??
Well, if we do it in England, we could hire out a castle, I bet. :-) I am flexible on my holiday locations. And probably easier to do the first one in English. We can continue this on direct email, on the LinkedIn group (perfect place probably) and/or on the margins of the Solr Revolution. Target next spring/summer for the week-long event, work backwards from there. Talk to http://www.techstars.com/program/locations/london/ to specifically target the startups, etc Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Fri, Jul 25, 2014 at 3:17 PM, Charlie Hull char...@flax.co.uk wrote: On 24/07/2014 01:54, Alexandre Rafalovitch wrote: On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote: All the great Solr guys I know are quite busy. Sounds like an opportunity for somebody to put together a training hacker camp, similar to https://hackerbeach.org/ . Cross-train consultants in Solr, immediately increase their value. We're definitely interested in the idea of 'growing' more Solr consultants, and eventually committers. Beaches and mountains are good too :) I think the skill shortage is a huge problem for the open source search world. Charlie
Re: spatial search: find result in bbox OR first result outside bbox
Thanks a lot for your answer David! I'll check that out. Elisabeth 2014-07-24 20:28 GMT+02:00 david.w.smi...@gmail.com david.w.smi...@gmail.com: Hi Elisabeth, Sorry for not responding sooner; I forgot. You’re in need of some spatial nearest-neighbor code I wrote but it isn’t open-sourced yet. It works on the RPT grid. Any way, you should consider doing this in two searches: the first query tries the bbox provided, and if that returns nothing then issue a second for the closest within the a 1000km distance. The first query is straight-forward as documented. The second would be close to what you gave in your example but sort by distance and return rows=1. It will *not* compute the distance to every document, just those within the 1000km radius plus some grid internal grid squares *if* you use spatial RPT (“location_rpt” in the example schema). But use LatLonType for optimal sorting performance, not RPT. With respect to doing this in one search vs two, that would involve writing a custom request handler. I have a patch to make this easier: https://issues.apache.org/jira/browse/SOLR-5005. If in your case there are absolutely no other filters and it’s not a distributed search (no sharding), then you could approach this with a custom query parser that generates and executes one query to know if it should return that query or return the fallback. Please let me know how this goes. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Tue, Jul 22, 2014 at 3:12 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I am using solr 4.2.1. I have the following use case. I should find results inside bbox OR if there is none, first result outside bbox within a 1000 km distance. I was wondering what is the best way to proceed. I was considering doing a geofilt search from the center of my bounding box and post filtering results. fq={!geofilt sfield=store}pt=45.15,-93.85d=1000 From a performance point of view I don't think it's a good solution though, since solr will have to calculate every document distance, then sort. I was wondering if there was another way to do this and avoid sending more than one request to solr. Thanks, Elisabeth
Facing issue while implementing connection pooling with solr
0 down vote favorite I have this requirement where I want to limit the number of concurrent calls to solr say 50. So I am trying to implement connection pooling in HTTP client which is then used in solr object HttpSolrServer. Please find the code below HttpClient httpclient = new DefaultHttpClient(); httpclient.getParams().setParameter( HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 50); httpclient.getParams().setParameter( HttpClientUtil.PROP_MAX_CONNECTIONS, 50); HttpSolrServer httpSolrServer = new HttpSolrServer( solr url,httpclient); SolrQuery solrQuery = new SolrQuery(*:*); for (int i = 0; i 1; i++) { long numFound = httpSolrServer.query(solrQuery).getResults() .getNumFound(); System.out.println(numFound); }` I was expecting only 50 connections to be created from my application to solr and then probably experience some slowness until the older connections are freed. However at every regular interval a new connection is created despite there are waiting connections at solr end and those connections are never used again. Example Output tcp 0 0192.168.0.241:22192.168.0.109:54120 ESTABLISHED tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47382TIME_WAIT tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47383ESTABLISHED tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47371TIME_WAIT tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47381TIME_WAIT where 109 is the ip where I am running my application and 241 is ip where solr is run. In this case :192.168.0.109:47382 will never be used again and it is finally terminated by solr Am i going wrong somewhere. Any help will be highly appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/Facing-issue-while-implementing-connection-pooling-with-solr-tp4149176.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shuffle results a little
Query ReRanking is built on the RankQuery API. With the RankQuery API you can build and plugin your own ranking algorithms. Here's a blog describing the RankQuery API: http://heliosearch.org/solrs-new-rankquery-feature/ Joel Bernstein Search Engineer at Heliosearch On Fri, Jul 25, 2014 at 4:11 AM, babenis babe...@gmail.com wrote: from what i gather about reranking query is that it would further fine-pick results, rather than dispurse similarities, or am i looking at it the wrong way? -- View this message in context: http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any Solr consultants available??
Any or all of the above, and more. OTOH, how many people are there out there who want to become Solr consultants, but aren't already either doing it or at least already in the process of coming up to speed or maybe just not cut out for it? But then there are the kids in school. Maybe we need to get more professors interested in Solr (or do they prefer Elasticsearch?!) and assigning projects? And maybe the problem is that a lot of the need is in departments outside of CS, but Solr (the people with actual data needs) is just too... difficult... for a lot of non-CS students to casually pick up. I sense the difficulty is that Solr is too much of a complex toolkit rather than a packaged product. For example, the recent inquiry related to queries for compound and split terms - it's not automatic and OOB for Solr, and without an obvious and simple solution. Lots of things are like that in Solr. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, July 25, 2014 4:52 AM To: solr-user Subject: Re: Any Solr consultants available?? Well, if we do it in England, we could hire out a castle, I bet. :-) I am flexible on my holiday locations. And probably easier to do the first one in English. We can continue this on direct email, on the LinkedIn group (perfect place probably) and/or on the margins of the Solr Revolution. Target next spring/summer for the week-long event, work backwards from there. Talk to http://www.techstars.com/program/locations/london/ to specifically target the startups, etc Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Fri, Jul 25, 2014 at 3:17 PM, Charlie Hull char...@flax.co.uk wrote: On 24/07/2014 01:54, Alexandre Rafalovitch wrote: On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote: All the great Solr guys I know are quite busy. Sounds like an opportunity for somebody to put together a training hacker camp, similar to https://hackerbeach.org/ . Cross-train consultants in Solr, immediately increase their value. We're definitely interested in the idea of 'growing' more Solr consultants, and eventually committers. Beaches and mountains are good too :) I think the skill shortage is a huge problem for the open source search world. Charlie
Re: Java heap space error
On 7/24/2014 7:53 AM, Ameya Aware wrote: I did not make any other change than this.. rest of the settings are default. Do i need to set garbage collection strategy? The collector chosen and its and tuning params can have a massive impact on performance, but it will make no difference at all if you are getting OutOfMemoryError exceptions. This means the program is trying to allocate more memory than it has been told it can allocate. Changing the garbage collector will not change Java's response when the program wants to allocate too much memory. The odd location of the commas in the start of this thread make it hard to understand exactly what numbers you were trying to say, but I think you were saying that you were trying to index 20 documents and it died after indexing 15000. How big was the solr index before you started indexing, both in number of documents and disk space consumed? How are you doing the indexing? Is it being done with requests to the /update handler, or are you using the dataimport handler to import from somewhere, like a database? Is it a single index, or distributed? Are you running in normal mode or SolrCloud? Can you share your solrconfig.xml file so we can look for possible problems? I already gave you a wiki URL that gives possible reasons for needing a very large heap, and some things you can do to reduce the requirements. Thanks, Shawn
Re: To warm the whole cache of Solr other than the only autowarmcount
On 7/24/2014 8:45 PM, YouPeng Yang wrote: To Matt Thank you,your opinion is very valuable ,So I have checked the source codes about how the cache warming up. It seems to just put items of the old caches into the new caches. I will pull Mark Miller into this discussion.He is the one of the developer of the Solr whom I had contacted with. To Mark Miller Would you please check out what we are discussing in the last two posts.I need your help. Matt is completely right. Any commit can drastically change the Lucene document id numbers. It would be too expensive to determine which numbers haven't changed. That means Solr must throw away all cache information on commit. Two of Solr's caches support autowarming. Those caches use queries as keys and results as values. Autowarming works by re-executing the top N queries (keys) in the old cache to obtain fresh Lucene document id numbers (values). The cache code does take *keys* from the old cache for the new cache, but not *values*. I'm very sure about this, as I wrote the current (and not terribly good) LFUCache. Thanks, Shawn
Re: Data Import Handler - resource not found - Jetty - Windows 7
On 7/25/2014 1:06 AM, Yavar Husain wrote: Have most of experience working on Solr with Tomcat. However I recently started with Jetty. I am using Solr 4.7.0 on Windows 7. I have configured solr properly and am able to see the admin UI as well as velocity browse. Dataimporthandler screen is also getting displayed. However when I do a full import it fails with the following error: INFO - 2014-07-25 12:28:35.177; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/dataimport params={indent=truecommand=status_=1406271515176wt=json} status=0 QTime=0 ERROR - 2014-07-25 12:28:35.179; org.apache.solr.common.SolrException; java.io.IOException: Can't find resource 'C:/solr-4.7.0/example/solr/collection1/conf' in classpath or 'C:\solr-4.7.0\example\solr\collection1\conf' at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:342) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:134) In 4.7.0, line 134 of DataImportHandler.java is concerned with locating the config file for the dataimport handler. In the following excerpt from a solrconfig.xml file included with Solr, the config file is db-data-config.xml. What do you have for this in your solrconfig.xml? requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Thanks, Shawn
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
Thank you Uwe. Unfortunately, I could not get your explain solr website to work. I always get an error saying Ops. We have internal server error. This event was logged. We will try fix this soon. We are sorry for inconvenience. At this point, I know that I need to have some technical background to understanding how these numbers are calculated. However even with that, I am sure that the format of this output is not obvious. I am curious about the documentation of this output format. It seems to be unintelligible. If this is not documented anywhere, can someone point me to which class is doing this output. Thank you, O. O. an6 wrote Hi, to get an idea of the meaning of all this numbers, have a look on http://explain.solr.pl. I like this tool, it's great. Uwe -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
The format of the XML explain output is not indented or very readable. When I really need to see the explain indented, I use wt=rubyindent=true (I don’t think the indent parameter is relevant for the explain output, but I use it anyway) Erik On Jul 25, 2014, at 10:11 AM, O. Olson olson_...@yahoo.it wrote: Thank you Uwe. Unfortunately, I could not get your explain solr website to work. I always get an error saying Ops. We have internal server error. This event was logged. We will try fix this soon. We are sorry for inconvenience. At this point, I know that I need to have some technical background to understanding how these numbers are calculated. However even with that, I am sure that the format of this output is not obvious. I am curious about the documentation of this output format. It seems to be unintelligible. If this is not documented anywhere, can someone point me to which class is doing this output. Thank you, O. O. an6 wrote Hi, to get an idea of the meaning of all this numbers, have a look on http://explain.solr.pl. I like this tool, it's great. Uwe -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space error
On Jul 25, 2014, at 9:13 AM, Shawn Heisey s...@elyograg.org wrote: On 7/24/2014 7:53 AM, Ameya Aware wrote: The odd location of the commas in the start of this thread make it hard to understand exactly what numbers you were trying to say On Jul 24, 2014, at 9:32 AM, Ameya Aware ameya.aw...@gmail.com wrote: I am in process of indexing around 2,00,000 documents. 1 Lakh (aka Lac) = 10^5 is written as 1,00,000 It’s used in Bangladesh, India, Myanmar, Nepal, Pakistan, and Sri Lanka, roughly 1/4 of the world’s population. http://en.wikipedia.org/wiki/Lakh
Re: Slow inserts when using Solr Cloud
I've built and installed the latest snapshot of Solr 4.10 using the same SolrCloud configuration and that gave me a tenfold increase in throughput, so it certainly looks like SOLR-6136 was the issue that was causing my slow insert rate/high latency with shard routing and replicas. Thanks for your help. Timothy Potter wrote Hi Ian, What's the CPU doing on the leader? Have you tried attaching a profiler to the leader while running and then seeing if there are any hotspots showing. Not sure if this is related but we recently fixed an issue in the area of leader forwarding to replica that used too many CPU cycles inefficiently - see SOLR-6136. Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Full Import frozen after indexing a fixed number of records
I have Apache Solr,hosted on my apache Tomcat Server with SQLServer Backend. Details: *Solr Version:* Solr Specification Version: 3.4.0.2012.01.23.14.08.01 Solr Implementation Version: 3.4 Lucene Specification Version: 3.4 Lucene Implementation Version: 3.4 *Tomcat version:* Apache Tomcat/6.0.18 *OS details:* SUSE Linux Enterprise Server 11 (x86_64) After I run a full import,Indexing proceeds sucessfully,but seems to freeze everytime after fetching fixed number of records.What I mean is after it fetches 10730 records it just freezes and doesnt process any more. Excerpt from dataimport.xml: lst name=statusMessages str name=Time Elapsed0:15:31.959/str str name=Total Requests made to DataSource0/str str name=Total Rows Fetched*10730*/str str name=Total Documents Processed3579/str str name=Total Documents Skipped0/str str name=Full Dump Started2014-07-25 10:44:39/str This seems to happen everytime. I checked the tomcatlog.Following is the excerpt when Solr freezes: INFO: Generating record for Unique ID :null attachment Ref:null parent ref :nullexecuted by thread:25 Jul 25, 2014 10:53:31 AM org.apache.solr.update.processor.LogUpdateProcessor processAdd FINE: add AH_12345 Jul 25, 2014 10:53:31 AM org.apache.solr.handler.dataimport.DocBuilder$EntityRunner runAThread INFO: Generating record for Unique ID :null attachment Ref:null parent ref :nullexecuted by thread:26 Jul 25, 2014 10:53:31 AM org.apache.solr.update.processor.LogUpdateProcessor processAdd FINE: add AH_23451 Jul 25, 2014 10:53:34 AM org.apache.solr.core.SolrCore execute INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0 Jul 25, 2014 10:53:36 AM org.apache.solr.core.SolrCore execute INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0 Jul 25, 2014 10:53:38 AM org.apache.solr.core.SolrCore execute INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0 Help appreciated. Regards, Aniket
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
Thank you very much Erik. This is exactly what I was looking for. While at the moment I have no clue about these numbers, they ruby formatting makes it much more easier to understand. Thanks to you Koji. I'm sorry I did not acknowledge you before. I think Erik's solution is what I was looking for. O. O. Erik Hatcher-4 wrote The format of the XML explain output is not indented or very readable. When I really need to see the explain indented, I use wt=rubyindent=true (I don’t think the indent parameter is relevant for the explain output, but I use it anyway) Erik -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149226.html Sent from the Solr - User mailing list archive at Nabble.com.
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Hi, I am in process of indexing lot of documents but after around 9 documents i am getting below error: java.lang.OutOfMemoryError: Requested array size exceeds VM limit I am passing below parameters with Solr : java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6 -jar start.jar Also, i am Auto-committing after 2 documents. I searched on google for this but could not get any specific answer. Can anybody help with this? Thanks, Ameya
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Would you include the entire stack trace for your OOM message? Are you seeing this on the client or server side? Thanks, Greg On Jul 25, 2014, at 10:21 AM, Ameya Aware ameya.aw...@gmail.com wrote: Hi, I am in process of indexing lot of documents but after around 9 documents i am getting below error: java.lang.OutOfMemoryError: Requested array size exceeds VM limit I am passing below parameters with Solr : java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6 -jar start.jar Also, i am Auto-committing after 2 documents. I searched on google for this but could not get any specific answer. Can anybody help with this? Thanks, Ameya
RE: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
You might consider looking at your internal Solr cache configuration (solrconfig.xml). These caches occupy heap space, and from my understanding do not overflow to disk. So if there is not enough heap memory to support the caches an OOM error will be thrown. I also believe these caches live in Old Gen. So you might consider decreasing your CMSInitiatingOccupancyFraction to trigger a GC sooner. Based on your description below every 20,000 documents your caches will be invalidated and rebuilt as part of a commit. So a GC that occurs sooner may help free the memory of the old caches. Matt -Original Message- From: Ameya Aware [mailto:ameya.aw...@gmail.com] Sent: Friday, July 25, 2014 9:22 AM To: solr-user@lucene.apache.org Subject: java.lang.OutOfMemoryError: Requested array size exceeds VM limit Hi, I am in process of indexing lot of documents but after around 9 documents i am getting below error: java.lang.OutOfMemoryError: Requested array size exceeds VM limit I am passing below parameters with Solr : java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6 -jar start.jar Also, i am Auto-committing after 2 documents. I searched on google for this but could not get any specific answer. Can anybody help with this? Thanks, Ameya
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Please find below entire stack trace: ERROR - 2014-07-25 13:14:22.202; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:278) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at
Solr MoreLikeThis returns no match while the source document is in Solr
Hi, I issued MoreLikeThis query using a uniquekey of a source document, and I got no match as below (but I can select this document fine in Solr). ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst result name=match numFound=0 start=0 maxScore=0.0/ null name=response/ lst name=interestingTerms/ /response The query is like this: http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320 However, using select in stead of MLT, this document did return http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320 when I tried another uniquekey with almost the same document content, Solr returned match and similar jobs. http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321 When I tried another MLT query where there is empty value on the matching field, no similar jobs returned as expected, but nonetheless the match document is returned as expected. What could cause MLT query return result name=match numFound=0.. whereas we can select this document fine? Thanks! Daniel
Re: Solr MoreLikeThis returns no match while the source document is in Solr
Hi, These might help you: https://issues.apache.org/jira/browse/SOLR-4414 https://issues.apache.org/jira/browse/SOLR-5480 and https://issues.apache.org/jira/browse/SOLR-6248. On Fri, Jul 25, 2014 at 11:58 AM, Donglin Chen daniel.chen@gmail.com wrote: Hi, I issued MoreLikeThis query using a uniquekey of a source document, and I got no match as below (but I can select this document fine in Solr). ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst result name=match numFound=0 start=0 maxScore=0.0/ null name=response/ lst name=interestingTerms/ /response The query is like this: http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320 However, using select in stead of MLT, this document did return http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320 when I tried another uniquekey with almost the same document content, Solr returned match and similar jobs. http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321 When I tried another MLT query where there is empty value on the matching field, no similar jobs returned as expected, but nonetheless the match document is returned as expected. What could cause MLT query return result name=match numFound=0.. whereas we can select this document fine? Thanks! Daniel -- Anshum Gupta http://www.anshumgupta.net
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Using Tika to extract documents or content is something I don't have experience with but it looks like your issue is in that process. If you're able to reproduce this issue near the same place every time maybe you've got a document that has a lot of nested fields in it or otherwise causes the extractor/update processor to do something weird. Thanks, Greg On Jul 25, 2014, at 12:32 PM, Ameya Aware ameya.aw...@gmail.com wrote: Please find below entire stack trace: ERROR - 2014-07-25 13:14:22.202; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:278) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at
Re: Solr MoreLikeThis returns no match while the source document is in Solr
Thank you Anshum! The links helps. Daniel On Fri, Jul 25, 2014 at 3:07 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi, These might help you: https://issues.apache.org/jira/browse/SOLR-4414 https://issues.apache.org/jira/browse/SOLR-5480 and https://issues.apache.org/jira/browse/SOLR-6248. On Fri, Jul 25, 2014 at 11:58 AM, Donglin Chen daniel.chen@gmail.com wrote: Hi, I issued MoreLikeThis query using a uniquekey of a source document, and I got no match as below (but I can select this document fine in Solr). ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst result name=match numFound=0 start=0 maxScore=0.0/ null name=response/ lst name=interestingTerms/ /response The query is like this: http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320 However, using select in stead of MLT, this document did return http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320 when I tried another uniquekey with almost the same document content, Solr returned match and similar jobs. http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321 When I tried another MLT query where there is empty value on the matching field, no similar jobs returned as expected, but nonetheless the match document is returned as expected. What could cause MLT query return result name=match numFound=0.. whereas we can select this document fine? Thanks! Daniel -- Anshum Gupta http://www.anshumgupta.net
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
The formatting is one thing, but ultimately it is just a giant expression, one for each document. The expression is computing the score, based on your chosen or default similarity algorithm. All the terms in the expressions are detailed here: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Unless you dive into that math (not so bad, really, if you are motivated), the expressions are going to be rather opaque to you. The long floating point numbers are mostly just the intermediate (and final) calculations of the math described above. Try constructing a very simple collection of simple, contrived documents, like a short sentence in each, with some common terms, and then try simply queries to see how the expression term values change. Try computing TF, DF, IDF yourself (just count the terms by hand), and compare to what debug gives you. -- Jack Krupansky -Original Message- From: O. Olson Sent: Thursday, July 24, 2014 6:45 PM To: solr-user@lucene.apache.org Subject: Understanding the Debug explanations for Query Result Scoring/Ranking Hi, If you add /*debug=true*/ to the Solr request /(and wt=xml if your current output is not XML)/, you would get a node in the resulting XML that is named debug. There is a child node to this called explain to this which has a list showing why the results are ranked in a particular order. I'm curious if there is some documentation on understanding these numbers/results. I am new to Solr, so I apologize that I may be using the wrong terms to describe my problem. I also aware of http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html though I have not completely understood it. My problem is trying to understand something like this: 1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in 44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0 = termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of: 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226 = fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 = fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109) [DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 = termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 = fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of: 6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 = fieldNorm(doc=44109) *Note:* I have searched for televisions. My search field is a single catch-all field. The Edismax parser seems to break up my search term into televis and tv Is there some documentation on how to understand these numbers. They do not seem to be properly delimited. At the minimum, I can understand something like: 1.5797625 = 0.4717142 + 1.1080483 and 0.71447384 = 7.0424104 * 0.10145303 But, I cannot understand if something like 0.10145303 = queryNorm 0.660226 = fieldWeight in 44109 is used in the calculation anywhere. Also since there were only two terms /(televis and tv)/ I could use subtraction to find out 1.1080483 was the start of a new result. I'd also appreciate if someone can tell me which class dumps out the above data. If I know it, I can edit that class to make the output a bit more understandable for me. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Java heap space error
Steve Rowe [sar...@gmail.com] wrote: 1 Lakh (aka Lac) = 10^5 is written as 1,00,000 It’s used in Bangladesh, India, Myanmar, Nepal, Pakistan, and Sri Lanka, roughly 1/4 of the world’s population. Yet still it causes confusion and distracts from the issue. Let's just stick to metric, okay? - Toke Eskildsen
SOLR cloud creating multiple copies of the same index
Hi , we have a solr cloud instance with 8 nodes and 4 shards. We are starting to see that index size is growing so huge and when looked at the file system solr has created several copies of the index. However using solr admin, I could see its using only on the them. This is what I see in solr admin. Index: /opt/solr/collections/aq-collection/data/index.20140725024044234 Master (Searching) 1406320016969 Gen - 81553 size -58.72 GB. But when I go in to the file system , This is how it looks. 16G index.20140527220456134 45G index.20140630001131038 4.6G index.20140630090031282 20G index.20140703192128959 1.3G index.20140703200948410 31G index.20140708162308859 52G index.20140716165801658 59G index.20140725024044234 4K index.properties 4K replication.properties it is actually pointing only to the index.20140725024044234, and using that for searching and indexing. The timstamps on other indexes are old(about a month or so) Can some one explain me why it created so many copies of the index(we did not create them manually). and how it can be prevented. Our solr instances are running on solaris VMs -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-cloud-creating-multiple-copies-of-the-same-index-tp4149264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud extended warmup support
It¹s a command like this just prior to jetty startup: find -L solrhome dir -type f -exec cat {} /dev/null \; On 7/24/14, 2:11 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Jeff Wartes [jwar...@whitepages.com] wrote: Well, I¹m not sure what to say. I¹ve been observing a noticeable latency decrease over the first few thousand queries. How exactly do you get the index files fully cached? The cp-command will (at least for some systems) happily skip copying if the destination is /dev/null. One way is to ensure caching is to cat all the files to /dev/null. - Toke Eskildsen
Re: SOLR cloud creating multiple copies of the same index
Looks to me like you are, or were, hitting the replication handler¹s backup function: http://wiki.apache.org/solr/SolrReplication#HTTP_API ie, http://master_host:port/solr/replication?command=backup You might not have been doing it explicitly, there¹s some support for a backup being triggered when certain things happen: http://wiki.apache.org/solr/SolrReplication#Master On 7/25/14, 1:50 PM, pras.venkatesh prasann...@outlook.com wrote: Hi , we have a solr cloud instance with 8 nodes and 4 shards. We are starting to see that index size is growing so huge and when looked at the file system solr has created several copies of the index. However using solr admin, I could see its using only on the them. This is what I see in solr admin. Index: /opt/solr/collections/aq-collection/data/index.20140725024044234 Master (Searching) 1406320016969 Gen - 81553 size -58.72 GB. But when I go in to the file system , This is how it looks. 16G index.20140527220456134 45G index.20140630001131038 4.6G index.20140630090031282 20G index.20140703192128959 1.3G index.20140703200948410 31G index.20140708162308859 52G index.20140716165801658 59G index.20140725024044234 4K index.properties 4K replication.properties it is actually pointing only to the index.20140725024044234, and using that for searching and indexing. The timstamps on other indexes are old(about a month or so) Can some one explain me why it created so many copies of the index(we did not create them manually). and how it can be prevented. Our solr instances are running on solaris VMs -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-cloud-creating-multiple-copies-of- the-same-index-tp4149264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
: Thank you very much Erik. This is exactly what I was looking for. While at : the moment I have no clue about these numbers, they ruby formatting makes it : much more easier to understand. Just to be clear, regardless of *which* response writer you use (xml, ruby, json, etc...) the default behavior is to include the score explanation sa a single string which uses tabs/newlines to deal with the nested (this nesting is visible if you view the raw response, no matter what ResponseWriter) You can however add a param indicating that you want the explaantion information to be returned as a *structured data* instead o a simple string... https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured ...if you wnat to programatically process debug info, this is the recomended way to to so. -Hoss http://www.lucidworks.com/
Re: Any Solr consultants available??
On Fri, Jul 25, 2014 at 6:59 PM, Jack Krupansky j...@basetechnology.com wrote: OTOH, how many people are there out there who want to become Solr consultants, but aren't already either doing it or at least already in the process of coming up to speed or maybe just not cut out for it? Well, I would target two groups: *) Startups that just realized they need search *) People who want to become consultants and want speed track to that (already in the process can take quite a while). For Startups, I would do a weeklong version of what I did with my one-day Solr Masterclass. *) Bring your own data, we teach you very specific process of development-oriented setup (e.g. start from https://github.com/arafalov/simplest-solr-config/blob/master/simplest-solr/collection1/conf/schema.xml , teach rapid iterations, ways to affect data in Solr such as URP, Custom Search Components, etc). *) Then teach debugging. *) Then SolrCloud. *) Then maybe touch on BigData as many SAAS startups will hit that problem *) Then going into production. *) Then, send them out with a (paid-for and/or subscription) dedicated discussion group where the mentor would continue answering questions as they bubble up. etc. *) And more For consultants: *) you teach them to understand which problems Solr is good for *) you teach them how to explain Solr to others. *) Teach them (or build for them) great Solr demos. *) Give them unsolved-but-tractable project and assist them in making those happen (e.g. build a Solr-backed real solr-consultants website, testing Solr clients with latest Solr, testing upstream integration, creating Solr feature demos for 3rd party products that have Solr inside, etc) *) Build them environments to quickly test their ideas, skills, etc. *) Give them tools and tricks to quickly build online identity around Solr (blogging tips, link to their articles to build SEO, GitHub repos, etc) *) Build a network where consultants can pass work to each other based on geography *) Get preferential deals with commercial Solr components suppliers, so the consultants get things like UI components at reduced price or extended trials or whatever *) Dedicated discussion group *) If they are in the solr-consultants directory, charge them subscription fees but give them a dedicated discussion group where they can talk but also ask for particular features (e.g. better examples, demo repos, language support, deals, commonly useful components like the split/join filters, etc). Use those as projects to drive next batch of developers. *) Reach out to startup community and offer discounted/apprenticeship model to access those newly graduated consultants. *) Possibly provide things like USA corporation umbrella to bring - say - a Philipino consultant to USA/UK for 3 months to train and then let them go back home to establish the business. *) And, again, a lot more And, of course, gamify the whole lot wherever possible to drive the speed of adoption :-) Time is money. Many of the things above exist for Solr, but they are all over the web, often rotting after initial release due to lack of visibility, etc. Other things are missing documentation, etc. Many of the other things exist (e.g. consultant directories) but they are not Solr specific. Frankly, many of the things that do exist have terrible search, fixing that alone would be competitive beyond Solr. There is value in building a happy singing YCombinator-style path. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853