RE: Highlighting externally stored text
Just an update. Change was pretty straight forward (at least for my simple test case) just a few lines in the getBestFragments method seemed to do the trick. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081748.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Highlighting externally stored text
Hey Bryan, Thanks for the response! To make use of the FastVectorHighlighter you need to enable termVectors, termPositions, and termOffsets correct? Which takes a considerable amount of space, but is good to know and I may possibly pursue this solution as well. Just starting to look at the code now, do you remember how substantial the change was? Are there any other options? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081719.html Sent from the Solr - User mailing list archive at Nabble.com.
Luke's analysis of Trie Dates
I have a TrieDateField dynamic field setup in my schema, pretty standard... In my code I only set one field, "creation_tdt" and I round it to the nearest second before storing it. However when I analyze it with Luke I get: tdate IT--OF-- *_tdt (unstored field) 22404 -1 22404 22404 22404 22404 22404 22404 22404 16014 6390 1535 1459 1268 1193 1187 1152 1129 1089 ... So my questions is, where are all these entries coming from? They are not the dates I specified because they have millis, and my field isn't multivalued, so the term counts dont add up (how could I have more than 22404 terms if I only have 22404 documents). Why multiple "1970-01-01T00:00:00Z" entries? Is this somehow related to Trie fields and how they are indexed? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Luke-s-analysis-of-Trie-Dates-tp4078885.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting externally stored text
Does anyone know if Issue SOLR-1397 (It should be possible to highlight external text ) is actively being worked by chance? Looks like the last update was May 2012. https://issues.apache.org/jira/browse/SOLR-1397 I'm trying to find a way to best highlight search results even though those results are not stored in my index. Has anyone been successful in reusing the SOLR highlighting logic on non-stored data? Does anyone know if there any other third party libraries that can do this for me until 1397 is formally released? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Benefits of Solr over Lucene?
So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like "aw snap, it doesn't support geospatial" (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Any thoughts? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964p4040009.html Sent from the Solr - User mailing list archive at Nabble.com.
Benefits of Solr over Lucene?
I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says "Solr uses the Lucene search library and extends it!", but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com.
Propogating an accurate exceptions to the end user
Solr3.1 using SolrJ So I have a gui that allows folks to search my solr repository and I want to show appropriate errors when something bad happens, but my problem is that the Solr exception are not very pretty and sometimes are not very descriptive. For instance if I enter a bad query the message on the exception is "Error executing query" and if I do getCause().getMessage() it gives "Bad Request Bad Request request: http://1.2.3.4:1234/solr/"; This really doesn't help my user too much. Another example is if a master search server serves out a request to a bunch of shards I just get a Connection Refused error that doesn't specify which connection was refused. I can't image I am the first to run into this and was curious what others do? Do people just try to catch all common exceptions and print those pretty? What about exceptions that you don't test for? How about exceptions that don't really explain the real problem? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Propogating-an-accurate-exceptions-to-the-end-user-tp3091548p3091548.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hitting the URI limit, how to get around this?
Yep that was my issue. And like Ken said on Tomcat I set maxHttpHeaderSize="65536". -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hitting the URI limit, how to get around this?
So here's what I'm seeing: I'm running Solr 3.1 I'm running a java client that executes a Httpget (I tried HttpPost) with a large shard list. If I remove a few shards from my current list it returns fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request". If I execute it in firefox with a few shards removed it returns fine, with the full shard list I get a blank screen returned immediately. My URI works at around 7800 characters but adding one more shard to it blows up. Any ideas? I've tried using SolrJ rather than httpget before but ran into similar issues but with even less shards. See http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html My shards are added dynamically, every few hours I am adding new shards or cores into the cluster. so I cannot have a shard list in the config files unless I can somehow update them while the system is running. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Better to have lots of smaller cores or one really big core?
Thanks Erick for the response. So my data structure is the same, i.e. they all use the same schema. Though I think it makes sense for us to somehow break apart the data, for example by the date it was indexed. I'm just trying to get a feel for how large we should aim to keep those (by day, by week, by month, etc...). So it sounds like we should aim to keep them at a size that one solr server can host to avoid serving multiple cores. One question, there is no real difference (other than configuration) from a server hosting its own index vs. it hosting one core, is there? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html Sent from the Solr - User mailing list archive at Nabble.com.
Better to have lots of smaller cores or one really big core?
I am trying to decide what the right approach would be, to have one big core and many smaller cores hosted by a solr instance. I think there may be trade offs either way but wanted to see what others do. And by small I mean about 5-10 million documents, large may be 50 million. It seems like small cores are better because - If one server can host say 70 million documents (before memory issues) we can get really close with a bunch of small indexes, vs only being able to host one 50 million document index. And when a software update comes out that allows us to host 90 million then we could add a few more small indexes. - It takes less time to build ten 5 million document indexes than one 50 million document index. It seems like larger cores are better because - Each core returns their result set, so if I want 1000 results and their are 100 cores the network is transferring 10 documents for that search. Where if I had only 10 much larger cores only 1 documents would be sent over the network. - It would prolong my time until I hit uri length limits being that there would be less cores in my system. Any thoughts??? Other trade-offs??? How do you find what the right size for you is? -- View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html Sent from the Solr - User mailing list archive at Nabble.com.
Hitting the URI limit, how to get around this?
I have a master solr instance that I sent my request to, it hosts no documents it just farms the request out to a large number of shards. All the other solr instances that host the data contain multiple cores. Therefore my search string looks like "http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,..."; This shard list is pretty long and has finally hit "the limit". So my question is how to best avoid having to build such a long uri? Is there a way to have mutiple tiers, where the master server has a list of servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores that they host (nodeA hosts core01, core02, core03, ...)? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html Sent from the Solr - User mailing list archive at Nabble.com.
Long list of shards breaks solrj query
So I have a simple class that builds a SolrQuery and sets the "shards" param. I have a really long list of shards, over 250. My search seems to work until I get my shard list up to a certain length. As soon as I add one more shard I get: org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset by peer: socket write error org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request My class just looks like this: public static void main(String[] args) { try { SolrServer s = new CommonsHttpSolrServer("http://mynode:8080/solr";); SolrQuery q = new SolrQuery(); q.setQuery("test"); q.setHighlight(true); q.setRows(50); q.setStart(0); q.setParam("shards", "node1:1010/solr/core01,node1:1010/solr/core02,..."); } catch (Exception e) { e.printStackTrace(); } If I execute the same request in a browser it returns fine. One other question I had was even if I set the version to 2.2 the response has version=1. Is that normal? In a browser it returns version=2.2 though. -- View this message in context: http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-tp2748556p2748556.html Sent from the Solr - User mailing list archive at Nabble.com.
Architecture question about solr sharding
I have an issue and I'm wondering if there is an easy way around it with just SOLR. I have multiple SOLR servers and a field in my schema is a relative path to a binary file. Each SOLR server is responsible for a different subset of data that belongs to a different base path. For Example... My directory structure may look like this: /someDir/Jan/binaryfiles/... /someDir/Feb/binaryfiles/... /someDir/Mar/binaryfiles/... /someDir/Apr/binaryfiles/... Server1 is responsible for Jan, Server2 for Feb, etc... And a response document may have a field like this my entry binaryfiles/12345.bin How can I tell from my main search server which server returned a result? I cannot put the full path in the index because my path structure might change in the future. Using this example it may go to '/someDir/Jan2011/'. I basically need to find a way to say 'Ah! server01 returned this result, so it must be in /someDir/Jan' Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html Sent from the Solr - User mailing list archive at Nabble.com.
General questions about distributed solr shards
1) Is there any information on preferred maximum sizes for a single solr index. I've read some people say 10 million, some say 80 million, etc... Is there any official recommendation or has anyone experimented with large datasets into the tens of billions? 2) Is there any down side to running multiple solr shard instances on a single machine rather than one shard instance with a larger index per machine? I would think that having 5 instances with 1/5 the index would return results approx 5 times faster. 3) Say you have a solr configuration with multiple shards. If you attempt to query while one of the shards is down you will receive a HTTP 500 on the client due to a connection refused on the server. Is there a way to tell the server to ignore this and return as many results as possible? In other words if you have 100 shards, it is possible that occasionally a process may die, but I would still like to return results from the active shards. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/General-questions-about-distributed-solr-shards-tp1095117p1095117.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: Disable Solr Response Formatting
Thanks! I was looking for things to change in the solrconfig.xml file. indent=off -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Disable Solr Response Formatting
Oops, let me try that again... By default my SOLR response comes back formatted, like such Is there a way to tell it to return it unformatted? like: -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html Sent from the Solr - User mailing list archive at Nabble.com.
Disable Solr Response Formatting
By default my SOLR response comes back formatted, like such Is there a way to tell it to return it unformatted? like: -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933785.html Sent from the Solr - User mailing list archive at Nabble.com.
Can solr return pretty text as the content?
When I feed pretty text into solr for indexing from lucene and search for it, the content is always returned as one long line of text. Is there a way for solr to return the pretty formatted text to me? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917912.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does SOLR provide a java class to perform url-encoding
I was assuming that I needed to leave the special characters in the http get, but running the solr admin it looks like it converts them the same way that URLEncoder.encode does. What is the need to preserve special characters? http://localhost:8983/solr/select?indent=on&version=2.2&q=%22mr.+bill%22+oh+n%3F&fq=&start=0&rows=50&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p843177.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does SOLR provide a java class to perform url-encoding
Thanks Sean, that was exactly what I need. One question though... How to correctly retain the Solr specific characters. I tried adding escape chars but URLEncoder doesn't seem to care about that: Example: String s1 = "\"mr. bill\" oh n?"; String s2 = "\\\"mr. bill\\\" oh n\\?"; String encoded1 = URLEncoder.encode(s1, "UTF-8"); String encoded2 = URLEncoder.encode(s2, "UTF-8"); System.out.println(encoded1); System.out.println(encoded2); Output: %22mr.+bill%22+oh+n%3F %5C%22mr.+bill%5C%22+oh+n%5C%3F Should I allow the URLEncoder to translate s1, then replace %22 with ", %3F with ?, and so on? Or is there a better way? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842744.html Sent from the Solr - User mailing list archive at Nabble.com.
Does SOLR provide a java class to perform url-encoding
I would like to leverage on whatever SOLR provides to properly url-encode a search string. For example a user enters: "mr. bill" oh no The URL submitted by the admin page is: http://localhost:8983/solr/select?indent=on&version=2.2&q=%22mr.+bill%22+oh+no&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= Since the admin page uses it I would image that this functionality is there, but having some trouble finding it. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842660.html Sent from the Solr - User mailing list archive at Nabble.com.