solr admin result page error
Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological dimension of Mackey functors for infinite groups. Journal of the London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr See the \u utf8-code in the last line. 1. the loaded data is valid, well-formed and checked with xmllint. No errors. 2. there is no \u utf8-code in the source data. 3. the data is loaded via DIH without any errors. 4. if opening the source-view of the result page with firefox there is also no \u utf8-code. Only idea I have is solr itself or the result page generation. How to proceed, what else to check? Regards, Bernd
Re: Turn off caching
besides fieldCache, there is also a cache for termInfo. I don't know how to turn it off in both lucene and solr. codes in TermInfosReader /** Returns the TermInfo for a Term in the set, or null. */ TermInfo get(Term term) throws IOException { return get(term, true); } /** Returns the TermInfo for a Term in the set, or null. */ private TermInfo get(Term term, boolean useCache) throws IOException 2011/2/11 Stijn Vanhoorelbeke stijn.vanhoorelb...@gmail.com: Hi, You can comment out all sections in solrconfig.xml pointing to a cache. However, there is a cache deep in Lucence - the fieldcache - that can't be commented out. This cache will always jump into the picture If I need to do such things, I restart the whole tomcat6 server to flush ALL caches. 2011/2/11 Li Li fancye...@gmail.com do you mean queryResultCache? you can comment related paragraph in solrconfig.xml see http://wiki.apache.org/solr/SolrCaching 2011/2/8 Isan Fulia isan.fu...@germinait.com: Hi, My solrConfig file looks like config updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter /queryResponseWriter !--config for the admin interface -- admin defaultQuery*:*/defaultQuery /admin /config EveryTime I fire the same query so as to compare the results for different configurations , the query result time is getting reduced because of caching. So I want to turn off the cahing or clear the ache before i fire the same query . Does anyone know how to do it. -- Thanks Regards, Isan Fulia.
Question about QueryParser and StandardAnalyzer
Hi All I am writing a custom QueryParserPlugin for solr to fulfill a specific requirement. Now when I Build query object, I need to feed that object with terms, for that I get analyzer from the request as Analyzer analyzer = req.getSchema().getField(TextField).getType().getAnalyzer() now in schema TextField has type text that is configured as fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory luceneMatchVersion=LUCENE_29/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype now when i get TokenStream TolenStream ts=analyzer.tokenStream(TextField,new StringReader(ri*)); it simply removes wild cards from the term, analyzer has the same behavior even if I escape wild card as TolenStream ts=analyzer.tokenStream(TextField,new StringReader(ri\\*)); Please suggest Regards Ahsan Iqbal
Re: solr admin result page error
Results so far. I could locate and isolate the document causing trouble. I've checked the document with xmllint again. It is valid, well-formed utf8. I've loaded the single document and get the XML error if displaying the search result. This is through solr admin search and also JSON interface, probably other interfaces also. Next step is to use debugger and see what goes wrong. One thing I can already say is that it is utf8-code F0 9D 94 90 (U+1D510) which makes the problem (Mathematical Fraktur Capital M). Any already known issues about that? Regards, Bernd Am 11.02.2011 08:59, schrieb Bernd Fehling: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological dimension of Mackey functors for infinite groups. Journal of the London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr See the \u utf8-code in the last line. 1. the loaded data is valid, well-formed and checked with xmllint. No errors. 2. there is no \u utf8-code in the source data. 3. the data is loaded via DIH without any errors. 4. if opening the source-view of the result page with firefox there is also no \u utf8-code. Only idea I have is solr itself or the result page generation. How to proceed, what else to check? Regards, Bernd
Re: solr admin result page error
It looks like you hit the same issue as i did a while ago: http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html On Friday 11 February 2011 08:59:27 Bernd Fehling wrote: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological dimension of Mackey functors for infinite groups. Journal of the London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr See the \u utf8-code in the last line. 1. the loaded data is valid, well-formed and checked with xmllint. No errors. 2. there is no \u utf8-code in the source data. 3. the data is loaded via DIH without any errors. 4. if opening the source-view of the result page with firefox there is also no \u utf8-code. Only idea I have is solr itself or the result page generation. How to proceed, what else to check? Regards, Bernd -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Turn off caching
I don't think there is option to disable cache in solrconfig.xml in Solr 1.4..You need to modify/change code at time of creating SolrIndexSearcher instance in class SorlCore. Thanks, Jagdish -Original Message- From: Isan Fulia [mailto:isan.fu...@germinait.com] Sent: Tuesday, February 08, 2011 5:02 PM To: solr-user@lucene.apache.org Subject: Turn off caching Hi, My solrConfig file looks like config updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter /queryResponseWriter !--config for the admin interface -- admin defaultQuery*:*/defaultQuery /admin /config EveryTime I fire the same query so as to compare the results for different configurations , the query result time is getting reduced because of caching. So I want to turn off the cahing or clear the ache before i fire the same query . Does anyone know how to do it. -- Thanks Regards, Isan Fulia.
Any contribs available for Range field type?
I have a huge need for a new field type. It would be a Poly field, similar to Point or Payload. It would take 2 data elements and a search would return a hit if the search term fell within the range of the elements. For example let's say I have a document representing an Employment record. I may want to create a field for years_of_service where it would take values 1999,2004. Then in a query q=years_of_service:2001 would be a hit, q=years_of_service:2010 would not. The field would need to take a data type attribute as a parameter. I may need to do integer ranges, float/double ranges, date ranges. I don't see the need now, but heck maybe even a string range. This would be useful for things like Event dates. An event often occurs between several days (or hours) but the query is something like what events are happening today. If I did q=event_date:NOW (or similar) it should hit all documents where event_date has a range that in inclusive of today. Another example would be product category document. A specific automobile may have a fixed price, but a category of auto (2010 BMW 3-series for example) would have a price range. I hope you get the point. My question (finally) is, does anyone know of an existing contribution to the public domain that already does this? I'm more of a .Net/C# developer than a Java developer. I know my way around Java, but don't really have the right tools to build/test/etc. So was hoping to borrow rather than build if I could. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr suggestions
Good Morning, I have implemented Solr 1.4.1 in our UAT environment and I get weird suggestions for any misspellings. For instance when I search for cabinet award winders as opposed to cabinet award winners, I get a suggestion of cabinet abarc pindeks http://nextgen-uat.sdc.vzwcorp.com/search/apachesolr_search/cabinet%20a barc%20pindeks . How can I get more meaningful suggestions? Any help is greatly appreciated. Thanks, Sai Thumuluri
Re: Alternative to Solrj
Sounds like you just described the VelocityResponseWriter. On trunk (or 3.x I believe), try out http://localhost:8983/solr/browse and look at what makes that tick. Erik On Feb 11, 2011, at 08:40 , McGibbney, Lewis John wrote: Hi list, I have been looking at an alternative UI config displaying retrieved results from Solr after a query has been passed. At this point, I am not interested in Solrj as all I wish to change is the default responseWriter (line 1007 of Solrconfig). I've also noticed a snippet of default CSS code included in /conf/xslt/example.xsl and understand that all response writers are located in $SOLR_HOME/src/java/org/apache/solr/request and that the default is XSLTResponseWriter.java. Basically I wish to keep code for the search UI as simple as possible (ideally write a simple JSP and CSS ), however I now find that this configuration is proving slightly more confusing in practice. My thinking is as follows, write own responseWriter, include within it my CSS template then specify the responseWriter in solrconfig along with the java class. Can anyone advise me on this from their own experiences. Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: solr admin result page error
Hi Markus, yes it looks like the same issue. There is also a \u utf8-code in your dump. Till now I followed it into XMLResponseWriter. Some steps before the result in a buffer looks good and the utf8-code is correct. Really hard to debug this freaky problem. Have you looked deeper into this and located the bug? It is definately a bug and has nothing to do with firefox. Regards, Bernd Am 11.02.2011 13:48, schrieb Markus Jelsma: It looks like you hit the same issue as i did a while ago: http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html On Friday 11 February 2011 08:59:27 Bernd Fehling wrote: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological dimension of Mackey functors for infinite groups. Journal of the London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr See the \u utf8-code in the last line. 1. the loaded data is valid, well-formed and checked with xmllint. No errors. 2. there is no \u utf8-code in the source data. 3. the data is loaded via DIH without any errors. 4. if opening the source-view of the result page with firefox there is also no \u utf8-code. Only idea I have is solr itself or the result page generation. How to proceed, what else to check? Regards, Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Any contribs available for Range field type?
You can solve this as is by having a start and end field, rather than a range field. Then add a clause like +(+start:[* TO target] +end:[target TO *]) Best Erick On Fri, Feb 11, 2011 at 8:49 AM, kenf_nc ken.fos...@realestate.com wrote: I have a huge need for a new field type. It would be a Poly field, similar to Point or Payload. It would take 2 data elements and a search would return a hit if the search term fell within the range of the elements. For example let's say I have a document representing an Employment record. I may want to create a field for years_of_service where it would take values 1999,2004. Then in a query q=years_of_service:2001 would be a hit, q=years_of_service:2010 would not. The field would need to take a data type attribute as a parameter. I may need to do integer ranges, float/double ranges, date ranges. I don't see the need now, but heck maybe even a string range. This would be useful for things like Event dates. An event often occurs between several days (or hours) but the query is something like what events are happening today. If I did q=event_date:NOW (or similar) it should hit all documents where event_date has a range that in inclusive of today. Another example would be product category document. A specific automobile may have a fixed price, but a category of auto (2010 BMW 3-series for example) would have a price range. I hope you get the point. My question (finally) is, does anyone know of an existing contribution to the public domain that already does this? I'm more of a .Net/C# developer than a Java developer. I know my way around Java, but don't really have the right tools to build/test/etc. So was hoping to borrow rather than build if I could. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr suggestions
Well, you have to tell us how you're accessing the info and what's in your index. Please include the relevant schema file definitions and the calls you're making to get spelling suggestions. Best Erick On Fri, Feb 11, 2011 at 8:55 AM, Thumuluri, Sai sai.thumul...@verizonwireless.com wrote: Good Morning, I have implemented Solr 1.4.1 in our UAT environment and I get weird suggestions for any misspellings. For instance when I search for cabinet award winders as opposed to cabinet award winners, I get a suggestion of cabinet abarc pindeks http://nextgen-uat.sdc.vzwcorp.com/search/apachesolr_search/cabinet%20a barc%20pindeks . How can I get more meaningful suggestions? Any help is greatly appreciated. Thanks, Sai Thumuluri
Re: Any contribs available for Range field type?
True. And that's my temporary solution. But it's ugly code, even uglier queries. I may have several such fields in a single query. A PolyField solution would be so much more elegant and useful. I'm actually shocked more people don't need/want something like it. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2474055.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr suggestions
Please let me know if there is any other information that could help. My request handler config is - requestHandler name=edismax class=solr.SearchHandler - lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str /lst /requestHandler - !-- Note how you can register the same handler multiple times with different names (and different init parameters) -- - requestHandler name=partitioned class=solr.SearchHandler default=true - lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfbody^1.0 title^20.0 ts_vid_9_names^10.0 ts_vid_10_names^10.0 name^3.0 taxonomy_names^2.0 tags_h1^5.0 tags_h2_h3^3.0 tags_h4_h5_h6^2.0 tags_inline^1.0/str str name=pfbody/str int name=ps2/int str name=mm3/str str name=q.alt*:*/str - !-- example highlighter config, enable per-query with hl=true -- str name=hltrue/str str name=hl.flbody/str int name=hl.snippets3/int str name=hl.mergeContiguoustrue/str - !-- instructs Solr to return the field itself if no query terms are found -- str name=f.body.hl.alternateFieldbody/str str name=f.body.hl.maxAlternateFieldLength256/str - !-- JS: I wasn't getting good results here... I'm turning off for now because I was getting periods (.) by themselves at the begining of snippets and don't feel like deubgging anymore. Without the regex is faster too str name=spellcheckfalse/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst - arr name=last-components strspellcheck/str strelevator/str /arr /requestHandler My field definitions are - !-- The document id is derived from a site-spcific key (hash) and the node ID like: $document-id = $hash . '/node/' . $node-nid; -- field name=id type=string indexed=true stored=true required=true / field name=site type=string indexed=false stored=true / field name=hash type=string indexed=true stored=true / field name=url type=string indexed=false stored=true / field name=title type=text indexed=true stored=true termVectors=true omitNorms=true / field name=body type=text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / field name=comments type=text indexed=false stored=true / field name=type type=string indexed=true stored=true / field name=type_name type=string indexed=true stored=true / field name=path type=string indexed=false stored=true multiValued=false / field name=path_alias type=text indexed=true stored=true termVectors=true / field name=uid type=integer indexed=false stored=true / field name=name type=text indexed=true stored=true termVectors=true / field name=sname type=string indexed=true stored=false / field name=sort_name type=sortString indexed=true stored=false / field name=created type=date indexed=true stored=true / field name=changed type=date indexed=true stored=true / field name=comment_count type=integer indexed=true stored=true / field name=tid type=integer indexed=true stored=true multiValued=true / field name=vid type=integer indexed=true stored=true multiValued=true / field name=taxonomy_names type=text indexed=true stored=false termVectors=true multiValued=true omitNorms=true / field name=app type=string indexed=true stored=true multiValued=true / field name=cat type=string indexed=true stored=true multiValued=true / field name=area type=string indexed=true stored=true multiValued=true / field name=region type=string indexed=true stored=true multiValued=true / field name=permalink type=string indexed=true stored=true / field name=categories type=string indexed=true stored=true multiValued=true / field name=categoriessrch type=text_lws indexed=true stored=false multiValued=true / field name=tags type=string indexed=true stored=true multiValued=true / field name=tagssrch type=text_lws indexed=true stored=false multiValued=true / field name=author type=string indexed=true stored=true / field name=text type=text indexed=true stored=false multiValued=true / field name=numcomments type=integer indexed=true stored=true / field name=tags_h1 type=text indexed=true stored=false omitNorms=true / field name=tags_h2_h3 type=text indexed=true stored=false omitNorms=true / field name=tags_h4_h5_h6 type=text indexed=true stored=false omitNorms=true / field name=tags_a type=text indexed=true stored=false omitNorms=true / field name=ts_vid_9_names type=text indexed=true stored=true OmitNorms=true multiValued=true / field name=ts_vid_10_names type=text indexed=true stored=true OmitNorms=true multiValued=true / field name=tags_inline type=text indexed=true stored=false omitNorms=true / field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false / field name=prefix1 type=prefix_full
Solr design decisions
Hello all, I have just finished to book Solr 1.4 Enterprise Search Server. I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps. Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system. I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks Greg
Re: Solr design decisions
Your users will have to accept some latency between changed permissions and those permissions being reflected in the results. The length of that latency is determined by two things: 1 the interval between when you send the change to Solr (i.e. re-index the doc) and issue a commit AND 2 the time it takes the Solr instance to propgate that change. Now, for 2 if you have a master/slave setup, the slave's polling interval must pass before it pulls the changes down. Then there's the warmup time that passes between the time the changes are made (master and/or slave) and the time the new searcher uses the newly-warmed searcher. Here's the problem; When a change is committed to an index (we're skipping the master/slave issue for now), any autowarming takes, say, time T. If you commit too frequently (some time less than T), then the *first* autowarm process isn't yet done when the *second* starts. And if you keep committing pathologically quickly then you start a death spiral. So the batching/not batching is less of a problem than the death spiral. Batch changes are more efficient, but that speedup is probably less noticeable than the propagation delays. All that said, it's not unreasonable to expect, say, a 5 minute delay between the changes and when they're reflected in new searches, so I'd start with some reasonable number, monitor the warmup times and reduce the commit interval as appropriate NOTE: if you have a master/slave setup, and your master isn't used to search, you can control this by the polling interval on the slave and commit more frequently on the master since it doesn't need to warm searchers. Finally, there is work being done for NRT (Near Real Time) searching that may be of interest to you, search for NRT in JIRA if you're interested. Best Erick On Fri, Feb 11, 2011 at 10:22 AM, Greg Georges greg.geor...@biztree.com wrote: Hello all, I have just finished to book Solr 1.4 Enterprise Search Server. I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps. Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system. I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks Greg
Title index to wiki
pI think it would be an improvement to the wikis if the link to the title index were at the top of the index page of the wikis :-) I looked on that index page amp; did not see that link on that page./p pWho's got write access to wikis pages?brbrbr/p pSent from Yahoo! Mail on Android/p
Re: Title index to wiki
What do you mean, there are two links to the Frontpage on each page. On Friday 11 February 2011 16:56:41 Dennis Gearon wrote: pI think it would be an improvement to the wikis if the link to the title index were at the top of the index page of the wikis :-) I looked on that index page amp; did not see that link on that page./p pWho's got write access to wikis pages?brbrbr/p pSent from Yahoo! Mail on Android/p -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr design decisions
You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway. Bill Bell Sent from mobile On Feb 11, 2011, at 8:22 AM, Greg Georges greg.geor...@biztree.com wrote: Hello all, I have just finished to book Solr 1.4 Enterprise Search Server. I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps. Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system. I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks Greg
Difference between Solr and Lucidworks distribution
Hello all, I just started watching the webinars from Lucidworks, and they mention their distribution which has an installer, etc.. Is there any other differences? Is it a good idea to use this free distribution? Greg
Re: Difference between Solr and Lucidworks distribution
It is not free for production environments. http://www.lucidimagination.com/lwe/subscriptions-and-pricing On Friday 11 February 2011 17:31:22 Greg Georges wrote: Hello all, I just started watching the webinars from Lucidworks, and they mention their distribution which has an installer, etc.. Is there any other differences? Is it a good idea to use this free distribution? Greg -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Alternative to Solrj
Hi Erik, This sounds much more like it. I have had a look at the wiki and it sounds like a logical approach to UI customisation. Thank you for this From: Erik Hatcher [erik.hatc...@gmail.com] Sent: 11 February 2011 14:12 To: solr-user@lucene.apache.org Subject: Re: Alternative to Solrj Sounds like you just described the VelocityResponseWriter. On trunk (or 3.x I believe), try out http://localhost:8983/solr/browse and look at what makes that tick. Erik On Feb 11, 2011, at 08:40 , McGibbney, Lewis John wrote: Hi list, I have been looking at an alternative UI config displaying retrieved results from Solr after a query has been passed. At this point, I am not interested in Solrj as all I wish to change is the default responseWriter (line 1007 of Solrconfig). I've also noticed a snippet of default CSS code included in /conf/xslt/example.xsl and understand that all response writers are located in $SOLR_HOME/src/java/org/apache/solr/request and that the default is XSLTResponseWriter.java. Basically I wish to keep code for the search UI as simple as possible (ideally write a simple JSP and CSS ), however I now find that this configuration is proving slightly more confusing in practice. My thinking is as follows, write own responseWriter, include within it my CSS template then specify the responseWriter in solrconfig along with the java class. Can anyone advise me on this from their own experiences. Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Solr design decisions
On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell billnb...@gmail.com wrote: You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway. It does do something! A new searcher is opened and caches are invalidated, etc. I'd recommend normally using commitWithin instead of explicitly committing or using autocommit. -Yonik http://lucidimagination.com
Fatal error when posting to Solr
Hi list, Was attempting to check out the VelocityResponseWriter before I progress with customising it for my own usage, I seem to have opened a can of worms when posting documents to Solr. Using simple post command I get the following output. lewis@lewis-01:~/Downloads/apache-solr-1.4.1/example/exampledocs$ java -jar post.jar *.pdf SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file technical_handbook_2010_domestic_section_0_general.pdf SimplePostTool: FATAL: Solr returned an error: Unexpected_character__code_37_in_prolog_expected___at_rowcol_unknownsource_11 In some projects (E.g. Nutch) I am aware that the distribution does not come with alll jar's and these are required to be downloaded separately, I know this is not the case with Solr though. I have also successfully committed a host of .pdf to Solr recently so I know that this is working fine. Checking my Solr logs nothing seems to be out of place! Has anyone seen anything similar? Thanks Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Faceting Query
On Thu, Feb 10, 2011 at 12:00 PM, Isha Garg isha.g...@orkash.com wrote: Hi, What is the significance of copy field when used in faceting . plz explain with example. Not sure what you mean here. Could you provide details? Regards, Gora
Re: Faceting Query
On Thu, Feb 10, 2011 at 12:21 PM, Isha Garg isha.g...@orkash.com wrote: What is facet.pivot field? PLz explain with example Does http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot not help? Regards, Gora
Re: Fatal error when posting to Solr
That's an incorrect way to POST PDF files (though maybe the latest work on post.jar makes it possible, but would require additional parameters). In order to index PDF files, you'll need to script an iteration over all files and POST them in (or stream them however is most reasonable for your environment) using the techniques described here: http://wiki.apache.org/solr/ExtractingRequestHandler Erik On Feb 11, 2011, at 12:26 , McGibbney, Lewis John wrote: Hi list, Was attempting to check out the VelocityResponseWriter before I progress with customising it for my own usage, I seem to have opened a can of worms when posting documents to Solr. Using simple post command I get the following output. lewis@lewis-01:~/Downloads/apache-solr-1.4.1/example/exampledocs$ java -jar post.jar *.pdf SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file technical_handbook_2010_domestic_section_0_general.pdf SimplePostTool: FATAL: Solr returned an error: Unexpected_character__code_37_in_prolog_expected___at_rowcol_unknownsource_11 In some projects (E.g. Nutch) I am aware that the distribution does not come with alll jar's and these are required to be downloaded separately, I know this is not the case with Solr though. I have also successfully committed a host of .pdf to Solr recently so I know that this is working fine. Checking my Solr logs nothing seems to be out of place! Has anyone seen anything similar? Thanks Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Monitor the QTime.
On Fri, Feb 11, 2011 at 3:40 AM, Stijn Vanhoorelbeke stijn.vanhoorelb...@gmail.com wrote: Hi, Is it possible to monitor the QTime of the queries. I know I could enable logging - but then all of my requests are logged, making bignasty logs. I just want to log the QTime periodically, lets say once every minute. Is this possible using Solr or can this be set up in tomcat anyway? QTime is, of course, specific to the query, but it is returned in the response XML, so one could run occasional queries to figure it out. Please see http://wiki.apache.org/solr/SearchHandler Regards, Gora
solr1.4 replication question
Hi, I am fairly new to solr, and have setup two servers, one with master, other as a slave. I have a load balancer in front with 2 different VIP, one to do gets/reads distributed evenly on the master and slave, and another VIP to do posts/updates just to the master. If the master fails I have the second VIP to automatically update the slave. But if that happens is there a way to automatically switch whcih is master and which is slave instead of going into solrconfig.xml and then restarting the instances? Any recommendations for the best way to set it up? Thanks
help with dismax query
I'm having a problem using the dismax query for the term obsessed with winning http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title ^10+description^4+text^1debugQuery=true that query yields zero results, but removing the dismax stuff it works fine: http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true or even adding the mm=2 yields results, but mm=3 does not. There is at least 1 record which contains the exact phrase 'obsessed with winning' in the title as well as the description multiple times, yet when the defType=dismax option is added, the query yields no results. Am I missing something? Thanks in advance.
Re: help with dismax query
Might with be a stop word removed by one of those qf fields? That'd explain why mm=3 doesn't work, I think. Erik On Feb 11, 2011, at 15:43 , Tanner Postert wrote: I'm having a problem using the dismax query for the term obsessed with winning http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title ^10+description^4+text^1debugQuery=true that query yields zero results, but removing the dismax stuff it works fine: http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true or even adding the mm=2 yields results, but mm=3 does not. There is at least 1 record which contains the exact phrase 'obsessed with winning' in the title as well as the description multiple times, yet when the defType=dismax option is added, the query yields no results. Am I missing something? Thanks in advance.
help with dismax query
I'm having a problem using the dismax query. For example: for the term obsessed with winning I use: http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title ^10+description^4+text^1debugQuery=true that query yields zero results, but removing the dismax stuff it works fine: http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true or even adding the mm=2 yields results, but mm=3 does not. Looking at the discussion here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg29682.html I see that possibly sending the qf fields as separate values, rather than one may yield better results but searching for: http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title ^10qf=description^4qf=text^1debugQuery=true yields no results either There is at least 1 record which contains the exact phrase 'obsessed with winning' in the title as well as the description and text (text is just a copied field of title and description and couple of other fields). multiple times, yet when the defType=dismax option is added, the query yields no results. Am I missing something? Thanks in advance.
boosting results by a query?
I have an odd need, and want to make sure I am not reinventing a wheel... Similar to the QueryElevationComponent, I need to be able to move documents to the top of a list that match a given query. If there were no sort, then this could be implemented easily with BooleanQuery (i think) but with sort it gets more complicated. Seems like I need: sortSpec.setSort( new Sort( new SortField[] { new SortField( something that only sorts results in the boost query ), new SortField( the regular sort ) })); Is there an existing FieldComparator I should look at? Any other pointers/ideas? Thanks ryan
Re: Solr design decisions
Thanks. If you do 2 commits should it do anything? Are people using it to clear caches? Bill Bell Sent from mobile On Feb 11, 2011, at 9:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell billnb...@gmail.com wrote: You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway. It does do something! A new searcher is opened and caches are invalidated, etc. I'd recommend normally using commitWithin instead of explicitly committing or using autocommit. -Yonik http://lucidimagination.com
Re: boosting results by a query?
We are currently a Lucene shop, the way we do it (currently) is to have these results come from a database table (where it is available in rank order). We want to move to Solr, so what I plan on doing to replicate this functionality is to write a custom request handler that will do the database query and put the results on the top of the search results before the SolrIndexSearcher is invoked. -sujit On Fri, 2011-02-11 at 16:31 -0500, Ryan McKinley wrote: I have an odd need, and want to make sure I am not reinventing a wheel... Similar to the QueryElevationComponent, I need to be able to move documents to the top of a list that match a given query. If there were no sort, then this could be implemented easily with BooleanQuery (i think) but with sort it gets more complicated. Seems like I need: sortSpec.setSort( new Sort( new SortField[] { new SortField( something that only sorts results in the boost query ), new SortField( the regular sort ) })); Is there an existing FieldComparator I should look at? Any other pointers/ideas? Thanks ryan
Re: help with dismax query
looks like that might be the case, if I just do a search for with including the dismax parameters, it returns no results, as opposed to a search for 'obsessed' does return results. Is there any way I can get around this behavior? or do I have something configured wrong? Might with be a stop word removed by one of those qf fields? That'd explain why mm=3 doesn't work, I think. Erik
Re: Monitor the QTime.
QTime is, of course, specific to the query, but it is returned in the response XML, so one could run occasional queries to figure it out. Please see http://wiki.apache.org/solr/SearchHandler Regards, Gora Yes, this could be a possibility. But then the Solr cache jumps back into the picture. I cannot simply query the system each minute with the same query - that way the result would be completely satisfied by the internal caches. I could build a list of heavy queries to do so - but I'd loved to use a more straight forward method.
Re: help with dismax query
I think I found the answer here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg04433.html http://www.mail-archive.com/solr-user@lucene.apache.org/msg04433.htmlI think the title and description fields did not have the stopword filter applied to it, so it was causing an error. When I took off the qf=title qf=description fields the results works. I am rebuilding my indexes now. On Fri, Feb 11, 2011 at 3:20 PM, Tanner Postert tanner.post...@gmail.comwrote: looks like that might be the case, if I just do a search for with including the dismax parameters, it returns no results, as opposed to a search for 'obsessed' does return results. Is there any way I can get around this behavior? or do I have something configured wrong? Might with be a stop word removed by one of those qf fields? That'd explain why mm=3 doesn't work, I think. Erik
Re: Monitor the QTime.
2011/2/11 Ryan McKinley ryan...@gmail.com You may want to check the stats via JMX. For example, http://localhost:8983/solr/core/admin/mbeans?stats=truekey=org.apache.solr.handler.StandardRequestHandler shows some basic stats info for the handler. ryan Can you access this URL from a web browser (tried but doesn't work ) ? Or must this used in jConsole / custom made java program. Could you please point me to a good guide to implement this JMX stuff, cause I'm a newbie for JMX.
more like this
Hi a MLT query with a q parameter which returns multiple matches such as q=id:45 id:34 id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name seems to return the results of three seperate mlt queries ie q=id:45 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name + q=id:34 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name + q=id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name rather than a combined similarity of all three Is this becuase field1 is not storing term verctors ? How best to achive a combined similarity mlt ?
Detailed Steps for Scaling Solr
Dear all, I need to construct a site which supports searching for a large index. I think scaling Solr is required. However, I didn't get a tutorial which helps me do that step by step. I only have two resources as references. But both of them do not tell me the exact operations. 1) http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr 2) David Smiley, Eric Pugh; Solr 1.4 Enterprise Search Server If you have experiences to scale Solr, could you give me such tutorials? Thanks so much! LB