Fwd: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities
Hi folks, I am currently migrating our Solr servers from a 4.0.0 nightly build (aprox. November 2011, which worked very well) to the newly released 4.0.0 and am running into some issues concerning the existing DataImportHandler configuratiions. Maybe you have an idea where I am going wrong here. The following lines are a highly simplified excerpt from one of the problematic imports: entity name=path rootEntity=false query=SELECT p.id, IF(p.name IS NULL, '', p.name) AS name FROM path p GROUP BY p.id entity name=item rootEntity=true query= SELECT i.*, CONVERT('${dataimporter.functions.escapeSql(path.name)}' USING utf8) AS path_name FROM items i WHERE i.path_id = ${path.id} / /entity While this configuration worked without any problem for over half a year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import throws the followeing Stacktrace and exits: SEVERE: Exception while processing: path document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException which is caused by Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79) In other words: The EvaluatorBag doesn't seem to resolve the given path.name variable properly and returns null. Does anyone have any idea? Appreciate your input! Regards Dom
Re: Solr 4.0 Master slave configuration in JBOSS 5.1.2
Can you please share some information on Setting up Solr 4.0 as a singleCore. I tried doing it and keep seeing ClassNotFound Exception for KeywordTokenizerFactory. on server start up. I see the jar files being loaded in the logs but its unable to find the class. Can you let me know what jars reside in your Solr Home lib folder? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Master-slave-configuration-in-JBOSS-5-1-2-tp3988375p4014683.html Sent from the Solr - User mailing list archive at Nabble.com.
Data Writing Performance of Solr 4.0
Hello everyone. I have two questions. I am considering using Solr 4.0 to perform full searches on the data output in real-time by a Storm cluster (http://storm-project.net/). 1. In particular, I'm concerned whether Solr would be able to keep up with the 2000-message-per-second throughput of the Storm cluster. What kind of throughput would I be able to expect from Solr 4.0, for example on a Xeon 2.5GHz 4-core with HDD? 2. Also, how efficiently would Solr scale with clustering? Any pertinent information would be greatly appreciated. Hideki Higashihara
Re: Even after indexing a mysql table,in solr am not able to retreive data after querying
On 19 October 2012 12:07, Romita Saha romita.s...@sg.panasonic.com wrote: [...] My data-config file is : entity name=camera query=SELECT id FROM camera field column=id name=id/ field column=data name=data/ /entity The related schema.xml file is : field name=id type=integer indexed=true stored=true required=true/ field name=data type=string indexed=true stored=true required=true/ Your data field is required, but you are not SELECTing it from mysql. You probably want query=SELECT id, data FROM camera Regards, Gora
Re: Even after indexing a mysql table,in solr am not able to retreive data after querying
status shows that all your 4 records were not indexed. str name=Total Documents Failed4/str On Fri, Oct 19, 2012 at 12:22 PM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, Even after indexing a mysql table,in solr am not able to retrieve data after querying. Here is the status after i run http://localhost:8983/solr/db/dataimport str name=Indexing completed. Added/Updated: 0 documents. Deleted 0 documents./str str name=Committed2012-10-19 14:31:28/str str name=Total Documents Processed0/str str name=Total Documents Failed4/str str name=Time taken0:0:0.524/str/lst str name=WARNINGThis response format is experimental. It is likely to change in the future ./str /response My data-config file is : entity name=camera query=SELECT id FROM camera field column=id name=id/ field column=data name=data/ /entity The related schema.xml file is : field name=id type=integer indexed=true stored=true required=true/ field name=data type=string indexed=true stored=true required=true/ In my database, id is of Type int (11) and data is of Type varchar(100) I am new to solr. Could any one please help. Thanks and regards, Romita Saha -- Chandan Tamrakar * *
Re: Building an enterprise quality search engine using Apache Solr
Hi, your question is not easy to answer. It depends on so many things, that there is no standard way to realize an enterprise solution and time planning aspects are depending on so much things. I can try to give you some brief notes about our solution, but there are some differences in target group and data source. I am technical responsible for the system disco (a research and discovery system) at the library at university of Münster. (excuse me, I don't want to make a promotion tour here, I earn no money with such activities -:)). Ok, in this search engine, based on lucene, we search in about 200 Mio Articles, Books, Journals and so on. So we have different data sources in structure and also in the way of delivery. At the beginning we thought, lets buy a solution in order to avoid more or less own developement work. So we bought a commercial search engine, which works on a lucene core with a proprietary business logic in order to talk to lucene core. So far so good - or not good. At that time I was the onliest worker on this project and I need nearly one and a half year in fulltime in order to fullfill most features and requirements. And the reason for that long time is not, that I had no exiperiences, (I hope so). I work in this area nearly 15 years in different companies, always as developer in J2EE. (That`s rare today, because today every experienced developer wants to work as leader or manager, that`s sounds better and less project leader are outsourced. ok, other topic) And other universities (customers) who realized a comparable search engine in that environment took as long or longer. So I am hopefully... In germany we say der teufel steckt im detail (translation literally: devil is hidden in detail), which means you start work and parallel to that process mostly requirements changed, sadly in most cases after development has done the software basis. For example we need a lot of time for the fine tuning of ranking and for realizing a complete automatic mechanism to update data sources. And it was one thing to realize the search in development and run a first developer test, a complete other thing is to make the system fit for 24/7 service and run a productive system without problems. Most time we need on data pre-processing because of the shit in - shit out problem. Work on the quality of data is expensive but you get no appreciation, because everybody is cope with searching features. This requirement shows us, that mostly it is impossible to avoid own developement completely. Next thing is user interface, not every feature a customer knows from good old database backboned systems is easy to realized in a search engine because of more or less flat data structure. So we had to develop one service after the other in order to read additional informations. In our case for example runtime holding informations of our library. Summarized, if you want to estimate a concrete time duration in order to realize a complete productive enterprise search solution, you should talk to some people with similar solutions, think of your own requirements in detail and then multiply your estimation with 2. Then perhaps you have a realistic estimate. Dirk - my developer logs -- View this message in context: http://lucene.472066.n3.nabble.com/Building-an-enterprise-quality-search-engine-using-Apache-Solr-tp4014557p4014688.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: KeeperException (NodeExists for /overseer): SolrCloud Multiple Collections - is it safe ignore these exceptions?
Thanks Mark! Cheers, Jeeva On Oct 19, 2012, at 8:35 AM, Mark Miller markrmil...@gmail.com wrote: Yes, those exceptions are fine. These are cases where we try to delete the node if it's there, but don't care if it's not there - things like that. In some of these cases, ZooKeeper logs things we can't stop, even though it's expected that sometimes we will try and remove nodes that are not there or create nodes that are already there. - Mark On Thu, Oct 18, 2012 at 9:01 AM, Jeevanandam Madanagopal je...@myjeeva.com wrote: Hello - While doing prototype of SolrCloud with Multiple Collection. Each collection represents country level data. - searching within collection represents country level - local search - searching across collection represents global search Attached the graph image of SolrCoud structure. For prototype I'm running Embedded ZooKeeper ensemble (5 replicated zookeeper servers). - Searching and Indexing in respective collection works well - Search across collection works well (for global search) While joining the 'Collection2' to zookeeper ensemble I noticed the following KeeperException in the logger. Question 'is it safe to ignore these exceptions?' Exception Log snippet: Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn$Factory run INFO: Accepted socket connection from /fe80:0:0:0:0:0:0:1%1:62700 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn readConnectRequest INFO: Client attempting to establish new session at /fe80:0:0:0:0:0:0:1%1:62700 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn finishSessionInit INFO: Established session 0x13a73521356000a with negotiated timeout 15000 for client /fe80:0:0:0:0:0:0:1%1:62700 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x13a73521356000a type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x13a73521356000a type:create cxid:0x2 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x13a73521356000a type:delete cxid:0x4 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/live_nodes/mac-book-pro.local:7500_solr Error:KeeperErrorCode = NoNode for /live_nodes/mac-book-pro.local:7500_solr Oct 18, 2012 4:54:26 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes Cheers, Jeeva -- - Mark
diversity of search results?
Hello SOLR expert, yesterday in our group we realized that a danger we may need to face is that a search result includes very similar results. Of course, one would expect skimming so that duplicates that show almost the same results in a search result would be avoided but we fear that this is not possible. I was wondering if some technology, plugin, or even research was existing that would enable a search result to be partially reordered so that diversity is ensured for a first page of results at least. I suppose that might be doable by processing the result page and the next (and the five next?) and pushing down some results if they are too similar to previous ones. Hope I am being clear. Paul
Re: Building an enterprise quality search engine using Apache Solr
Hi Alexandre, Yes it is active. ManifoldCF 1.0.1 is released yesterday :) You can index content of SharePoint 2010 to Solr 4.0.0 . 'End user documentation' and 'in action book' are two main resources. http://manifoldcf.apache.org/release/release-1.0.1/en_US/end-user-documentation.html http://www.manning.com/wright/ --- On Fri, 10/19/12, Alexandre Rafalovitch arafa...@gmail.com wrote: From: Alexandre Rafalovitch arafa...@gmail.com Subject: Re: Building an enterprise quality search engine using Apache Solr To: solr-user@lucene.apache.org Date: Friday, October 19, 2012, 7:18 AM This is the first time I hear of this project. Looks interesting, but Is it active? The integration FAQ seem to be talking about Solr 1.4, a bit out of date. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Oct 19, 2012 at 12:37 AM, Jack Krupansky j...@basetechnology.com wrote: Take a look at Apache ManifoldCF for crawling enterprise repositories such as SharePoint (as well as lighterweight web crawling and file system crawling). http://manifoldcf.apache.org/en_US/index.html -- Jack Krupansky -Original Message- From: Venky Naganathan Sent: Thursday, October 18, 2012 2:21 PM To: solr-user@lucene.apache.org Subject: Building an enterprise quality search engine using Apache Solr Hello, Can some one please provide me advise on the below ? 1) I am considering building an enterprise search engine that indexes
Re: diversity of search results?
Hi Paul, yes that`s a typical problem in configuring a search engine. A solution depends on your data. Sometimes you can overcome this problem by fine tuning your search engine on boosting level. Thats not easy and always based on trail and error tests. Another thing you can do is to try to realize a data pre-processing which compensate the reasons of similar content in certain fields, e.g. in a title field. For example if you have products with very similar titles and you boost such a field. The result is, that you always will found all documents in the result list. But if you go on and add some informations (perhaps out of other search fields) in this title field you perhaps can reduce the similarity. (typical example in my branch: Book titles in different volumes, then I add the volumn number and der year to the title field.) Perhaps it is also necessary to cape with a pre-processed deduplication. Here you can find an entry point: http://wiki.apache.org/solr/Deduplication Dirk - my developer logs -- View this message in context: http://lucene.472066.n3.nabble.com/diversity-of-search-results-tp4014692p4014696.html Sent from the Solr - User mailing list archive at Nabble.com.
Query related to Solr XML
Hi, I made a Solr XML data source in lucidworks enterprise v2.1. When I search in Solr Admin for text. I am unable to get the result. Could you help me in this? Thanks Regards, Leena Jawale Software Engineer Trainee BFS BU Phone No. - 9762658130 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Saravanan Chinnadurai/Actionimages is out of the office.
I will be out of the office starting 18/10/2012 and will not return until 23/10/2012. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data Protection notice which is available in the privacy footer at www.reuters.com Registered in England No. 145516 VAT REG: 397000555
Re: Antw: Re: How to retrieve field contents as UTF-8 from Solr-Index with SolrJ
Fetching the same records using a raw Http-Request works fine and characters are OK. I am actually considering to fetch the data in Java via raw Http-Requests + XSLTResponsWriter as a workaround, but I want to try it first using the 'native' way with SolrJ. Andreas Jack Krupansky j...@basetechnology.com 18.10.2012 21:36 Have you verified that the data was indexed properly (UTF-8 encoding)? Try a raw HTTP request using the browser or curl and see how that field looks in the resulting XML. -- Jack Krupansky -Original Message- From: Andreas Kahl Sent: Thursday, October 18, 2012 1:10 PM To: j...@basetechnology.com ; solr-user@lucene.apache.org Subject: Antw: Re: How to retrieve field contents as UTF-8 from Solr-Index with SolrJ Jack, Thanks for the hint, but we have already set URIEncoding=UTF-8 on all our tomcats, too. Regards Andreas Jack Krupansky 18.10.12 17.11 Uhr It may be that your container does not have UTF-8 enabled. For example, with Tomcat you need something like: Make sure your Connector element has URIEncoding=UTF-8 (for Tomcat.) -- Jack Krupansky -Original Message- From: Andreas Kahl Sent: Thursday, October 18, 2012 10:53 AM To: solr-user@lucene.apache.org Subject: How to retrieve field contents as UTF-8 from Solr-Index with SolrJ Hello everyone, we are trying to implement a simple Servlet querying a Solr 3.5-Index with SolrJ. The Query we send is an identifier in order to retrieve a single record. From the result we extract one field to return. This field contains an XML-Document with characters from several european and asian alphabets, so we need UTF-8. Now we have the problem that the string returned by marcXml = results.get(0).getFirstValue(marcxml).toString(); is not valid UTF-8, so the resulting XML-Document is not well formed. Here is what we do in Java: ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, query.toString()); params.set(fl, marcxml); params.set(rows, 1); try { QueryResponse result = server.query(params, SolrRequest.METHOD.POST); SolrDocumentList results = result.getResults(); if (!results.isEmpty()) { marcXml = results.get(0).getFirstValue(marcxml).toString(); } } catch (Exception ex) { Logger.getLogger(MarcServer.class.getName()).log(Level.SEVERE, null, ex); } Charset.defaultCharset() is UTF-8 on both, the querying machine and the Solr-Server. Also we tried BinaryResponseParser as well as XMLResponseParser when instantiating CommonsHttpSolrServer. Does anyone have a solution to this? Is this related to https://issues.apache.org/jira/browse/SOLR-2034 ? Is there eventually a workaround? Regards Andreas
Re: Solr 4.0.0 - index version and generation not changed after delete by query on master
I wonder if you're getting hit by the browser caching the admin page and serving up the old version? What happens if you try from a different browser or purge the browser cache? Of course you have to refresh the master admin page, there's no automatic update but I assume you did that. Best Erick On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote: Just discovered that the replication admin REST API reports the correct index version and generation: http://master_host:port/solr/replication?command=indexversion So is this a bug in the admin UI? Bill On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote: I just upgraded to Solr 4.0.0. I noticed that after a delete by query, the index version, generation, and size remain unchanged on the master even though the documents have been deleted (num docs changed and those deleted documents no longer show up in query responses). But on the slave both the index version, generation, and size are updated. So I though the master and slave were out of sync but in reality that is not true. What's going on here? Bill
Re: Solr 4.0 segment flush times has bigger difference between tow machines
I have found that segment flush is controlled by DocumentWriterFlushControl, and indexing is implemented by DocumentWriterPerThread. DocumentWriterFlushControl has information about number of doc and size of RAM buffer, but this seemed be shared by all DocumentWriterPerThread. Is that RAM limit is sum of all buffer of DocumentWriterPerThread? 2012/10/19 Jun Wang wangjun...@gmail.com Hi I have 2 machine for a collection, and it's using DIH to import data, DIH is trigger via url request at one machine, let's call it A, and A will forward some index to machine B. Recently I have found that segment flush happened more in machine B. here is part of INFOSTREAM.txt. Machine A: DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as segment _4r3 numDocs=71616 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 deleted docs DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no vectors; no norms; no docValues; prox; freqs DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm, _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq] DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40 D Machine B -- DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings as segment _zi0 numDocs=4302 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has 0 deleted docs DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has no vectors; no norms; no docValues; prox; freqs DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt, _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip] DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed codec=Lucene40 D I have found that flush occured when number of doc in RAM reached 7~9000 in machine A, but the number in machine B is very different, almost is 4000. It seem that every doc in buffer used more RAM in machine B then machine A, that result in more flush . Does any one know why this happened? My conf is here. ramBufferSizeMB64/ramBufferSizeMBmaxBufferedDocs10/maxBufferedDocs -- from Jun Wang -- from Jun Wang
SimpleTextCodec usage tips?
Hi does anybody could give some direction / suggestion on how to correctly configure and use the SimpleTextCodec? http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/codecs/simpletext/SimpleTextCodec.html i'd like to do some test for debugging purpose, but i'm not shure on how to enable the pluggable codecs interface. as far as i understand, i have to use the codec factory in the schema.xml, but i didn't understand where to configure and choice the specific codec. thank you in advance (sorry if this question was earlier posted, i din't find any post on that), Alfredo Serafini
Re: Apache Solr Quiz
Thanks for the quiz. It is refreshing. Do you plan on covering other parts of SOLR management, like various handlers, scoring, plugins, sharding etc? Dmitry On Wed, Oct 17, 2012 at 7:12 PM, Yulia Crowder yulia.crow...@gmail.comwrote: I love Solr! I have searched for a quiz about Solr and didn't find any on the net. I am pleased to say that I have conducted a Quiz about Solr: http://www.quizmeup.com/quiz/apache-solr-configuration It is build on a free wiki based quiz site. You can, and welcome to, improve my questions and add new questions. Hope you find it useful and enjoyable way to learn about Solr. Comments?
Re: Query related to Solr XML
Leena - It's best to ask LucidWorks related questions at http://support.lucidworks.com rather than in this e-mail list. As for your issue more information is needed in order to assist. Did you start the Solr XML crawler? Does your data source show that there are documents in the index? If you simply press search (with an empty query) do you see documents? (best, again, to respond to these questions at the LucidWorks support site) Erik On Oct 19, 2012, at 05:54 , Leena Jawale wrote: Hi, I made a Solr XML data source in lucidworks enterprise v2.1. When I search in Solr Admin for text. I am unable to get the result. Could you help me in this? Thanks Regards, Leena Jawale Software Engineer Trainee BFS BU Phone No. - 9762658130 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Query related to Solr XML
Leena, Please ask on Lucid fora. You'll get better and faster help there. Otis -- Performance Monitoring - http://sematext.com/spm On Oct 19, 2012 5:54 AM, Leena Jawale leena.jaw...@lntinfotech.com wrote: Hi, I made a Solr XML data source in lucidworks enterprise v2.1. When I search in Solr Admin for text. I am unable to get the result. Could you help me in this? Thanks Regards, Leena Jawale Software Engineer Trainee BFS BU Phone No. - 9762658130 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Easy question ? docs with empty geodata field
Hello, Looking to get all documents with empty geolocalisation field, I have not found any way to do it, with ['' to *], geodata being a specific field, do you have any solution ? Thanks, Jul -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting count for Multi-Select Faceting
Hi all, Congrats on the 4.0.0 delivery, it's a pleasure to work with! I have a small problem that I am trying to elegantly resolve: while using multi-select faceting it might happen that a facet is selected which is not part of the facet list (due to limit for example). When executing the query I cannot then get the facet's value count as it still outside of the scope of the limit. for a sample query: http://192.168.160.2:8983/solr/select?fq={!tag=scat}category:Articlefacet.field={!ex=scat}categoryq=*:*facet=truefacet.limit=5facet.mincount=1 I have the following results: lst name=facet_fields lst name=category int name=Organic Papers6225/int int name=Metal-Organic Papers3055/int int name=Research Papers236/int int name=Inorganic Papers187/int int name=Addenda and Errata59/int /lst /lst Note that the facet (category:Article) is not present within the facet_fields result. I've thought of running 2 facet queries where one is not tagged and merge the 2 list within the UI. Is that the best solution available, or should the facet of fq be present (as sticky) within the facet_list? Cheers, _Stephane
Re: Easy question ? docs with empty geodata field
sorry, I mean this field called geodata in my schema fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ field name=geodata type=location indexed=true stored=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data Writing Performance of Solr 4.0
On Fri, Oct 19, 2012 at 2:50 AM, higashihara_hdk higashihara_...@es-planning.jp wrote: Hello everyone. I have two questions. I am considering using Solr 4.0 to perform full searches on the data output in real-time by a Storm cluster (http://storm-project.net/). 1. In particular, I'm concerned whether Solr would be able to keep up with the 2000-message-per-second throughput of the Storm cluster. What kind of throughput would I be able to expect from Solr 4.0, for example on a Xeon 2.5GHz 4-core with HDD? It depends on the size of the messages and the analysis you will be applying. But without any other info, yes, it's possible depending on your data and how you massage it. 2. Also, how efficiently would Solr scale with clustering? That's a pretty general question. -- - Mark
Re: Easy question ? docs with empty geodata field
Hello, Did you try q=-geodata:[* TO *] ? (Note the '-' (minus)) This reads as documents without any value for field named geodata. Also if you plan to use this intensively, you'd better declare a boolean field telling if geodata are set or not and set a value to each doc, because the -field_name:[* TO *] is an expansive query, especially on large data sets. Regards, -- Tanguy 2012/10/19 darul daru...@gmail.com sorry, I mean this field called geodata in my schema fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ field name=geodata type=location indexed=true stored=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting count for Multi-Select Faceting
Did you look think of using 'facet.query' ? Adding 'facet.query=category:Article' to your url should return what you expected. Franck Brisbart Le vendredi 19 octobre 2012 à 15:18 +0200, Stephane Gamard a écrit : Hi all, Congrats on the 4.0.0 delivery, it's a pleasure to work with! I have a small problem that I am trying to elegantly resolve: while using multi-select faceting it might happen that a facet is selected which is not part of the facet list (due to limit for example). When executing the query I cannot then get the facet's value count as it still outside of the scope of the limit. for a sample query: http://192.168.160.2:8983/solr/select?fq={!tag=scat}category:Articlefacet.field={!ex=scat}categoryq=*:*facet=truefacet.limit=5facet.mincount=1 I have the following results: lst name=facet_fields lst name=category int name=Organic Papers6225/int int name=Metal-Organic Papers3055/int int name=Research Papers236/int int name=Inorganic Papers187/int int name=Addenda and Errata59/int /lst /lst Note that the facet (category:Article) is not present within the facet_fields result. I've thought of running 2 facet queries where one is not tagged and merge the 2 list within the UI. Is that the best solution available, or should the facet of fq be present (as sticky) within the facet_list? Cheers, _Stephane
Benchmarking/Performance Testing question
Hi all, I know there have been many posts about this already and I have done my best to read through them but one lingering question remains. When doing performance testing on a Solr instance (under normal production like circumstances, not the ones where commits are happening more frequently than necessary), is there any value in performance testing against a server with caches *disabled* with a profiler hooked up to see where queries in the absence of a cache are spending the most time? The reason I am asking this is to tune things like field types, using tint vs regular int, different precision steps etc. Or maybe sorting is taking a long time and the profiler shows an inordinate amount of time spent there etc. so either we find a different way to solve that particular problem. Perhaps we are faceting on something bad etc. Then we can optimize those to at least not be as slow and then ensure that caching is tuned properly so that cache misses don't yield these expensive spikes. I'm trying to devise a proper performance testing for any new features/config changes and wanted to get some feedback on whether or not this approach makes sense. Of course performance testing against a typical production setup *with* caching will also be done to make sure things behave as expected. Thanks! Amit
Solr-4.0.0 DIH not indexing xml attributes
Hello all, I am having problems indexing xml attributes using the DIH. I have the following xml: root Stuff attr1=some attr attr2=another attr ... /Stuff /root I am using the following XPath for my fields: field column=attr1 xpath=/root/Stuff/@attr1 / field column=attr2 xpath=/root/Stuff/@attr2 / However nothing is getting inserted into my index. I am pretty sure this should work so I have no idea what is wrong. Can anyone else confirm that this is a problem? Or is it just me? Thanks, Billy
Re: Easy question ? docs with empty geodata field
What about querying on the dynamic lat/long field to see if there are documents that do not have the dynamic _latlon0 or whatever defined? On Fri, Oct 19, 2012 at 8:17 AM, darul daru...@gmail.com wrote: I have already tried but get a nice exception because of this field type : -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorl 4.0: ClassNotFoundException DataImportHandler
Thanks Chris for your reply. I really need some help here. 1) If I put the apache-solr-dataimporthandler-*.jar files in solr/lib folder, the jar files are loading. I see that in the tomcat logs. But in the end it says 'ClassNotFoundException DataImportHandler'. 2) So If I remove apache-solr-dataimporthandler-*.jar from solr/lib folder and placed them in tomcat/lib folder. No more ClassNotFoundException. But this time it says 'Error Instantiating Request Handler, org.apache.solr.handler.dataimport.DataImportHandler failed to instantiate org.apache.solr.request.SolrRequestHandler'. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorl-4-0-ClassNotFoundException-DataImportHandler-tp4014348p4014770.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Easy question ? docs with empty geodata field
Your idea looks great but with this schema info : fieldType name=point class=solr.PointType dimension=2 subFieldSuffix=_d/ fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ fieldtype name=geohash class=solr.GeoHashField/ . field name=geodata type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / How can I use it ? fq=location_coordinate:[1 to *] not working by instance -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014779.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighter isn't highlighting what is matched in query analyzer
Hi, all. The content I'm trying to index contains dollar signs that should be indexed and matched, e.g., $1. I've set up my schema to index the dollar sign, and am able to successfully match it with the query analyzer; searching for $1 matches $1. However, the highlighter doesn't seem to recognize the dollar sign. When I submit a query for $1, the results do contain highlighted results, but the highlights appear like $em1/em; the dollar sign is not highlighted. How can I ensure that the highlighter will highlight the entirety of what is matched in the query analyzer tool? -Ali
[/solr] memory leak prevent tomcat shutdown
very often when we try to shutdown tomcat, we got following error in catalina.out indicating a solr thread can not be stopped, the tomcat results hanging, we have to kill -9, which we think lead to some core corruptions in our production environment. please help ... catalina.out: ... ... Oct 19, 2012 10:17:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: The web application [/solr] appears to have started a thread named [pool-69-thread-1] but has failed to stop it. This is very likely to create a memory leak. Then I used kill -3 to signal the thread dump, here is what I get (note the thread [pool-69-thread-1] is hanging) : 2012-10-19 10:18:39 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.2-b06 mixed mode): DestroyJavaVM prio=10 tid=0x55b39800 nid=0x7e82 waiting on condition [0x] java.lang.Thread.State: RUNNABLE pool-69-thread-1 prio=10 tid=0x2aaabcb41800 nid=0x19fa waiting on condition [0x4205e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0006de699d80 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) JDWP Transport Listener: dt_socket daemon prio=10 tid=0x578aa000 nid=0x19f9 runnable [0x] java.lang.Thread.State: RUNNABLE ... ... -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [/solr] memory leak prevent tomcat shutdown
by the way, I am running tomcat 6, solr 3.5 on redhat 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Easy question ? docs with empty geodata field
So here is my spec for lat/long (similar to yours except I explicitly define the sub-field names for clarity) fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/ field name=location type=latLon indexed=true stored=true/ !-- Could use dynamic fields here but prefer explicitly defining them so it's clear what's going on. The LatLonType looks to be a wrapper around these fields? -- field name=location_0_latLon type=tdouble indexed=true stored=true/ field name=location_1_latLon type=tdouble indexed=true stored=true/ So then the query would be location_0_latLon:[ * TO *]. Looking at your schema, my guess would be: location_0_coordinate:[* TO *] location_1_coordinate:[* TO *] Let me know if that helps Amit On Fri, Oct 19, 2012 at 9:37 AM, darul daru...@gmail.com wrote: Your idea looks great but with this schema info : fieldType name=point class=solr.PointType dimension=2 subFieldSuffix=_d/ fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ fieldtype name=geohash class=solr.GeoHashField/ . field name=geodata type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / How can I use it ? fq=location_coordinate:[1 to *] not working by instance -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014779.html Sent from the Solr - User mailing list archive at Nabble.com.
number and minus operator
I have a document with name ABC 102030 XYZ and if i search for this document with ABC and -10 then i dont get this document (which is correct behavior) but when i do ABC and -10 i don't get the correct result back. Any explanation around this. -- View this message in context: http://lucene.472066.n3.nabble.com/number-and-minus-operator-tp4014794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0.0 - index version and generation not changed after delete by query on master
It's not the browser cache. I have tried reloading the admin page and accessing the admin page from another machine. Both show the older index version and generation. On the slave, replication did kicked in and show the new index version and generation for the slave. But the slave admin page also shows the older index version and generation for the master. If I do a second delete by query on the master, the master index generation reported the admin UI does go up by one on both the master and slave. But it is still one generation behind. Bill On Fri, Oct 19, 2012 at 7:09 AM, Erick Erickson erickerick...@gmail.comwrote: I wonder if you're getting hit by the browser caching the admin page and serving up the old version? What happens if you try from a different browser or purge the browser cache? Of course you have to refresh the master admin page, there's no automatic update but I assume you did that. Best Erick On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote: Just discovered that the replication admin REST API reports the correct index version and generation: http://master_host:port/solr/replication?command=indexversion So is this a bug in the admin UI? Bill On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote: I just upgraded to Solr 4.0.0. I noticed that after a delete by query, the index version, generation, and size remain unchanged on the master even though the documents have been deleted (num docs changed and those deleted documents no longer show up in query responses). But on the slave both the index version, generation, and size are updated. So I though the master and slave were out of sync but in reality that is not true. What's going on here? Bill
Re: Solr 4.0 copyField not applying index analyzers
What exactly is the precise symptom - give us an example with field names of source and dest and what precise value is in fact being indexed. Is the entire field value being indexed as a single term/string (if analyzer is not being applied)? Or, what? -- Jack Krupansky -Original Message- From: davers Sent: Friday, October 19, 2012 2:51 PM To: solr-user@lucene.apache.org Subject: Solr 4.0 copyField not applying index analyzers I am upgrading from solr 3.6 to solr 4.0 and my copyFields do not seem to be applying the index analyzers. I'm sure there is something i'm missing in my schema.xml. I am also using a DIH but I'm not sure that matters. ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=id type=string indexed=true stored=true/ field name=groupid type=string indexed=true stored=false/ field name=siteid type=int indexed=true stored=false multiValued=true/ field name=sku type=textTight indexed=true stored=true multiValued=true/ field name=upc type=textTight indexed=true stored=true multiValued=true/ field name=productID type=textTight indexed=true stored=true/ field name=manufacturer type=text indexed=true stored=true / field name=productTitle type=text indexed=true stored=true/ field name=categoryId type=int indexed=true stored=false multiValued=true/ field name=categoryName type=text indexed=true stored=false multiValued=true/ field name=theme type=text indexed=true stored=false/ field name=description type=text indexed=false stored=false/ field name=weight type=tfloat indexed=true stored=false/ field name=price type=tfloat indexed=true stored=false/ field name=popularity type=tint indexed=true stored=false default=0/ field name=inStock type=boolean indexed=true stored=false multiValued=true/ field name=onSale type=boolean indexed=true stored=false/ field name=hasDigiCast type=boolean indexed=true stored=false/ field name=hasDigiVista type=boolean indexed=true stored=false/ field name=isNew type=boolean indexed=true stored=false/ field name=isTopSeller type=boolean indexed=true stored=false/ field name=finish type=text indexed=true stored=true multiValued=true/ field name=masterFinish type=text indexed=true stored=false multiValued=true/ field name=series type=text indexed=true stored=false/ field name=searchKeyword type=text_ws indexed=true stored=false multiValued=true/ field name=discontinued type=boolean indexed=true stored=false / field name=spell type=textSpell indexed=true stored=true multiValued=true/ field name=_version_ type=long indexed=true stored=true/ field name=imageURL type=string indexed=false stored=true / field name=productURL type=string indexed=false stored=true / field name=productID_sort type=string indexed=true stored=true multiValued=false/ field name=text type=text indexed=true stored=true multiValued=true/ field name=modifiedDate type=date indexed=true stored=true multiValued=false default=NOW/ field name=productAddDate type=tdate indexed=true stored=true multiValued=false default=NOW/ field name=textnge type=autocomplete_edge indexed=true stored=true multiValued=true / field name=textng type=autocomplete_ngram indexed=true stored=true multiValued=true omitNorms=true omitTermFreqAndPositions=true / field name=textphon type=text_phonetic_do indexed=true stored=true multiValued=true omitNorms=true omitTermFreqAndPositions=true / dynamicField name=*_i type=intindexed=true stored=false multiValued=true/ dynamicField name=*_s type=string indexed=true stored=false multiValued=true/ dynamicField name=*_l type=long indexed=true stored=false multiValued=true/ dynamicField name=*_t type=text indexed=true stored=false multiValued=true/ dynamicField name=*_b type=boolean indexed=true stored=false multiValued=true/ dynamicField name=*_f type=float indexed=true stored=false multiValued=true/ dynamicField name=*_d type=double indexed=true stored=false multiValued=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / dynamicField name=*_dt type=dateindexed=true stored=true/ dynamicField name=*_dts type=dateindexed=true stored=true multiValued=true/ dynamicField name=*_ti type=tintindexed=true stored=true/ dynamicField name=*_tl type=tlong indexed=true stored=true/ dynamicField name=*_tf type=tfloat indexed=true stored=true/ dynamicField name=*_td type=tdouble indexed=true stored=true/ dynamicField name=*_tdt type=tdate indexed=true stored=true/ dynamicField name=*_pi type=pintindexed=true stored=true/ dynamicField name=attr_* type=text indexed=true stored=true multiValued=true/ dynamicField name=random_* type=random / /fields uniqueKeyid/uniqueKey copyField source=productTitle dest=text/ copyField source=manufacturer dest=text/ copyField source=description dest=text/ copyField source=productID dest=text/ copyField
Re: need help with exact match search
Because you used solr.StandardTokenizerFactory which will tokenize terms at some delimiters - such as the hyphens that surround your errant 404 case. Try solr.WhitespaceTokenizerFactory or solr.KeywordTokenizerFactory. And maybe rename your field type from text_general_trim to text_exact since general implies a general text analyzer. Test your field type changes on the Solr Admin Analysis page. -- Jack Krupansky -Original Message- From: geeky2 Sent: Friday, October 19, 2012 5:20 PM To: solr-user@lucene.apache.org Subject: need help with exact match search environment: solr 3.5 Hello, i have a query for an exact match that is bringing back one (1) additional record that is NOT an exact match. when i do an exact match search for 404 - i should get back three (3) document, *but i get back the additional record, with an itemModelNoExactMatchStr of DUS-404-19 * can someone help me understand what i am missing or not setting up correctly? response from solr with 4 documents ?xml version=1.0? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=sortitemModelNoExactMatchStr asc/str str name=fqitemType:2/str str name=echoParamsall/str str name=qfitemModelNoExactMatchStr^30.0/str str name=q.alt*:*/str str name=rows50/str str name=defTypeedismax/str str name=debugQuerytrue/str * str name=qitemModelNoExactMatchStr:404/str* str name=qtmodelItemNoSearch/str str name=rows50/str str name=facetfalse/str /lst /lst *result name=response numFound=4 start=0* doc arr name=divProductTypeDesc strKitchen Equipment*/str /arr str name=divProductTypeId0212020/str str name=id0212020,0431 ,404 /str str name=itemModelDescELECTRIC GENERAL SLICER WITH VACU BASE/str * str name=itemModelNo404/str* str name=itemModelNoExactMatchStr404 /str int name=itemType2/int int name=partCnt13/int arr name=plsBrandDesc strGENERAL/str /arr str name=plsBrandId0431 /str int name=rankNo0/int /doc doc arr name=divProductTypeDesc strVacuum, Canister/str /arr str name=divProductTypeId0642000/str str name=id0642000,0517 ,404 /str str name=itemModelDescHOOVER /str str name=itemModelNo404/str * str name=itemModelNoExactMatchStr404 /str* int name=itemType2/int int name=partCnt48/int arr name=plsBrandDesc strHOOVER/str /arr str name=plsBrandId0517 /str int name=rankNo0/int /doc doc arr name=divProductTypeDesc strPower roller/str /arr str name=divProductTypeId0733200/str str name=id0733200,1164 ,404 /str str name=itemModelDescPOWER PAINTER/str str name=itemModelNo404/str * str name=itemModelNoExactMatchStr404 /str* int name=itemType2/int int name=partCnt39/int arr name=plsBrandDesc strWAGNER/str /arr str name=plsBrandId1164 /str int name=rankNo0/int /doc doc arr name=divProductTypeDesc strDishwasher^/str /arr str name=divProductTypeId013/str str name=id013,0164 ,DUS-404-19 /str str name=itemModelDescDISHWASHERS/str str name=itemModelNoDUS-404-19 /str *str name=itemModelNoExactMatchStrDUS-404-19 /str* int name=itemType2/int int name=partCnt185/int arr name=plsBrandDesc strCALORIC/str /arr str name=plsBrandId0164 /str int name=rankNo0/int /doc /result lst name=debug str name=rawquerystringitemModelNoExactMatchStr:404/str str name=querystringitemModelNoExactMatchStr:404/str str name=parsedquery+itemModelNoExactMatchStr:404/str str name=parsedquery_toString+itemModelNoExactMatchStr:404/str lst name=explain str name=0212020,0431 ,404 10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 4745495), product of: 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1) 10.053003 = idf(docFreq=971, maxDocs=8304922) 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=4745495) /str str name=0642000,0517 ,404 10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 4781972), product of: 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1) 10.053003 = idf(docFreq=971, maxDocs=8304922) 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=4781972) /str str name=0733200,1164 ,404 10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 8186768), product of: 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1) 10.053003 = idf(docFreq=971, maxDocs=8304922) 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=8186768) /str str name=013,0164 ,DUS-404-19 5.0265017 = (MATCH)
Re: need help with exact match search
hello jack, thank you very much for the reply - i will re-test and let you know. really appreciate it ;) thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4014848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Transient commit errors during autocommit
Lance, I have seen this error when the Solr process hit the maximum file descriptors (because the commit triggered an optimize). Make sure your maxfds is set as high as possible. In my case, 1024 was not nearly sufficient. --Casey On 10/19/12 6:20 PM, Lance Norskog wrote: When a transient error happens during an autocommit, the error does not cause a safe rollback or notify the user there was a problem. Instead, there is a write lock failure and Solr has to be restarted. It run fine after restart. Is this a known problem? Is it fixable? Is it unit-test-able?
Re: Solr-4.0.0 DIH not indexing xml attributes
Do other fields get added? Do these fields have type problems? I.e. is 'attr1' a number and you are adding a string? There is a logging EP that I think shows the data found- I don't know how to use it. Is it possible to post the whole DIH script? - Original Message - | From: Billy Newman newman...@gmail.com | To: solr-user@lucene.apache.org | Sent: Friday, October 19, 2012 9:06:08 AM | Subject: Solr-4.0.0 DIH not indexing xml attributes | | Hello all, | | I am having problems indexing xml attributes using the DIH. | | I have the following xml: | | root | Stuff attr1=some attr attr2=another attr | ... | /Stuff | /root | | I am using the following XPath for my fields: | field column=attr1 xpath=/root/Stuff/@attr1 / | field column=attr2 xpath=/root/Stuff/@attr2 / | | | However nothing is getting inserted into my index. | | I am pretty sure this should work so I have no idea what is wrong. | | Can anyone else confirm that this is a problem? Or is it just me? | | Thanks, | Billy |
Re: Benchmarking/Performance Testing question
Hi Amit, I'm not sure I follow what you are after... Yes, seeing how queries that result in cache misses perform is valuable (esp. if you have low cache hit rate in production) But figuring out if you chose a bad field type or bad faceting method or doesn't require profiling - you can review configs and logs and such and quickly find performance issues. In production (or dev, really, too) you can use tools like SPM for Solr or NewRelic. SPM will show you performance breakdown over all Solr SearchComponents used in searches. NewRelic has non-free plans that also let you do on-demand profiling, so you could profile Solr in production, which can be handy. HTH, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 19, 2012 at 12:02 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I know there have been many posts about this already and I have done my best to read through them but one lingering question remains. When doing performance testing on a Solr instance (under normal production like circumstances, not the ones where commits are happening more frequently than necessary), is there any value in performance testing against a server with caches *disabled* with a profiler hooked up to see where queries in the absence of a cache are spending the most time? The reason I am asking this is to tune things like field types, using tint vs regular int, different precision steps etc. Or maybe sorting is taking a long time and the profiler shows an inordinate amount of time spent there etc. so either we find a different way to solve that particular problem. Perhaps we are faceting on something bad etc. Then we can optimize those to at least not be as slow and then ensure that caching is tuned properly so that cache misses don't yield these expensive spikes. I'm trying to devise a proper performance testing for any new features/config changes and wanted to get some feedback on whether or not this approach makes sense. Of course performance testing against a typical production setup *with* caching will also be done to make sure things behave as expected. Thanks! Amit
Re: diversity of search results?
Hi Paul, We've done this for a client in the past via a custom SearchComponent and it worked well. Yes, it involved some post-processing, but on the server, not client. I *think* we saw 10% performance degradation. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 19, 2012 at 3:26 AM, Paul Libbrecht p...@hoplahup.net wrote: Hello SOLR expert, yesterday in our group we realized that a danger we may need to face is that a search result includes very similar results. Of course, one would expect skimming so that duplicates that show almost the same results in a search result would be avoided but we fear that this is not possible. I was wondering if some technology, plugin, or even research was existing that would enable a search result to be partially reordered so that diversity is ensured for a first page of results at least. I suppose that might be doable by processing the result page and the next (and the five next?) and pushing down some results if they are too similar to previous ones. Hope I am being clear. Paul
Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities
If it worked before and does not work now, I don't think you are doing anything wrong :) Do you have a different version of your JDBC driver? Can you make a unit test with a minimal DIH script and schema? Or, scan through all of the JIRA issues against the DIH from your old Solr capture date. - Original Message - | From: Dominik Siebel m...@dsiebel.de | To: solr-user@lucene.apache.org | Sent: Thursday, October 18, 2012 11:22:54 PM | Subject: Fwd: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities | | Hi folks, | | I am currently migrating our Solr servers from a 4.0.0 nightly build | (aprox. November 2011, which worked very well) to the newly released | 4.0.0 and am running into some issues concerning the existing | DataImportHandler configuratiions. Maybe you have an idea where I am | going wrong here. | | The following lines are a highly simplified excerpt from one of the | problematic imports: | | entity name=path rootEntity=false query=SELECT p.id, IF(p.name | IS NULL, '', p.name) AS name FROM path p GROUP BY p.id | | entity name=item rootEntity=true query= | SELECT | i.*, | | CONVERT('${dataimporter.functions.escapeSql(path.name)}' USING | utf8) AS path_name | FROM items i | WHERE i.path_id = ${path.id} / | | /entity | | While this configuration worked without any problem for over half a | year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import throws | the | followeing Stacktrace and exits: | | SEVERE: Exception while processing: path document : | null:org.apache.solr.handler.dataimport.DataImportHandlerException: | java.lang.NullPointerException | | which is caused by | | Caused by: java.lang.NullPointerException | at | org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79) | | In other words: The EvaluatorBag doesn't seem to resolve the given | path.name variable properly and returns null. | | Does anyone have any idea? | Appreciate your input! | | Regards | Dom |