Text search within facets?
Hello, Is it possible to do a text search within facets? Something that will return me what words solr used to gather my results and how many of those results were found. For example, if I have the following field: field name=dog type=string indexed=true stored=true/ and it has docs that contain something like str name=dogenglish bulldog/str str name=dogfrench bulldog/str str name=dogbichon frise/str If I search for english bulldog and facet on dog, I will get the following: int name=english bulldog135/int int name=french bulldog23/int int name=bichon frise12/int But I really want only the ones that contain the words english and bulldog like int name=english bulldog135/int int name=french bulldog23/int Thanks for your help! -- View this message in context: http://old.nabble.com/Text-search-within-facets--tp27560090p27560090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reindex data without restarting server
Hi, Thanks ! This is very useful :) :) On Fri, Feb 12, 2010 at 7:55 AM, Joe Calderon calderon@gmail.comwrote: if you use the core model via solr.xml you can reload a core without having to to restart the servlet container, http://wiki.apache.org/solr/CoreAdmin On 02/11/2010 02:40 PM, Emad Mushtaq wrote: Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets say I make a change in the schema file. That would require me to reindex data. Is there a solution to this ? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
EmbeddedSolrServer vs CommonsHttpSolrServer
Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = http://localhost:8983/solr;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery(video); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(Found: + docs.getNumFound()); System.out.println(Start: + docs.getStart()); System.out.println(Max Score: + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* */ SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, /WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, coreDescriptor); coreContainer.register(, core, false); /* */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
inconsistency between analysis.jsp and actual search
Hi I am indexing the name FC St. Gallen using the following type: fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer /fieldType Which according to analysis.jsp gets split into: f | fc | s | st | g | ga | gal | gall | galle | gallen So far so good. Now if I search for fc st.gallen according to analysis.jsp it will search for: fc | st | gallen But when I do a dismax search using the following handler: requestHandler name=auto class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=qfname firstname email^0.5 telefon^0.5 city^0.6 street^0.6/str str name=flid,type,name,firstname,zipcode,city,street,urlizedname/str /lst /requestHandler I do not get a match. Looking at the debug of the query I can see that its actually splitting the query into fc and st gallen: str name=rawquerystringfc st.gallen/str str name=querystringfc st.gallen/str str name=parsedquery +((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc)) DisjunctionMaxQuery((telefon:st gallen^0.5 | firstname:st gallen | email:st gallen^0.5 | street:st gallen^0.6 | city:st gallen^0.6 | name:st gallen)))~2) () /str str name=parsedquery_toString +(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc) (telefon:st gallen^0.5 | firstname:st gallen | email:st gallen^0.5 | street:st gallen^0.6 | city:st gallen^0.6 | name:st gallen))~2) () /str Whats going on there? regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml dataDir${solr.data.dir:./solr/data}/dataDir we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = http://localhost:8983/solr;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery(video); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(Found : + docs.getNumFound()); System.out.println(Start : + docs.getStart()); System.out.println(Max Score: + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* */ SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, /WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, coreDescriptor); coreContainer.register(, core, false); /* */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
Local Solr Inconsistent results for radius
Hello, I have a question related to local solr. For certain locations (latitude, longitude), the spatial search does not work. Here is the query I try to make which gives me no results: q=*qt=geosort=geo_distance asclat=33.718151long=73. 060547radius=450 However if I make the same query with radius=449, it gives me results. Here is part of my solrconfig.xml containing startTier and endTier: updateRequestProcessorChain processor class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory str name=latFieldlatitude/str !-- The field used to store your latitude -- str name=lngFieldlongitude/str !-- The field used to store your longitude -- int name=startTier9/int int name=endTier17/int /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain What do I need to do to fix this problem? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: inconsistency between analysis.jsp and actual search
On 12.02.2010, at 11:17, Ahmet Arslan wrote: analysis.jsp does not do actual query parsing. just shows produced tokens step by step in analysis (charfilter, tokenizer, tokenfilter) phase. admin/analysis.jsp page will show you how your field is processed while indexing and while querying, and if a particular query matches. [1] [1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F I see, thats good to know. Maybe even something that should be noted in the analysis.jsp page itself. Anyways so how can I get st.gallen split into two terms at query time? fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index ... /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer /fieldType It seems I should probably use the solr.StandardTokenizerFactory anyways, but for this case it wouldnt help either. regards, Lukas Kahwe Smith m...@pooteeweet.org
optimize is taking too much time
hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- !-- dataDir${solr.data.dir:./solr/data}/dataDir -- Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan rc...@i-tao.com A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml dataDir${solr.data.dir:./solr/data}/dataDir we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = http://localhost:8983/solr;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery(video); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(Found : + docs.getNumFound()); System.out.println(Start : + docs.getStart()); System.out.println(Max Score: + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* */ SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, /WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, coreDescriptor); coreContainer.register(, core, false); /* */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
When using EmbeddedSolrServer, you could simply set the solr.data.dir system property or launch your process from the same working directory where you are launching the HTTP version of Solr. Either of those should also work to alleviate this issue. Erik On Feb 12, 2010, at 5:36 AM, dcdmailbox-i...@yahoo.it wrote: Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- !-- dataDir${solr.data.dir:./solr/data}/dataDir -- Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan rc...@i-tao.com A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml dataDir${solr.data.dir:./solr/data}/dataDir we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = http://localhost:8983/solr;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache- solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery(video); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(Found : + docs.getNumFound()); System.out.println(Start : + docs.getStart()); System.out.println(Max Score: + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache- solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* */ SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache- solr-1.4.0/example/solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, /WORKSPACE/bin/apache-solr-1.4.0/ example/solr/data, solrConfig, indexSchema,
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
don't think this is a bug, the default behaviour is for /data to sit under Solr home there should be no need to use this parameter unless it is special case not sure why it is like this in the example - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 10:36:41 AM Subject: Re: EmbeddedSolrServer vs CommonsHttpSolrServer Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- !-- dataDir${solr.data.dir:./solr/data}/dataDir -- Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan rc...@i-tao.com A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml dataDir${solr.data.dir:./solr/data}/dataDir we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = http://localhost:8983/solr;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery(video); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(Found : + docs.getNumFound()); System.out.println(Start : + docs.getStart()); System.out.println(Max Score: + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty(solr.solr.home, /WORKSPACE/bin/apache-solr-1.4.0/example/solr); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* */ SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , solrConfig.getResourceLoader().getInstanceDir());
Good literature on search basics
Does anyone know good literature(web resources, books etc) on basics of search? I do have Solr 1.4 and Lucene books but wanted to go in more details on basics. Thanks, -- View this message in context: http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html Sent from the Solr - User mailing list archive at Nabble.com.
persistent cache
Does Solr use some sort of a persistent cache? I do this 10 times in a loop: * start solr * create a core * execute warmup query * execute query with sort fields * stop solr Executing the query with sort fields takes 5-20 times longer the first iteration than the other 9 iterations. For instance I have a query 'hockey' with one date sort field. That takes 768 ms in the first iteration of the loop. The next 9 iterations the query takes 52 ms. The solr and jetty server really stops in each iteration so the RAM must be emptied. So the only way I can think of why this happens is because there is some persistent cache that survives the solr restarts. Is this the case? Or why could this be? /Tim
Re: persistent cache
2010/2/12 Tim Terlegård tim.terleg...@gmail.com Does Solr use some sort of a persistent cache? I do this 10 times in a loop: * start solr * create a core * execute warmup query * execute query with sort fields * stop solr Executing the query with sort fields takes 5-20 times longer the first iteration than the other 9 iterations. For instance I have a query 'hockey' with one date sort field. That takes 768 ms in the first iteration of the loop. The next 9 iterations the query takes 52 ms. The solr and jetty server really stops in each iteration so the RAM must be emptied. So the only way I can think of why this happens is because there is some persistent cache that survives the solr restarts. Is this the case? Or why could this be? Solr does not have a persistent cache. That is the operating system's file cache at work. -- Regards, Shalin Shekhar Mangar.
Re: Dismax phrase queries
On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'd like to boost an exact phrase match such as q=video poker over q=video poker. How would I do this using dismax? I tried pre-processing video poker into, video poker video poker however that just gets munged by dismax into video poker video poker... Which is wrong. Have you tried the pf parameter? -- Regards, Shalin Shekhar Mangar.
Re: spellcheck
I try to config spellcheck, but I still have this problem: Config: lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellcheckerFile/str /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler lazy=true lst name=file str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str str name=spellchecktrue/str str name=spellcheck.dictionaryfile/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Maybe I have this result because I work with dictionary? For request 'popular' I still get 'populars', but in dictionary I have popular and populars! -- View this message in context: http://old.nabble.com/spellcheck-tp27527425p27562959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Local Solr Inconsistent results for radius
Hi Emad, I had the same issue ( http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it seems that this happens only on eastern areas of the world. Try inverting the sign of all your longitudes, or translate all your longitudes to the west. Cheers, Mauricio On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq emad.mush...@sigmatec.com.pkwrote: Hello, I have a question related to local solr. For certain locations (latitude, longitude), the spatial search does not work. Here is the query I try to make which gives me no results: q=*qt=geosort=geo_distance asclat=33.718151long=73. 060547radius=450 However if I make the same query with radius=449, it gives me results. Here is part of my solrconfig.xml containing startTier and endTier: updateRequestProcessorChain processor class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory str name=latFieldlatitude/str !-- The field used to store your latitude -- str name=lngFieldlongitude/str !-- The field used to store your longitude -- int name=startTier9/int int name=endTier17/int /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain What do I need to do to fix this problem? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: inconsistency between analysis.jsp and actual search
Anyways so how can I get st.gallen split into two terms at query time? As you mentioned in your first mail, query st.gallen is already broken into two terms/words. But query parser constructs a phrase query. There was an disscussion about this behaviour earlier. http://www.lucidimagination.com/search/document/d41bc0ef422b9238/understanding_the_query_parser#85db37e69ef29dba
Fwd: indexing: issue with default values
in the schema.xml I have fileds with int type and default value exp: field name=postal_code type=int indexed=true stored=true default=0/ but when a document has no value for the field postal_code at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preFor input string: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) /pre /body /html ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime4/int/lst /response any help? thx
Re: persistent cache
2010/2/12 Shalin Shekhar Mangar shalinman...@gmail.com: 2010/2/12 Tim Terlegård tim.terleg...@gmail.com Does Solr use some sort of a persistent cache? Solr does not have a persistent cache. That is the operating system's file cache at work. Aha, that's very interesting and seems to make sense. So is the primary goal of warmup queries to allow the operating system to cache all the files in the data/index directory? Because I think the difference (768ms vs 52ms) is pretty big. I just do one warmup query and get 52 ms response on a 40 million documents index. I think that's pretty nice performance without tinkering with the caches at all. The only tinkering that seems to be needed is this operating system file caching. What's the best way to make sure that my warmup queries have cached all the files? And does a file cache have the complete file in memory? I guess it can get tough to get my 100GB index into the 16GB memory. /Tim
Re: Good literature on search basics
See http://markmail.org/thread/z5sq2jr2a6eayth4 On 12 February 2010 12:14, javaxmlsoapdev vika...@yahoo.com wrote: Does anyone know good literature(web resources, books etc) on basics of search? I do have Solr 1.4 and Lucene books but wanted to go in more details on basics. Thanks, -- View this message in context: http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing: issue with default values
When a document has no value, are you still sending a postal_code field in your post to Solr? Seems like you are. Erik On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: in the schema.xml I have fileds with int type and default value exp: field name=postal_code type=int indexed=true stored=true default=0/ but when a document has no value for the field postal_code at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preFor input string: java.lang.NumberFormatException: For input string: at java .lang .NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org .apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 246) at org .apache .solr .update .processor .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org .apache .solr .handler .ContentStreamHandlerBase .handleRequestBody(ContentStreamHandlerBase.java:54) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 365) at org .mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org .mortbay .jetty .handler .ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org .mortbay .jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) /pre /body /html ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime4/int/lst /response any help? thx
Re: Dismax phrase queries
Was going to post that I more or less figured it out. Dismax handles this automatically with the ps parameter, which is different than the bs parameter... On Fri, Feb 12, 2010 at 3:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'd like to boost an exact phrase match such as q=video poker over q=video poker. How would I do this using dismax? I tried pre-processing video poker into, video poker video poker however that just gets munged by dismax into video poker video poker... Which is wrong. Have you tried the pf parameter? -- Regards, Shalin Shekhar Mangar.
Re: indexing: issue with default values
yes, sometimes the document has postal_code with no values , i still post it to solr 2010/2/12 Erik Hatcher erik.hatc...@gmail.com When a document has no value, are you still sending a postal_code field in your post to Solr? Seems like you are. Erik On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: in the schema.xml I have fileds with int type and default value exp: field name=postal_code type=int indexed=true stored=true default=0/ but when a document has no value for the field postal_code at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preFor input string: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) /pre /body /html ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime4/int/lst /response any help? thx
Re: Collating results from multiple indexes
Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Re: indexing: issue with default values
thanx Eric, that was very helpfull 2010/2/12 Erik Hatcher erik.hatc...@gmail.com That would be the problem then, I believe. Simply don't post a value to get the default value to work. Erik On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote: yes, sometimes the document has postal_code with no values , i still post it to solr 2010/2/12 Erik Hatcher erik.hatc...@gmail.com When a document has no value, are you still sending a postal_code field in your post to Solr? Seems like you are. Erik On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: in the schema.xml I have fileds with int type and default value exp: field name=postal_code type=int indexed=true stored=true default=0/ but when a document has no value for the field postal_code at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preFor input string: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) /pre /body /html ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime4/int/lst /response any help? thx
Re: persistent cache
One solution is to add the persistent cache with memcache at the application layer. -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com On 2/12/10 5:19 AM, Tim Terlegård wrote: 2010/2/12 Shalin Shekhar Mangarshalinman...@gmail.com: 2010/2/12 Tim Terlegårdtim.terleg...@gmail.com Does Solr use some sort of a persistent cache? Solr does not have a persistent cache. That is the operating system's file cache at work. Aha, that's very interesting and seems to make sense. So is the primary goal of warmup queries to allow the operating system to cache all the files in the data/index directory? Because I think the difference (768ms vs 52ms) is pretty big. I just do one warmup query and get 52 ms response on a 40 million documents index. I think that's pretty nice performance without tinkering with the caches at all. The only tinkering that seems to be needed is this operating system file caching. What's the best way to make sure that my warmup queries have cached all the files? And does a file cache have the complete file in memory? I guess it can get tough to get my 100GB index into the 16GB memory. /Tim -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com
Re: Text search within facets?
For example, if I have the following field: field name=dog type=string indexed=true stored=true/ and it has docs that contain something like str name=dogenglish bulldog/str str name=dogfrench bulldog/str str name=dogbichon frise/str If I search for english bulldog and facet on dog, I will get the following: int name=english bulldog135/int int name=french bulldog23/int int name=bichon frise12/int Thats strange. The query english bulldog should return only str name=dogenglish bulldog/str since type of dog is string which is not tokenized. What is your default search field defined in schema.xml? Can you try q=dog:english bulldogfacet=truefacet.field=dogfacet.mincount=1
expire/delete documents
HiIs there a way for solr or lucene to expire documents based on a field in a document. Let's say that I have a createTime field whose type is date, can i set a policy in schema.xml for solr to delete the documents older than X days?Thank you
Re: Local Solr Inconsistent results for radius
Hello Mauricio, Do you know why such a problem occurs. Has it to do with certain latitudes, longitudes. If so why is it happening. Is it a bug in local solr? On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Hi Emad, I had the same issue ( http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it seems that this happens only on eastern areas of the world. Try inverting the sign of all your longitudes, or translate all your longitudes to the west. Cheers, Mauricio On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq emad.mush...@sigmatec.com.pkwrote: Hello, I have a question related to local solr. For certain locations (latitude, longitude), the spatial search does not work. Here is the query I try to make which gives me no results: q=*qt=geosort=geo_distance asclat=33.718151long=73. 060547radius=450 However if I make the same query with radius=449, it gives me results. Here is part of my solrconfig.xml containing startTier and endTier: updateRequestProcessorChain processor class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory str name=latFieldlatitude/str !-- The field used to store your latitude -- str name=lngFieldlongitude/str !-- The field used to store your longitude -- int name=startTier9/int int name=endTier17/int /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain What do I need to do to fix this problem? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: expire/delete documents
You could easily have a scheduled job that ran delete by query to remove posts older than a certain date... On Fri, Feb 12, 2010 at 13:00, Matthieu Labour matthieu_lab...@yahoo.com wrote: HiIs there a way for solr or lucene to expire documents based on a field in a document. Let's say that I have a createTime field whose type is date, can i set a policy in schema.xml for solr to delete the documents older than X days?Thank you
Re: Deleting spelll checker index
HI Guys Opening this thread again. I need to get around this issue. i have a spellcheck field defined and i am copying two fileds make and model to this field copyField source=make dest=spellText/ copyField source=model dest=spellText/ i have buildoncommit and buildonoptimize set to true hence when i index data and try to search for a work accod i get back suggestion accord since model is also being copied. I stop the sorl server removed the copy filed for model. now i only copy make to the spellText field and started solr server. i refreshed the dictiaonry by issuring the following command. spellcheck.build=truespellcheck.dictionary=default So i hope it should rebuild by dictionary, bu the strange thing is that it still gives a suggestion for accrd. I have to reindex data again and then it wont offer me suggestion which is the correct behavour. How can i create the dictionary again by changing my schema and issuing a command spellcheck.build=truespellcheck.dictionary=default i cant afford to reindex data everytime. Any answer ASAP will be appreciated Thanks darniz darniz wrote: Then i assume the easiest way is to delete the directory itself. darniz hossman wrote: : We are using Index based spell checker. : i was wondering with the help of any url parameters can we delete the spell : check index directory. I don't think so. You might be able to configure two differnet spell check components that point at the same directory -- one hat builds off of a real field, and one that builds off of an (empty) text field (using FileBasedSpellChecker) .. then you could trigger a rebuild of an empty spell checking index using the second component. But i've never tried it so i have no idea if it would work. -Hoss -- View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27567465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Local Solr Inconsistent results for radius
Yes, it seems to be a bug, at least with the code you and I are using. If you don't need to search across the whole globe, try translating your longitudes as I suggested. On Fri, Feb 12, 2010 at 3:04 PM, Emad Mushtaq emad.mush...@sigmatec.com.pkwrote: Hello Mauricio, Do you know why such a problem occurs. Has it to do with certain latitudes, longitudes. If so why is it happening. Is it a bug in local solr? On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Hi Emad, I had the same issue ( http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it seems that this happens only on eastern areas of the world. Try inverting the sign of all your longitudes, or translate all your longitudes to the west. Cheers, Mauricio On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq emad.mush...@sigmatec.com.pkwrote: Hello, I have a question related to local solr. For certain locations (latitude, longitude), the spatial search does not work. Here is the query I try to make which gives me no results: q=*qt=geosort=geo_distance asclat=33.718151long=73. 060547radius=450 However if I make the same query with radius=449, it gives me results. Here is part of my solrconfig.xml containing startTier and endTier: updateRequestProcessorChain processor class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory str name=latFieldlatitude/str !-- The field used to store your latitude -- str name=lngFieldlongitude/str !-- The field used to store your longitude -- int name=startTier9/int int name=endTier17/int /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain What do I need to do to fix this problem? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: persistent cache
Hi Tim, We generally run about 1600 cache-warming queries to warm up the OS disk cache and the Solr caches when we mount a new index. Do you have/expect phrase queries? If you don't, then you don't need to get any position information into your OS disk cache. Our position information takes about 85% of the total index size (*prx files). So with a 100GB index, your *frq files might only be 15-20GB and you could probably get more than half of that in 16GB of memory. If you have limited memory and a large index, then you need to choose cache warming queries carefully as once the cache is full, further queries will start evicting older data from the cache. The tradeoff is to populate the cache with data that would require the most disk access if the data was not in the cache versus populating the cache based on your best guess of what queries your users will execute. A good overview of the issues is the paper by Baeza-Yates ( http://doi.acm.org/10.1145/1277741.125 The Impact of Caching on Search Engines ) Tom Burton-West Digital Library Production Service University of Michigan Library -- View this message in context: http://old.nabble.com/persistent-cache-tp27562126p27567840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Hi, at openthesaurus.org or .com you can find a mysql version of synonyms you just have to join it to fit the synonym schema of solr yourself. Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: Hi, I was wondering if anyone has prepared a synonyms.txt for general purpose search engines, that can be shared. If not could you refer me to places where such a synonym list or thesaurus can be found. Synonyms for search engines are different from the regular thesaurus. Any help would be highly appreciated. Thanks. -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Wow thanks!! You all are awesome! :D :D On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille jul...@netimpact.de wrote: Hi, at openthesaurus.org or .com you can find a mysql version of synonyms you just have to join it to fit the synonym schema of solr yourself. Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: Hi, I was wondering if anyone has prepared a synonyms.txt for general purpose search engines, that can be shared. If not could you refer me to places where such a synonym list or thesaurus can be found. Synonyms for search engines are different from the regular thesaurus. Any help would be highly appreciated. Thanks. -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Hi, Your welcome. Thats something google came up with some weeks ago :) Am 12.02.2010 um 20:42 schrieb Emad Mushtaq: Wow thanks!! You all are awesome! :D :D On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille jul...@netimpact.de wrote: Hi, at openthesaurus.org or .com you can find a mysql version of synonyms you just have to join it to fit the synonym schema of solr yourself. Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: Hi, I was wondering if anyone has prepared a synonyms.txt for general purpose search engines, that can be shared. If not could you refer me to places where such a synonym list or thesaurus can be found. Synonyms for search engines are different from the regular thesaurus. Any help would be highly appreciated. Thanks. -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille --- NetImpact KG Altonaer Straße 8 20357 Hamburg Tel: 040 / 6738363 2 Mail: jul...@netimpact.de Geschäftsführer: Tarek Müller
Re: implementing profanity detector
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll gsing...@apache.org wrote: Otherwise, I'd do it via copy fields. Your first field is your main field and is analyzed as before. Your second field does the profanity detection and simply outputs a single token at the end, safe/unsafe. How long are your documents? The extra copy field is extra work, but in this case it should be fast as you should be able to create a pretty streamlined analyzer chain for the second task. The documents are web page text, so they shouldn't be more than 10-20k generally. Would something like this do the trick? @Override public boolean incrementToken() throws IOException { while (input.incrementToken()) { if (profanities.contains(termAtt.termBuffer(), 0, termAtt.termLength())) { termAtt.setTermBuffer(y, 0, 1); return false; } } termAtt.setTermBuffer(n, 0, 1); return false; } mike
Re: For caches, any reason to not set initialSize and size to the same value?
On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill jayallenh...@gmail.com wrote: If I've done a lot of research and have a very good idea of where my cache sizes are having monitored the stats right before commits, is there any reason why I wouldn't just set the initialSize and size counts to the same values? Is there any reason to set a smaller initialSize if I know reliably that where my limit will almost always be? Probably not much... The only savings will be the 8 bytes (on a 64 bit proc) per unused array slot (in the HashMap). Maybe we should consider removing the initialSize param from the example config to reduce the amount of stuff a user needs to think about. -Yonik http://www.lucidimagination.com
reloading sharedlib folder
when using solr.xml, you can specify a sharedlib directory to share among cores, is it possible to reload the classes in this dir without having to restart the servlet container? it would be useful to be able to make changes to those classes on the fly or be able to drop in new plugins
RE: For caches, any reason to not set initialSize and size to the same value?
Funny, Arrays.copy() for HashMap... but something similar... Anyway, I use same values for initial size and max size, to be safe... and to have OOP at startup :) -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: February-12-10 6:55 PM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: For caches, any reason to not set initialSize and size to the same value? I always use initial size = max size, just to avoid Arrays.copyOf()... Initial (default) capacity for HashMap is 16, when it is not enough - array copy to new 32-element array, then to 64, ... - too much wasted space! (same for ConcurrentHashMap) Excuse me if I didn't understand the question... -Fuad http://www.tokenizer.ca -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: February-12-10 6:30 PM To: solr-user@lucene.apache.org Subject: Re: For caches, any reason to not set initialSize and size to the same value? On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill jayallenh...@gmail.com wrote: If I've done a lot of research and have a very good idea of where my cache sizes are having monitored the stats right before commits, is there any reason why I wouldn't just set the initialSize and size counts to the same values? Is there any reason to set a smaller initialSize if I know reliably that where my limit will almost always be? Probably not much... The only savings will be the 8 bytes (on a 64 bit proc) per unused array slot (in the HashMap). Maybe we should consider removing the initialSize param from the example config to reduce the amount of stuff a user needs to think about. -Yonik http://www.lucidimagination.com
Re: Deleting spelll checker index
Any update on this Do you guys want to rephrase my question, if its not clear. Thanks darniz darniz wrote: HI Guys Opening this thread again. I need to get around this issue. i have a spellcheck field defined and i am copying two fileds make and model to this field copyField source=make dest=spellText/ copyField source=model dest=spellText/ i have buildoncommit and buildonoptimize set to true hence when i index data and try to search for a work accod i get back suggestion accord since model is also being copied. I stop the sorl server removed the copy filed for model. now i only copy make to the spellText field and started solr server. i refreshed the dictiaonry by issuring the following command. spellcheck.build=truespellcheck.dictionary=default So i hope it should rebuild by dictionary, bu the strange thing is that it still gives a suggestion for accrd. I have to reindex data again and then it wont offer me suggestion which is the correct behavour. How can i create the dictionary again by changing my schema and issuing a command spellcheck.build=truespellcheck.dictionary=default i cant afford to reindex data everytime. Any answer ASAP will be appreciated Thanks darniz darniz wrote: Then i assume the easiest way is to delete the directory itself. darniz hossman wrote: : We are using Index based spell checker. : i was wondering with the help of any url parameters can we delete the spell : check index directory. I don't think so. You might be able to configure two differnet spell check components that point at the same directory -- one hat builds off of a real field, and one that builds off of an (empty) text field (using FileBasedSpellChecker) .. then you could trigger a rebuild of an empty spell checking index using the second component. But i've never tried it so i have no idea if it would work. -Hoss -- View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27570613.html Sent from the Solr - User mailing list archive at Nabble.com.
migrating from solr 1.3 to 1.4
Hi there, I'm trying to migrate from solr 1.3 to solr 1.4 and I've few issues. Initially my localsolr was throwing NullPointer exception and I fixed it by changing type of lat and lng to 'tdouble'. But now I'm not able to update index. When I try to update index it throws out error saying - Feb 12, 2010 2:14:11 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Feb 12, 2010 2:14:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: log at com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138) I tried searching on net, but none of post regarding this issue is answered. Have anyone come across this issue? Thanks, Sachin.
cannot match on phrase queries
I am seeing this in several of my fields. I have something like Samsung X150 or Nokia BH-212. And my query will not match on X150 or BH-212. So, my query is something like +model:(Samsung X150). Through debugQuery, I see that this gets converted to +(model:samsung model:x 150). It matches on Samsung, but not X150. A simple query like model:BH-212 simply fails. model:BH212 also fails. The only query that seems to work is model:(BH 212). Here is the schema for that field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType field name=model type=text indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / Any ideas? According to the analyzer, I would expect the phrase BH-212 to match on bh and 212. Or am I missing something? Also, is there anyway to tell the parser to not convert X150 into a phrase query. I have some cases when it would be more useful to turn it into +(X 150).
Re: Solr 1.4: Full import FileNotFoundException
: I have noticed that when I run concurrent full-imports using DIH in Solr : 1.4, the index ends up getting corrupted. I see the following in the log I'm fairly confident that concurrent imports won't work -- but it shouldn't corrupt your index -- even if the DIH didn't actively check for this type of situation, the underlying Lucene LockFactory should ensure that one of the inports wins ... you'll need to tell us what kind of Filesystem you are using, and show us the relevent settings from your solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, etc...) At worst you should get a lock time out exception. : But I looked at: : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : and was under the impression that this issue was fixed in Solr 1.4. ...right, attempting to run two concurrent imports with DIH should cause the second one to abort immediatley. -Hoss
Re: Cannot get like exact searching to work
: Can your query consist of more than one words? : : Yes, and I expect it almost always will (the query string is coming : from a search box on a website). ... : Actually it won't. The data I am indexing has extra spaces in front : and is capitalized. I really need to be able to filter it through the : lowercase and trim filter without tokenizing it. ... : The idea is that a phrase match would be boosted over the : normal : token matches and would show up first in the listing. Let This is starting to smell like an XY Problem... http://people.apache.org/~hossman/#xyproblem ...you mentioned wanting prefix type queries to work, but that seems to be based on your initial approach of using an exact (ie: untokenized) field for your matches -- all of your examples seem to want matching at a word level, not partial words. If your ultimate goal is just that exact' matches score higher then documents containing all fo the same words in a differnet order (which should score higher then docs only containing a few of the words) then i think you are just making things harder for yourself then you really need ... defType=dismax should be able to solve all of your problems -- just specify the field(s) you want to search in the qf and pf params and documents with all the words in a phrase will appear first. -Hoss
Interesting stuff; Solr as a syslog store.
Hey everyone, I don't actually have a question, but I just thought I'd share something really cool that I did with Solr for our company. We run a good amount of servers, well into the several hundreds, and naturally we need a way to centralize all of the system logs. For a while we used a commercial solution to centralize and search our logs, but they wanted to charge us tens of thousands of dollars for just one gigabyte/day more of indexed data. So I said forget it, I'll write my own solution! We already use Solr for some of our other backend searching systems, so I came up with an idea to index all of our logs to Solr. I wrote a daemon in perl that listens on the syslog port, and pointed every single system's syslog to forward to this single server. From there, this daemon will write to a Solr indexing server after parsing them into fields, such as date/time, host, program, pid, text, etc. I then wrote a cool javascript/ajax web front end for Solr searching, and bam. Real time searching of all of our syslogs from a web interface, for no cost! Just thought this would be a neat story to share with you all. I've really grown to love Solr, it's something else! Thanks, -Antonio
Re: sorting
:str name=bftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str :str name=qftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str FWIW: I don't think you understand what the bf param is for ... it's not analogous to qf and pf, it's for expressing a list of boost functions -- a function can be a simple field name, but that typically only makes sense if it's numeric. that *may* be causing your problem, if the function parser is attempting to generate the FieldCache for your content fields. : now, solr is complaining about some sorting issues on content* as they solr is complaining is relaly vauge... please explain *exactly* what the error message is, where you see it, what the full stack trace looks like if there is one, and what you did to trigger te error (ie: did it happen on startup? did it happen when you executed a query? what was the full URL of hte query? -Hoss
Re: sorting
: that *may* be causing your problem, if the function parser is attempting : to generate the FieldCache for your content fields. Yep ... that's it ... if you use a barefield name as a function, and that field name is not numeric, the result is an OrdFieldSource shiceh uses the FieldCache. I opened a bug to improve the error message... https://issues.apache.org/jira/browse/SOLR-1771 -Hoss
RE: expire/delete documents
or since you specificly asked about delteing anything older then X days (in this example i'm assuming x=7)... deletequerycreateTime:[NOW-7DAYS TO *]/query/delete createTime:[* TO NOW-7DAYS]
Re: How to reindex data without restarting server
: if you use the core model via solr.xml you can reload a core without having to : to restart the servlet container, : http://wiki.apache.org/solr/CoreAdmin For making a schema change, the steps would be: - create a new_core with the new schema - reindex all the docs into new_core - SWAP old_core and new_core so all the old URLs now point at the new core with the new schema. -Hoss
Re: Deleting spelll checker index
: Any update on this Patience my friend ... 5 hours after you send an email isn't long enough to wait before asking for any update on this -- it's just increasing the volume of mail everyone gets and distracting people from actual bugs/issues. FWIW: this doesn't really seem directly related to the thread you initially started about Deleting the spell checker index -- what you're asking about now is rebuilding the spellchecker index... : I stop the sorl server removed the copy filed for model. now i only copy : make to the spellText field and started solr server. : i refreshed the dictiaonry by issuring the following command. : spellcheck.build=truespellcheck.dictionary=default : So i hope it should rebuild by dictionary, bu the strange thing is that it : still gives a suggestion for accrd. that's because removing the copyField declaration doens't change anything about the values that have already been copied to the spellText field -- rebuilding your spellcheker index is just re-reading the same indexed values from that field. : How can i create the dictionary again by changing my schema and issuing a : command : spellcheck.build=truespellcheck.dictionary=default it's just not possible. a schema change like that doesn't magicly undo all of the values that were already copied. -Hoss
Re: cannot match on phrase queries
It appears that omitTermFreqAndPositions is indeed the culprit. I assume it has to do with the fact that the index parsing of BH-212 puts multiple terms in the same position. From: Kevin Osborn osbo...@yahoo.com To: Solr solr-user@lucene.apache.org Sent: Fri, February 12, 2010 5:28:08 PM Subject: cannot match on phrase queries I am seeing this in several of my fields. I have something like Samsung X150 or Nokia BH-212. And my query will not match on X150 or BH-212. So, my query is something like +model:(Samsung X150). Through debugQuery, I see that this gets converted to +(model:samsung model:x 150). It matches on Samsung, but not X150. A simple query like model:BH-212 simply fails. model:BH212 also fails. The only query that seems to work is model:(BH 212). Here is the schema for that field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType field name=model type=text indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / Any ideas? According to the analyzer, I would expect the phrase BH-212 to match on bh and 212. Or am I missing something? Also, is there anyway to tell the parser to not convert X150 into a phrase query. I have some cases when it would be more useful to turn it into +(X 150).
Re: Solr 1.4: Full import FileNotFoundException
concurrent imports are not allowed in DIH, unless u setup multiple DIH instances On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have noticed that when I run concurrent full-imports using DIH in Solr : 1.4, the index ends up getting corrupted. I see the following in the log I'm fairly confident that concurrent imports won't work -- but it shouldn't corrupt your index -- even if the DIH didn't actively check for this type of situation, the underlying Lucene LockFactory should ensure that one of the inports wins ... you'll need to tell us what kind of Filesystem you are using, and show us the relevent settings from your solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, etc...) At worst you should get a lock time out exception. : But I looked at: : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : and was under the impression that this issue was fixed in Solr 1.4. ...right, attempting to run two concurrent imports with DIH should cause the second one to abort immediatley. -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
: concurrent imports are not allowed in DIH, unless u setup multiple DIH instances Right, but that's not the issue -- the question is wether attemping to do so might be causing index corruption (either because of a bug or because of some possibly really odd config we currently know nothing about) : : I have noticed that when I run concurrent full-imports using DIH in Solr : : 1.4, the index ends up getting corrupted. I see the following in the log : : I'm fairly confident that concurrent imports won't work -- but it : shouldn't corrupt your index -- even if the DIH didn't actively check for : this type of situation, the underlying Lucene LockFactory should ensure : that one of the inports wins ... you'll need to tell us what kind of : Filesystem you are using, and show us the relevent settings from your : solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, : etc...) : : At worst you should get a lock time out exception. : : : But I looked at: : : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : : : and was under the impression that this issue was fixed in Solr 1.4. : : ...right, attempting to run two concurrent imports with DIH should cause : the second one to abort immediatley. : : : : : -Hoss : : : : : : -- : - : Noble Paul | Systems Architect| AOL | http://aol.com : -Hoss
parsing strings into phrase queries
Right now if I have the query model:(Nokia BH-212V), the parser turns this into +(model:nokia model:bh 212 v). The problem is that I might have a model called Nokia BH-212, so this is completely missed. In my case, I would like my query to be +(model:nokia model:bh model:212 model:v). This is my schema for the field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType
Re: Interesting stuff; Solr as a syslog store.
Am 13.02.2010 um 03:02 schrieb Antonio Lobato: Just thought this would be a neat story to share with you all. I've really grown to love Solr, it's something else! Hi Antonio, Great. Would you also share the source code somewhere! May the Source be with you. Thanks. Olivier