Re: dynamic changes to schema
huh? I think I lost you :) You want to use a multivalued field to list what dynamic fields you have in your document? Also if you program your application correctly you should be able to restrict your users from doing anything you please (or don't please in this case). On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann m...@intersales.de wrote: hi, thanks for the advise but the problem with dynamic fields is, that i cannot restrict how the user calls the field in the application. So there isn't a pattern I can use. But I thought about using mulitvalued fields for the dynamically added fields. Good Idea? thanks, Marco Constantijn Visinescu schrieb: use a dynamic field ? On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann m...@intersales.de wrote: Hi there, is there a possibility to change the solr-schema over php dynamically. The web-application I want to index at the moment has the feature to add fields to entitys and you can tell this fields that they are searchable. To realize this with solr the schema has to change when a searchable field is added or removed. Any suggestions, Thanks a lot, Marco Westermann -- ++ Business-Software aus einer Hand ++ ++ Internet, Warenwirtschaft, Linux, Virtualisierung ++ http://www.intersales.de http://www.eisxen.org http://www.tarantella-partner.de http://www.medisales.de http://www.eisfair.net interSales AG Internet Commerce Subbelrather Str. 247 50825 Köln Tel 02 21 - 27 90 50 Fax 02 21 - 27 90 517 Mail i...@intersales.de Mail m...@intersales.de Web www.intersales.de Handelsregister Köln HR B 30904 Ust.-Id.: DE199672015 Finanzamt Köln-Nord. UstID: nicht vergeben Aufsichtsratsvorsitzender: Michael Morgenstern Vorstand: Andrej Radonic, Peter Zander __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4346 (20090818) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com -- ++ Business-Software aus einer Hand ++ ++ Internet, Warenwirtschaft, Linux, Virtualisierung ++ http://www.intersales.de http://www.eisxen.org http://www.tarantella-partner.de http://www.medisales.de http://www.eisfair.net interSales AG Internet Commerce Subbelrather Str. 247 50825 Köln Tel 02 21 - 27 90 50 Fax 02 21 - 27 90 517 Mail i...@intersales.de Mail m...@intersales.de Web www.intersales.de Handelsregister Köln HR B 30904 Ust.-Id.: DE199672015 Finanzamt Köln-Nord. UstID: nicht vergeben Aufsichtsratsvorsitzender: Michael Morgenstern Vorstand: Andrej Radonic, Peter Zander
RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS
DO NOT RELY on your hosting provider. They use automated tools creating complete mess with approved for production on CentOS versions of Lucene, Servlet-API, java.util.* package, and etc; look at this: Here is my classpath entry when Tomcat starts up java.library.path: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/i386/client:/usr/lib/j vm/jav a-1.6.0-openjdk-1.6.0.0/jre/lib/i386:/usr/lib/jvm/java-1.6.0-openjdk-1 .6.0.0 /jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this JDK runs on? Do you use client when you really need server environment? Is it HotSpot? Is yoyr platform really i386? I mentioned in previous post such installs for Java are totally mess, you may have incompatible Servlet API loaded by bootstrap classloader before Tomcat classes etc. Install everything from scratch. =??? INFO: Adding 'file:/usr/share/tomcat5/solr/lib/jetty-util-6.1.3.jar' to Solr classloader =??? -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: August-19-09 1:43 AM To: solr-user@lucene.apache.org Subject: RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS -Dsolr.solr.home='/some/path' CORRECT: -Dsolr.data.dir=.. It should be in java startup parameters; for instance, JAVA_OPTS=-server -Zms32768M -Xmx32768M -Dsolr.data.dir=/some/path inside catalina.sh as a first statement... According to the logs you posted probably mistake in solr.xml which is multicore definition; you should post here it's content. Java 1.4/5/6 supports nested exceptions. The root cause of your problem: java.lang.NoClassDefFoundError: org.apache.solr.core.Config This exception causes another one: javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory implementation found for the object model: http://java.sun.com/jaxp/xpath/dom at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at org.apache.solr.core.Config.clinit(Config.java:41) etc. etc. etc. NoClassDefFoundError means: classloader didn't have any problem with finding class def, but it couldn't define it due for instance dependency on another library and/or dependency on configuration file such as solr.xml etc. XPath should be called on DOM (after Config is properly initialized) Difficult to explain what is wrong with your mess of files in config (you are obviously using double-core) - you should do following: 1. Install Tomcat 2. Copy SOLR war file to webapps folder 3. Start Tomcat and verify logs; ensure that you have some clear messages in it (SOLR should use default home? Verify!) 4. Configure SOLR-home with sample solrconfig.xml and schema.xml, restart, verify ... ... ... Don't go to multicore until you play enough with simplest SOLR installation -Original Message- From: Aaron Aberg [mailto:aaronab...@gmail.com] Sent: August-19-09 12:28 AM To: solr-user@lucene.apache.org Subject: Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS Tomcat is running fine. It's solr that is having the issue. I keep seeing people talk about this: -Dsolr.solr.home='/some/path' Should I be putting that somewhere? Or is that already taken care of when I edited the web.xml file in my solr.war file? On Tue, Aug 18, 2009 at 7:29 PM, Fuad Efendif...@efendi.ca wrote: I forgot to add: compiler is inside tools.jar in some cases if I am correct... doesn't matter really... try to access Tomcat default homepage before trying to use SOLR! The only difference between JRE and JDK (from TOMCAT viewpoint) is absence of javac compiler for JSPs. But it will complain only if you try to use JSPs (via admin console). Have you tried to install SOLR on your localbox and play with samples described at many WIKI pages? -Original Message- From: Aaron Aberg [mailto:aaronab...@gmail.com] Sent: August-18-09 9:04 PM To: solr-user@lucene.apache.org Subject: Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS Marco might be right about the JRE thing. Here is my classpath entry when Tomcat starts up java.library.path: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/i386/client:/usr/lib/jvm/jav a-1.6.0-openjdk-1.6.0.0/jre/lib/i386:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0 /jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib Constantijn, Here is my solr home file list with permissions: -bash-3.2$ ll /usr/share/solr/* -rw-r--r-- 1 tomcat tomcat 2150 Aug 17 22:51 /usr/share/solr/README.txt /usr/share/solr/bin: total 160 -rwxr-xr-x 1 tomcat tomcat 4896 Aug 17 22:51 abc -rwxr-xr-x 1 tomcat tomcat 4919 Aug 17 22:51 abo -rwxr-xr-x 1 tomcat tomcat 2915 Aug 17 22:51 backup -rwxr-xr-x 1 tomcat tomcat 3435 Aug 17 22:51 backupcleaner -rwxr-xr-x 1 tomcat tomcat 3312 Aug 17 22:51 commit -rwxr-xr-x 1 tomcat tomcat 3306 Aug 17 22:51 optimize -rwxr-xr-x 1 tomcat tomcat 3163 Aug 17 22:51 readercycle -rwxr-xr-x 1 tomcat tomcat 1752
Re: DataImportHandler ignoring most rows
this comment says that str name=Total Rows Fetched7/str the query fetched only 7 rows. If possible open a tool and just run the same query and see how many rows are returned On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com wrote: Using: - apache-solr-1.3.0 - java 1.6 - tomcat 6 - sql server 2005 w/ JSQLConnect 4.0 driver I have a group table with 3007 rows. I have confirmed the key is unique with select distinct id from group and it returns 3007. When i re-index using http://host:port/solr/dataimport?command=full-import I only get 7 records indexed. Any insight into what is going on would be really great. A partial response: lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched7/str str name=Total Documents Skipped0/str I have other entities that index all the rows without issue. There are no errors in the logs. I am not using any Transformers (and most of my config is not changed from install) My schema.xml contains: uniqueKeykey/uniqueKey and field defs (not a full list of fields): field name=key type=string indexed=true stored=true required=true / field name=class type=string indexed=true stored=true required=true / field name=id type=string indexed=true stored=true / field name=description type=text indexed=true stored=true / field name=created type=date indexed=true stored=true / field name=updated type=date indexed=true stored=true / data-config.xml dataConfig !-- jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2/logfile=DB_TRACE.log -- dataSource type=JdbcDataSource driver=com.jnetdirect.jsql.JSQLDriver url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2 user=SocialSite2 password=SocialSite2 / document entity name=Group pk=key query=select 'group.'+id as 'key', 'group' as 'class', name, handle, description, created, updated from group order by created asc /entity entity name=Message pk=key query=...redacted... /entity /document /dataConfig -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Replication over multi-core solr
On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote: Hi, We use multi-core setup for Solr, where new cores are added dynamically to solr.xml. Only one core is active at a time. My question is how can the replication be done for multi-core - so every core is replicated on the slave? replication does not handle new core creation. You will have to issue the core creation command to each slave separately. I went over the wiki, http://wiki.apache.org/solr/SolrReplication, and few questions related to that, 1) How do we replicate solr.xml where we have list of cores? Wiki says, Only files in the 'conf' dir of solr instance is replicated. - since, solr.xml is in the home directory how do we replicate that? solr.xml canot be replicated. even if you did it is not reloaded. 2) Solrconfig.xml in slave takes a static core url, str name=masterUrlhttp://localhost:port/solr/corename/replication/str put a placeholder like str name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str so the corename is automatically replaced As in our case cores are created dynamically (new core created after the active one reaches some capacity), how can we define master core dynamically for replication? The only I see it is using fetchIndex command and passing new core info there - is it right? If so, does the slave application have write code to poll Master periodically and fire fetchIndex command, but how would Slave know the Master corename - as they are created dynamically on the Master? Thanks, -vivek -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Is negative boost possible?
:the only way to negative boost is to positively boost the inverse... : : (*:* -field1:value_to_penalize)^10 This will do the job aswell as bq supports pure negative queries (at least in trunk): bq=-field1:value_to_penalize^10 http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e hossman wrote: : Use decimal figure less than 1, e.g. 0.5, to express less importance. but that's stil la positive boost ... it still increases the scores of documents that match. the only way to negative boost is to positively boost the inverse... (*:* -field1:value_to_penalize)^10 : I am looking for a way to assign negative boost to a term in Solr query. : Our use scenario is that we want to boost matching documents that are : updated recently and penalize those that have not been updated for a long : time. There are other terms in the query that would affect the scores as : well. For example we construct a query similar to this: : : *:* field1:value1^2 field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5 : lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3 : : I notice it's not possible to simply use a negative boosting factor in the : query. Is there any way to achieve such result? : : Regards, : Shi Quan He : : -Hoss -- View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication over multi-core solr
Hi Vivek, currently we want to add cores dynamically when the active one reaches some capacity, can you give me some hints to achieve such this functionality? (Just wondering if you have used shell-scripting or you have code some 100% Java based solution) Thx 2009/8/19 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote: Hi, We use multi-core setup for Solr, where new cores are added dynamically to solr.xml. Only one core is active at a time. My question is how can the replication be done for multi-core - so every core is replicated on the slave? replication does not handle new core creation. You will have to issue the core creation command to each slave separately. I went over the wiki, http://wiki.apache.org/solr/SolrReplication, and few questions related to that, 1) How do we replicate solr.xml where we have list of cores? Wiki says, Only files in the 'conf' dir of solr instance is replicated. - since, solr.xml is in the home directory how do we replicate that? solr.xml canot be replicated. even if you did it is not reloaded. 2) Solrconfig.xml in slave takes a static core url, str name=masterUrlhttp://localhost:port/solr/corename/replication/str put a placeholder like str name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str so the corename is automatically replaced As in our case cores are created dynamically (new core created after the active one reaches some capacity), how can we define master core dynamically for replication? The only I see it is using fetchIndex command and passing new core info there - is it right? If so, does the slave application have write code to poll Master periodically and fire fetchIndex command, but how would Slave know the Master corename - as they are created dynamically on the Master? Thanks, -vivek -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lici
Problems importing HTML content contained within XML document
Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML contained within one of the XML tags BODY. The data import just seems to ignore the textContent silently but it imports everything else. When I do a query through the SOLR admin interface, only the id and author fields are displayed. Any ideas what I am doing wrong? Thanks This is what my dataConfig looks like: dataConfig dataSource type=HttpDataSource / document entity name=archive pk=id url=http://localhost:9080/data/20090817070752.xml; processor=XPathEntityProcessor forEach=/document/category transformer=DateFormatTransformer stream=true dataSource=dataSource field column=id xpath=/document/category/reference / field column=textContent xpath=/document/category/BODY / field column=author xpath=/document/category/author / /entity /document /dataConfig This is how I have specified my schema fields field name=id type=string indexed=true stored=true required=true / field name=author type=string indexed=true stored=true/ field name=textContent type=text indexed=true stored=true / /fields uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField And this is what my XML document looks like: document category reference123456/reference authorAuthori name/author BODY PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P /BODY /category /document _ Looking for a place to rent, share or buy this winter? Find your next place with Ninemsn property http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT
Re: CorruptIndexException: Unknown format version
It looks like your solr lucene-core version doesn't match with the lucene version used to generate the index, as Yonik said, looks like there is a lucene library conflict. 2009/8/19 Chris Hostetter hossman_luc...@fucit.org: : how can that happen, it is a new index, and it is already corrupt? : : Did anybody else something like this? Unknown format version doesn't mean your index is corrupt .. it means the version of LUcnee parsing the index doesn't recognize the index format version ... typically it means you are trying to open an index generated by a newer version of lucene then the one you are using. -Hoss -- Lici
Re: Replication over multi-core solr
Licinio, Please open a separate thread - as it's a different issue - and I can respond there. -vivek 2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com: Hi Vivek, currently we want to add cores dynamically when the active one reaches some capacity, can you give me some hints to achieve such this functionality? (Just wondering if you have used shell-scripting or you have code some 100% Java based solution) Thx 2009/8/19 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote: Hi, We use multi-core setup for Solr, where new cores are added dynamically to solr.xml. Only one core is active at a time. My question is how can the replication be done for multi-core - so every core is replicated on the slave? replication does not handle new core creation. You will have to issue the core creation command to each slave separately. I went over the wiki, http://wiki.apache.org/solr/SolrReplication, and few questions related to that, 1) How do we replicate solr.xml where we have list of cores? Wiki says, Only files in the 'conf' dir of solr instance is replicated. - since, solr.xml is in the home directory how do we replicate that? solr.xml canot be replicated. even if you did it is not reloaded. 2) Solrconfig.xml in slave takes a static core url, str name=masterUrlhttp://localhost:port/solr/corename/replication/str put a placeholder like str name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str so the corename is automatically replaced As in our case cores are created dynamically (new core created after the active one reaches some capacity), how can we define master core dynamically for replication? The only I see it is using fetchIndex command and passing new core info there - is it right? If so, does the slave application have write code to poll Master periodically and fire fetchIndex command, but how would Slave know the Master corename - as they are created dynamically on the Master? Thanks, -vivek -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lici
Adding cores dynamically
Hi there, currently we want to add cores dynamically when the active one reaches some capacity, can anyone give me some hints to achieve such this functionality? (Just wondering if you have used shell-scripting or you have code some 100% Java based solution) Thx -- Lici
Re: Replication over multi-core solr
Ok 2009/8/19 vivek sar vivex...@gmail.com: Licinio, Please open a separate thread - as it's a different issue - and I can respond there. -vivek 2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com: Hi Vivek, currently we want to add cores dynamically when the active one reaches some capacity, can you give me some hints to achieve such this functionality? (Just wondering if you have used shell-scripting or you have code some 100% Java based solution) Thx 2009/8/19 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote: Hi, We use multi-core setup for Solr, where new cores are added dynamically to solr.xml. Only one core is active at a time. My question is how can the replication be done for multi-core - so every core is replicated on the slave? replication does not handle new core creation. You will have to issue the core creation command to each slave separately. I went over the wiki, http://wiki.apache.org/solr/SolrReplication, and few questions related to that, 1) How do we replicate solr.xml where we have list of cores? Wiki says, Only files in the 'conf' dir of solr instance is replicated. - since, solr.xml is in the home directory how do we replicate that? solr.xml canot be replicated. even if you did it is not reloaded. 2) Solrconfig.xml in slave takes a static core url, str name=masterUrlhttp://localhost:port/solr/corename/replication/str put a placeholder like str name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str so the corename is automatically replaced As in our case cores are created dynamically (new core created after the active one reaches some capacity), how can we define master core dynamically for replication? The only I see it is using fetchIndex command and passing new core info there - is it right? If so, does the slave application have write code to poll Master periodically and fire fetchIndex command, but how would Slave know the Master corename - as they are created dynamically on the Master? Thanks, -vivek -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lici -- Lici
Re: Spanish Stemmer
Hi, take a look at this: !-- Tipo de campo para Textos (con stemming en español) -- fieldtype name=textTypeWithStemming class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Spanish/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Spanish/ /analyzer /fieldtype Un saludo 2009/8/19 Robert Muir rcm...@gmail.com: hi, it looks like you might just have a simple typo: filter class=solr.SnowballPorterFilterFactory languange=Spanish/ if you change it to language=Spanish it should work. -- Robert Muir rcm...@gmail.com -- Lici
Re: Problems importing HTML content contained within XML document
Hi Venn, I think what is happening when the BODY element is being processed by xpath expressen (/document/category/BODY), is that it does not retrieve the text content from the P elements inside the body element. The expression will only retrieve text content that is directly a child of the BODY element. I do not know the xpath function(s) the data importhandler currently supports to return the text content of a node and all its child nodes. Maybe the expression /document/category/BODY/* will work. Cheers, Martijn 2009/8/19 venn hardy venn.ha...@hotmail.com: Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML contained within one of the XML tags BODY. The data import just seems to ignore the textContent silently but it imports everything else. When I do a query through the SOLR admin interface, only the id and author fields are displayed. Any ideas what I am doing wrong? Thanks This is what my dataConfig looks like: dataConfig dataSource type=HttpDataSource / document entity name=archive pk=id url=http://localhost:9080/data/20090817070752.xml; processor=XPathEntityProcessor forEach=/document/category transformer=DateFormatTransformer stream=true dataSource=dataSource field column=id xpath=/document/category/reference / field column=textContent xpath=/document/category/BODY / field column=author xpath=/document/category/author / /entity /document /dataConfig This is how I have specified my schema fields field name=id type=string indexed=true stored=true required=true / field name=author type=string indexed=true stored=true/ field name=textContent type=text indexed=true stored=true / /fields uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField And this is what my XML document looks like: document category reference123456/reference authorAuthori name/author BODY PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P /BODY /category /document _ Looking for a place to rent, share or buy this winter? Find your next place with Ninemsn property http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT
Re: Problems importing HTML content contained within XML document
try this field column=textContent xpath=/document/category/BODY faltten=true/ this should slurp al the tags under body On Wed, Aug 19, 2009 at 1:44 PM, venn hardyvenn.ha...@hotmail.com wrote: Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML contained within one of the XML tags BODY. The data import just seems to ignore the textContent silently but it imports everything else. When I do a query through the SOLR admin interface, only the id and author fields are displayed. Any ideas what I am doing wrong? Thanks This is what my dataConfig looks like: dataConfig dataSource type=HttpDataSource / document entity name=archive pk=id url=http://localhost:9080/data/20090817070752.xml; processor=XPathEntityProcessor forEach=/document/category transformer=DateFormatTransformer stream=true dataSource=dataSource field column=id xpath=/document/category/reference / field column=textContent xpath=/document/category/BODY / field column=author xpath=/document/category/author / /entity /document /dataConfig This is how I have specified my schema fields field name=id type=string indexed=true stored=true required=true / field name=author type=string indexed=true stored=true/ field name=textContent type=text indexed=true stored=true / /fields uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField And this is what my XML document looks like: document category reference123456/reference authorAuthori name/author BODY PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P /BODY /category /document _ Looking for a place to rent, share or buy this winter? Find your next place with Ninemsn property http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Problems importing HTML content contained within XML document
sorry field column=textContent xpath=/document/category/BODY flatten=true/ 2009/8/19 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: try this field column=textContent xpath=/document/category/BODY faltten=true/ this should slurp al the tags under body On Wed, Aug 19, 2009 at 1:44 PM, venn hardyvenn.ha...@hotmail.com wrote: Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML contained within one of the XML tags BODY. The data import just seems to ignore the textContent silently but it imports everything else. When I do a query through the SOLR admin interface, only the id and author fields are displayed. Any ideas what I am doing wrong? Thanks This is what my dataConfig looks like: dataConfig dataSource type=HttpDataSource / document entity name=archive pk=id url=http://localhost:9080/data/20090817070752.xml; processor=XPathEntityProcessor forEach=/document/category transformer=DateFormatTransformer stream=true dataSource=dataSource field column=id xpath=/document/category/reference / field column=textContent xpath=/document/category/BODY / field column=author xpath=/document/category/author / /entity /document /dataConfig This is how I have specified my schema fields field name=id type=string indexed=true stored=true required=true / field name=author type=string indexed=true stored=true/ field name=textContent type=text indexed=true stored=true / /fields uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField And this is what my XML document looks like: document category reference123456/reference authorAuthori name/author BODY PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P /BODY /category /document _ Looking for a place to rent, share or buy this winter? Find your next place with Ninemsn property http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Relevant results with DisMaxRequestHandler
Wow, it's like the 'mm' parameters is just appeared for the first time... Yes, I read the doc few times, but never understood that the documents who doesn't match any of the expressions will not be return... my apologize everything seems more clear now thanks to the min number parameter. Thank you, Vincent hossman wrote: : The 'qf' parameter used in the dismax seems to work with a 'AND' separator. : I have much more results without dixmax. Is there any way to keep the same : amount of document and process the 'qf' ? did you read any of the docs on dismax? http://wiki.apache.org/solr/DisMaxRequestHandler did you look at the mm param? http://wiki.apache.org/solr/DisMaxRequestHandler#mm -Hoss -- View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p25041314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM Heap utilization Memory leaks with Solr
Fuad, We have around 5 million documents and around 3700 fields. All documents will not have values for all the fields JRockit is not approved for use within my organization. But thanks for the info anyway. Regards Rahul On Tue, Aug 18, 2009 at 9:41 AM, Funtick f...@efendi.ca wrote: BTW, you should really prefer JRockit which really rocks!!! Mission Control has necessary toolongs; and JRockit produces _nice_ exception stacktrace (explaining almost everything) in case of even OOM which SUN JVN still fails to produce. SolrServlet still catches Throwable: } catch (Throwable e) { SolrException.log(log,e); sendErr(500, SolrException.toStr(e), request, response); } finally { Rahul R wrote: Otis, Thank you for your response. I know there are a few variables here but the difference in memory utilization with and without shards somehow leads me to believe that the leak could be within Solr. I tried using a profiling tool - Yourkit. The trial version was free for 15 days. But I couldn't find anything of significance. Regards Rahul On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, A) There are no known (to me) memory leaks. I think there are too many variables for a person to tell you what exactly is happening, plus you are dealing with the JVM here. :) Try jmap -histo:live PID-HERE | less and see what's using your memory. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Rahul R rahul.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 1:09:06 AM Subject: JVM Heap utilization Memory leaks with Solr I am trying to track memory utilization with my Application that uses Solr. Details of the setup : -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0 - Hardware : 12 CPU, 24 GB RAM For testing during PSR I am using a smaller subset of the actual data that I want to work with. Details of this smaller sub-set : - 5 million records, 4.5 GB index size Observations during PSR: A) I have allocated 3.2 GB for the JVM(s) that I used. After all users logout and doing a force GC, only 60 % of the heap is reclaimed. As part of the logout process I am invalidating the HttpSession and doing a close() on CoreContainer. From my application's side, I don't believe I am holding on to any resource. I wanted to know if there are known issues surrounding memory leaks with Solr ? B) To further test this, I tried deploying with shards. 3.2 GB was allocated to each JVM. All JVMs had 96 % free heap space after start up. I got varying results with this. Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain. I split the 5 million index into 5 parts of 1 million each and used them as shards. After multiple users used the system and doing a force GC, around 94 - 96 % of heap was reclaimed in all the JVMs. Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On the other, I deployed the entire 5 million part index as one shard. After multiple users used the system and doing a gorce GC, around 76 % of the heap was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my application was running. This result further convinces me that my application can be absolved of holding on to memory resources. I am not sure how to interpret these results ? For searching, I am using Without Shards : EmbeddedSolrServer With Shards :CommonsHttpSolrServer In terms of Solr objects this is what differs in my code between normal search and shards search (distributed search) After looking at Case 1, I thought that the CommonsHttpSolrServer was more memory efficient but Case 2 proved me wrong. Or could there still be memory leaks in my application ? Any thoughts, suggestions would be welcome. Regards Rahul -- View this message in context: http://www.nabble.com/JVM-Heap-utilization---Memory-leaks-with-Solr-tp24802380p25018165.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr-773 (GEO Module) question
Hi, we're glancing at the GEO search module known from the jira issue 773 (http://issues.apache.org/jira/browse/SOLR-773). It seems to us that the issue is still open and not yet included in the nightly builds. Is there a release plan for the nightly builds, and is this module considered core or contrib? Regards, Johan
Re: MultiCore Queries? are they possible
On Tue, Aug 18, 2009 at 5:47 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, Can we create a Join query between two indexes on two cores? Is this possible in Solr? I have a index which stores author profiles and other index which stores content and a author id as a reference. Can I query as select Content,AuthorName from Core0,Core1 where core0.authorid = core1.authorid and authorid=A123 No but you can always make two calls and join it yourself. However, Solr supports multi-valued fields so it is best to de-normalize the data if you need to show both kinds of information in one query. -- Regards, Shalin Shekhar Mangar.
Re: Strange error with shards
On Tue, Aug 18, 2009 at 9:01 PM, ahammad ahmed.ham...@gmail.com wrote: HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437) at The way I created this shard was to copy an existing one, erasing all the data files/folders, and modifying my schema/data-config files. So the core settings are pretty much the same. What did you modify in the schema? All the shards should have the same schema. That exception can come if the uniqueKey is missing/null. -- Regards, Shalin Shekhar Mangar.
Re: Passing a Cookie in SolrJ
On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J) paul.m.rami...@jpl.nasa.gov wrote: Hi All, The project I am working on is using Solr and OpenSSO (Sun's single sign on service). I need to write some sample code for our users that shows them how to query Solr and I would just like to point them to the SolrJ documentation but I can't see an easy way to be able to pass a cookie with the request. The cookie is needed to be able to get through the SSO layer but will just be ignored by Solr. I see that you are using Apache Commons Http Client and with that I would be able to write the cookie if I had access to the HttpMethod being used (GetMethod or PostMethod). However, I can not find an easy way to get access to this with SolrJ and thought I would ask before rewriting a simple example using only an ApacheHttpClient without the SolJ library. Thanks in advance for any pointers you may have. There's no easy way I think. You can extend CommonsHttpSolrServer and override the request method. Copy/paste the code from CommonsHttpSolrServer#request and make the changes. It is not an elegant way but it will work. -- Regards, Shalin Shekhar Mangar.
Re: How to boost fields with many terms against single-term?
On Wed, Aug 19, 2009 at 12:32 AM, Fuad Efendi f...@efendi.ca wrote: I don't want single-term docs such as home to appear in top for simple search for a home; I need home improvement made easy in top... How to implement it at query time? If you always want home improvement made easy on top for home, see if the QueryElevationComponent can help. -- Regards, Shalin Shekhar Mangar.
Re: Strange error with shards
Each core has a different database as a datasource, which means that they have different DB structures and fields. That is why the schemas are different. I figured out the cause of this problem. You were right, it was the uniqueKey field. All of my cores have that field set to id but for this new core, it is set to threadID. Changing that to id fixed the problem. Shalin Shekhar Mangar wrote: On Tue, Aug 18, 2009 at 9:01 PM, ahammad ahmed.ham...@gmail.com wrote: HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437) at The way I created this shard was to copy an existing one, erasing all the data files/folders, and modifying my schema/data-config files. So the core settings are pretty much the same. What did you modify in the schema? All the shards should have the same schema. That exception can come if the uniqueKey is missing/null. If all the shards should have the same schema, then what is the point of sharding in the first place? I thought that it was used to combine different cores with different index structures...Right now, every core I have is unique, and every schema is different... -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Strange-error-with-shards-tp25027486p25043859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error with shards
On Wed, Aug 19, 2009 at 6:44 PM, ahammad ahmed.ham...@gmail.com wrote: Each core has a different database as a datasource, which means that they have different DB structures and fields. That is why the schemas are different. If all the shards should have the same schema, then what is the point of sharding in the first place? I thought that it was used to combine different cores with different index structures...Right now, every core I have is unique, and every schema is different... Index is sharded when it becomes too much for one box to keep the whole index. Distributed Search in Solr can merge these multiple indexes running on different boxes into one result set. It is not meant for combining different cores or different schemas. If many shards have a document with the same uniqueKey value, any one can be returned. Typically, shards have the same schema, with each having a disjoint subset of the complete set of documents. -- Regards, Shalin Shekhar Mangar.
RE: JVM Heap utilization Memory leaks with Solr
Hi Rahul, JRockit could be used at least in a test environment to monitor JVM (and troubleshoot SOLR, licensed for-free for developers!); they have even Eclipse plugin now, and it is licensed by Oracle (BEA)... But, of course, in large companies test environment is in hands of testers :) But... 3700 fields will create (over time) 3700 arrays each of size 5,000,000!!! Even if most of fields are empty for most of documents... Applicable to non-tokenized single-valued non-boolean fields only, Lucene internals, FieldCache... and it won't be GC-collected after user log-off... prefer dedicated box for SOLR. -Fuad -Original Message- From: Rahul R [mailto:rahul.s...@gmail.com] Sent: August-19-09 6:19 AM To: solr-user@lucene.apache.org Subject: Re: JVM Heap utilization Memory leaks with Solr Fuad, We have around 5 million documents and around 3700 fields. All documents will not have values for all the fields JRockit is not approved for use within my organization. But thanks for the info anyway. Regards Rahul On Tue, Aug 18, 2009 at 9:41 AM, Funtick f...@efendi.ca wrote: BTW, you should really prefer JRockit which really rocks!!! Mission Control has necessary toolongs; and JRockit produces _nice_ exception stacktrace (explaining almost everything) in case of even OOM which SUN JVN still fails to produce. SolrServlet still catches Throwable: } catch (Throwable e) { SolrException.log(log,e); sendErr(500, SolrException.toStr(e), request, response); } finally { Rahul R wrote: Otis, Thank you for your response. I know there are a few variables here but the difference in memory utilization with and without shards somehow leads me to believe that the leak could be within Solr. I tried using a profiling tool - Yourkit. The trial version was free for 15 days. But I couldn't find anything of significance. Regards Rahul On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, A) There are no known (to me) memory leaks. I think there are too many variables for a person to tell you what exactly is happening, plus you are dealing with the JVM here. :) Try jmap -histo:live PID-HERE | less and see what's using your memory. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Rahul R rahul.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 1:09:06 AM Subject: JVM Heap utilization Memory leaks with Solr I am trying to track memory utilization with my Application that uses Solr. Details of the setup : -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0 - Hardware : 12 CPU, 24 GB RAM For testing during PSR I am using a smaller subset of the actual data that I want to work with. Details of this smaller sub-set : - 5 million records, 4.5 GB index size Observations during PSR: A) I have allocated 3.2 GB for the JVM(s) that I used. After all users logout and doing a force GC, only 60 % of the heap is reclaimed. As part of the logout process I am invalidating the HttpSession and doing a close() on CoreContainer. From my application's side, I don't believe I am holding on to any resource. I wanted to know if there are known issues surrounding memory leaks with Solr ? B) To further test this, I tried deploying with shards. 3.2 GB was allocated to each JVM. All JVMs had 96 % free heap space after start up. I got varying results with this. Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain. I split the 5 million index into 5 parts of 1 million each and used them as shards. After multiple users used the system and doing a force GC, around 94 - 96 % of heap was reclaimed in all the JVMs. Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On the other, I deployed the entire 5 million part index as one shard. After multiple users used the system and doing a gorce GC, around 76 % of the heap was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my application was running. This result further convinces me that my application can be absolved of holding on to memory resources. I am not sure how to interpret these results ? For searching, I am using Without Shards : EmbeddedSolrServer With Shards :CommonsHttpSolrServer In terms of Solr objects this is what differs in my code between normal search and shards search (distributed search) After looking at Case 1, I thought that the CommonsHttpSolrServer was more memory efficient but Case 2 proved me wrong. Or could there still be memory leaks in my application ? Any thoughts, suggestions would be welcome. Regards Rahul --
multi words synonyms
Hi, I would like to make the synonym for internal medicine to physician or doctor. but it is not working properly. Anyone help me? synonym.index.txt internal medicine = physician synonyms.query.txt physician, internal medicine = physician, doctor In the Analysis tool, I can see clearly that internal medicine is converted to physician and doctor in index and querying times, but when actual query, it is not converted (with debugQuery=true paprameter). lst name=debug str name=rawquerystringinternal medicine/str str name=querystringinternal medicine/str str name=parsedqueryjob:intern job:medicin/str str name=parsedquery_toStringjob:intern job:medicin/str It returns doc float name=score1.3963256/float str name=job874878_INTERNATIONAL CONSULTANTS/str /doc Here is what I have in schema.xml analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.index.txt ignoreCase=true expand=false/ analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.index.txt ignoreCase=true expand=false/
Shutdown Solr
Does anyone know a graceful way to shutdown Solr? (other than killing the process with Ctrl-C)
Re: Shutdown Solr
it catches the kill signal and shuts down as it should, I guess :) because it writes stuff to the log after pressing ^c 2009/8/19 Miller, Michael P. m.mil...@radium.ncsc.mil Does anyone know a graceful way to shutdown Solr? (other than killing the process with Ctrl-C)
Re: Data Modeling
This is the sort of Solr fundamentals question my book (chapter 2) will help you with. Think about what your user interface is. What are users searching for? That is, what exactly comes back from search results? It's not clear from your description what your search scenario is. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote: Hi, I am trying to create a schema for Solr. Here is a relational model of what our data might look like: Inventory - Sku Price Weight Attributes --- AttributeName AttributeValue Applications -- Id (Auto-Incrementing) Sku VehicleYear VehicleMake VehicleModel VehicleEngine There can be multiple Application(s) records. Also, Attributes can also have duplicates. Basically I want to store basic information about our inventory, attributes, and applications. If I didn't have the applications, I would simply have: field name=id ... field name=sku ... field name=price ... field name=weight ... !-- Attributes -- field name=OilPumpVolume ... field name=FuelType ... Since one part might have 3 or 4 attributes, but 100 applications, I want to try to avoid having 400 records, but maybe that is just what I will have to do. I appreciate any help. -- Vladimir Landman Northern Auto Parts
Re: Solr-773 (GEO Module) question
On Aug 19, 2009, at 6:45 AM, johan.sjob...@findwise.se wrote: Hi, we're glancing at the GEO search module known from the jira issue 773 (http://issues.apache.org/jira/browse/SOLR-773). It seems to us that the issue is still open and not yet included in the nightly builds. correct Is there a release plan for the nightly builds, and is this module considered core or contrib? activity on the nightly builds is winding down as we gear up for the 1.4 release. After 1.4 is out, I expect progress on the geo stuff. It will be in contrib (not core) and will likely be marked experimental for a while. That is, stuff will be added without the expectation that the interfaces will be set in stone. best ryan
RE: Shutdown Solr
catalina.sh stop But SolrServlet catches everything and forgets to implement destroy()! I am absolutely unsure about Ctrl-C and even have many concerns regarding catalina.sh stop... J2EE/JEE does not specify any support for threads outside than container-managed... I hope SolrServlet closes Lucene index (and other resources) and everything follows Servlet specs... but I can't find dummies' method _destroy()_ in SolrServlet!!! It shold gracefully close Lucene index and other resources. WHY? -Original Message- From: Tobias Brennecke [mailto:t.bu...@gmail.com] Sent: August-19-09 11:39 AM To: solr-user@lucene.apache.org Subject: Re: Shutdown Solr it catches the kill signal and shuts down as it should, I guess :) because it writes stuff to the log after pressing ^c 2009/8/19 Miller, Michael P. m.mil...@radium.ncsc.mil Does anyone know a graceful way to shutdown Solr? (other than killing the process with Ctrl-C)
RE: Shutdown Solr
Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you started Tomcat.
strange sorting results: each word in field is sorted
I'm trying to sort, but I am not always getting the correct results and I'm not sure where to start tracking down the problem. You can see the problem here (at least until it's fixed!): http://nines.performantsoftware.com/search/saved?user=paulname=poem If you sort by Title/Ascending, you get partially sorted results, but it seems to be using a random word to sort on instead of sorting on the entire title. Page one starts good with: (blank) Adieu Advertisement Afterwards etc but by page 6 it starts to break down: Elizabeth Barrett Browning Albert and Elweena Emerson and Bacon etc... Errata Anne Bannerman: Biographical Essay Aboringines (Estonia) etc... I notice in the above list that there is SOME word that is sorted, just not the first one. (In fact, it seems to be the word that appears greatest in the sort order.) Then at the end, for instance page 336, it sorts some titles with diacritical marks: Roman à Clef The Forgotten Reaping-Hook: Sex in My Ántonia Social (Re)Visioning in the Fields of My Ántonia etc... I'm not sure what info would be useful to help debug. In my schema.xml file, I've clipped what seems to be the relevant part: fieldtype name=text_lu class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype field name=title type=text_lu indexed=true stored=true multiValued=true/ Thanks, Paul
Re: Shutdown Solr
On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote: Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you started Tomcat. *No* application is graceful for kill -9. The whole point of kill -9 is that it's uncatchable. -- http://www.linkedin.com/in/paultomblin
Re: strange sorting results: each word in field is sorted
On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote: You can see the problem here (at least until it's fixed!): http://nines.performantsoftware.com/search/saved?user=paulname=poem Hi Paul - that project looks familiar! :) If you sort by Title/Ascending, you get partially sorted results, but it seems to be using a random word to sort on instead of sorting on the entire title. I'm not sure what info would be useful to help debug. In my schema.xml file, I've clipped what seems to be the relevant part: fieldtype name=text_lu class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype field name=title type=text_lu indexed=true stored=true multiValued=true/ I'm surprised you're not seeing an exception when trying to sort on title given this configuration. Sorting must be done on single valued indexed fields, that have at most a single term indexed per document. I recommend you use copyField to copy title to title_sort and configure a title_sort field as a string or a field type that analyzes only to a single term (like simply keyword tokenizing - lower case filter. Erik
RE: Shutdown Solr
Thanks... kill should be / can be graceful; kill -9 should kill immediately... no any hang, whole point... http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html -Original Message- From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul Tomblin Sent: August-19-09 2:49 PM To: solr-user@lucene.apache.org Subject: Re: Shutdown Solr On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote: Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you started Tomcat. *No* application is graceful for kill -9. The whole point of kill -9 is that it's uncatchable. -- http://www.linkedin.com/in/paultomblin
WordDelimiterFilter = MultiPhraseQuery?
My issue is with the use of WordDelimiterFilter and how the QueryParser (Dismax) converts the query into a MultiPhraseQuery. This is on solr 1.3 / lucene 2.4.1. For example: 1. yuma - 3:10 to Yuma 2. yUma - no results For #2 it gets split into y + uma and becomes a MultiPhraseQuery requiring both terms thus no results vs. requiring either one with a preference on both (or a preference on joining the terms or at least an OR query). 1. joker-man - Joker-Man Goes For Gold 2. joKerman - no results 3. jo-kerman - no results 1. prom night - Prom Night 2. PromNight - Prom Night 3. promnight - no results 4. pRomnIght - no results Is there a way to configure this behavior. I need to support all the above use-cases. I have a brute force solution using a copyField and a non-WordDelimiterFilter analyzer (whitespacetoken, lowercase, patternreplace punctuation, edgengram) and basically drop into solrconfig.xml a 2nd field for this (titleNameSubstring2). Those two combined is pretty much what I need, but that costs a memory hit + performance hit whereas some tuning to avoid MultiPhraseQuery would be a better fit. Here are the schema.xml + solrconfig.xml bits that are not working. [schema.xml] fieldType name=textSubstring class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=12/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType [solrconfig.xml] requestHandler name=stuff_title class=solr.SearchHandler lst name=invariants str name=defTypedismax/str str name=echoParamsexplicit/str str name=sortscore desc/str str name=qf titleNameSubstring^200.0 /str str name=pf titleNameSubstring^2.0 /str str name=bf product(releaseYear,0.1) /str str name=mm1/str /lst lst name=appends str name=fqsearchable:true/str /lst /requestHandler Any ideas? -netcam
Re: strange sorting results: each word in field is sorted
Erik Hatcher wrote: On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote: You can see the problem here (at least until it's fixed!): http://nines.performantsoftware.com/search/saved?user=paulname=poem Hi Paul - that project looks familiar! :) Hi Erik! I should hope so! And I've gone a year without having to delve into solr much since it has just plain worked. Thanks for the speedy reply. I'm surprised you're not seeing an exception when trying to sort on title given this configuration. Sorting must be done on single valued indexed fields, that have at most a single term indexed per document. I recommend you use copyField to copy title to title_sort and configure a title_sort field as a string or a field type that analyzes only to a single term (like simply keyword tokenizing - lower case filter. Erik I want to double check this (since you probably remember how long it takes to recreate the indexes). I think you're saying to add these two lines, then re-index: field name=title_sort type=string indexed=true stored=true/ copyField source=title dest=title_sort/ Now, this is case-sensitive, right? So would this make it case-insensitive? fieldtype name=sort_stringclass=solr.StrField sortMissingLast=true analyzer filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype field name=title_sort type=sort_string indexed=true stored=true/ copyField source=title dest=title_sort/ Also, I'm guessing from seeing the current results that this wouldn't collate the characters with diacritical marks correctly. Is there a way to indicate that, for instance, A-grave would sort next to A? And, while I'm on the subject, I have to do the same thing with the Author field, but unfortunately, that is sometimes First Last and sometimes Last, First. Is there any way to sort those by last name, or do I just have to encourage the index people to be more consistent? I can think of a fairly simple algorithm, but am not sure where to implement it: - if the word and or appears, just look at the left side of the field (in other words, sort by the first name that appears.) - if there is a comma, but it is part of , jr. or some other common suffixes like that, ignore it. - otherwise, if there is no comma, sort by the last word, unless it is jr, sr, III, etc., then sort by the word before that. - otherwise, sort by the first word. That would get most of the cases. Thanks, Paul
FW: Data Modeling
I hit reply and sent this to just David, but I think it should go to the whole list: Hi David, I want to do 2 kinds of things with Solr Maybe 3 in the future 1. I want to use it on our website so that a customer can filter down products by different attributes. So suppose we have: Inventory --- ABC, 10 DEF, 15 s Attributes ABC,Brand,ACME Brand ABC,Water Pump Style,Short DEF,Brand,Engine Builders DEF,Water Pump Style, Long Vehicle Applicatins ABC, 1999, Toyota, Camry, 3.1L ABC, 2000, Toyota, Camry, 3.1L DEF, 1997, Ford, Focus, 2.5L DEF, 1998, Ford, Focus, 2.5L I would like to be able to handle two things: 1. Give the person a list of all the unique years. When they pick one, show them all the Makes for that year. When they pick that, show all the Models. Alternatively: 1. Give them a list of makes, then models, then engine, etc... Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all the attributes for that sku, alternatively, I'd love to be able to drill-down by the attributes such as Brand, Water Pump Style, etc. Please let me know if this email is still not clear... -- Vladimir Landman Northern Auto Parts From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: 2009-08-19 10:42 AM To: solr; Vladimir Landman Subject: Re: Data Modeling This is the sort of Solr fundamentals question my book (chapter 2) will help you with. Think about what your user interface is. What are users searching for? That is, what exactly comes back from search results? It's not clear from your description what your search scenario is. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote: Hi, I am trying to create a schema for Solr. Here is a relational model of what our data might look like: Inventory - Sku Price Weight Attributes --- AttributeName AttributeValue Applications -- Id (Auto-Incrementing) Sku VehicleYear VehicleMake VehicleModel VehicleEngine There can be multiple Application(s) records. Also, Attributes can also have duplicates. Basically I want to store basic information about our inventory, attributes, and applications. If I didn't have the applications, I would simply have: field name=id ... field name=sku ... field name=price ... field name=weight ... !-- Attributes -- field name=OilPumpVolume ... field name=FuelType ... Since one part might have 3 or 4 attributes, but 100 applications, I want to try to avoid having 400 records, but maybe that is just what I will have to do. I appreciate any help. -- Vladimir Landman Northern Auto Parts
Re: Adding cores dynamically
Lici, We're doing similar thing with multi-core - when a core reaches capacity (in our case 200 million records) we start a new core. We are doing this via web service call (Create web service), http://wiki.apache.org/solr/CoreAdmin This is all done in java code - before writing we check the number of records in core - if reached it's capacity we create a new core and then index there. -vivek 2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com: Hi there, currently we want to add cores dynamically when the active one reaches some capacity, can anyone give me some hints to achieve such this functionality? (Just wondering if you have used shell-scripting or you have code some 100% Java based solution) Thx -- Lici
Re: strange sorting results: each word in field is sorted
On Aug 19, 2009, at 3:50 PM, Paul Rosen wrote: I'm surprised you're not seeing an exception when trying to sort on title given this configuration. Sorting must be done on single valued indexed fields, that have at most a single term indexed per document. I recommend you use copyField to copy title to title_sort and configure a title_sort field as a string or a field type that analyzes only to a single term (like simply keyword tokenizing - lower case filter. Erik I want to double check this (since you probably remember how long it takes to recreate the indexes). I think you're saying to add these two lines, then re-index: field name=title_sort type=string indexed=true stored=true/ copyField source=title dest=title_sort/ For the simplest case, yes. You do have to be careful the sort field is not multiValued - and I believe the NINES model allowed for multiple titles. So it might be necessary for your indexing client to specify the single sort field value instead of leveraging copyField. Now, this is case-sensitive, right? So would this make it case- insensitive? Yes, the above would be case sensitive. fieldtype name=sort_stringclass=solr.StrField sortMissingLast=true analyzer filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype field name=title_sort type=sort_string indexed=true stored=true/ copyField source=title dest=title_sort/ That analyzer definition isn't quite right - you must have at least a tokenizer. The KeywordTokenizer tokenizes the entire string into a single token, though. In Solr's example schema there is a field type like this: fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory/ !-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive -- filter class=solr.LowerCaseFilterFactory / !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / !-- The PatternReplaceFilter gives you the flexibility to use Java Regular expression to replace any sequence of characters matching a pattern with an arbitrary replacement string, which may include back references to portions of the original string matched by the pattern. See the Java Regular Expression documentation for more information on pattern and replacement string syntax. http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType Also, I'm guessing from seeing the current results that this wouldn't collate the characters with diacritical marks correctly. Is there a way to indicate that, for instance, A-grave would sort next to A? Yes, you can incorporate the diacritic normalizing filter into the analyzer definition above. AsciiFoldingFilter or the ISO Latin1 one. And, while I'm on the subject, I have to do the same thing with the Author field, but unfortunately, that is sometimes First Last and sometimes Last, First. Is there any way to sort those by last name, or do I just have to encourage the index people to be more consistent? Good luck with getting consistency in your domain! :) But it certainly makes sense to request that from the data providers, in at least some form that can be turned into the sortable value. I can think of a fairly simple algorithm, but am not sure where to implement it: - if the word and or appears, just look at the left side of the field (in other words, sort by the first name that appears.) - if there is a comma, but it is part of , jr. or some other common suffixes like that, ignore it. - otherwise, if there is no comma, sort by the last word, unless it is jr, sr, III, etc., then sort by the word before that. - otherwise, sort by the first word. Probably best to implement that in the indexing client code, but simple transformations could be implemented using the PatternReplaceFilter like above. Erik
Re: Passing a Cookie in SolrJ
SolrJ uses the Apache Commons HTTP client. This describes the authentication system: http://hc.apache.org/httpclient-3.x/authentication.html http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/auth/package-frame.html *This has code to use authentication* https://issues.apache.org/jira/browse/SOLR-1238 You might be able to find an openSSO implementation for this. Or hack up a simple one. On Wed, Aug 19, 2009 at 5:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J) paul.m.rami...@jpl.nasa.gov wrote: Hi All, The project I am working on is using Solr and OpenSSO (Sun's single sign on service). I need to write some sample code for our users that shows them how to query Solr and I would just like to point them to the SolrJ documentation but I can't see an easy way to be able to pass a cookie with the request. The cookie is needed to be able to get through the SSO layer but will just be ignored by Solr. I see that you are using Apache Commons Http Client and with that I would be able to write the cookie if I had access to the HttpMethod being used (GetMethod or PostMethod). However, I can not find an easy way to get access to this with SolrJ and thought I would ask before rewriting a simple example using only an ApacheHttpClient without the SolJ library. Thanks in advance for any pointers you may have. There's no easy way I think. You can extend CommonsHttpSolrServer and override the request method. Copy/paste the code from CommonsHttpSolrServer#request and make the changes. It is not an elegant way but it will work. -- Regards, Shalin Shekhar Mangar. -- Lance Norskog goks...@gmail.com
Re: Shutdown Solr
In production systems I have done a three-stage technique. First, use the container's standard shutdown tool. Tomcat, JBoss, Jetty all have their own. Then, sleep for maybe 60 seconds. Then do kill, sleep more, then 'kill -9'. On Wed, Aug 19, 2009 at 12:21 PM, Fuad Efendi f...@efendi.ca wrote: Thanks... kill should be / can be graceful; kill -9 should kill immediately... no any hang, whole point... http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html -Original Message- From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul Tomblin Sent: August-19-09 2:49 PM To: solr-user@lucene.apache.org Subject: Re: Shutdown Solr On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote: Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you started Tomcat. *No* application is graceful for kill -9. The whole point of kill -9 is that it's uncatchable. -- http://www.linkedin.com/in/paultomblin -- Lance Norskog goks...@gmail.com
Re: DataImportHandler ignoring most rows
It usually helps to make a database view of your query, and then load the DIH from that view. There are cases where some query syntaxes are mangled on the way to the DB. 2009/8/18 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com this comment says that str name=Total Rows Fetched7/str the query fetched only 7 rows. If possible open a tool and just run the same query and see how many rows are returned On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com wrote: Using: - apache-solr-1.3.0 - java 1.6 - tomcat 6 - sql server 2005 w/ JSQLConnect 4.0 driver I have a group table with 3007 rows. I have confirmed the key is unique with select distinct id from group and it returns 3007. When i re-index using http://host:port/solr/dataimport?command=full-import I only get 7 records indexed. Any insight into what is going on would be really great. A partial response: lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched7/str str name=Total Documents Skipped0/str I have other entities that index all the rows without issue. There are no errors in the logs. I am not using any Transformers (and most of my config is not changed from install) My schema.xml contains: uniqueKeykey/uniqueKey and field defs (not a full list of fields): field name=key type=string indexed=true stored=true required=true / field name=class type=string indexed=true stored=true required=true / field name=id type=string indexed=true stored=true / field name=description type=text indexed=true stored=true / field name=created type=date indexed=true stored=true / field name=updated type=date indexed=true stored=true / data-config.xml dataConfig !-- jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2/logfile=DB_TRACE.log -- dataSource type=JdbcDataSource driver=com.jnetdirect.jsql.JSQLDriver url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2 user=SocialSite2 password=SocialSite2 / document entity name=Group pk=key query=select 'group.'+id as 'key', 'group' as 'class', name, handle, description, created, updated from group order by created asc /entity entity name=Message pk=key query=...redacted... /entity /document /dataConfig -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lance Norskog goks...@gmail.com
Re: DataImportHandler ignoring most rows
I switched to the ms driver an now all is well. Must be an incompatibility with the JSQLConnect driver. Sent from my iPhone On Aug 18, 2009, at 11:47 PM, Noble Paul നോബിള് नो ब्ळ् noble.p...@corp.aol.com wrote: this comment says that str name=Total Rows Fetched7/str the query fetched only 7 rows. If possible open a tool and just run the same query and see how many rows are returned On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com wrote: Using: - apache-solr-1.3.0 - java 1.6 - tomcat 6 - sql server 2005 w/ JSQLConnect 4.0 driver I have a group table with 3007 rows. I have confirmed the key is unique with select distinct id from group and it returns 3007. When i re-index using http://host:port/solr/dataimport?command=full-import I only get 7 records indexed. Any insight into what is going on would be really great. A partial response: lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched7/str str name=Total Documents Skipped0/str I have other entities that index all the rows without issue. There are no errors in the logs. I am not using any Transformers (and most of my config is not changed from install) My schema.xml contains: uniqueKeykey/uniqueKey and field defs (not a full list of fields): field name=key type=string indexed=true stored=true required=true / field name=class type=string indexed=true stored=true required=true / field name=id type=string indexed=true stored=true / field name=description type=text indexed=true stored=true / field name=created type=date indexed=true stored=true / field name=updated type=date indexed=true stored=true / data-config.xml dataConfig !-- jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/ user=SocialSite2/logfile=DB_TRACE.log -- dataSource type=JdbcDataSource driver=com.jnetdirect.jsql.JSQLDriver url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/ user=SocialSite2 user=SocialSite2 password=SocialSite2 / document entity name=Group pk=key query=select 'group.'+id as 'key', 'group' as 'class', name, handle, description, created, updated from group order by created asc /entity entity name=Message pk=key query=...redacted... /entity /document /dataConfig -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Data Modeling
It's getting clearer Vladimir. So fundamentally your users are searching for products (apparently auto parts) and the different attributes would become navigation filters. If this is right, then your initial schema (the first email) is a start, although it's a little ambigous to interpert it because id and sku are over-loaded. Your schema would contain a part id, the part's sku, and for each attribute you mentioned it would have a field as well. I recommend using Solr's dynamic fields to define those so that you don't have to explicitly define every attribute you'll ever think of for every part explicitly in the schema. The word application was totally throwing me but now I believe you mean to say that this is a vehicle, and an auto part is going to work on multiple vehicles. In Solr, you're going to denormalize this related data by inlining the auto information (aka application) into the each document which is an auto part. ... I think you have a couple approaches on that. Firstly, I observe that when I'm shopping for autos or for auto parts, I am guided through a user interface to pick my precise vehicle. THEN I see related products. This is straight forward -- you would not use Solr; put this information in your database and build an easy app to navigate to a specific vehicle to get the vehicle identifier. You *could* use Solr for this but it'd be in a separate index/core or you would have to use multiple document types in your schema (my book has more info on these approaches). So once you have the vehicle identifier, you would look up documents in Solr (aka auto parts) that have have this vehicle identifier. It's be a multi-valued untokenized field and this would be the only vehicle info needed in your schema. The other approach would be necessary to dynamically filter a list of parts by *partial* vehicle choices like picking Porsche and 2001 would give you parts that will work on a Boxster and a Carerra made in 2001. Doing this correctly is tricky for solr and it's non-relational schema because there are multiple vehicle attributes and an auto part is associated to multiple vehicles. I'll advise more if you need to do this but hopefully you won't need to. It's a bit advanced and complicated. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server From: Vladimir Landman [v...@northernautoparts.com] Sent: Wednesday, August 19, 2009 4:01 PM To: solr-user@lucene.apache.org Subject: FW: Data Modeling I hit reply and sent this to just David, but I think it should go to the whole list: Hi David, I want to do 2 kinds of things with Solr Maybe 3 in the future 1. I want to use it on our website so that a customer can filter down products by different attributes. So suppose we have: Inventory --- ABC, 10 DEF, 15 s Attributes ABC,Brand,ACME Brand ABC,Water Pump Style,Short DEF,Brand,Engine Builders DEF,Water Pump Style, Long Vehicle Applicatins ABC, 1999, Toyota, Camry, 3.1L ABC, 2000, Toyota, Camry, 3.1L DEF, 1997, Ford, Focus, 2.5L DEF, 1998, Ford, Focus, 2.5L I would like to be able to handle two things: 1. Give the person a list of all the unique years. When they pick one, show them all the Makes for that year. When they pick that, show all the Models. Alternatively: 1. Give them a list of makes, then models, then engine, etc... Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all the attributes for that sku, alternatively, I'd love to be able to drill-down by the attributes such as Brand, Water Pump Style, etc. Please let me know if this email is still not clear... -- Vladimir Landman Northern Auto Parts From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: 2009-08-19 10:42 AM To: solr; Vladimir Landman Subject: Re: Data Modeling This is the sort of Solr fundamentals question my book (chapter 2) will help you with. Think about what your user interface is. What are users searching for? That is, what exactly comes back from search results? It's not clear from your description what your search scenario is. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote: Hi, I am trying to create a schema for Solr. Here is a relational model of what our data might look like: Inventory - Sku Price Weight Attributes --- AttributeName AttributeValue Applications -- Id (Auto-Incrementing) Sku VehicleYear VehicleMake VehicleModel VehicleEngine There can be multiple Application(s) records. Also, Attributes can also have duplicates. Basically I want to store basic information about our inventory, attributes, and applications. If I didn't have the applications, I would simply have: field name=id ... field name=sku ... field name=price
Re: dynamic changes to schema
Hi, thanks for your answers, I think I have to go more in deatail. we are talking about a shop-application which have products I want to search for. This products normally have the standard attributes like sku, a name, a price and so on. But the user can add attributes to the product. So for example if he sells books, he could add the author as attribute. Lets say he name this field my_author (but he is free to name it as he wants) and he tells this field over the configuration, that it is searchable. So I need a field in solr for the author. Cause I cant restrict the user to prefix every field with something like my_ dynamic fields doesn't work, do they? best, Marco Constantijn Visinescu schrieb: huh? I think I lost you :) You want to use a multivalued field to list what dynamic fields you have in your document? Also if you program your application correctly you should be able to restrict your users from doing anything you please (or don't please in this case). On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann m...@intersales.de wrote: hi, thanks for the advise but the problem with dynamic fields is, that i cannot restrict how the user calls the field in the application. So there isn't a pattern I can use. But I thought about using mulitvalued fields for the dynamically added fields. Good Idea? thanks, Marco Constantijn Visinescu schrieb: use a dynamic field ? On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann m...@intersales.de wrote: Hi there, is there a possibility to change the solr-schema over php dynamically. The web-application I want to index at the moment has the feature to add fields to entitys and you can tell this fields that they are searchable. To realize this with solr the schema has to change when a searchable field is added or removed. Any suggestions, Thanks a lot, Marco Westermann -- ++ Business-Software aus einer Hand ++ ++ Internet, Warenwirtschaft, Linux, Virtualisierung ++ http://www.intersales.de http://www.eisxen.org http://www.tarantella-partner.de http://www.medisales.de http://www.eisfair.net interSales AG Internet Commerce Subbelrather Str. 247 50825 Köln Tel 02 21 - 27 90 50 Fax 02 21 - 27 90 517 Mail i...@intersales.de Mail m...@intersales.de Web www.intersales.de Handelsregister Köln HR B 30904 Ust.-Id.: DE199672015 Finanzamt Köln-Nord. UstID: nicht vergeben Aufsichtsratsvorsitzender: Michael Morgenstern Vorstand: Andrej Radonic, Peter Zander __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4346 (20090818) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com -- ++ Business-Software aus einer Hand ++ ++ Internet, Warenwirtschaft, Linux, Virtualisierung ++ http://www.intersales.de http://www.eisxen.org http://www.tarantella-partner.de http://www.medisales.de http://www.eisfair.net interSales AG Internet Commerce Subbelrather Str. 247 50825 Köln Tel 02 21 - 27 90 50 Fax 02 21 - 27 90 517 Mail i...@intersales.de Mail m...@intersales.de Web www.intersales.de Handelsregister Köln HR B 30904 Ust.-Id.: DE199672015 Finanzamt Köln-Nord. UstID: nicht vergeben Aufsichtsratsvorsitzender: Michael Morgenstern Vorstand: Andrej Radonic, Peter Zander __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4346 (20090818) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com -- ++ Business-Software aus einer Hand ++ ++ Internet, Warenwirtschaft, Linux, Virtualisierung ++ http://www.intersales.de http://www.eisxen.org http://www.tarantella-partner.de http://www.medisales.de http://www.eisfair.net interSales AG Internet Commerce Subbelrather Str. 247 50825 Köln Tel 02 21 - 27 90 50 Fax 02 21 - 27 90 517 Mail i...@intersales.de Mail m...@intersales.de Web www.intersales.de Handelsregister Köln HR B 30904 Ust.-Id.: DE199672015 Finanzamt Köln-Nord. UstID: nicht vergeben Aufsichtsratsvorsitzender: Michael Morgenstern Vorstand: Andrej Radonic, Peter Zander
Re: dynamic changes to schema
However, you can have a dynamic * field mapping that catches all field names that aren't already defined - though all of the fields will be the same field type. Erik On Aug 19, 2009, at 5:48 PM, Marco Westermann wrote: Hi, thanks for your answers, I think I have to go more in deatail. we are talking about a shop-application which have products I want to search for. This products normally have the standard attributes like sku, a name, a price and so on. But the user can add attributes to the product. So for example if he sells books, he could add the author as attribute. Lets say he name this field my_author (but he is free to name it as he wants) and he tells this field over the configuration, that it is searchable. So I need a field in solr for the author. Cause I cant restrict the user to prefix every field with something like my_ dynamic fields doesn't work, do they? best, Marco Constantijn Visinescu schrieb: huh? I think I lost you :) You want to use a multivalued field to list what dynamic fields you have in your document? Also if you program your application correctly you should be able to restrict your users from doing anything you please (or don't please in this case). On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann m...@intersales.de wrote: hi, thanks for the advise but the problem with dynamic fields is, that i cannot restrict how the user calls the field in the application. So there isn't a pattern I can use. But I thought about using mulitvalued fields for the dynamically added fields. Good Idea? thanks, Marco Constantijn Visinescu schrieb: use a dynamic field ? On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann m...@intersales.de wrote: Hi there, is there a possibility to change the solr-schema over php dynamically. The web-application I want to index at the moment has the feature to add fields to entitys and you can tell this fields that they are searchable. To realize this with solr the schema has to change when a searchable field is added or removed. Any suggestions, Thanks a lot, Marco Westermann
【solr DIH】A problem about solr delta-imports
Hi all, There is a problem when I use solr delta-imports to update the index. I have added the last_modified column in the table. After I use the full-import command to index the database data, the dataimport.properties file contains nothing, and when I use the delta-import command to update index, the solr list all the data in database not the lasted data. My db-data-config.xml: dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/funguide user=root password=root/ document name=shopinfo entity name=shop pk=shop_id query=select shop_id,title,description,tel,address,longitude,latitude from shop deltaQuery=select shop_id from shop where last_modified '${dataimporter.last_index_time}' field column=shop_id name=id / field column=title name=title / field column=description name=description / field column=tel name=tel / field column=address name=address / field column=longitude name=longitude / field column=latitude name=latitude / /entity /document /dataConfig Anyboby know how to solve the problem? Thanks! enzhao...@gmail.com -- View this message in context: http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 【solr DIH】A problem about solr delta-imports
which version of solr are you using? .Solr1.3 had a bug with this. On Thu, Aug 20, 2009 at 9:42 AM, huenzhaohuenz...@126.com wrote: Hi all, There is a problem when I use solr delta-imports to update the index. I have added the last_modified column in the table. After I use the full-import command to index the database data, the dataimport.properties file contains nothing, and when I use the delta-import command to update index, the solr list all the data in database not the lasted data. My db-data-config.xml: dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/funguide user=root password=root/ document name=shopinfo entity name=shop pk=shop_id query=select shop_id,title,description,tel,address,longitude,latitude from shop deltaQuery=select shop_id from shop where last_modified '${dataimporter.last_index_time}' field column=shop_id name=id / field column=title name=title / field column=description name=description / field column=tel name=tel / field column=address name=address / field column=longitude name=longitude / field column=latitude name=latitude / /entity /document /dataConfig Anyboby know how to solve the problem? Thanks! enzhao...@gmail.com -- View this message in context: http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: 【solr DIH】A problem about solr delta-imports
The version is 1.3. After I used the full-import, the tomcat log show that the solr did not call the SolrWriter class. Do you know the solution of this bug? Noble Paul നോബിള് नोब्ळ्-2 wrote: which version of solr are you using? .Solr1.3 had a bug with this. On Thu, Aug 20, 2009 at 9:42 AM, huenzhaohuenz...@126.com wrote: Hi all, There is a problem when I use solr delta-imports to update the index. I have added the last_modified column in the table. After I use the full-import command to index the database data, the dataimport.properties file contains nothing, and when I use the delta-import command to update index, the solr list all the data in database not the lasted data. My db-data-config.xml: dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/funguide user=root password=root/ document name=shopinfo entity name=shop pk=shop_id query=select shop_id,title,description,tel,address,longitude,latitude from shop deltaQuery=select shop_id from shop where last_modified '${dataimporter.last_index_time}' field column=shop_id name=id / field column=title name=title / field column=description name=description / field column=tel name=tel / field column=address name=address / field column=longitude name=longitude / field column=latitude name=latitude / /entity /document /dataConfig Anyboby know how to solve the problem? Thanks! enzhao...@gmail.com -- View this message in context: http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25056379.html Sent from the Solr - User mailing list archive at Nabble.com.