RE: Ensuring stable timestamp ordering
Dennis Gearon [gear...@sbcglobal.net] wrote: how about a timrstamp with either a GUID appended on the end of it? Since long (8 bytes) is the largest atomic type supported by Java, this would have to be represented as a String (or rather BytesRef) and would take up 4 + 32 bytes + 2 * 4 bytes from the internal BytesRef-attributes + some extra overhead. That is quite a large memory penalty to ensure unique timestamps.
using HebMorph
Hi I'm trying to use HebMorph, a new Hebrew analyzer. http://github.com/itaifrenkel/HebMorph/tree/master/java/ The instructions says: 1. Download the code from herehttp://github.com/synhershko/HebMorph/tree/master/java/ . 2. Use the hebmorph ant scripthttp://github.com/synhershko/HebMorph/blob/master/java/hebmorph/build.xmlto build the hebmorph project. 3. Use the lucene.hebrew ant scripthttp://github.com/synhershko/HebMorph/blob/master/java/lucene.hebrew/build.xmlto build the lucene.hebrew project. 4. Copy both jar files to the solr/lib folder. 5. Edit your solr/conf/schema.xml file to use the analyzer you choose to use. I've installed the Solr package under ubuntu Lucyd. I've completed steps 1-3. Where do I put the jar files? How do I make Solr use the analyzer? Thanks
Solr MySQL Adding new column to table
Hello Techies, I am new to Solr, i am using it with mysql. Suppose i have table called person in mysql with two columns name, age and i have configured mysql in solr. now i have added a new column to person table called phoneNumber, is it possible for solr to recognize new column dynamically ? i mean with out changing old configuration. thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1826759.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr MySQL Adding new column to table
You have to change the old configuration for the newly added field.Or you can use dynamic fields concept. Go through the link http://wiki.apache.org/solr/SchemaXml -Original Message- From: nitin.vanaku...@gmail.com [via Lucene] ml-node+1826759-1041834398-225...@n3.nabble.com Sent: Tuesday, November 2, 2010 4:50am To: sivaprasad sivaprasa...@echidnainc.com Subject: Solr MySQL Adding new column to table Hello Techies, I am new to Solr, i am using it with mysql. Suppose i have table called person in mysql with two columns name, age and i have configured mysql in solr. now i have added a new column to person table called phoneNumber, is it possible for solr to recognize new column dynamically ? i mean with out changing old configuration. thanks in advance Nitin View message @ [http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1826759.html] http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1826759.html To start a new topic under Solr - User, email ml-node+472068-1030716887-225...@n3.nabble.com To unsubscribe from Solr - User, [http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_codenode=472068code=c2l2YXByYXNhZC5qQGVjaGlkbmFpbmMuY29tfDQ3MjA2OHwtMjAyODMzMTY4OQ==] click here. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1826792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase Query Problem?
On 11/1/2010 11:14 PM, Ken Stanley wrote: On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod Tod, Without knowing your exact field definition, my first guess would be your first boolean query; because it is not quoted, what SOLR typically does is to transform that type of query into something like (assuming your uniqueKey is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do (mykeywords:Compliance+With+Conduct+Standards) you might see different (better?) results. Otherwise, appenddebugQuery=on to your URL and you can see exactly how SOLR is parsing your query. If none of that helps, what is your field definition in your schema.xml? - Ken The field definition is: field name=mykeywords type=string indexed=true stored=true multiValued=true/ The request: select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on The response looks like this: responseHeader:{ status:0, QTime:8, params:{ wt:json, q:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), start:0, indent:true, fl:mykeywords, debugQuery:on}}, response:{numFound:6,start:0,docs:[ { mykeywords:[Compliance With Attorney Conduct Standards]}, { mykeywords:[Anti-Bribery,Bribes]}, { mykeywords:[Marketing Guidelines,Marketing]}, {}, { mykeywords:[Anti-Bribery,Due Diligence]}, { mykeywords:[Anti-Bribery,AntiBribery]}] }, debug:{ rawquerystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), querystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), parsedquery:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, explain:{ ... As you mentioned, looking at the parsed query its breaking the request up on word boundaries rather than on the entire phrase. The goal is to return only the very first entry. Any ideas? Thanks - Tod
RE: Solr MySQL Adding new column to table
Hi Sivaprasad, first of all thanks for your kind response. i gone through that link, if i use the dynamicField concept,still i need to alter the query in data-config.xml right! thanks Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1826865.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr MySQL Adding new column to table
Not if you use 'SELECT * FROM person' Ephraim Ofir -Original Message- From: nitin.vanaku...@gmail.com [mailto:nitin.vanaku...@gmail.com] Sent: Tuesday, November 02, 2010 11:19 AM To: solr-user@lucene.apache.org Subject: RE: Solr MySQL Adding new column to table Hi Sivaprasad, first of all thanks for your kind response. i gone through that link, if i use the dynamicField concept,still i need to alter the query in data-config.xml right! thanks Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table -tp1826759p1826865.html Sent from the Solr - User mailing list archive at Nabble.com.
Dynamically create new core
Hi, I have a requirement of dynamically creating new cores(master). Each core should have a replicated slave core. I am working with Java and using SolrJ as my solr client. I came across CoreAdminRequest class and looks like the way to go. CoreAdminRequest.createCore(NewCore1, NewCore1, solrServer); creates a new core programmatically. Also, for the newly created core, I want to use an existing solrconfig.xml modify certain parameters. Can I achieve this using SolrJ? Are there any better approaches for the requirement? Thanks for any pointers,
RE: Solr MySQL Adding new column to table
ok. i have one more issue. i am getting following exception can you please explore on it INFO: Creating a connection for entity person with URL: jdbc:mysql://localhost:3306/example Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 250 Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocument[{eage=eage(1.0)={28}, ename=ename(1.0)={shree}, eid=eid(1.0)={1}}] org.apache.solr.common.SolrException: Document is missing uniqueKey field id at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:230) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocument[{eage=eage(1.0)={29}, ename=ename(1.0)={ramesh}, eid=eid(1.0)={2}}] org.apache.solr.common.SolrException: Document is missing uniqueKey field id at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:230) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.DocBuilder finish INFO: Import completed successfully Nov 2, 2010 3:34:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table-tp1826759p1827093.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr MySQL Adding new column to table
Your uniqueKey field is defined as id (in schema.xml) and your query doesn't return an id field. Ephraim Ofir -Original Message- From: nitin.vanaku...@gmail.com [mailto:nitin.vanaku...@gmail.com] Sent: Tuesday, November 02, 2010 12:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr MySQL Adding new column to table ok. i have one more issue. i am getting following exception can you please explore on it INFO: Creating a connection for entity person with URL: jdbc:mysql://localhost:3306/example Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 250 Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocument[{eage=eage(1.0)={28}, ename=ename(1.0)={shree}, eid=eid(1.0)={1}}] org.apache.solr.common.SolrException: Document is missing uniqueKey field id at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115 ) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2. java:230) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate ProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImport Handler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 0) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 370) Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocument[{eage=eage(1.0)={29}, ename=ename(1.0)={ramesh}, eid=eid(1.0)={2}}] org.apache.solr.common.SolrException: Document is missing uniqueKey field id at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115 ) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2. java:230) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate ProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImport Handler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 0) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 370) Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.DocBuilder finish INFO: Import completed successfully Nov 2, 2010 3:34:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=fa lse) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table -tp1826759p1827093.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use polish stemmer - Stempel - in schema.xml?
Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !--filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regards, Bernd Am 28.10.2010 14:56, schrieb Jakub Godawa: Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub.
Re: Phrase Query Problem?
That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick On Tue, Nov 2, 2010 at 5:25 AM, Tod listac...@gmail.com wrote: On 11/1/2010 11:14 PM, Ken Stanley wrote: On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod Tod, Without knowing your exact field definition, my first guess would be your first boolean query; because it is not quoted, what SOLR typically does is to transform that type of query into something like (assuming your uniqueKey is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do (mykeywords:Compliance+With+Conduct+Standards) you might see different (better?) results. Otherwise, appenddebugQuery=on to your URL and you can see exactly how SOLR is parsing your query. If none of that helps, what is your field definition in your schema.xml? - Ken The field definition is: field name=mykeywords type=string indexed=true stored=true multiValued=true/ The request: select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on The response looks like this: responseHeader:{ status:0, QTime:8, params:{ wt:json, q:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), start:0, indent:true, fl:mykeywords, debugQuery:on}}, response:{numFound:6,start:0,docs:[ { mykeywords:[Compliance With Attorney Conduct Standards]}, { mykeywords:[Anti-Bribery,Bribes]}, { mykeywords:[Marketing Guidelines,Marketing]}, {}, { mykeywords:[Anti-Bribery,Due Diligence]}, { mykeywords:[Anti-Bribery,AntiBribery]}] }, debug:{ rawquerystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), querystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), parsedquery:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, explain:{ ... As you mentioned, looking at the parsed query its breaking the request up on word boundaries rather than on the entire phrase. The goal is to return only the very first entry. Any ideas? Thanks - Tod
how to get TermVectorComponent using xml , vs. SOLR-949
Hi all, This seems a basic question: what's the best way to get TermVectorComponents. from the Solr XmL response? SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch adds this ability, but after 2 years it's still not in the mainline. (And doesn't patch cleanly to the current head 1.4). I'm new to Solr and familiar with the SolrJ but not as the best means for getting/parsing the raw xml. (Typically I find the dtd and right code to parse the dom using the dtd. In this case I've seen a few examples, but nothing definiive) Our team would rather use the out of the box solr rather than manually apply patches and worry about consistency during upgrades... Thanks in advance, will
Re: Disk usage per-field
Hi, I am currently benchmarking solr index with different fields to see the impact on its size/ search speed etc. A feature to find the disk usage per field of index would be really handy and save me alot of time. Do we have any updates on this? Has anyone tried writing custom code for it ? - Muneeb -- View this message in context: http://lucene.472066.n3.nabble.com/Disk-usage-per-field-tp934765p1827739.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use polish stemmer - Stempel - in schema.xml?
Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regards, Bernd Am 28.10.2010 14:56, schrieb Jakub Godawa: Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub.
Re: How to use polish stemmer - Stempel - in schema.xml?
Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !--filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regards, Bernd Am 28.10.2010 14:56, schrieb Jakub Godawa: Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub. -- *
Re: Problem with phrase matches in Solr
I will. Thanks Darren -Moazzam On Mon, Nov 1, 2010 at 1:15 PM, dar...@ontrenet.com wrote: Take a look at term proximity and phrase query. http://wiki.apache.org/solr/SolrRelevancyCookbook Hey guys, I have a solr index where i store information about experts from various fields. The thing is when I search for channel marketing i get people that have the word channel or marketing in their data. I only want people who have that entire phrase in their bio. I copy the contents of bio to the default search field (which is text) How can I make sure that exact phrase matching works while the search is agile enough that half searches match too (like uni matches university, etc - this works but not phrase matching)? I hope I was able to properly explain my problem. If not, please let me know. Thanks in advance, Moazzam
Re: How to use polish stemmer - Stempel - in schema.xml?
This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.classGener.class MultiTrie2.class Optimizer2.class Reduce.classRow.classTestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter
Re: Phrase Query Problem?
On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken
Re: How to use polish stemmer - Stempel - in schema.xml?
So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.classGener.class MultiTrie2.class Optimizer2.class Reduce.classRow.classTestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !--filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
Re: How to use polish stemmer - Stempel - in schema.xml?
Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter
Highlighting and maxBooleanClauses limit
By default, the solrconfig.xml has maxBooleanClauses set to 1024, which in my opinion should be more than enough clauses in general. Recently, we have been noticing errors in our Catalina log: SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 2048. As a temporary (and quick) work around, we tried to increase the maxBooleanClauses to 2048, but are still experiencing problems hitting the limit. The full error (including the query ran before the error) is: INFO: [bizjournals] webapp=/solr path=/select/ params={facet=truesort=df_date_published+aschl=trueversion=2.2facet.field=facet_typefacet.field=facet_authorfacet.field=facet_arr_industriesfq=df_date_published:[*+TO+NOW]hl.requireFieldMatch=truehl.fragsize=75facet.mincount=1indent=onhl.fl=df_text_contentwt=xmlrows=25hl.snippets=2hl.maxAlternateFieldLength=150start=0q=(df_text_blog_name:farm+bill)+OR+((df_text_headline:[*+TO+*]+AND+df_date_published:[*+TO+NOW])+AND+((df_text_author:farm+bill)+OR+(df_text_content:farm+bill)+OR+(df_text_headline:farm+bill)+OR+(df_text_blog_name:farm+bill)))hl.alternateField=df_text_contenthl.usePhraseHighlighter=true} hits=269 status=500 QTime=729 Nov 2, 2010 4:10:09 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 2048 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153) at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:144) at org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite.rewrite(MultiTermQuery.java:110) at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:178) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) I've noticed in the stack trace that this exception occurs when trying to build the query for the highlighting; I've confirmed this by copying the params and changing hl=true to hl=false. Unfortunately, when using debugQuery=on, I do not see any details on what is going on with the highlighting portion of the query (after artificially increasing the maxBooleanClauses so the query will run). With all of that said, my question(s) to the list are: Is there a way to determine how exactly the highlighter is building its query (i.e., some sort of highlighting debug setting)? Is the behavior of highlighting in SOLR intended to be held to the same restrictions (maxBooleanClauses) as the query
Slave replication with custom dataDir
Hey guys, I have 2 instances of Solr running, one as a master, one as a slave. Both have dataDir/var/lib/solr/data/dataDir The master works fine, the slave dies with a huge set of stack traces. The Solr wiki says that replication must match the dataDir if it's custom, but how do I actually set that?
Re: Slave replication with custom dataDir
This is a log dump, please be aware that this only appears in my log if I have the following enabled in config. dataDir/var/lib/solr/data/dataDir ... snip ... requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://10.1.2.196:8080/solr/replication/str str name=pollInterval00:00:20/str /lst /requestHandler Log ouput 03/11/2010 1:23:47 AM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.clinit(MultiThreadedHttpConnectionManager.java:70) at org.apache.solr.handler.SnapPuller.createHttpClient(SnapPuller.java:110) at org.apache.solr.handler.SnapPuller.init(SnapPuller.java:138) at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:775) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486) at org.apache.solr.core.SolrCore.init(SolrCore.java:589) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:593) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1484) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1329) ... 35 more 03/11/2010 1:23:47 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@689e8c34 main 03/11/2010 1:23:47 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.clinit(MultiThreadedHttpConnectionManager.java:70) at org.apache.solr.handler.SnapPuller.createHttpClient(SnapPuller.java:110) at org.apache.solr.handler.SnapPuller.init(SnapPuller.java:138) at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:775) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486) at org.apache.solr.core.SolrCore.init(SolrCore.java:589) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at
Query question
I can't seem to find the right formula for this. I have a need to build a query where one of the fields should boost the score, but not affect the query if there isn't a match. For example, if I have documents with restaurants, name, address, cuisine, description, etc. I want to search on, say, Romantic AND View AND city:Chicago if city is in fact Chicago it should score higher, but if city is not Chicago (or even if it's missing the city field), but matches the other query parameters it should still come back in the results. Is something like this possible? It's kind of like q=(some query) optional boost if field:value. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query question
I think you'll find the dismax request handler helpful in general, it supports more flexible query wrangling like that. With the dismax request handler, I think the bq (boost query) parameter will do what you need, eg: bq=city:Chicago^5.0 The ^5.0 is how much boost you want, you can play around with it to see what works well for your use cases. http://wiki.apache.org/solr/DisMaxQParserPlugin kenf_nc wrote: I can't seem to find the right formula for this. I have a need to build a query where one of the fields should boost the score, but not affect the query if there isn't a match. For example, if I have documents with restaurants, name, address, cuisine, description, etc. I want to search on, say, Romantic AND View AND city:Chicago if city is in fact Chicago it should score higher, but if city is not Chicago (or even if it's missing the city field), but matches the other query parameters it should still come back in the results. Is something like this possible? It's kind of like q=(some query) optional boost if field:value. Thanks, Ken
Re: Phrase Query Problem?
On 11/2/2010 9:21 AM, Ken Stanley wrote: On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken What it turned out to be was escaping the spaces. q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) became q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) If I tried q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) ... it didn't work. Once I removed the quotes and escaped spaces it worked as expected. This seems odd since I would have expected the quotes to have triggered a phrase query. Thanks for your help. - Tod
Re: Query question
Do you want something like (Romantic AND View) OR city:Chicago^10? Best Erick On Tue, Nov 2, 2010 at 10:45 AM, kenf_nc ken.fos...@realestate.com wrote: I can't seem to find the right formula for this. I have a need to build a query where one of the fields should boost the score, but not affect the query if there isn't a match. For example, if I have documents with restaurants, name, address, cuisine, description, etc. I want to search on, say, Romantic AND View AND city:Chicago if city is in fact Chicago it should score higher, but if city is not Chicago (or even if it's missing the city field), but matches the other query parameters it should still come back in the results. Is something like this possible? It's kind of like q=(some query) optional boost if field:value. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamically create new core
To create the core, the folder with the confs must already exist and has to be placed in the proper place (inside the solr home). Once you run the create core action, this core will we added to solr.xml and dinamically loaded. -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamically-create-new-core-tp1827097p1828560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting and maxBooleanClauses limit
(10/11/02 23:14), Ken Stanley wrote: I've noticed in the stack trace that this exception occurs when trying to build the query for the highlighting; I've confirmed this by copying the params and changing hl=true to hl=false. Unfortunately, when using debugQuery=on, I do not see any details on what is going on with the highlighting portion of the query (after artificially increasing the maxBooleanClauses so the query will run). With all of that said, my question(s) to the list are: Is there a way to determine how exactly the highlighter is building its query (i.e., some sort of highlighting debug setting)? Basically I think highlighter uses main query, but try to rewrite it before highlighting. Is the behavior of highlighting in SOLR intended to be held to the same restrictions (maxBooleanClauses) as the query parser (even though the highlighting query is built internally)? I think so because maxBooleanClauses is a static variable. I saw your stack trace and glance at highlighter source, my assumption is - highlighter tried to rewrite (expand) your range queries to boolean query, even if you set requireFieldMatch to true. Can you try to query without the range query? If the problem goes away, I think it is highlighter bug. Highlighter should skip the range query when user set requireFieldMatch to true, because your range query is for another field. If so, please open a jira issue. Koji -- http://www.rondhuit.com/en/
Re: Phrase Query Problem?
Indeed something doesn't seem right about that, quotes are for phrases, you are right, and I get confused even thinking about what happens when you try to escape spaces like that. I think there's something odd going on with your URI-escaping in general. Here's what the string should actually look like for mykeywords:Compliance With Conduct Standards , when put into a URI: mykeywords%3A%22Compliance+With+Conduct+Standards%22 You really ought to escape the colon and the double quotes too, to follow URI spec. If you weren't escaping the double-quotes, that could explain your issue. And I seriously don't understand what putting a backslash in the URI accomplishes in this case, it confuses me trying to understand what's going on there, and personally I never like it when i just try random things until something I don't understand works. Tod wrote: On 11/2/2010 9:21 AM, Ken Stanley wrote: On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken What it turned out to be was escaping the spaces. q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) became q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) If I tried q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) ... it didn't work. Once I removed the quotes and escaped spaces it worked as expected. This seems odd since I would have expected the quotes to have triggered a phrase query. Thanks for your help. - Tod
Re: Query question
Jonathan, Dismax is something I've been meaning to look into, and bq does seem to fit the bill, although I'm worried about this line in the wiki :TODO: That latter part is deprecated behavior but still works. It can be problematic so avoid it. It still seems to be the closest to what I want however so I'll play with it. Erick, that query would return all restaurants in Chicago, whether they matched Romantic View or not. Although the scores should sort relevant results to the top, the results would still contain a lot of things I wasn't interested in. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query question
Don't worry about that line. It just means that one particular kind of 'default' behavior in bq shouldn't be relied upon, if you don't entirely understand that behavior they're saying is deprecated (as I don't either!) anyway, don't worry about it, just supply an explicit boost in your bq. bq isn't going anywhere, it is a stable and well-used part of dismax. kenf_nc wrote: Jonathan, Dismax is something I've been meaning to look into, and bq does seem to fit the bill, although I'm worried about this line in the wiki :TODO: That latter part is deprecated behavior but still works. It can be problematic so avoid it. It still seems to be the closest to what I want however so I'll play with it. Erick, that query would return all restaurants in Chicago, whether they matched Romantic View or not. Although the scores should sort relevant results to the top, the results would still contain a lot of things I wasn't interested in.
Re: Highlighting and maxBooleanClauses limit
Hmm, i'm not sure it's the highlighter alone. Depending on the query it can also get triggered by the spellcheck component. See below what happens with a maxBoolean = 16. HTTP ERROR: 500 maxClauseCount is set to 16 org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 16 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153) at org.apache.lucene.search.spell.SpellChecker.add(SpellChecker.java:329) at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:260) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:140) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:140) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On Tuesday 02 November 2010 16:26:00 Koji Sekiguchi wrote: (10/11/02 23:14), Ken Stanley wrote: I've noticed in the stack trace that this exception occurs when trying to build the query for the highlighting; I've confirmed this by copying the params and changing hl=true to hl=false. Unfortunately, when using debugQuery=on, I do not see any details on what is going on with the highlighting portion of the query (after artificially increasing the maxBooleanClauses so the query will run). With all of that said, my question(s) to the list are: Is there a way to determine how exactly the highlighter is building its query (i.e., some sort of highlighting debug setting)? Basically I think highlighter uses main query, but try to rewrite it before highlighting. Is the behavior of highlighting in SOLR intended to be held to the same restrictions (maxBooleanClauses) as the query parser (even though the highlighting query is built internally)? I think so because maxBooleanClauses is a static variable. I saw your stack trace and glance at highlighter source, my assumption is - highlighter tried to rewrite (expand) your range queries to boolean query, even if you set requireFieldMatch to true. Can you try to query without the range query? If the problem goes away, I think it is highlighter bug. Highlighter should skip the range query when user set requireFieldMatch to true, because your range query is for another field. If so, please open a jira issue. Koji -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Highlighting and maxBooleanClauses limit
On Tue, Nov 2, 2010 at 11:26 AM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/11/02 23:14), Ken Stanley wrote: I've noticed in the stack trace that this exception occurs when trying to build the query for the highlighting; I've confirmed this by copying the params and changing hl=true to hl=false. Unfortunately, when using debugQuery=on, I do not see any details on what is going on with the highlighting portion of the query (after artificially increasing the maxBooleanClauses so the query will run). With all of that said, my question(s) to the list are: Is there a way to determine how exactly the highlighter is building its query (i.e., some sort of highlighting debug setting)? Basically I think highlighter uses main query, but try to rewrite it before highlighting. Is the behavior of highlighting in SOLR intended to be held to the same restrictions (maxBooleanClauses) as the query parser (even though the highlighting query is built internally)? I think so because maxBooleanClauses is a static variable. I saw your stack trace and glance at highlighter source, my assumption is - highlighter tried to rewrite (expand) your range queries to boolean query, even if you set requireFieldMatch to true. Can you try to query without the range query? If the problem goes away, I think it is highlighter bug. Highlighter should skip the range query when user set requireFieldMatch to true, because your range query is for another field. If so, please open a jira issue. Koji -- http://www.rondhuit.com/en/ Koji, that is most excellent. Thank you for pointing out that the range queries were causing the highlighter to exceed the maxBooleanClauses. Once I removed them from my main query (and moved them into separate filter queries), SOLR and highlighting worked as I expected them to work. Per your suggestion, I have opened a JIRA ticket (SOLR-2216) for this problem. I am somewhat a novice at Java, and I have not yet had the pleasure of getting the SOLR sources in my working environment, but I would be more than eager to potentially assist in finding a solution - with maybe some mentoring from a more experienced developer. Anyway, thank you again, I am very excited to have a suitable work around for the time being. - Ken Stanley
IndexableBinaryStringTools (was FieldCache)
Hi, [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte array. The size was increased to 7 characters (= 14 bytes) which is still a gain of more than 50 percent compared to the UTF8 encoding. BTW: I found no sample how to use the IndexableBinaryStringTools class except in the unit tests. IndexableBinaryStringTools will eventually be deprecated and then dropped, in favor of native indexable/searchable binary terms. More work is required before these are possible, though. Well-maintained unit tests are not a bad way to describe functionality... Sure, but there is no unit test for Solr. I assume that the char[] returned form IndexableBinaryStringTools.encode is encoded in UTF-8 again and then stored. At some point the information is lost and cannot be recovered. Can you give an example? This should not happen. It's hard to give an example output, because the binary string representation contains unprintiple characters. I'll try to explain what I'm doing. My character array returned by IndexableBinaryStringTools.encode looks like following: char[] encoded = new char[] {0, 8508, 3392, 64, 0, 8, 0, 0}; Then I add it to a SolrInputDocument: SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, new String(encoded)); If I now print the SolrInputDocument using System.out.println(doc), the String representation of the character array is correct. Then I add it to a RAMDirectory: ArrayListSolrInputDocument docs = new ArrayListSolrInputDocument(); docs.add(doc); solrServer.add(docs); solrServer.commit(); ... and immediately retrieve it like follows: SolrQuery query = new SolrQuery(); query.setQuery(*:*); QueryResponse rsp = solrServer.query(query); SolrDocumentList docList = rsp.getResults(); System.out.println(docList); Now the string representation of the SolrDocuments ID looks different than that of the SolrInputDocument. If I do not create a new string in doc.addField, just the string representation of the array address will be added the the SolrInputDocument. BTW: I've tested it with EmbeddedSolrServer and Solr/Lucene trunk. Why has the string representation changed? From the changed string I cannot decode the correct ID. -- Kind regards, Mathias
Re: Possible memory leaks with frequent replication
On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. That's our current solution - I was just wondering if there was anything I was missing. Thanks!
Re: Possible memory leaks with frequent replication
On Tue, Nov 2, 2010 at 12:32 PM, Simon Wistow si...@thegestalt.org wrote: On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. That's our current solution - I was just wondering if there was anything I was missing. You could also try dialing down maxWarmingSearchers to 1 - that should prevent multiple searchers warming at the same time and may be the source of you running out of memory. -Yonik http://www.lucidimagination.com
Re: Possible memory leaks with frequent replication
It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io. Lance Norskog wrote: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon
Solr like for autocomplete field?
I have a city field. Now when a user starts typing in a city textbox I want to return found matches (like Google). So for example, user types new, and I will return new york, new hampshire etc. my schema.xml field name=city type=string indexed=true stored=true/ my current url: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new Basically 2 questions here: 1. is the url Im using the best practice when implementing autocomplete? What I wanted to do, is use the facets for found matches. 2. How can I match PART of the cityname just like the SQL LIKE command, cityname LIKE '%userinput' Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr like for autocomplete field?
We used the filters talked about at Lucid Imagination for our site, it seems to work pretty well: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ Your mileage might vary, but its a pretty good place to start. Matt On 11/2/2010 1:56 PM, PeterKerk wrote: I have a city field. Now when a user starts typing in a city textbox I want to return found matches (like Google). So for example, user types new, and I will return new york, new hampshire etc. my schema.xml field name=city type=string indexed=true stored=true/ my current url: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new Basically 2 questions here: 1. is the url Im using the best practice when implementing autocomplete? What I wanted to do, is use the facets for found matches. 2. How can I match PART of the cityname just like the SQL LIKE command, cityname LIKE '%userinput' Thanks!
Querying Solr using dismax, requested field not showing up in debug score boosts
I'm storing a set of products in solr as ducuments. I'm separating out the name, description, keywords, and product category name into separate fields so that I can boost them independently using the dismax handler. All the fields are stored as text in the same way. I'm passing these four fields in the fl param to the dismax handler, and I'm also specifying them with a boost in the qf field. Not every record (document) has a category name associated with it, but the problem I have is that even when the category name comes back in the query results, I do not see the boost I am applying to that field taking effect in the debug output of the solr query. Does anyone have an idea of why this could be? -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1829456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query question
I... Need... more... coffee. On Tue, Nov 2, 2010 at 11:31 AM, kenf_nc ken.fos...@realestate.com wrote: Jonathan, Dismax is something I've been meaning to look into, and bq does seem to fit the bill, although I'm worried about this line in the wiki :TODO: That latter part is deprecated behavior but still works. It can be problematic so avoid it. It still seems to be the closest to what I want however so I'll play with it. Erick, that query would return all restaurants in Chicago, whether they matched Romantic View or not. Although the scores should sort relevant results to the top, the results would still contain a lot of things I wasn't interested in. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr like for autocomplete field?
Also, you might want to consider TermsComponent, see: http://wiki.apache.org/solr/TermsComponent Also, note that there's an autosuggestcomponent, that's recently been committed. Best Erick On Tue, Nov 2, 2010 at 1:56 PM, PeterKerk vettepa...@hotmail.com wrote: I have a city field. Now when a user starts typing in a city textbox I want to return found matches (like Google). So for example, user types new, and I will return new york, new hampshire etc. my schema.xml field name=city type=string indexed=true stored=true/ my current url: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new Basically 2 questions here: 1. is the url Im using the best practice when implementing autocomplete? What I wanted to do, is use the facets for found matches. 2. How can I match PART of the cityname just like the SQL LIKE command, cityname LIKE '%userinput' Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Querying Solr using dismax, requested field not showing up in debug score boosts
First, you should show us the query, as well as the debug output, it often helps to have a second set of eyes... Where are you specifying the qf? Under any circumstance it would be helpful to see the definition of the request handler you're using. Because as it stands, the best I can say is that I haven't a clue... Best Erick On Tue, Nov 2, 2010 at 1:51 PM, zakuhn zak.k...@extrabux.com wrote: I'm storing a set of products in solr as ducuments. I'm separating out the name, description, keywords, and product category name into separate fields so that I can boost them independently using the dismax handler. All the fields are stored as text in the same way. I'm passing these four fields in the fl param to the dismax handler, and I'm also specifying them with a boost in the qf field. Not every record (document) has a category name associated with it, but the problem I have is that even when the category name comes back in the query results, I do not see the boost I am applying to that field taking effect in the debug output of the solr query. Does anyone have an idea of why this could be? -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1829456.html Sent from the Solr - User mailing list archive at Nabble.com.
Updating last_modified field when using DIH
Hello everyone! I would like to ask you a question about DIH and delta import. I am trying to sync Solr with a PostgreSQL database and I have a field ent_lastModified of type timestamp without timezone. Here is my xml file: dataConfig dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://host user=XXX password=XXX readOnly=true autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT/ document entity name='myEntity' dataSource='jdbc' pk='id' query=' SELECT * FROM Entities' deltaImportQuery='SELECT ent_id AS id FROM Entities WHERE ent_id=${dataimporter.delta.id}' deltaQuery=' SELECT ent_id AS id FROM Entities WHERE ent_lastModified gt; #39;${dataimporter.last_index_time}#39;' /entity /document /dataConfig Full-import works fine, but when I run a delta-import the ent_lastModified field, I get the corresponding records, but the ent_lastModified stays the same, so if I make another delta-import, the same records are retreived. I have read all the documentation at http://wiki.apache.org/solr/DataImportHandler but I could not find an update query for the last_modified field and Solr does not seem to do this automatically. I have also tried to name the field last_modified as in the example, but its value keeps unchanged after a delta-import. Can anyone point me in the right direction? Thanks in advance! Juan M.
RE: Stored or indexed?
Thanks for the great info! I appreciate everybody's help in getting started with Solr, hopefully I'll be able to get my stuff working and move on to more difficult questions. :) -Original Message- From: Elizabeth L. Murnane [mailto:emurn...@architexa.com] Sent: Friday, October 29, 2010 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Stored or indexed? Hi Ron, In a nutshell - an indexed field is searchable, and a stored field has its content stored in the index so it is retrievable. Here are some examples that will hopefully give you a feel for how to set the indexed and stored options: indexed=true stored=true Use this for information you want to search on and also display in search results - for example, book title or author. indexed=false stored=true Use this for fields that you want displayed with search results but that don't need to be searchable - for example, destination URL, file system path, time stamp, or icon image. indexed=true stored=false Use this for fields you want to search on but don't need to get their values in search results. Here are some of the common reasons you would want this: Large fields and a database: Storing a field makes your index larger, so set stored to false when possible, especially for big fields. For this case a database is often used, as the previous responder said. Use a separate identifier field to get the field's content from the database. Ordering results: Say you define field name=bookName type=text indexed=true stored=true that is tokenized and used for searching. If you want to sort results based on book name, you could copy the field into a separate nonretrievable, nontokenized field that can be used just for sorting - field name=bookSort type=string indexed=true stored=false copyField source=bookName dest=bookSort Easier searching: If you define the field field name=text type=text indexed=true stored=false multiValued=true/ you can use it as a catch-all field that contains all of the other text fields. Since solr looks in a default field when given a text query without field names, you can support this type of general phrase query by making the catch-all the default field. indexed=false stored=false Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default. fieldtype name=ignored stored=false indexed=false dynamicField name=* type=ignored Elizabeth Murnane emurn...@architexa.com Architexa Lead Developer - www.architexa.com Understand Document Code In Seconds --- On Thu, 10/28/10, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Subject: Re: Stored or indexed? To: solr-user@lucene.apache.org Date: Thursday, October 28, 2010, 4:25 AM In our case, we just store a database id and do a secondary db query when displaying the results. This is handy and leads to a more centralised architecture when you need to display properties of a domain object which you don't index/search. On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote: Interesting wiki link, I hadn't seen that table before. And to answer your specific question about indexed=true, stored=false, this is most often done when you are using analyzers/tokenizers on your field. This field is for search only, you would never retrieve it's contents for display. It may in fact be an amalgam of several fields into one 'content' field. You have your display copy stored in another field marked indexed=false, stored=true and optionally compressed. I also have simple string fields set to lowercase so searching is case-insensitive, and have a duplicate field where the string is normal case. the first one is indexed/not stored, the second is stored/not indexed. -- View this message in context: http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Query question
Erick, that query would return all restaurants in Chicago, whether they matched Romantic View or not. Although the scores should sort relevant results to the top, the results would still contain a lot of things I wasn't interested in. How about this one? +(city:Chicago^1000 OR (*:* -city:Chicago)) +Romantic +View
Re: Querying Solr using dismax, requested field not showing up in debug score boosts
Ok, here is the query cleaned up a bit: solr/select/?q=mattress q.op=AND qt=dismaxfl=name%2Cdescription%2Cgroup_id%2Clowest_price%2Cnum_child_products%2Craw_category_string%2Ccategory_id%2Cparent_category_id%2Cstr_brand%2Cgrandparent_category_id%2Cgrandparent_category_name%2Cparent_category_name%2Ccategory_name start=0 rows=25 indent=on wt=php version=2.2 mm=35% ps=0 qs=0 sort=score%20desc fq=-parent_id:%5B%2A%20TO%20%2A%5Dfq=-num_child_products:%5B%2A%20TO%201%5Dfq=-parent_group_id:%5B%2A%20TO%20%2A%5Dqf=keywords%5E.5%20description%5E1.5%20brand%5E0.7%20manufacturer_model%5E4%20name%5E5%20upc%5E1%20isbn%5E1%20raw_category_string%5E.8%20category_id%5E1%20str_brand%5E1%20grandparent_category_name%5E1%20parent_category_name%5E2%20category_name%5E3 facet=true facet.limit=150 facet.mincount=1 facet.offset=0 facet.field=str_brand facet.field=grandparent_category_id -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1831414.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stored or indexed?
IMO, the very, very best way to increase your grasp of all things Solr is to try to answer questions on this list. Folks are pretty gentle about correcting mistaken posts. And I certainly remember any advice I've given that's been corrected G. Besides, if you try to answer the things you *do* understand, it leave more time for the committers to answer *your* questions G... Best Erick On Tue, Nov 2, 2010 at 4:39 PM, Olson, Ron rol...@lbpc.com wrote: Thanks for the great info! I appreciate everybody's help in getting started with Solr, hopefully I'll be able to get my stuff working and move on to more difficult questions. :) -Original Message- From: Elizabeth L. Murnane [mailto:emurn...@architexa.com] Sent: Friday, October 29, 2010 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Stored or indexed? Hi Ron, In a nutshell - an indexed field is searchable, and a stored field has its content stored in the index so it is retrievable. Here are some examples that will hopefully give you a feel for how to set the indexed and stored options: indexed=true stored=true Use this for information you want to search on and also display in search results - for example, book title or author. indexed=false stored=true Use this for fields that you want displayed with search results but that don't need to be searchable - for example, destination URL, file system path, time stamp, or icon image. indexed=true stored=false Use this for fields you want to search on but don't need to get their values in search results. Here are some of the common reasons you would want this: Large fields and a database: Storing a field makes your index larger, so set stored to false when possible, especially for big fields. For this case a database is often used, as the previous responder said. Use a separate identifier field to get the field's content from the database. Ordering results: Say you define field name=bookName type=text indexed=true stored=true that is tokenized and used for searching. If you want to sort results based on book name, you could copy the field into a separate nonretrievable, nontokenized field that can be used just for sorting - field name=bookSort type=string indexed=true stored=false copyField source=bookName dest=bookSort Easier searching: If you define the field field name=text type=text indexed=true stored=false multiValued=true/ you can use it as a catch-all field that contains all of the other text fields. Since solr looks in a default field when given a text query without field names, you can support this type of general phrase query by making the catch-all the default field. indexed=false stored=false Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default. fieldtype name=ignored stored=false indexed=false dynamicField name=* type=ignored Elizabeth Murnane emurn...@architexa.com Architexa Lead Developer - www.architexa.com Understand Document Code In Seconds --- On Thu, 10/28/10, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Subject: Re: Stored or indexed? To: solr-user@lucene.apache.org Date: Thursday, October 28, 2010, 4:25 AM In our case, we just store a database id and do a secondary db query when displaying the results. This is handy and leads to a more centralised architecture when you need to display properties of a domain object which you don't index/search. On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote: Interesting wiki link, I hadn't seen that table before. And to answer your specific question about indexed=true, stored=false, this is most often done when you are using analyzers/tokenizers on your field. This field is for search only, you would never retrieve it's contents for display. It may in fact be an amalgam of several fields into one 'content' field. You have your display copy stored in another field marked indexed=false, stored=true and optionally compressed. I also have simple string fields set to lowercase so searching is case-insensitive, and have a duplicate field where the string is normal case. the first one is indexed/not stored, the second is stored/not indexed. -- View this message in context: http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and
Re: Influencing scores on values in multiValue fields
Thanks Mike for your suggestion. It did take me down the correct route. I basically created another multiValue field of type 'string' and boosted that. To get the partial matches to avoid the length normalisation I had the 'text' type multiValue field to omitNorms. The results look as per expected so far on this configuration. Cheers -- Imran On Fri, Oct 29, 2010 at 1:09 PM, Michael Sokolov soko...@ifactory.comwrote: How about creating another field for doing exact matches (a string); searching both and boosting the string match? -Mike -Original Message- From: Imran [mailto:imranboho...@gmail.com] Sent: Friday, October 29, 2010 6:25 AM To: solr-user@lucene.apache.org Subject: Influencing scores on values in multiValue fields Hi All We've got an index in which we have a multiValued field per document. Assume the multivalue field values in each document to be; Doc1: bar lifters Doc2: truck tires back drops bar lifters Doc 3: iron bar lifters Doc 4: brass bar lifters iron bar lifters tire something truck something oil gas Now when we search for 'bar lifters' the expectation (based on the requirements) is that we get results in the order of Doc1, Doc 2, Doc4 and Doc3. Doc 1 - since there's an exact match (and only one) for the search terms Doc 2 - since ther'e an exact match amongst the values Doc 4 - since there's a partial match on the values but the number of matches are more than Doc 3 Doc 3 - since there's a partial match However, the results come out as Doc1, Doc3, Doc2, Doc4. Looking at the explaination of the result it appears Doc 2 is loosing to Doc3 and Doc 4 is loosing to Doc3 based on length normalisation. We think we can see the reason for that - the field length in doc2 is greater than doc3 and doc 4 is greater doc3. However, is there any mechanism I can force doc2 to beat doc3 and doc4 to beat doc3 with this structure. We did look at using omitNorms=true, but that messes up the scores for all docs. The result comes out as Doc4, Doc1, Doc2, Doc3 (where Doc1, Doc2 and Doc3 gets the same score) This is because the fieldNorm is not taken into account anymore (as expected) and the termFrequence being the only contributing factor. So trying to avoid length normalisation through omitNorms is not helping. Is there anyway where we can influence an exact match of a value in a multiValue field to add on to the overall score whilst keeping the lenght normalisation? Hope that makes sense. Cheers -- Imran
Re: Query question
My impression was that city:Chicago^10 +Romantic +View would do what you want (with the standard lucene query parser and default operator OR), and I'm not sure about this, but I have a feeling that the version with Boolean operators AND/OR and parens might actually net out to the same thing, since under the hood all the terms have to be translated into optional, required or forbidden: lucene doesn't actually have true binary boolean operators. At least that was the impression I got after some discussion at a recent conference. I may have misunderstood - if so, could someone who knows set me straight? Thanks -Mike On 11/2/2010 5:08 PM, Ahmet Arslan wrote: Erick, that query would return all restaurants in Chicago, whether they matched Romantic View or not. Although the scores should sort relevant results to the top, the results would still contain a lot of things I wasn't interested in. How about this one? +(city:Chicago^1000 OR (*:* -city:Chicago)) +Romantic +View
Re: xpath processing
?xml version=1.0 encoding=UTF-8? mods:mods xmlns:mods=http://www.loc.gov/mods/v3; xmlns:xlink=http://www.w3.org/1999/xlink; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd; mods:titleInfo mods:titleAny place I hang my hat is home/mods:title /mods:titleInfo mods:titleInfo type=uniform mods:titleSt. Louis woman/mods:title mods:partNameAny place I hang my hat is home/mods:partName /mods:titleInfo mods:titleInfo type=alternative mods:titleFree an' easy that's my style/mods:title /mods:titleInfo mods:name type=personal mods:namePartArlen, Harold/mods:namePart mods:namePart type=date1905-1986/mods:namePart mods:role mods:roleTerm authority=marcrelator type=textcreator/mods:roleTerm /mods:role /mods:name mods:name type=personal mods:namePartMercer, Johnny/mods:namePart mods:namePart type=date1909-/mods:namePart /mods:name mods:name type=personal mods:namePartDavison, R./mods:namePart /mods:name mods:name type=personal mods:namePartBontemps, Arna Wendell/mods:namePart mods:namePart type=date1902-1973/mods:namePart /mods:name mods:name type=personal mods:namePartCullen, Countee/mods:namePart mods:namePart type=date1903-1946/mods:namePart /mods:name mods:typeOfResourcenotated music/mods:typeOfResource mods:originInfo mods:place mods:placeTerm authority=marccountry type=codenyu/mods:placeTerm /mods:place mods:place mods:placeTerm type=textNew York/mods:placeTerm /mods:place mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher mods:dateIssuedc1946/mods:dateIssued mods:dateIssued encoding=marc1946/mods:dateIssued mods:issuancemonographic/mods:issuance mods:dateOther type=normalized1946/mods:dateOther mods:dateOther type=normalized1946/mods:dateOther /mods:originInfo mods:language mods:languageTerm authority=iso639-2b type=codeeng/mods:languageTerm /mods:language mods:physicalDescription mods:form authority=marcformprint/mods:form mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent /mods:physicalDescription mods:note type=statement of responsibilitymusic by Harold Arlen ; lyrics by Johnny Mercer./mods:note mods:noteFor voice and piano./mods:note mods:noteIncludes chord symbols./mods:note mods:noteIllustration by R. Davison./mods:note mods:noteFirst line: Free an' easy that's my style./mods:note mods:noteEdward Gross presents St. Louis Woman ... Book by Arna Bontemps amp; Countee Cullen -- Cover./mods:note mods:notePublisher's advertising includes musical incipits./mods:note mods:subject authority=lcsh mods:topicMotion picture music/mods:topic mods:topicExcerpts/mods:topic mods:topicVocal scores with piano/mods:topic /mods:subject mods:classification authority=lccM1 .S8/mods:classification mods:identifier type=music plate1403-4 De Sylva, Brown Henderson, Inc./mods:identifier mods:location mods:physicalLocationLilly Library, Indiana University Bloomington/mods:physicalLocation /mods:location mods:recordInfo mods:recordContentSource authority=marcorgIUL/mods:recordContentSource mods:recordCreationDate encoding=marc990316/mods:recordCreationDate mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier /mods:recordInfo /mods:mods Above is my sample xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\test_xml entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=uw/ field column=collectionName template=University of Washington Pacific Northwest Sheet Music Collection/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=nameNamePart_keyword xpath=/mods/name/namepa...@type != 'date']/ /entity /entity /document /dataConfig above is the data config file The namePart element in the above xml may or may not have type attribute. How can i get data from the namePart element which has no type attribute? xpath=/mods/name/namepa...@type != 'date'] This is not working. I dont get any errors ,There is no namePart_keyword in the index. Quoting Ken Stanley doh...@gmail.com:
Re: Ensuring stable timestamp ordering
memory's cheap! (I know processing it is not' though ) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Toke Eskildsen t...@statsbiblioteket.dk To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, November 1, 2010 11:45:34 PM Subject: RE: Ensuring stable timestamp ordering Dennis Gearon [gear...@sbcglobal.net] wrote: how about a timrstamp with either a GUID appended on the end of it? Since long (8 bytes) is the largest atomic type supported by Java, this would have to be represented as a String (or rather BytesRef) and would take up 4 + 32 bytes + 2 * 4 bytes from the internal BytesRef-attributes + some extra overhead. That is quite a large memory penalty to ensure unique timestamps.
Re: using HebMorph
I don't know the paths in the Solr package for Ubuntu. In the Solr apache release, you go to the example/ directory. The example/solr directory needs a new lib directory, and you copy the jars to there. Then run 'java -jar start.jar' still in the example/ directory. Solr should start. Now, you need to study example/solr/conf/schema.xml and look at what Analyzers do. Good luck! On Tue, Nov 2, 2010 at 12:37 AM, mark peleus mark.pel...@gmail.com wrote: Hi I'm trying to use HebMorph, a new Hebrew analyzer. http://github.com/itaifrenkel/HebMorph/tree/master/java/ The instructions says: 1. Download the code from herehttp://github.com/synhershko/HebMorph/tree/master/java/ . 2. Use the hebmorph ant scripthttp://github.com/synhershko/HebMorph/blob/master/java/hebmorph/build.xmlto build the hebmorph project. 3. Use the lucene.hebrew ant scripthttp://github.com/synhershko/HebMorph/blob/master/java/lucene.hebrew/build.xmlto build the lucene.hebrew project. 4. Copy both jar files to the solr/lib folder. 5. Edit your solr/conf/schema.xml file to use the analyzer you choose to use. I've installed the Solr package under ubuntu Lucyd. I've completed steps 1-3. Where do I put the jar files? How do I make Solr use the analyzer? Thanks -- Lance Norskog goks...@gmail.com
Re: how to get TermVectorComponent using xml , vs. SOLR-949
TVC is in Solr 1.4 onwards. It is configured in example/solr/conf/solrconfig.xml as 'tvrh'. This is not a solr/url thing, so you have to say solr/select?q=word'qt=tvrh' and look at the bottom of the xml. On Tue, Nov 2, 2010 at 5:34 AM, Will Milspec will.mils...@gmail.com wrote: Hi all, This seems a basic question: what's the best way to get TermVectorComponents. from the Solr XmL response? SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch adds this ability, but after 2 years it's still not in the mainline. (And doesn't patch cleanly to the current head 1.4). I'm new to Solr and familiar with the SolrJ but not as the best means for getting/parsing the raw xml. (Typically I find the dtd and right code to parse the dom using the dtd. In this case I've seen a few examples, but nothing definiive) Our team would rather use the out of the box solr rather than manually apply patches and worry about consistency during upgrades... Thanks in advance, will -- Lance Norskog goks...@gmail.com
Re: Disk usage per-field
The Lucene CheckIndex program opens an index and reads many types of data from it. It's easy to start with it and change that to count up the space used by terms and store data for field X. On Tue, Nov 2, 2010 at 5:51 AM, Muneeb Ali muneeba...@hotmail.com wrote: Hi, I am currently benchmarking solr index with different fields to see the impact on its size/ search speed etc. A feature to find the disk usage per field of index would be really handy and save me alot of time. Do we have any updates on this? Has anyone tried writing custom code for it ? - Muneeb -- View this message in context: http://lucene.472066.n3.nabble.com/Disk-usage-per-field-tp934765p1827739.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: How to use polish stemmer - Stempel - in schema.xml?
Here's the problem: Solr is a little dumb about these Filter classes, and so you have to make a Factory object for the Stempel Filter. There are a lot of other FilterFactory classes. You would have to just copy one and change the names to Stempel and it might actually work. This will take some Solr programming- perhaps the author can help you? On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote: Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as
Re: Possible memory leaks with frequent replication
Isn't that what this code does? onDeckSearchers++; if (onDeckSearchers 1) { // should never happen... just a sanity check log.error(logid+ERROR!!! onDeckSearchers is + onDeckSearchers); onDeckSearchers=1; // reset } else if (onDeckSearchers maxWarmingSearchers) { onDeckSearchers--; String msg=Error opening new searcher. exceeded limit of maxWarmingSearchers=+maxWarmingSearchers + , try again later.; log.warn(logid++ msg); // HTTP 503==service unavailable, or 409==Conflict throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); } else if (onDeckSearchers 1) { log.info(logid+PERFORMANCE WARNING: Overlapping onDeckSearchers= + onDeckSearchers); } On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote: It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io. Lance Norskog wrote: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon -- Lance Norskog goks...@gmail.com
Re: Solr like for autocomplete field?
And the SpellingComponent. There's nothing to help you with phrases. On Tue, Nov 2, 2010 at 11:21 AM, Erick Erickson erickerick...@gmail.com wrote: Also, you might want to consider TermsComponent, see: http://wiki.apache.org/solr/TermsComponent Also, note that there's an autosuggestcomponent, that's recently been committed. Best Erick On Tue, Nov 2, 2010 at 1:56 PM, PeterKerk vettepa...@hotmail.com wrote: I have a city field. Now when a user starts typing in a city textbox I want to return found matches (like Google). So for example, user types new, and I will return new york, new hampshire etc. my schema.xml field name=city type=string indexed=true stored=true/ my current url: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new Basically 2 questions here: 1. is the url Im using the best practice when implementing autocomplete? What I wanted to do, is use the facets for found matches. 2. How can I match PART of the cityname just like the SQL LIKE command, cityname LIKE '%userinput' Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: xpath processing
The XPathEP has the option to run a real XSL script at some point in its processing chain. I guess you could make an XSL that pulls your fields out into a simpler XML in the /a/b/c format that the XPath parser supports. On Tue, Nov 2, 2010 at 5:37 PM, pghorp...@ucla.edu wrote: ?xml version=1.0 encoding=UTF-8? mods:mods xmlns:mods=http://www.loc.gov/mods/v3; xmlns:xlink=http://www.w3.org/1999/xlink; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd; mods:titleInfo mods:titleAny place I hang my hat is home/mods:title /mods:titleInfo mods:titleInfo type=uniform mods:titleSt. Louis woman/mods:title mods:partNameAny place I hang my hat is home/mods:partName /mods:titleInfo mods:titleInfo type=alternative mods:titleFree an' easy that's my style/mods:title /mods:titleInfo mods:name type=personal mods:namePartArlen, Harold/mods:namePart mods:namePart type=date1905-1986/mods:namePart mods:role mods:roleTerm authority=marcrelator type=textcreator/mods:roleTerm /mods:role /mods:name mods:name type=personal mods:namePartMercer, Johnny/mods:namePart mods:namePart type=date1909-/mods:namePart /mods:name mods:name type=personal mods:namePartDavison, R./mods:namePart /mods:name mods:name type=personal mods:namePartBontemps, Arna Wendell/mods:namePart mods:namePart type=date1902-1973/mods:namePart /mods:name mods:name type=personal mods:namePartCullen, Countee/mods:namePart mods:namePart type=date1903-1946/mods:namePart /mods:name mods:typeOfResourcenotated music/mods:typeOfResource mods:originInfo mods:place mods:placeTerm authority=marccountry type=codenyu/mods:placeTerm /mods:place mods:place mods:placeTerm type=textNew York/mods:placeTerm /mods:place mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher mods:dateIssuedc1946/mods:dateIssued mods:dateIssued encoding=marc1946/mods:dateIssued mods:issuancemonographic/mods:issuance mods:dateOther type=normalized1946/mods:dateOther mods:dateOther type=normalized1946/mods:dateOther /mods:originInfo mods:language mods:languageTerm authority=iso639-2b type=codeeng/mods:languageTerm /mods:language mods:physicalDescription mods:form authority=marcformprint/mods:form mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent /mods:physicalDescription mods:note type=statement of responsibilitymusic by Harold Arlen ; lyrics by Johnny Mercer./mods:note mods:noteFor voice and piano./mods:note mods:noteIncludes chord symbols./mods:note mods:noteIllustration by R. Davison./mods:note mods:noteFirst line: Free an' easy that's my style./mods:note mods:noteEdward Gross presents St. Louis Woman ... Book by Arna Bontemps amp; Countee Cullen -- Cover./mods:note mods:notePublisher's advertising includes musical incipits./mods:note mods:subject authority=lcsh mods:topicMotion picture music/mods:topic mods:topicExcerpts/mods:topic mods:topicVocal scores with piano/mods:topic /mods:subject mods:classification authority=lccM1 .S8/mods:classification mods:identifier type=music plate1403-4 De Sylva, Brown Henderson, Inc./mods:identifier mods:location mods:physicalLocationLilly Library, Indiana University Bloomington/mods:physicalLocation /mods:location mods:recordInfo mods:recordContentSource authority=marcorgIUL/mods:recordContentSource mods:recordCreationDate encoding=marc990316/mods:recordCreationDate mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier /mods:recordInfo /mods:mods Above is my sample xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\test_xml entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=uw/ field column=collectionName template=University of Washington Pacific Northwest Sheet Music Collection/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=nameNamePart_keyword xpath=/mods/name/namepa...@type != 'date']/ /entity /entity /document /dataConfig above is the data config file The namePart element in the above xml may or may not