Query term completion via the suggester
Hi, I am trying to configure the suggester for solr 3.6 as described under the http://wiki.apache.org/solr/Suggester but the configuration does not work. I cannot figure out what I am doing wrong... After starting Solr-Server I am getting an exception org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param. If I try to do a query to get a query suggestion http://localhost:8983/solr/suggest?q=compdf=autocomplete;, Solr only returns documents but no suggestions for query completion. In the schema.xml the field is defined as following: field name=autocomplete type=textSpell indexed=true stored=false multiValued=true /. The text spell type is: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The request handler is defined is following: requestHandler name=/suggest class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarya_suggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst attr name=components strsuggest/str /attr /requestHandler The corresponding suggest component: searchComponent name=suggest class=solr.SpellCheckComponent lst name=spellchecker str name=namea_suggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str str name=fieldautocomplete/str str name=buildOnOptimizetrue/str int name=weightBuckets100/int /lst /searchComponent best regards, Michael
Re: are stopwords indexed?
Hi Giovanni, you have entered the stopwords into stopword.txt file, right? But in the definition of the field type you are referencing stopwords_FR.txt.. best regards, Michael On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType text from the example schema.xml, i.e. I have -- -- 8 -- -- 8 -- -- 8 -- -- 8 fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / [...] /analyzer analyzer type=query [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / /analyzer /fieldType -- -- 8 -- -- 8 -- -- 8 -- -- 8 * searching for a stopwords thru solr gives always zero results * inspecting the index with LuCLI http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html show that all stopwords are in my index. Note that I query LuCLI specifying the field, i.e. with myFieldName:and and not just with the stopword and. Is this normal? Are stopwords indexed? Cheers, Giovanni
Re: Problem while indexing XML file with special characters represented uuml
Somebody any idea? Solr seems to ignore the DTD definition and therefore does not understand the entities like uuml; or auml; that are defined in dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD definition? On Fri, 06 Jul 2012 10:58:59 +0200, Michael Belenki v...@belenki.name wrote: Dear community, I am experiencing strange problem while trying to index / to import XML document to SOLR via DataImportHandler. The XML document contains some special characters (e.g. german ü) that are represented as XML entities ü or ä. There is also DTD file that defines these entities (!ENTITY uumlü ) (I tried to use dtd file as well as to include the DTD definition to the xml itself). After I start the import command full-import, the import process throws an exception as soon as it tries to parse ü: Un declared general entity uuml. Did anyone already face such a problem? best regards, Michael My data-config for importing is: dataConfig dataSource type=FileDataSource encoding=ISO-8859-1 / document !-- stream should be true since huge xml document is being parsed -- entity name=article processor=XPathEntityProcessor stream=true forEach=/dblp/article url=documents/dblp.xml field column=keyxpath=/dblp/article/@key / field column=title xpath=/dblp/article/title / /entity /document /dataConfig The XML file looks e.g. like this: ?xml version=1.0 encoding=ISO-8859-1? !DOCTYPE dblp [ !ENTITY uumlü !-- small u, dieresis or umlaut mark -- ] dblp article key=journals/fm/Riccardi09 mdate=2011-10-27 authorMarco Riccardi/author titleSolution of Cubic and Quartic Equations.ü/title pages117-122/pages year2009/year volume17/volume journalFormalized Mathematics/journal number1-4/number eehttp://dx.doi.org/10.2478/v10037-009-0012-z/eeurldb/journals/fm/fm17.html#Riccardi09/url /article/dblp The stack-trace is: 05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1 05.07.2012 17:37:19 org.apache.solr.common.SolrException log SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeE xception: org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:documents/dblp.xml rows processed in this xml:2 last row in this xml:{title=Common Subexpression Identification in General Algebraic System s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo rter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j ava:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja va:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataIm portHandlerException: Parsing failed for xml, url:documents/dblp.xml rows proces sed in this xml:2 last row in this xml:{title=Common Subexpression Identificatio n in General Algebraic Systems., $forEach=/dblp/article, key=persons/Hall74} Pro cessing Document # 3 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j ava:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Parsin g failed for xml, url:documents/dblp.xml rows processed in this xml:2 last row i n this xml:{title=Common Subexpression Identification in General Algebraic Syste ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd Throw(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE ntityProcessor.java:504) at org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE ntityProcessor.java:517) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity ProcessorBase.java:120) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow( XPathEntityProcessor.java:225) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath EntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent ityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:296
Problem while indexing XML file with special characters represented uuml
Dear community, I am experiencing strange problem while trying to index / to import XML document to SOLR via DataImportHandler. The XML document contains some special characters (e.g. german ü) that are represented as XML entities uuml; or auml;. There is also DTD file that defines these entities (!ENTITY uuml#252; ) (I tried to use dtd file as well as to include the DTD definition to the xml itself). After I start the import command full-import, the import process throws an exception as soon as it tries to parse uuml;: Un declared general entity uuml. Did anyone already face such a problem? best regards, Michael My data-config for importing is: dataConfig dataSource type=FileDataSource encoding=ISO-8859-1 / document !-- stream should be true since huge xml document is being parsed -- entity name=article processor=XPathEntityProcessor stream=true forEach=/dblp/article url=documents/dblp.xml field column=keyxpath=/dblp/article/@key / field column=title xpath=/dblp/article/title / /entity /document /dataConfig The XML file looks e.g. like this: ?xml version=1.0 encoding=ISO-8859-1? !DOCTYPE dblp [ !ENTITY uuml#252; !-- small u, dieresis or umlaut mark -- ] dblp article key=journals/fm/Riccardi09 mdate=2011-10-27 authorMarco Riccardi/author titleSolution of Cubic and Quartic Equations.uuml;/title pages117-122/pages year2009/year volume17/volume journalFormalized Mathematics/journal number1-4/number eehttp://dx.doi.org/10.2478/v10037-009-0012-z/eeurldb/journals/fm/fm17.html#Riccardi09/url /article/dblp The stack-trace is: 05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1 05.07.2012 17:37:19 org.apache.solr.common.SolrException log SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeE xception: org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:documents/dblp.xml rows processed in this xml:2 last row in this xml:{title=Common Subexpression Identification in General Algebraic System s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo rter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j ava:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja va:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataIm portHandlerException: Parsing failed for xml, url:documents/dblp.xml rows proces sed in this xml:2 last row in this xml:{title=Common Subexpression Identificatio n in General Algebraic Systems., $forEach=/dblp/article, key=persons/Hall74} Pro cessing Document # 3 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j ava:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Parsin g failed for xml, url:documents/dblp.xml rows processed in this xml:2 last row i n this xml:{title=Common Subexpression Identification in General Algebraic Syste ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd Throw(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE ntityProcessor.java:504) at org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE ntityProcessor.java:517) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity ProcessorBase.java:120) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow( XPathEntityProcessor.java:225) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath EntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent ityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:619) ... 5 more Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Un declared general entity uuml at [row,col {unknown-source}]: [26,42] at