Query term completion via the suggester

2012-07-28 Thread Michael Belenki
Hi,

I am trying to configure the suggester for solr 3.6 as described under the
http://wiki.apache.org/solr/Suggester but the configuration does not work.
I cannot figure out what I am doing wrong...

After starting Solr-Server I am getting an exception
org.apache.solr.common.SolrException: no field name specified in
query and no default specified via 'df' param. If I try to do a query to
get a query suggestion
http://localhost:8983/solr/suggest?q=compdf=autocomplete;, Solr only
returns documents but no suggestions for query completion.


In the schema.xml the field is defined as following: field
name=autocomplete type=textSpell indexed=true stored=false
multiValued=true /. The text spell type is: 

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


The request handler is defined is following:

requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarya_suggest/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatetrue/str
/lst

attr name=components
strsuggest/str
/attr
/requestHandler

The corresponding suggest component:

   searchComponent name=suggest class=solr.SpellCheckComponent
lst name=spellchecker
str name=namea_suggest/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
str name=fieldautocomplete/str
str name=buildOnOptimizetrue/str
int name=weightBuckets100/int
/lst
   /searchComponent

best regards,

Michael


Re: are stopwords indexed?

2012-07-16 Thread Michael Belenki
Hi Giovanni,

you have entered the stopwords into stopword.txt file, right? But in the
definition of the field type you are referencing stopwords_FR.txt..

best regards,

Michael
On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich
g.gherdov...@gmail.com wrote:
 Hi all,
 
 are stopwords from the stopwords.txt config file
 supposed to be indexed?
 
 I would say no, but this is the situation I am
 observing on my Solr instance:
 
 * I have a bunch of stopwords in stopwords.txt
 * my fields are of fieldType text from the example schema.xml,
   i.e. I have
 
 -- -- 8 -- -- 8 -- -- 8 -- -- 8
fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 [...]
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
 [...]
   /analyzer
   analyzer type=query
  [...]
  filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
   /analyzer
/fieldType
 -- -- 8 -- -- 8 -- -- 8 -- -- 8
 
 * searching for a stopwords thru solr gives always zero results
 * inspecting the index with LuCLI
 http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
   show that all stopwords are in my index. Note that I query
   LuCLI specifying the field, i.e. with myFieldName:and
   and not just with the stopword and.
 
 Is this normal?
 
 Are stopwords indexed?
 
 Cheers,
 Giovanni


Re: Problem while indexing XML file with special characters represented uuml

2012-07-09 Thread Michael Belenki
Somebody any idea? Solr seems to ignore the DTD definition and therefore
does not understand the entities like uuml; or auml; that are defined in
dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
definition?

On Fri, 06 Jul 2012 10:58:59 +0200, Michael Belenki v...@belenki.name
wrote:
 Dear community,
 
 I am experiencing strange problem while trying to index / to import XML
 document to SOLR via DataImportHandler. The XML document contains some
 special characters (e.g. german ü) that are represented as XML entities
 ü or ä. There is also DTD file that defines these entities
 (!ENTITY uumlü ) (I tried to use dtd file as well as to
 include the DTD definition to the xml itself). After I start the import
 command full-import, the import process throws an exception as soon as
it
 tries to parse ü: Un
 declared general entity uuml. Did anyone already face such a problem? 
 
 best regards,
 
 Michael
 
 
 My data-config for importing is:
 
 
 dataConfig
 dataSource type=FileDataSource encoding=ISO-8859-1 /
 document
   !--  stream should be true since huge xml document is being 
 parsed
--
 entity name=article
 processor=XPathEntityProcessor
 stream=true
 forEach=/dblp/article
 url=documents/dblp.xml
 
 
 field column=keyxpath=/dblp/article/@key /
 field column=title xpath=/dblp/article/title /
 
 
/entity
 /document
 /dataConfig
 
 The XML file looks e.g. like this:
 
 ?xml version=1.0 encoding=ISO-8859-1?
 
 !DOCTYPE dblp [
 
 !ENTITY uumlü !-- small u, dieresis or umlaut mark --
 ]
 dblp
 
 article key=journals/fm/Riccardi09 mdate=2011-10-27
 authorMarco Riccardi/author
 titleSolution of Cubic and Quartic Equations.ü/title
 pages117-122/pages
 year2009/year
 volume17/volume
 
 journalFormalized Mathematics/journal
 
 number1-4/number

eehttp://dx.doi.org/10.2478/v10037-009-0012-z/eeurldb/journals/fm/fm17.html#Riccardi09/url
 /article/dblp
 
 The stack-trace is:
 
 05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1
 05.07.2012 17:37:19 org.apache.solr.common.SolrException log
 SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeE
 xception: org.apache.solr.handler.dataimport.DataImportHandlerException:
 Parsing
  failed for xml, url:documents/dblp.xml rows processed in this xml:2
last
 row in
  this xml:{title=Common Subexpression Identification in General
Algebraic
 System
 s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
 :264)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
 rter.java:375)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
 ava:445)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
 va:426)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataIm
 portHandlerException: Parsing failed for xml, url:documents/dblp.xml
rows
 proces
 sed in this xml:2 last row in this xml:{title=Common Subexpression
 Identificatio
 n in General Algebraic Systems., $forEach=/dblp/article,
 key=persons/Hall74} Pro
 cessing Document # 3
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
 r.java:621)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
 ava:327)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
 :225)
 ... 3 more
 Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException:
 Parsin
 g failed for xml, url:documents/dblp.xml rows processed in this xml:2
last
 row i
 n this xml:{title=Common Subexpression Identification in General
Algebraic
 Syste
 ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
 Throw(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
 ntityProcessor.java:504)
 at
 org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
 ntityProcessor.java:517)
 at
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
 ProcessorBase.java:120)
 at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
 XPathEntityProcessor.java:225)
 at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath
 EntityProcessor.java:204)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent
 ityProcessorWrapper.java:330)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
 ityProcessorWrapper.java:296

Problem while indexing XML file with special characters represented uuml

2012-07-06 Thread Michael Belenki
Dear community,

I am experiencing strange problem while trying to index / to import XML
document to SOLR via DataImportHandler. The XML document contains some
special characters (e.g. german ü) that are represented as XML entities
uuml; or auml;. There is also DTD file that defines these entities
(!ENTITY uuml#252; ) (I tried to use dtd file as well as to
include the DTD definition to the xml itself). After I start the import
command full-import, the import process throws an exception as soon as it
tries to parse uuml;: Un
declared general entity uuml. Did anyone already face such a problem? 

best regards,

Michael


My data-config for importing is:


dataConfig
dataSource type=FileDataSource encoding=ISO-8859-1 /
document
!--  stream should be true since huge xml document is being 
parsed --
entity name=article
processor=XPathEntityProcessor
stream=true
forEach=/dblp/article
url=documents/dblp.xml


field column=keyxpath=/dblp/article/@key /
field column=title xpath=/dblp/article/title /


   /entity
/document
/dataConfig

The XML file looks e.g. like this:

?xml version=1.0 encoding=ISO-8859-1?

!DOCTYPE dblp [

!ENTITY uuml#252; !-- small u, dieresis or umlaut mark --
]
dblp

article key=journals/fm/Riccardi09 mdate=2011-10-27
authorMarco Riccardi/author
titleSolution of Cubic and Quartic Equations.uuml;/title
pages117-122/pages
year2009/year
volume17/volume

journalFormalized Mathematics/journal

number1-4/number
eehttp://dx.doi.org/10.2478/v10037-009-0012-z/eeurldb/journals/fm/fm17.html#Riccardi09/url
/article/dblp

The stack-trace is:

05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1
05.07.2012 17:37:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeE
xception: org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsing
 failed for xml, url:documents/dblp.xml rows processed in this xml:2 last
row in
 this xml:{title=Common Subexpression Identification in General Algebraic
System
s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:264)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:375)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:445)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:426)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataIm
portHandlerException: Parsing failed for xml, url:documents/dblp.xml rows
proces
sed in this xml:2 last row in this xml:{title=Common Subexpression
Identificatio
n in General Algebraic Systems., $forEach=/dblp/article,
key=persons/Hall74} Pro
cessing Document # 3
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:621)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:327)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsin
g failed for xml, url:documents/dblp.xml rows processed in this xml:2 last
row i
n this xml:{title=Common Subexpression Identification in General Algebraic
Syste
ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
ntityProcessor.java:504)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
ntityProcessor.java:517)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
ProcessorBase.java:120)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
XPathEntityProcessor.java:225)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath
EntityProcessor.java:204)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent
ityProcessorWrapper.java:330)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:296)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:683)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:619)
... 5 more
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Un
declared general entity uuml
 at [row,col {unknown-source}]: [26,42]
at