track unused parts of config, schema
Hi, Our configs, schemas are quite big. Are there any tools, code snippets in various languages, methodologies that people use in cleaning such up? For methodologies I might instead say things to look for that are almost always there and almost never used so I can look at those first. Thanks, Bryan Rasmussen
getTransformer error
Hi, I am trying to transforrm the results using xslt - I store my xslts in conf/xslt/ I call them in the querystring with the parameters wt=xslttr=result.xsl And get back an error: getTransformer fails in getContentType java.lang.RuntimeException: getTransformer fails in getContentType ... Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl' ... Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet I'm supposing it is not an XSLT issue as I am able to run the transformation via command line with Xalan. Thanks, Bryan Rasmussen
Re: getTransformer error
Ok I guess it is nonetheless a stylesheet problem, as a basic hello world outputting stylesheet works. thanks, Bryan Rasmussen On Fri, Jun 10, 2011 at 10:12 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, I am trying to transforrm the results using xslt - I store my xslts in conf/xslt/ I call them in the querystring with the parameters wt=xslttr=result.xsl And get back an error: getTransformer fails in getContentType java.lang.RuntimeException: getTransformer fails in getContentType ... Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl' ... Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet I'm supposing it is not an XSLT issue as I am able to run the transformation via command line with Xalan. Thanks, Bryan Rasmussen
solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory
str name=hl.flall_text title/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize150/str /lst arr name=last-components strclustering/str /arr /requestHandler with the following command to start solr java -Dsolr.clustering.enabled=true -Dsolr.solr.home=C:\projects\solrexample\solr -jar start.jar Any idea as to why crusty is not working? Thanks, Bryan Rasmussen
clustering problems on 3.1
I added the following to my configuration lib dir=c:/projects/solrtest/dist/ regex=apache-solr-clustering-.*\.jar / requestHandler name=clusty class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str bool name=clusteringtrue/bool str name=clustering.enginedefault/str bool name=clustering.resultstrue/bool !-- Fields to cluster on -- str name=carrot.titletitle/str str name=carrot.snippetall_text/str str name=hl.flall_text title/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize150/str /lst arr name=last-components strclustering/str /arr /requestHandler searchComponent class=org.apache.solr.handler.clustering.ClusteringComponent name=clustering lst name=engine str name=namedefault/str str name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str !-- Engine-specific parameters -- str name=LingoClusteringAlgorithm.desiredClusterCountBase20/str /lst /searchComponent which ended up with the message solr java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory and whenever I did a request I got a 404 response back and SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.SolrCore@14db38a4 (core1) has a reference count of 1 appeared in my console. Any suggestions? Thanks, Bryan Rasmussen
Re: Solr vs ElasticSearch
Well, I recently chose it for a personal project and the deciding thing for me was that it had nice integration to couchdb. Thanks, Bryan Rasmussen On Wed, Jun 1, 2011 at 4:33 AM, Mark static.void@gmail.com wrote I've been hearing more and more about ElasticSearch. Can anyone give me a rough overview on how these two technologies differ. What are the strengths/weaknesses of each. Why would one choose one of the other? Thanks
Re: HTMLStripTransformer will remove the content in XML??
I would expect that it doesn't understand CDATA and thinks of everything between and as a 'tag'. Best Regards, Bryan Rasmussen On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote: I have an XML string like this: ?xml version=1.0 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr ]]/loc/language By using HTMLStripTransformer, I expect to get 'hello,solr'. But actual this transformer will remove ALL THE TEXT INSIDE! Did I do something silly, or is it a bug? Thank you
Re: problem in setting field attribute in schema.xml
ya...but when i set indexed=false for a particular field, and i search as *:* then it will search all documents thats true, but what i think is it should not contain the field which i set as indexed=true. for example in a document fields are id, author,title. and i for author field i set indexed=false, then author should not be indexed and when i perform search as *:* it should show all documents as doc string name= id id1/string string name=titlet1/string string name=authora1/string /doc Well, since I am only a beginner myself I have to say what my experience is - given that I have cleared my index, restarted, reindexed with new schema settings and do a restart (which is probably overdone) and if the schema I indexed with says indexed = false, stored=true for author and I search for author:a1 then I will get 0 results as I expect and if I search for id:id1 then it will show doc string name= id id1/string string name=titlet1/string string name=authora1/string /doc as I expect - is this what is happening for you? if it is happening and you are confused as to why I can't answer why on a technical level as I assume it is based on design decisions which I would agree don't seem sensible to me but is very probably based on some underlying technical reason that I am not familiar with. If you want to make sure that you do only see id and title in your result then either set stored = false for author (although why would you have a field that was both not stored and not indexed I don't know) or use the fl parameter on your request to give the list of fields you want returned - for example fl=id,title in the querystring for the request should mean you would just see string name= id id1/string string name=titlet1/string and not string name=authora1/string Best Regards, Bryan Rasmussen
Re: problem in setting field attribute in schema.xml
From my experience if it is indexing content that you have told it not to index that is because you haven't cleared your old indexed content. If you index something using schema version 5 which says indexed = true and then you change it to indexed = false you have to delete your old indexed content and reindex using the new schema, with lots of stopping and restarting involved. So - delete index, restart with new schema, index content with new schema. Best Regards, Bryan Rasmussen On Thu, May 26, 2011 at 11:24 AM, Romi romijain3...@gmail.com wrote: thanks a lot bryan: it might be again the repetition, but i just want to know WHY it is indexing the field when it is indexed=false, what if stored=true, it is clearly written in documentation that a field is search able only if it is indexed=true, which surely make sense. and my application is not saying to do so i am just experimenting with solr to learn it. want to clear my concepts about indexing. Thanks Romi - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2988066.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
Well I'm probably being overly cautious here but its been my experience that if I have a schema that says indexed = true on a field and I change it to indexed = false I have to delete my index to get rid of everything that was indexed with the old schema and I have to restart to be able to index with the new schema. I've had the situation a number of times where I have changed the indexing rule for a field and not followed these steps and been surprised when my index does not follow my expectations - and it seems like you are experiencing the same thing. Best Regards, Bryan Rasmussen
Re: problem in setting field attribute in schema.xml
On Thu, May 26, 2011 at 2:10 PM, Romi romijain3...@gmail.com wrote: did u mean when i set indexed=false and store=true, solr does not index the field's value but store its value as it is??? Yes. So you can get back the value of all stored fields even if your search actually only finds results in indexed fields. It does seem somewhat counter-intuitive. Best Regards, Bryan Rasmussen
Re: problem in setting field attribute in schema.xml
if you never want to see a result for a field set stored = false. Best Regards, Bryan Rasmussen On Wed, May 25, 2011 at 2:37 PM, Romi romijain3...@gmail.com wrote: In my schema.xml file i made a filed attribute indexed=false and stored=true. ie. i am not indexing this field but still in my search results i am getting values for this field, why is so any idea? - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
surely it indexes the data if you do indexed = true. if you put some data in the field that is unique to that document and then search do you get it? If not then it is because it is not indexed. If you do a search for another field in the same document but still get the non-indexed field shown it is because the non-indexed field is stored. Best Regards, Bryan Rasmussen On Wed, May 25, 2011 at 3:11 PM, Romi romijain3...@gmail.com wrote: if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
I only want to return a fields value in certain cases, how is this done
Let us say I have 3 fields I index f1, f2, f3. f1 and f2 are copied to f4. f4 is the default searched field. There is a value that is found in f2 and f3. When I am searching in f3 I want to return only f3 and none other. when I am searching in f4 I do not want to return f3. I only want to return f1 if it has the value that is found in the search. Is this doable? Can you show me an example? Thanks, Bryan Rasmussen
I need to improve highlighting
Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text − str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: I need to improve highlighting
Bryan, on Q2 - what about using xpath like 'str/em' ? How do I do that? The highlighting result, at least in the solr installation I have (3. something) returns the em as escaped markup. Is there an xpath parameter or configuration I can set for highlighting, or a way to change the em elements to be actual elements (hl.fomatter maybe?) Thanks, Bryan Rasmussen On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: I need to improve highlighting
yeah but you just got me to check again, what I thought was ignoring my setting of hl.fragsize and always using the default ended up just returning a smaller field higher ranked, so when I set it to 1000 and saw the same as what I saw with 100 was the just the off chance that there was only 100 to see in the first 10 results. funny. thanks, Bryan Rasmussen On Wed, May 18, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.com wrote: Just checking, but have you tried setting hl.fragsize=very large number as suggested here: http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? If that's not the problem, please show us the results of attaching debugQuery=on to the request, that may shed some light on the problem. Best Erick On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
indexing xml attributes?
Hi, As I understand it the DIH XPathEntityProcessor will not allow me to index attributes - like so field column=ID xpath=/ARTIKEL/@ID / So if I want to index attributes I should pre-process the documents into the format that Solr indexes normally and place the value of the ID into a field? Thanks, Bryan Rasmussen
Re: indexing xml attributes?
Ah never mind, I had to restart my instance in order for my changes to the dataimporter to register. thanks, Bryan Rasmussen On Tue, May 17, 2011 at 12:19 PM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, As I understand it the DIH XPathEntityProcessor will not allow me to index attributes - like so field column=ID xpath=/ARTIKEL/@ID / So if I want to index attributes I should pre-process the documents into the format that Solr indexes normally and place the value of the ID into a field? Thanks, Bryan Rasmussen
How much does Solr enterprise server differ from the non Enterprise server?
I am asking specifically because I am wondering if it is worth my time too read the Enterprise server book or if there is too much of a branch between the two? If I read the book are there any parts of the book specifically that won't be relevant? Thanks, Bryan Rasmussen
Re: How much does Solr enterprise server differ from the non Enterprise server?
ok, I just saw the thing about syncing the version numbers. Is there any information on these Solr 3.1 books? Publishers, publication dates, website on them? Mvh, Bryan Rasmussen On Thu, May 5, 2011 at 10:57 AM, Jan Høydahl jan@cominvent.com wrote: Hi, Solr IS an enterprise search server. And there is only one edition :) I'd wait a few more weeks until the Solr 3.1 books are available, and then read up on it. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 09.37, bryan rasmussen wrote: I am asking specifically because I am wondering if it is worth my time too read the Enterprise server book or if there is too much of a branch between the two? If I read the book are there any parts of the book specifically that won't be relevant? Thanks, Bryan Rasmussen
testing of stemming
Hi, I was wondering if I have a large number of queries I want to test stemming on if there is a free standing library I can just run it against without having to do all the overhead of a http request? Thanks, Bryan Rasmussen
Re: testing of stemming
maybe not a library but a command line tool would be good, something that I can write code or do automation via script to test that when I ask for the word virksomhed in the danish language that I can then see that it will would also return virksomhederne and other variations. I guess I was hoping for something similar to a wordnet of stems... but at worst I would be fine with checking specifically against my index - I just didn't necessarily want to automate the browser to do it as I figured it would be extra performance intensive. Thanks, Bryan Rasmussen On Tue, Apr 19, 2011 at 5:19 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not sure what a free standing library would look like. Do you want it to check that all the terms in your index are stemmed correctly (or at least as expected)? You have a bunch of queries. How would such a library test them against your corpus? There's not enough information here to give a meaningful answer Best Erick On Tue, Apr 19, 2011 at 11:15 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, I was wondering if I have a large number of queries I want to test stemming on if there is a free standing library I can just run it against without having to do all the overhead of a http request? Thanks, Bryan Rasmussen
Re: testing of stemming
that looks like a good starting point, thanks, bryan rasmussen 2011/4/19 François Schiettecatte fschietteca...@gmail.com: I would start here: http://snowball.tartarus.org/ François On Apr 19, 2011, at 11:15 AM, bryan rasmussen wrote: Hi, I was wondering if I have a large number of queries I want to test stemming on if there is a free standing library I can just run it against without having to do all the overhead of a http request? Thanks, Bryan Rasmussen
all searches return 0 hits - what have I done wrong?
Hi, I am starting my solr instance with the command java -Dsolr.solr.home=./test1/solr/ -jar start.jar where I have a solr.xml file ?xml version=1.0 encoding=UTF-8 standalone=yes? solr sharedLib=lib persistent=true cores adminPath=/admin/cores core default=false instanceDir=tester name=tester/ /cores /solr In the folder tester I have configurations - adapted from the rss examples DataImporter.xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=^.*\.xml$ recursive=true baseDir=/projects/solrtest/transformedimport entity name=x rootEntity=true dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} stream=false forEach=/ARTIKEL transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer logTemplate=processing ${jc.fileAbsolutePath} logLevel=info field column=title xpath=/DOKTITEL/OVERSKRIFT1 / field column=text xpath=/AKROP/TXT / /entity /entity /document /dataConfig solrconfig.xml - same as the rss example only removed elevate components. schema.xml fields field name=title type=text indexed=true stored=true / field name=txt type=text indexed=true stored=true / field name=all_text type=text indexed=true stored=true multiValued=true / copyField source=title dest=all_text / copyField source=txt dest=all_text / /fields removed the uniqueKey constraint. When I go to http://localhost:8983/solr/tester/admin/ I get the admin page. When I run http://localhost:8983/solr/tester/dataimport?command=full-import it says response − lst name=responseHeader int name=status0/int int name=QTime16/int /lst − lst name=initArgs − lst name=defaults str name=configdataimporter.xml/str /lst /lst str name=commandfull-import/str str name=statusidle/str str name=importResponse/ lst name=statusMessages/ − str name=WARNING This response format is experimental. It is likely to change in the future. /str /response When I look at the log of that it says a bunch of stuff like: INFO: processing c:\projects\solrtest\transformed\1.xml org.apache.solr.common.util.XMLErrorLogger report WARNING: XmL parser reported xml declaration in null, line 1, column 38: Inconsistent text encoding; declared as utf-8 in xml declaration, application had passed Cp1252 Here is one of the processed documents ?xml version=1.0 encoding=utf-8 ? - ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER - DOKTITEL OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1 /DOKTITEL - AKROP TXTAdministrationsydelser er momspligtige. Dette gælder også når de faktureres koncerninternt, f.eks. fra et moderselskab (holdingselskab) til et datterselskab./TXT TXTDer er fradragsret for moms vedrørende køb af administrationsydelser i samme omfang, som virksomheden kan fratrække momsen af øvrige fællesomkostninger./TXT TXTHvis administrationsydelser faktureres på tværs af landegrænserne, f.eks. indenfor internationale koncerner, kan der gælde forskellige principper for momsberegningen i de enkelte EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et datterselskab i et andet land, herunder også i andre EU-lande, er det myndighedernes holdning, at der skal faktureres med dansk moms./TXT TXTHvis en administrationsydelse faktureres mellem et selskab og dets filial/-er, skal faktura altid udstedes uden moms. Handel med ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre momspligtige transaktioner./TXT TXTORegler/TXTO - TXT LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR /TXT /AKROP /ARTIKEL If I search for the text Administrationsydelser http://localhost:8983/solr/tester/select/?q=Administrationsydelserversion=2.2start=0rows=10indent=on I get response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qAdministrationsydelser/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0/ /response There is a segments.gen and a segments_4 file in my index but nothing else. Tried looking with Luke but it seems not to be compatible with the newest versions of Lucene... version of solr is 3.1.0 Thanks, Bryan Rasmussen
Re: all searches return 0 hits - what have I done wrong?
Also if I check solr/tester/dataimport it responds: response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdataimporter.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Total Requests made to DataSource0/str str name=Total Rows Fetched1634/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-04-18 11:55:47/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str str name=Committed2011-04-18 11:55:48/str str name=Optimized2011-04-18 11:55:48/str str name=Total Documents Processed0/str str name=Time taken 0:0:0.922/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str /response On Mon, Apr 18, 2011 at 11:46 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, I am starting my solr instance with the command java -Dsolr.solr.home=./test1/solr/ -jar start.jar where I have a solr.xml file ?xml version=1.0 encoding=UTF-8 standalone=yes? solr sharedLib=lib persistent=true cores adminPath=/admin/cores core default=false instanceDir=tester name=tester/ /cores /solr In the folder tester I have configurations - adapted from the rss examples DataImporter.xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=^.*\.xml$ recursive=true baseDir=/projects/solrtest/transformedimport entity name=x rootEntity=true dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} stream=false forEach=/ARTIKEL transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer logTemplate=processing ${jc.fileAbsolutePath} logLevel=info field column=title xpath=/DOKTITEL/OVERSKRIFT1 / field column=text xpath=/AKROP/TXT / /entity /entity /document /dataConfig solrconfig.xml - same as the rss example only removed elevate components. schema.xml fields field name=title type=text indexed=true stored=true / field name=txt type=text indexed=true stored=true / field name=all_text type=text indexed=true stored=true multiValued=true / copyField source=title dest=all_text / copyField source=txt dest=all_text / /fields removed the uniqueKey constraint. When I go to http://localhost:8983/solr/tester/admin/ I get the admin page. When I run http://localhost:8983/solr/tester/dataimport?command=full-import it says response − lst name=responseHeader int name=status0/int int name=QTime16/int /lst − lst name=initArgs − lst name=defaults str name=configdataimporter.xml/str /lst /lst str name=commandfull-import/str str name=statusidle/str str name=importResponse/ lst name=statusMessages/ − str name=WARNING This response format is experimental. It is likely to change in the future. /str /response When I look at the log of that it says a bunch of stuff like: INFO: processing c:\projects\solrtest\transformed\1.xml org.apache.solr.common.util.XMLErrorLogger report WARNING: XmL parser reported xml declaration in null, line 1, column 38: Inconsistent text encoding; declared as utf-8 in xml declaration, application had passed Cp1252 Here is one of the processed documents ?xml version=1.0 encoding=utf-8 ? - ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER - DOKTITEL OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1 /DOKTITEL - AKROP TXTAdministrationsydelser er momspligtige. Dette gælder også når de faktureres koncerninternt, f.eks. fra et moderselskab (holdingselskab) til et datterselskab./TXT TXTDer er fradragsret for moms vedrørende køb af administrationsydelser i samme omfang, som virksomheden kan fratrække momsen af øvrige fællesomkostninger./TXT TXTHvis administrationsydelser faktureres på tværs af landegrænserne, f.eks. indenfor internationale koncerner, kan der gælde forskellige principper for momsberegningen i de enkelte EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et datterselskab i et andet land, herunder også i andre EU-lande, er det myndighedernes holdning, at der skal faktureres med dansk moms./TXT TXTHvis en administrationsydelse faktureres mellem et selskab og dets filial/-er, skal faktura altid udstedes uden moms. Handel med ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre momspligtige transaktioner./TXT TXTORegler/TXTO - TXT LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR /TXT /AKROP /ARTIKEL If I search for the text Administrationsydelser http://localhost:8983/solr/tester/select
Re: all searches return 0 hits - what have I done wrong?
hah, actually I tried with complete xpaths earlier but they weren't working but that was because I had made a mistake in my foreach.. and then I decided that probably the foreach and the other xpaths were being concatenated. however it is not absolutely correct yet, if I run http://localhost:8983/solr/tester/dataimport?command=full-importdebug=true I get response − lst name=responseHeader int name=status0/int int name=QTime422/int /lst − lst name=initArgs − lst name=defaults str name=configdataimporter.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str − arr name=documents − lst − arr name=title strForord (MomsManual)/str /arr /lst − lst − arr name=title strAbonnementsudgifter (MomsManual)/str /arr /lst − lst − arr name=title strAb skf (MomsManual)/str /arr /lst − lst − arr name=title strAcontobeløb (MomsManual)/str /arr /lst − lst − arr name=title strAdgang til arrangementer (MomsManual)/str /arr /lst − lst − arr name=title strAdministration, fast ejendom (MomsManual)/str /arr /lst − lst − arr name=title strAdministrationsfællesskab (MomsManual)/str /arr /lst − lst − arr name=title strAdministrationsydelser (MomsManual)/str /arr /lst − lst − arr name=title strAdsl (MomsManual)/str /arr /lst − lst − arr name=title strAdvokatomkostninger (MomsManual)/str /arr /lst − lst − arr name=title strAfbestillingsgebyrer (MomsManual)/str /arr /lst /arr lst name=verbose-output/ str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str − lst name=statusMessages str name=Total Requests made to DataSource0/str str name=Total Rows Fetched22/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-04-18 12:26:52/str − str name= Indexing completed. Added/Updated: 11 documents. Deleted 0 documents. /str str name=Total Documents Processed11/str str name=Time taken 0:0:0.406/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str /response so the title fields field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 / are being added but not the the text fields field column=text xpath=/ARTIKEL/AKROP/TXT / The most salient difference between these two is that will be more than one TXT, I just tried with the parent element however and it didn't do anything. But when I do a search for MomsManual which you can see is in all the title fields I get response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qMomsManual/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0/ /response :( Thanks, Bryan Rasmussen On Mon, Apr 18, 2011 at 12:23 PM, lboutros boutr...@gmail.com wrote: did you try with the comlete xpath ? field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 / field column=text xpath=/ARTIKEL/AKROP/TXT / Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: all searches return 0 hits - what have I done wrong?
/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldtype /types fields field name=title type=text indexed=true stored=true / field name=txt type=text indexed=true stored=true / field name=all_text type=text indexed=true stored=true multiValued=true / copyField source=title dest=all_text / copyField source=txt dest=all_text / /fields defaultSearchFieldall_text/defaultSearchField solrQueryParser defaultOperator=AND/ /schema the protwords.txt and stopwords.txt are also from the rss example. thanks, Bryan Rasmussen On Mon, Apr 18, 2011 at 12:55 PM, lboutros boutr...@gmail.com wrote: If a document contains multiple 'txt' fields, it should be marked as 'multiValued'. field name=txt type=text indexed=true stored=true multiValued=true/ But if I'm understanding well, you also tried this ? : field column=text xpath=/ARTIKEL/AKROP / And for your search (MomsManual), could you give us your analyzer from the schema.xml please ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: all searches return 0 hits - what have I done wrong?
Hmm, ok I see the schema was wrong - I was calling the TEXT field txt... also now I am getting results on my title search after another restart and reindex - setting the TXT fields to be multiValued. Thanks, Bryan Rasmussen On Mon, Apr 18, 2011 at 1:09 PM, bryan rasmussen rasmussen.br...@gmail.com wrote: well basically I copied out the RSS example as I figured that would be the closest to what I wanted to do ?xml version=1.0 encoding=UTF-8 ? schema name=tester version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType !-- Less flexible matching, but less false matches. Probably not ideal for product names, but may be good for SKUs. Can insert dashes in the wrong place and still match. -- fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldtype name=html stored=true indexed=true class=solr.TextField analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory
command=full-import not working, indexes 11 documents
Hi, I am using a DataImportHandler to get files from the file system, if I do the url http://localhost:8983/solr/tester/dataimport?command=full-import it ends up indexing 11 documents. If I do http://localhost:8983/solr/tester/dataimport?command=full-importrows=817 (the number of documents I have) they all get indexed. Any explanation for something I might have overlooked in configuration that would be having this effect? Thanks, Bryan Rasmussen
newbie - filter to only show queried field when query is free text
Hi, If I want to filter a search result to not return all fields as per the default but I don't know what field my hits will be in. This is basically for unstructured document type data, for example large HTML or DOCBOOK documents. thanks, Bryan Rasmussen
DataImportHandler - importing XML documents, undeclared general entity - DTD right there
Hi, I am importing a number of XML documents from the filesystem. The dataimporthandler finds them, but returns an undeclared general entity error - even though my DTD is present and findable by other parsers. DTD Declaration !DOCTYPE ARTIKEL PUBLIC -//Thomson Information AS//DTD ARTIKEL//DK allartikel.dtd In XML file in the same folder as the DTD allartikel.dtd Thanks, Bryan Rasmussen