Re: recip function error
Thank you very much for your replies. I discovered there was a typo in the function I was given. One of the parenthesis was in the wrong spot It should be this: boost=recip(ms(NOW/HOUR,general_modifydate),3.16e-11,0.08,0.05) And now it works with edismax! Strange... Thanks again, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165713.html Sent from the Solr - User mailing list archive at Nabble.com.
recip function error
Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))' I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 Rebuilt and redeployed AND I get the exact same error. I only copied over the new jars and war file. Non of the other libraries seemed to have changed. the patch is in solr core so I figured I was safe. Does anyone know how to fix this? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: recip function error
Thanks we're planning on going to 4.10.1 in a few months. I discovered that recip only works with dismax; I use edismax by default. does anyone know why I can't use recip with edismax?? I hope this is fixed in 4.10.1... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165613.html Sent from the Solr - User mailing list archive at Nabble.com.
Why does the q parameter change?
Good afternoon all, I just implemented a phrase search and the parsed query gets changed from rapid prototyping to rapid prototype. I used the solr analyzer and prototyping was unchanged so I think I ruled out a tokenizer. So can anyone tell me what's going on? Here's the query: q=rapid prototypingdefType=edismaxqf=textpf2=text^40ps=0 here's the debugger: as you can see; prototyping gets changed to just prototype. What's causing this and how do I turn it off? Thanks, lst name=debug lst name=queryBoosting str name=qrapid prototyping/str null name=match//lst str name=rawquerystringrapid prototyping/strstr name=querystringrapid prototyping/str str name=parsedquery(+((DisjunctionMaxQuery((text:rapid)) DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:rapid prototype^40.0)))/no_coord/str str name=parsedquery_toString+(((text:rapid) (text:prototype))~2) (text:rapid prototype^40.0)/str str name=QParserExtendedDismaxQParser/str -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why does the q parameter change?
Ok, I think I'm on to something. I omitted this parameter which means it is set to false by default on my text field. I need to set it to true and see what happens... autoGeneratePhraseQueries=true If I'm reading the wiki right, this parameter if true will preserve phrase queries... -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why does the q parameter change?
No, apparently it's the KStemFilter. should I turn this off at query time? I'll put this in another question... -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html Sent from the Solr - User mailing list archive at Nabble.com.
Best practice for KStemFilter query or index or both?
Good afternoon, Here's my configuration for a text field. I have the same configuration for index and query time. Is this valid? What's the best practice for these query or index or both? for synonyms; I've read conflicting reports on when to use it but I'm currently changing it over to at indexing time only. Thanks, fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html Sent from the Solr - User mailing list archive at Nabble.com.
I need a replacement for the QueryElevation Component
Good morning to one and all, I'm using Solr 4.0 Final and I've been struggling mightily with the elevation component. It is too limited for our needs; it doesn't handle phrases very well and I need to have more than one doc with the same keyword or phrase. So, I need a better solution. One that allows us to tag the doc with keywords that clearly identify it as a promoted document would be ideal. I tried using an external file field but that only allows numbers and not strings (please correct me if I'm wrong) EFF would be ideal if there is a way to make it take strings. I also need an easy way to add these tags to specific docs. If possible, I would like to avoid creating a separate elevation core but it may come down to that... Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html Sent from the Solr - User mailing list archive at Nabble.com.
Can the elevation component work with synonyms?
Good morning Solr compatriots, I'm using Solr4.0Final and I have synonyms.txt in my schema (only at query time) like so: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer /fieldType However, when I try to call my /elevate handler; the synonyms are factored in but none of the results say [elevated]=true I'm assuming this is because the elevation must be an exact match and the synonyms are expanding it beyond that so elevation is thwarted. For example, if I have TV elevated and TV is also in synonyms.txt then the query gets expanded to text:TV text:television. Is there any way to get the elevation to work correctly with synonyms? BTW (I did find a custom synonym handler that works but this will require significant changes to the front end and I'm not sure it will break if and when we finally upgrade solr) Here's the custom synonym filter (I had to drop the code in and rebuild solr.war to get it to work): https://github.com/healthonnet/hon-lucene-synonyms -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-elevation-component-work-with-synonyms-tp4140423.html Sent from the Solr - User mailing list archive at Nabble.com.
How to build Solr4.0 Final?
Good morning, My company uses Solr4.0Final and I need to add some code to it and recompile. However, when I rebuild, all of the jars and the war file say Solr 5.0! I'm using the old build.xml file from 4.0 so I don't know why it's automatically upgrading. How do I force it to build the older version of Solr? Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to build Solr4.0 Final?
Ok, I think I figured it out. Somehow my Solr4.0Final project was accidentally updated to 5.0. The solr/build.xml was fine. the build.xml file at the top level was pointed at 5.0-snapshot. I need to pull down the 4.0 and start from scratch. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918p4138922.html Sent from the Solr - User mailing list archive at Nabble.com.
How to exclude a mimetype in tika?
Good afternoon, I'm using solr 4.0 Final I need movies hidden in zip files that need to be excluded from the index. I can't filter movies on the crawler because then I would have to exclude all zip files. I was told I can have tika skip the movies. the details are escaping me at this point. How do I exclude a file in the tika configuration? I assume it's something I add in the update/extract handler but I'm not sure. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can the solr dataimporthandler consume an atom feed?
Gora! It works now! You are amazing! thank you so much! I dropped the atom: from the xpath and everything is working. I did have a typo that might have been causing issues too. thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can the solr dataimporthandler consume an atom feed?
The only message I get is: Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Requests: 1, Skipped: 0 And there are no errors in the log. Here's what the ibm atom feed looks like: ?xml version=1.0 encoding=utf-16? atom:feed xmlns:atom=http://www.w3.org/2005/Atom; xmlns:wplc=http://www.ibm.com/wplc/atom/1.0; xmlns:age=http://purl.org/atompub/age/1.0; xmlns:snx=http://www.ibm.com/xmlns/prod/sn; xmlns:lconn=http://www.ibm.com/lotus/connections/seedlist/atom/1.0; atom:id https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Format=ATOMamp;Locale=en_USamp;Range=2amp;Start=0/atom:id atom:link href=https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Range=2amp;Start=1000amp;Format=ATOMamp;Locale=en_USamp;State=U0VDT05EXzIwMTQtMDMtMTMgMTY6MjM6NTguODRfMjAxMS0wNi0wNiAwODowNDoxNC42MjJfNmQ1YzQ3MWMtYTM3ZS00ZjlmLWE0OGEtZWZjYjMyZjU2NDgzXzEwMDBfZmFsc2U%3D; rel=next type=application/atom+xml title=Next page / atom:generator xml:lang=en-US version=1.2 lconn:version=4.0.0.0Seedlist Service Backend System/atom:generator atom:category term=ContentSourceType/Files scheme=com.ibm.wplc.taxonomy://feature_taxonomy label=Files / atom:title xml:lang=en-USFiles : 1,000 entries of Seedlist FILES/atom:title wplc:action do=update / wplc:fieldInfo id=title name=Title type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=author name=Owner's directory id type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=published name=Created timestamp type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=updated name=Last modification timestamp (major change only, as indicated in UI) type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=false / wplc:fieldInfo id=summary name=Description type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=tag name=Tag type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=commentCount name=Number of comments type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=downloadCount name=Number of downloads type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=recommendCount name=Number of recommendations type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=fileUpdated name=Binary file last modification timestamp type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=fileSize name=Binary file size type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=fileName name=File name type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=true supportsExactMatch=false / wplc:fieldInfo id=sharedWithUser name=Shared with user's directory id type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=sharedWithUserName name=Shared with user's name type=string contentSearchable=false fieldSearchable=false parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=libraryId name=The id of library owning the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=ORGANISATIONAL_ID name=The id of the organization the owning user belongs to type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=communityId name=The id of the community associated to the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=containerType name=The type of the container (library) associated to the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=ATOMAPISOURCE name=Atom API link type=string
Re: Can the solr dataimporthandler consume an atom feed?
I confirmed the xpath is correct with a third party XPath visualizer. /atom:feed/atom:entry parses the xml correctly. Can anyone confirm or deny that the dataimporthandler can handle an atom feed? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can the solr dataimporthandler consume an atom feed?
Ok, I found one typo: the links need to be this: /atom:feed/atom:entry/atom:link/@href But the import still doesn't work... :( I guess I have to convert the feed over to RSS 2.0 -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126691.html Sent from the Solr - User mailing list archive at Nabble.com.
Can the solr dataimporthandler consume an atom feed?
Good afternoon, I'm using solr 4.0 Final. I have an IBM atom feed I'm trying to index but it won't work. There are no errors in the log. All the other DIH I've created consumed RSS 2.0 Does it NOT work with an atom feed? here's my configuration: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=URLDataSource / document entity name=C3Files_from_Seedlist pk=id url=https://[redacted]; processor=XPathEntityProcessor forEach=/atom:feed/atom:entry transformer=DateFormatTransformer,TemplateTransformer field column=id xpath=/atom:feed/atom:entry/atom:link@href / field column=link xpath=/atom:feed/atom:entry/atom:link@href / field column=c3filetitle xpath=/atom:feed/atom:entry/atom:title / field column=author xpath=/atom:feed/atom:entry/atom:author / field column=authoremail xpath=/atom:feed/atom:entry/atom:author/atom:email / field column=published xpath=/atom:feed/atom:entry/atom:published dateTimeFormat=-MM-dd / field column=updated xpath=/atom:feed/atom:entry/atom:updated dateTimeFormat=-MM-dd / field column=attr_stream_content_type xpath=/atom:feed/atom:entry/atom:link@type / field column=index_category template=ConnectionsFiles / /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param
Hi Erick, Let me make sure I understand you: I'm NOT running SolrCloud; so I just have to put the default field in ALL of my solrconfig.xml files and then restart and that should be it? Thanks for your reply, -- View this message in context: http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param
Ok, I updated all of my solrconfig.xml files and I restarted the tomcat server AND the errors are still there on 2 out of 10 cores Am I not reloading correctly? Here's my /browse handler: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=titleSolritas/str str name=defTypeedismax/str str name=qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 /str str name=dftext/str str name=mm100%/str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str str name=mlt.qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 /str str name=mlt.fltext,features,name,sku,id,manu,cat/str int name=mlt.count3/int str name=faceton/str str name=facet.fieldcat/str str name=facet.fieldmanu_exact/str str name=facet.queryipod/str str name=facet.queryGB/str str name=facet.mincount1/str str name=facet.pivotcat,inStock/str str name=facet.range.otherafter/str str name=facet.rangeprice/str int name=f.price.facet.range.start0/int int name=f.price.facet.range.end600/int int name=f.price.facet.range.gap50/int str name=facet.rangepopularity/str int name=f.popularity.facet.range.start0/int int name=f.popularity.facet.range.end10/int int name=f.popularity.facet.range.gap3/int str name=facet.rangemanufacturedate_dt/str str name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str str name=f.manufacturedate_dt.facet.range.endNOW/str str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str str name=f.manufacturedate_dt.facet.range.otherbefore/str str name=f.manufacturedate_dt.facet.range.otherafter/str str name=hlon/str str name=hl.fltext features name/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=spellcheckon/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str strmanifoldCFSecurity/str /arr /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121502.html Sent from the Solr - User mailing list archive at Nabble.com.
RegexTransformer and xpath in DataImportHandler
Good afternoon, I have this DIH: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=URLDataSource / document entity name=blogFeed pk=id url=https://redacted/; processor=XPathEntityProcessor forEach=/rss/channel/item transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=id xpath=/rss/channel/item/id / field column=link xpath=/rss/channel/item/link / field column=blogtitle xpath=/rss/channel/item/title / field column=short_blogtitle xpath=/rss/channel/item/title / field column=short_blogtitle regex=^(.{250})([^\.]*\.)(.*)$ replaceWith=$1 sourceColName=blogtitle / field column=pubdateiso xpath=/rss/channel/item/pubDate dateTimeFormat=-MM-dd / field column=category xpath=/rss/channel/item/category / field column=author xpath=/rss/channel/item/author / field column=authoremail xpath=/rss/channel/item/authoremail / field column=content xpath=/rss/channel/item/content / field column=summary xpath=/rss/channel/item/summary / field column=index_category template=ConnectionsBlogs/ /entity /document /dataConfig I can't seem to populate BOTH blogtitle and short_blogtitle with the same xpath. I can only do one or the other; why can't I put the same xpath in 2 different fields? I removed the short_blogtitle (with the xpath statement) and left in the regex statement and blogtitle gets populated and short_blogtitle goes to my update.chain (to the auto complete index) but the field itself is blank in this index. If I leave the dih as above, then blogtitle doesn't get populated but short_blogtitle does. What am I doing wrong here? Is there a way to populate both? And I CANNOT use copyfield here because then the update.chain won't work Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html Sent from the Solr - User mailing list archive at Nabble.com.
SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param
Hi, I'm using Solr 4.0 Final (yes, I know I need to upgrade) I'm getting this error: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param And I applied this fix: https://issues.apache.org/jira/browse/SOLR-3646 And unfortunately, the error persists. I'm using a multi shard environment and the error is only happening on one of the shards. I've already updated about half of the other shards with the missing default text in /browse but the error persists on that one shard. Can anyone tell me how to make the error go away? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to get Solr to delete an uploaded document after its been indexed?
Hi, My crawler uploads all the documents to Solr for indexing to a tomcat/temp folder. Over time this folder grows so large that I run out of disk space. So, I wrote a bash script to delete the files and put it in the crontab. However, if I delete the docs too soon, it doesn't get indexed; too late and I run out of disk. I'm still trying to find the right window... So, (and this is probably a long shot) I'm wondering if there's anything in Solr that can delete these docs from /temp after they've been indexed... Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html Sent from the Solr - User mailing list archive at Nabble.com.
How to get phrase recipe working?
Good morning, In the Apache Solr 4 cookbook, p 112 there is a recipe for setting up phrase searches; like so: fieldType name=text_phrase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType I ran a sample query q=text_ph:a-z index and it didn't work very well at all. Is there a better way to do phrase searches? I need a specific configuration to follow/use. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-phrase-recipe-working-tp4112484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ODP: How to get phrase recipe working?
Thanks, I'll remove the snowball filter and give it try. I guess I'm looking for an exact phrase match to start. (Is that the standard phrase search?) Is there something better or more versatile? Btw, great job on the book! -- View this message in context: http://lucene.472066.n3.nabble.com/ODP-How-to-get-phrase-recipe-working-tp4112491p4112511.html Sent from the Solr - User mailing list archive at Nabble.com.
Can I combine standardtokenizer with solr.WordDelimiterFilterFactory?
Good morning, Here's the issue: I have and ID that consists of two letters and a number. The whole user title looks like this: Lastname, Firstname (LA12345). Now, with my current configuration, I can search for LA12345 and find the user. However, when I type in just the number I get zero results. If I put a wildcard in (*12345) I find the correct record. The problem is I changed that user title to use the worddelimiterfitlerfactory and it seems to work. However, I also copy that field into the text field which just uses the standardtokenizer and I lose the ability to search for 12345 without a wildcard. My question is can (or should) I put the worddelimiterfactory in with the standardtokenizer in the text field? Or should I just use one or the other? Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-combine-standardtokenizer-with-solr-WordDelimiterFilterFactory-tp4098814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuration and specs to index a 1 terabyte (TB) repository
Wow again! Thank you all very much for your insights. We will certainly take all of this under consideration. Erik: I want to upgrade but unfortunately, it's not up to me. You're right, we definitely need to do it. And SolrJ sounds interesting, thanks for the suggestions. By the way, is there a Solr upgrade guide out there anywhere? Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098431.html Sent from the Solr - User mailing list archive at Nabble.com.
Configuration and specs to index a 1 terabyte (TB) repository
Good morning, I have a 1 TB repository with approximately 500,000 documents (that will probably grow from there) that needs to be indexed. I'm limited to Solr 4.0 final (we're close to beta release, so I can't upgrade right now) and I can't use SolrCloud because work currently won't allow it for some reason. I found this configuration from this link: http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-td3656484.html#a3657056 He said he was able to index 1 TB on a single server with 40 cores and 128 GB of RAM with 10 shards. Is this my only option? Or is there a better configuration? Is there some formula for calculating server specifications (this much data and documents equals this many cores, RAM, hard disk space etc)? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuration and specs to index a 1 terabyte (TB) repository
Wow, thanks for your response. You raise a lot of great questions; I wish I had the answers! We're still trying to get enough resources to finish crawling the repository, so I don't even know what the final size of the index will be. I've thought about excluding the videos and other large files and using a data import handler to just send the meta data but there are problems no matter where I turn. I'm taking what you said back to the server team for deliberation. Thanks again for your insights -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuration and specs to index a 1 terabyte (TB) repository
P.S. Offhand, how do I control how much of the index is held in RAM? Can you point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098260.html Sent from the Solr - User mailing list archive at Nabble.com.
how to manually update a field in the index without re-crawling?
Good morning, I'm currently using Solr 4.0 FINAL. I indexed a website and it took over 24 hours to crawl. I just realized I need to rename one of the fields (or add a new one). so I added the new field to the schema, But how do I copy the data over from the old field to the new field without recrawling everything? Is this possible? I was thinking about maybe putting an update chain processor in the /update handler but I'm not sure that will work. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-manually-update-a-field-in-the-index-without-re-crawling-tp4092955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 is stripping XML format from RSS content field
If anyone is interested, I managed to resolve this a long time ago. I used a Data Import Handler instead and it worked beautifully. DIH are very forgiving and it takes what ever XML data is there and injects it into the Solr Index. It's a lot faster than crawling too. You use XPATH to map the fields to your schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html Sent from the Solr - User mailing list archive at Nabble.com.
QueryElevationComponent results only show up with debug = true
Hi, I'm using solr 4.0 final built around Dec 2012. I was initially told that the QEC didn't work for distributed search but apparently it was fixed. Anyway, I use the /elevate handler with [elevated] in the field list and I don't get any elevated results. elevated=false in the result block. however, if I turn on debugQuery; the elevated result appears in the debug section under queryBoost. Is this the only way you can get elevated results? Because before (and I can't remember if this was before or after I went to 4.0 Final) I would get the elevated results mixed in with the regular results in the result block. elevated=true was the only way to tell them apart. I also tried forceElevation, enableElevation, exclusive but there is still no elevated results in the result block. What am I doing wrong? query: http://localhost:8080/solr/Profiles/elevate?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=100enableElevation=trueforceElevation=truedf=textqt=edismaxdebugQuery=true Here's my config: searchComponent name=elevator class=solr.QueryElevationComponent str name=queryFieldTypetext_general/str str name=config-fileelevate.xml/str /searchComponent requestHandler name=/elevate class=solr.SearchHandler startup=lazy lst name=defaults str name=echoParamsexplicit/str str name=dftext/str /lst arr name=last-components strelevator/str /arr /requestHandler elevate.xml elevate query text=gangnam style doc id=https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3; / /query /elevate -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QueryElevationComponent results only show up with debug = true
Sure, Here are the results with the debugQuery=true; with debugging off, there are no results. The elevated result appears in the queryBoost section but not in the result section: ?xml version=1.0 encoding=utf-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=enableElevationtrue/str str name=wtxml/str str name=rows100/str str name=fl*,[elevated]/str str name=dftext/str str name=debugQuerytrue/str str name=start0/str str name=qgangnam/str str name=forceElevationtrue/str str name=qtedismax/str /lst /lst result name=response numFound=0 start=0/result lst name=debug lst name=queryBoosting str name=qgangnam/str arr name=match str https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3/str /arr /lst str name=rawquerystringgangnam/str str name=querystringgangnam/str str name=parsedquery(text:gangnam ((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0))/no_coord/str str name=parsedquery_toStringtext:gangnam ((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0)/str lst name=explain / str name=QParserLuceneQParser/str lst name=timing double name=time0.0/double lst name=prepare double name=time0.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.QueryElevationComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time0.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.QueryElevationComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QueryElevationComponent results only show up with debug = true
I can guarantee you that the ID is unique and it exists in that index. -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087565.html Sent from the Solr - User mailing list archive at Nabble.com.
Can a data import handler grab all pages of an RSS feed?
Good morning, I have an IBM Portal atom feed that spans multiple pages. Is there a way to instruct the DIH to grab all available pages? I can put a huge range in but that can be extremely slow with large amounts of XML data. I'm currently using Solr 4.0 final. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'
I just resolved this same error. The problem was that I had a lot of ampersands () that were un-escaped in my XML doc There was nothing wrong with my DIH; it was the xml doc it was trying to consume. I just used StringEscapeUtils.escapeXml from apache to resolve... Another big help was the Eclipse XML validation engine. Just add your doc to an existing project and right click anywhere on the doc and select validate from the menu. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Unexpected-character-code-61-expected-a-semi-colon-after-the-reference-for-entity-st-tp2816210p4086531.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexoutofbounds size: 9 index: 8 with data import handler
Good morning, I'm using solr 4.0 final on tomcat 7.0.34 on linux I created 3 new data import handlers to consume 3 RSS feeds. They seemed to work perfectly. However, today, I'm getting these errors: 10:42:17SEVERE SolrCorejava.lang.IndexOutOfBoundsException: Index: 9,​ Size: 8 10:42:17SEVERE SolrDispatchFilter null:java.lang.IndexOutOfBoundsException: Index: 9,​ Size: 8 10:42:17SEVERE SolrCoreorg.apache.solr.common.SolrException: Server at https://search:7443/solr/Communities returned non ok status:500,​ message:Internal Server Error 10:42:17SEVERE SolrDispatchFilter null:org.apache.solr.common.SolrException: Server at https://search/solr/Communities returned non ok status:500,​ message:Internal Server Error I read that the index is corrupt so I deleted it and restarted and then the same errors jumped to the next core with the DIH for the RSS feed. How do I fix this? Here's my dih in solrconfig.xml requestHandler name=/DIHCommunityFeed class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdih-comm-feed.xml/str str name=update.chainSemaAC/str /lst /requestHandler Here's the dih config ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=URLDataSource / document entity name=communitiesFeed pk=id url=https://search/C3CommunityFeedDEV/; processor=XPathEntityProcessor forEach=/rss/channel/item transformer=DateFormatTransformer field column=id xpath=/rss/channel/item/id / field column=link xpath=/rss/channel/item/link / field column=communitytitle xpath=/rss/channel/item/title / field column=pubdateiso xpath=/rss/channel/item/pubDate dateTimeFormat=-MM-dd / field column=category xpath=/rss/channel/item/category / field column=author xpath=/rss/channel/item/author / field column=authoremail xpath=/rss/channel/item/authoremail / field column=content xpath=/rss/channel/item/content / field column=summary xpath=/rss/channel/item/summary / /entity /document /dataConfig Here a partial of my schema field name=title type=text_general indexed=true stored=true multiValued=true/ field name=subject type=text_general indexed=true stored=true/ field name=description type=text_general indexed=true stored=true/ field name=comments type=text_general indexed=true stored=true/ field name=author type=text_general indexed=true stored=true/ field name=authoremail type=text_general indexed=true stored=true/ field name=keywords type=text_general indexed=true stored=true/ field name=category type=text_general indexed=true stored=true/ field name=content_type type=string indexed=true stored=true multiValued=true/ field name=last_modified type=date indexed=true stored=true/ field name=links type=string indexed=true stored=true multiValued=true/ field name=solr.title type=string indexed=true stored=true multiValued=false / field name=communitytitle type=string indexed=true stored=true multiValued=false / field name=content type=string indexed=true stored=true multiValued=true/ field name=pubdateiso type=date dateTimeFormat=-MM-dd indexed=true stored=true multiValued=true/ field name=link type=string indexed=true stored=true multiValued=true/ field name=summary type=text_general indexed=true stored=true/ field name=published type=date indexed=true stored=true multivalued=true / field name=updated type=date indexed=true stored=true multivalued=true / copyField source=link dest=text/ copyField source=description dest=text/ copyField source=communitytitle dest=text/ copyField source=communitytitle dest=solr.title/ copyField source=content dest=text/ copyField source=author dest=text/ copyField source=authoremail dest=text/ copyField source=summary dest=text/ -- View this message in context: http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexoutofbounds size: 9 index: 8 with data import handler
Ok, these errors seem to be caused by passing incorrect parameters in a search query. Such as: spellcheck=extendedResults=true instead of spellcheck.extendedResults=true Thankfully, it seems to have nothing to do with the DIH at all. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812p4084874.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to parse multivalued data into single valued fields?
Ok, I have one index called Communities from an RSS feed. each item in the feed has multiple titles (which are all the same for this feed) So, the title needs to be cleaned up before it is put into the community index let's call the field community_title; And then an UpdateProcessorChain needs to fire and it takes community_title and puts it into another index for auto completion suggestions called SolrAC. Does that make sense? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108p4083302.html Sent from the Solr - User mailing list archive at Nabble.com.
How to parse multivalued data into single valued fields?
Hi, I'm currently using solr 4.0 final with Manifoldcf v1.3 dev. I have multivalued titles (the names are all the same so far) that must go into a single valued field. Can a transformer do this? Can anyone show me how to do it? And this has to fire off before an update chain takes place. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html Sent from the Solr - User mailing list archive at Nabble.com.
how to improve (keyword) relevance?
Good morning, I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev on tomcat 7. Early on, I used copyfield to put the meta data into the text field to simplify solr queries (i.e. I only have to query one field now.) However, a lot people are concerned about improving relevance. I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook; however is there a way to modify it so it only uses one field? (i.e. the text field?) (Note well: I have multi cores and the schemas are all somewhat different; If I can't get this to work with one field then I would have to build complex queries for all the other cores; this would vastly over complicate the UI. Is there another way?) here's the requesthandler in question: requestHandler name=/better class=solr.StandardRequestHandler 1st name=defaults str name=indenttrue/str str name=q_query_:{!edismaxqf=$qfQuery mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery} /str str name=qfQueryname^10 description/str str name=mmQuery1/str str name=pfQueryname description/str str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100% v=$mainQuery}^10/str /1st /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to improve (keyword) relevance?
Sure, let's say the user types in test pdf; we need the results with all the query words to be near the top of the result set. the query will look like this: /select?q=text%3Atest+pdfwt=xml How do I ensure that the top resultset contains all of the query words? How can I boost the first (or second) term when they are both the same field (i.e. text)? Does this make sense? Please bear with me; I'm still new to the solr query syntax so I don't even know if I'm asking the right question. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to capture div tag by id?
let's say I have a div with id=myDiv Is there a way to set up the solr upate/extract handler to capture just that particular div? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html Sent from the Solr - User mailing list archive at Nabble.com.
how do I capture h1 tags?
I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7. I need to capture the h1 tags on each web page as that is the true title for the lack of a better word. I can't seem to get it to work at all. I read the instructions and used the capture component and then mapped it to a field named h1 in the schema. Here's my update/extract handler: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=fmap.titlesolr.title/str str name=fmap.namesolr.name/str str name=captureh1/str str name=fmap.h1h1/str str name=descriptioncomments/str str name=fmap.Last-Modifiedlast_modified/str str name=uprefixattr_/str str name=lowernamestrue/str /lst Can anyone tell me what I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I capture h1 tags?
Ok, I figured it out: you need to add this too: str name=captureAttrtrue/str -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792p4072798.html Sent from the Solr - User mailing list archive at Nabble.com.
How to store the document folder path in solr?
Good afternoon, I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34 today, a user asked a great question. What if I only know the name of the folder that the documents are in? Can I just search on the folder name? Currently, I'm only indexing documents; how do I capture the folder name (or full path) and store it? I have a variety of repositories: web, rss, livelink (I can get the folder hierarchy for this); I guess indexing a file share would be straight forward and the path readily available but I haven't been asked to index those yet. I'll try to run some tests on network file shares... Can anyone point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html Sent from the Solr - User mailing list archive at Nabble.com.
How to aggregate data in solr 4.0?
Good afternoon, Does anyone know of a good tutorial on how to perform SQL like aggregation in solr queries? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html Sent from the Solr - User mailing list archive at Nabble.com.
relevance when merging results
Hi, I'm currently using Solr 4.0 final on tomcat v7.0.3x I have 2 cores (let's call them A and B) and I need to combine them as one for the UI. However we're having trouble on how to best merge these two result sets. Currently, I'm using relevancy to do the merge. For example, I search for red in both cores. Core A has a max score of .919856 with 87 results Core B has a max score or .6532563 with 30 results I would like to simply merge numerically but I don't know if that's valid. If I merge in numerical order then Core B results won't appear until element 25 or later. I initially thought about just taking the top 5 results from each and layer one on top of the other. Is there a best practice out there for merging relevancy? Please advise... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/relevance-when-merging-results-tp4059275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to configure shards with SSL?
Ok, We figured it out: The cert wasn't in the trusted CA keystore. I know we put it in there earlier; I don't know why it was missing. But we added it in again and everything works as before. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735p4055064.html Sent from the Solr - User mailing list archive at Nabble.com.
How to configure shards with SSL?
Good morning everyone, I'm running solr 4.0 Final with ManifoldCF v1.2dev on tomcat 7.0.37 and I had shards up and running on http but when I migrated to SSL it won't work anymore. First I got an IO Exception but then I changed my configuration in solrconfig.xml to this: requestHandler name=/all class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=wtxml/str str name=indenttrue/str str name=q.alt*:*/str str name=flid, solr.title, content, category, link, pubdateiso/str str name=shardsdev:7443/solr/ProfilesJava/|dev:7443/solr/C3Files/|dev:7443/solr/Blogs/|dev:7443/solr/Communities/|dev:7443/solr/Wikis/|dev:7443/solr/Bedeworks/|dev:7443/solr/Forums/|dev:7443/solr/Web/|dev:7443/solr/Bookmarks//str /lst shardHandlerFactory class=HttpShardHandlerFactory str name=urlSchemehttps:///str int name=socketTimeOut1000/int int name=connTimeOut5000/int /shardHandlerFactory /requestHandler And Now I'm getting this error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request: How do I configure shards with SSL? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735.html Sent from the Solr - User mailing list archive at Nabble.com.
detailed Error reporting in Solr
Good morning, I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. And I'm battling Tika XML parse errors again. Solr reports this error:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error which is too vague. I had to manually run the link against the tika app and I got a much more detailed error. Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105; The entity nbsp was referenced, but not declared. so there are old school non break space in the html that tika can't handle. for example: li Cyber Systems and Technologynbsp;rsaquo; /mission/CST/CST.html /li My question is two fold: 1) how do I get solr to report more detailed errors and 2) how do I get tika to accept (or ignore) nbsp? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: detailed Error reporting in Solr
ok, one possible fix is to add the xml equivalent to nbsp with is: ?xml version=1.0? !DOCTYPE some_name [ lt;!ENTITY nbsp quot;amp;#160;quot; ] but how do I add this into the tika configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: detailed Error reporting in Solr
Yes, that's it exactly. I crawled a link with these (nbsp;rsaquo;) in each list item and solr couldn't handle it threw the xml parse error and the crawler terminated the job. Is this fixable? Or do I have to submit a bug to the tika folks? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query builder for solr UI?
sorry, The easiest way to describe it is specifically we desire a google-like experience. so if the end user types in a phrase or quotes or +, - (for and, not) etc etc. the UI will be flexible enough to build the correct solr query syntax. How will edismax help? And I tried simplifying queries by using the copyfield command to copy all of the metadata to the text field. So now the only field we have to query is the text field but I doubt that is going to be a panacea. Does that make sense? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query builder for solr UI?
Good question, if the user types in special characters like the dash - How will I know to treat it like a dash or the NOT operator? The first one will need to be URL encoded the second one won't be resulting in very different queries. So I apologize for not being more clear, so really what I'm after is making it easy for the user to communicate what exactly they are looking for and to URL encode their input correctly. that's what I meant by query building Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043659.html Sent from the Solr - User mailing list archive at Nabble.com.
query builder for solr UI?
Good day, Currently we are building a front end for solr (in jquery, html, and css) and I'm struggling with making a query builder that can handle pretty much whatever the end user types into the search box. does something like this already exist in javascript/jquery? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0 is stripping XML format from RSS content field
Hi, I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler that Manifold is indeed sending the content field from a RSS feed that contains xml data However, when I query the index the content field is there with just the data; the XML structure is gone. Does anyone know how to stop Solr from doing this? I'm using tika but I don't see it in the update/extract handler. Can anyone point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809.html Sent from the Solr - User mailing list archive at Nabble.com.
Can you call the elevation component in another requesthandler?
Good day, I got my elevation component working with the /elevate handler. However, I would like to add the elevation component to my main search handler which is currently /query. so I can have one handler return everything (elevated items with regular search results; i.e. one stop shopping, so to speak) This is what I tried: requestHandler name=/query class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=wtxml/str str name=indenttrue/str str name=dftext/str /lst arr name=last-components strelevator/str strmanifoldCFSecurity/str /arr /requestHandler I also tried it in first components as well. Is there any way to combine these? Otherwise the UI will have to make separate ajax calls and we're trying to minimize that. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can you call the elevation component in another requesthandler?
Update: Ok, If I search for gangnam style in /query handler by itself, elevation works! If I search with gangnam style and/or something else the elevation component doesn't work but the rest of the query does. here's the examples: works: /query?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=50debugQuery=truedismax=true elevation fails: /query?q=gangnam+style+OR+title%3A*White*fl=*,[elevated]wt=xmlstart=0rows=50debugQuery=truedismax=true So I guess I have to do separate queries at this point. Is there a way to combine these 2 request handlers? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054p4039076.html Sent from the Solr - User mailing list archive at Nabble.com.
Multicore search with ManifoldCF security not working
Good morning, I used this post here to join to search 2 different cores and return one data set. http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set The good news is that it worked! The bad news is that one of the cores is Opentext and the ManifoldCF security check isn't firing! So users could see documents that they aren't supposed to. The opentext security works if I call the core handler individually. it fails for the merged result. I need to find a way to get the AuthenticatedUserName parameter to the opentext core. Here's my /query handler for the merged result requestHandler name=/query class=solr.SearchHandler lst name=defaults str name=q.alt*:*/str str name=flid, attr_general_name, attr_general_owner, attr_general_creator, attr_general_modifier, attr_general_description, attr_general_creationdate, attr_general_modifydate, solr.title, content, category, link, pubdateiso /str str name=shardslocalhost:8080/solr/opentext/,localhost:8080/solr/Profiles//str /lst arr name=last-components strmanifoldCFSecurity/str /arr /requestHandler As you can see, I tried calling manifoldCFSecurity first and it didn't work. I was thinking perhaps I can call the shards directly in the URL and put the AuthenticatedUserName on the opentext shard but I'm getting pulled in different directions currently. Can anyone point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore search with ManifoldCF security not working
I'm sorry, I don't know what you mean. I clicked on the hidden email link, filled out the form and when I hit submit; I got this error: Domain starts with dot Please fix the error and try again. Who exactly am I sending this to and how do I get the form to work? -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776p4036829.html Sent from the Solr - User mailing list archive at Nabble.com.
How to use SolrAjax with multiple cores?
Hi, I need to build a UI that can access multiple cores. And combine them all on an Everything tab. The solrajax example only has 1 core. How do I setup multicore with solrajax? Do I setup 1 manager per core? How much of a performance hit will I take with multiple managers running? Is there a better way to do this? Is there a better UI to use? Can anyone point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-SolrAjax-with-multiple-cores-tp4036840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching for field that contains multiple values
All I had to do was put a wildcard before and after the search term and it would succeed. (*Maritime*) Searching multi value fields wouldn't work any other way. Like so: http://localhost:8080/solr/Blogs/select?q=title%3A*Maritime*wt=xml but I'll check out those other suggestions... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-for-field-that-contains-multiple-values-tp4033944p4036854.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error initializing QueryElevationComponent
In case anyone was wondering, the solution is to html encode the URL. Solr didn't like the 's; just convert them to amp; and it works! -- View this message in context: http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194p4036261.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'
Thanks, That worked. So the documentation needs to be fixed in a few places (the solr wiki and the default solrconfig.xml in Solr 4.0 final; I didn't check any other versions) I'll either open a new ticket in JIRA to request a fix or reopen the old one... Furthermore, I tried using the ElevatedMarkerFactory and it didn't behave the way I thought it would. this http://localhost:8080/solr/Lisa/elevate?q=foo+barwt=xmldefType=dismax got me all the doc info but no elevated marker I ran this http://localhost:8080/solr/Lisa/elevate?q=foo+barfl=[elevated]wt=xmldefType=dismax and all I got was response = 1 and elevated = true I had to run this to get all of the above info: http://localhost:8080/solr/Lisa/elevate?q=foo+barfl=*,[elevated]wt=xmldefType=dismax -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'
Good morning, I can't seem to figure out how to load this class Can someone please point me in the right direction? Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035330.html Sent from the Solr - User mailing list archive at Nabble.com.
error initializing QueryElevationComponent
Hi, I'm trying to test out the queryelevationcomponent. elevate.xml is in the solrconfig.xml and it's in the conf directory. I left the defaults. I added this to the elevate.xml elevate query text=foo bar doc id=https://opentextdev/cs/llisapi.dll?func=llobjID=577575objAction=download; / /query /elevate id is a string setup as the uniquekey And I get this error: 16:25:48SEVERE Config Exception during parsing file: elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; lineNumber: 28; columnNumber: 77; The reference to entity objID must end with the ';' delimiter. 16:25:48SEVERE SolrCorejava.lang.NullPointerException 16:25:48SEVERE CoreContainer Unable to create core: Lisa 16:25:48SEVERE CoreContainer null:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent. what am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'
Hi, This is related to my earlier question regarding the elevationcomponent. I tried turning this on: If you are using the QueryElevationComponent, you may wish to mark documents that get boosted. The EditorialMarkerFactory will do exactly that: -- transformer name=qecBooster class=org.apache.solr.response.transform.EditorialMarkerFactory / but it fails to load this class. I'm using solr 4.0 final. How do I get this to load? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr multicore aborts with socket timeout exceptions
I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2 dev running on Jetty. I have solr multicore set up with 10 cores. (Is this too much?) I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10 JVMs per connection) From the look of it; Solr couldn't handle all the data that ManifoldCF was sending it and the connection would abort socket timeout exceptions. I tried increasing the maxThreads to 200 on tomcat and it didn't work. In the ManifoldCF throttling section, I decreased the number of JVMs per connection from 10 down to 1 and not only did the crawl speed up significantly, the socket exceptions went away (for the most part) Here's the ticket for this issue: https://issues.apache.org/jira/browse/CONNECTORS-608 My question is this: how do I increase the number of connections on the solr side so I can run multiple ManifoldCF jobs concurrently without aborting or timeouts? The ManifoldCF team did mention that there was a committer who had socket timeout exceptions in a newer version of Solr and he fixed it by increasing the timeout window. I'm looking for that patch if available. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-multicore-aborts-with-socket-timeout-exceptions-tp4034250.html Sent from the Solr - User mailing list archive at Nabble.com.
Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs
I keep seeing these in the tomcat logs: Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute INFO: [Lisa] webapp=/solr path=/admin/logging params={since=1358453312320wt=jso n} status=0 QTime=0 I'm just curious: What is getting executed here? I'm not running any queries against this core or using it in any way currently. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html Sent from the Solr - User mailing list archive at Nabble.com.
Tutorial for Solr query language, dismax and edismax?
Does anyone have a great tutorial for learning the solr query language, dismax and edismax? I've searched endlessly for one but I haven't been able to locate one that is comprehensive enough and has a lot of examples (that actually work!). I also tried to use wildcards, logical operators, and a phrase search and it either didn't work or behave the way I thought it would. for example, I tried to search a multivalued field solr.title and a content field that contains their phone number (and a lot of other data) so, from the solr admin query page; in the q field i tried lots of variations of this- solr.title:*Costa, Julie* AND content:tel= And I either got 0 results or ALL the results. solr.title would only work if I put in solr.title:*Costa* but not anything longer than that. Even though there are plenty of Costa, J's (John, Julie, Julia, Jerry etc) I should be able to do a phrase search out of the box, shouldn't I? I also read on one site that only edismax can use logical operators but I couldn't get that to work either. Can anyone point me in the right direction? I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html Sent from the Solr - User mailing list archive at Nabble.com.
ivy errors trying to build solr from trunk
I downloaded the latest from solr. I applied a patch cd to solr dir and I try ant dist I get these ivy errors ivy-availability-check: [echo] Building analyzers-phonetic... ivy-fail: [echo] This build requires Ivy and Ivy could not be found in your ant classpath. [echo] (Due to classpath issues and the recursive nature of the Lucene/Solr [echo] build system, a local copy of Ivy can not be used an loaded dynamically [echo] by the build.xml) [echo] You can either manually install a copy of Ivy 2.2.0 in your ant classpath: [echo]http://ant.apache.org/manual/install.html#optionalTasks [echo] Or this build file can do it for you by running the Ivy Bootstrap target: [echo]ant ivy-bootstrap [echo] [echo] Either way you will only have to install Ivy one time. [echo] 'ant ivy-bootstrap' will install a copy of Ivy into your Ant User Library: [echo]C:\Users\da24005/.ant/lib [echo] [echo] If you would prefer, you can have it installed into an alternative [echo] directory using the -Divy_install_path=/some/path/you/choose option, [echo] but you will have to specify this path every time you build Lucene/Solr [echo] in the future... [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose [echo]... [echo]ant -lib /some/path/you/choose clean compile [echo]... [echo]ant -lib /some/path/you/choose clean compile [echo] If you have already run ivy-bootstrap, and still get this message, please [echo] try using the --noconfig option when running ant, or editing your global [echo] ant config to allow the user lib to be loaded. See the wiki for more details: [echo]http://wiki.apache.org/lucene-java/HowToContribute#antivy [echo] BUILD FAILED i tried the ivy-bootstrap but I still get the same error. I have the ivy jar in the ant lib directory. what am I doing wrong? and it says use --noconfig if ivy-bootstrap didn't work. Well --noconfig is not a valid ant command. where/how do I use it? -- View this message in context: http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ivy errors trying to build solr from trunk
Ok, the old problem was that eclipse was using a different version of ant 1.8.3. I dropped the ivy jar in the build path and now I get these errors: [ivy:retrieve] ERRORS [ivy:retrieve] Server access Error: Connection timed out: connect url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom [ivy:retrieve] Server access Error: Connection timed out: connect url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar [ivy:retrieve] Server access Error: Connection timed out: connect url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.pom [ivy:retrieve] Server access Error: Connection timed out: connect url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.jar [ivy:retrieve] Server access Error: Connection timed out: connect url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom [ivy:retrieve] Server access Error: Connection timed out: connect url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar Apparently, I can't get to maven since I'm behind a firewall. are the solr deps available for manual download somewhere? -- View this message in context: http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300p4032332.html Sent from the Solr - User mailing list archive at Nabble.com.
solr invalid date string
I'm currently running solr 4.0 alpha with manifoldCF v1.1 dev Manifold is sending solr the datetime as milliseconds expired after 1-1-1970. I've tried setting several date.formats in the extraction handler but I always get this error: and the manifoldcf crawl aborts. SolrCoreorg.apache.solr.common.SolrException: Invalid Date String:'134738361' at org.apache.solr.schema.DateField.parseMath(DateField.java:174) at org.apache.solr.schema.TrieField.createField(TrieField.java:540) here's my extraction handler: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=fmap.titlesolr.title/str str name=fmap.namesolr.name/str str name=linklink/str str name=fmap.pubdatepubdate/str str name=summarysummary/str str name=descriptioncomments/str str name=publishedpublished/str str name=fmap.Last-Modifiedlast_modified/str str name=uprefixattr_/str str name=lowernamestrue/str str name=fmap.divignored_/str /lst lst name=date.formats str-MM-dd/str str-MM-dd'T'HH:mm:ss.SSS'Z'/str /lst /lst-- /requestHandler here's pubdate in the schema field name=pubdate type=date indexed=true stored=true multiValued=true/ the dates are already in UTC time they're just in milliseconds... What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr invalid date string
I'll certainly ask manifold if they can send the date in the correct format. Meanwhile; How would I create an updater to change the format of a date? Are there any decent examples out there? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661p4031669.html Sent from the Solr - User mailing list archive at Nabble.com.
is there an easy way to upgrade from Solr 4 alpha to 4.0 final?
I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha) I'm currently running Solr 4.0 alpha on Tomcat 7. Is there an easy way to surgically replace files and upgrade? Or should I completely start over with a fresh install? Ideally, I'm looking for a set of steps... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-an-easy-way-to-upgrade-from-Solr-4-alpha-to-4-0-final-tp4031682.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many Tika errors
Ok, I managed to fix the universal charset error is caused by a missing dependency just download universalchardet-1.0.3.jar and put it in your extraction lib the microsoft errors will probably be fixed in a future release of the POI jars. (v3.9 didn't fix this error) -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126p4026347.html Sent from the Solr - User mailing list archive at Nabble.com.
Too many Tika errors
I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example single core as well with manifoldcf v1.1 I had everything working but then the crawler stops and I have Tika errors in the solr log I had tika 1.1 and that produces these errors: org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@17bc9c03 So, I upgraded to tika 1.2 and again everything seemed to be working (I indexed 24,000 files) then I recrawled the repository and again it stops; this time the tika errors are: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/mozilla/universalchardet/CharsetListener at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) What's going on here? What version of tika should I use? -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html Sent from the Solr - User mailing list archive at Nabble.com.
How do you get the document name from Open Text?
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. I have the cats/atts turned on in Open Text and I can see them all in the Solr index. However, the id is just the URL to download the doc from open text and the document name either from Open Text or the document properties is nowhere to be found. I tried using resourceName in the solrconfig.xml as it was described in the manual but it doesn't work. I used this: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=fmap.Last-Modifiedlast_modified/str str name=uprefixattr_/str str name=resourceNameFile Name/str str name=lowernamestrue/str /lst /requestHandler but all I get is File Name in resourceName. Should I leave the value blank or is there some other field I should use? Please advise -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-you-get-the-document-name-from-Open-Text-tp3998908.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr not getting OpenText document name and metadata
Hi, I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the output is sent to Solr (4.0 alpha). All I see in the index is an id = to the opentext download URL and a version (a big integer value). What I don't see is the document name from OpenText or any of the Opentext metadata. Does anyone know how I can get this data? because I can't even search by document name or by document extension! Only a few of the documents actually have a title in the solr index. but the Opentext name of the document is nowhere to be found. if I know some text within the document I can search for that. I'm using the default schema with tika as the extraction handler I'm also using uprefix = attr to get all of the ignored properties but most of those are useless. Please advise... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-not-getting-OpenText-document-name-and-metadata-tp3997786.html Sent from the Solr - User mailing list archive at Nabble.com.