Re: Fault tolerant Solr replication architecture
Hi Parvin, Fault tolerant architecture is something you need to decide on your requirement. At some point of time there may require some manual intervention to recover from crash. You need to see how much percentage you could support fault tolerant. It certainly may not be 100. We could handle situation of network failure but hard to handle situation of crashes. Consider you have one master and two slaves. You could have load balancer between slaves, so that you could do round-robin or fail-over between slaves. If you are not using load balancer then you should handle this in your application. If the master crashes, then you may need to rebuild the index. Chances are less likely. Regards Aditya www.findbestopensource.com On Mon, May 21, 2012 at 12:55 PM, Parvin Gasimzade parvin.gasimz...@gmail.com wrote: Hi, I am using solr with replication. I have one master that indexes data and two slaves which pulls index from master and responds to the queries. My question is, how can i create fault tolerant architecture? I mean what should i do when master server crashes? I heard that repeater is used for this type of architecture. Then, do I have to create one master, one slave with repeater and one slave? Another question is, if master crashes then does slave with repeater start indexing authomatically or should i configure it manually? I asked similar question on the stackoverflow : http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture Any help will be appreciated. Regards, Parvin
Re: using Carrot2 custom ITokenizerFactory
Hi Koji, Dawid came up with a simple fix for this, it's committed to trunk and 3.6 branch. Staszek On Sun, May 20, 2012 at 5:15 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Staszek, Thank you for the fix so quickly! As a trial, I set: str name=PreprocessingPipeline.**tokenizerFactoryorg.apache.** solr.handler.clustering.**carrot2.**LuceneCarrot2TokenizerFactory**/str then I could start Solr without error. But when I make a request: http://localhost:8983/solr/**clustering?q=*%3A*version=2.** 2start=0rows=10indent=on**wt=jsonfl=idcarrot.**produceSummary=falsehttp://localhost:8983/solr/clustering?q=*%3A*version=2.2start=0rows=10indent=onwt=jsonfl=idcarrot.produceSummary=false I got an exception: org.apache.solr.common.**SolrException: Carrot2 clustering failed at org.apache.solr.handler.**clustering.carrot2.** CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**224) at org.apache.solr.handler.**clustering.** ClusteringComponent.process(**ClusteringComponent.java:91) at org.apache.solr.handler.**component.SearchHandler.** handleRequestBody(**SearchHandler.java:186) at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** RequestHandlerBase.java:129) at org.apache.solr.core.**RequestHandlers$** LazyRequestHandlerWrapper.**handleRequest(RequestHandlers.**java:244) at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1376) at org.apache.solr.servlet.**SolrDispatchFilter.execute(** SolrDispatchFilter.java:365) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1212) at org.mortbay.jetty.servlet.**ServletHandler.handle(** ServletHandler.java:399) at org.mortbay.jetty.security.**SecurityHandler.handle(** SecurityHandler.java:216) at org.mortbay.jetty.servlet.**SessionHandler.handle(** SessionHandler.java:182) at org.mortbay.jetty.handler.**ContextHandler.handle(** ContextHandler.java:766) at org.mortbay.jetty.webapp.**WebAppContext.handle(** WebAppContext.java:450) at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(** ContextHandlerCollection.java:**230) at org.mortbay.jetty.handler.**HandlerCollection.handle(** HandlerCollection.java:114) at org.mortbay.jetty.handler.**HandlerWrapper.handle(** HandlerWrapper.java:152) at org.mortbay.jetty.Server.**handle(Server.java:326) at org.mortbay.jetty.**HttpConnection.handleRequest(** HttpConnection.java:542) at org.mortbay.jetty.**HttpConnection$RequestHandler.** headerComplete(HttpConnection.**java:928) at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.** java:212) at org.mortbay.jetty.**HttpConnection.handle(** HttpConnection.java:404) at org.mortbay.jetty.bio.**SocketConnector$Connection.** run(SocketConnector.java:228) at org.mortbay.thread.**QueuedThreadPool$PoolThread.** run(QueuedThreadPool.java:582) Caused by: org.carrot2.core.**ComponentInitializationExcepti**on: org.carrot2.util.attribute.**AttributeBindingException: Could not assign field org.carrot2.text.**preprocessing.pipeline.** CompletePreprocessingPipeline#**tokenizerFactory with value org.apache.solr.handler.**clustering.carrot2.** LuceneCarrot2TokenizerFactory at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native Method) at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(** NativeConstructorAccessorImpl.**java:39) at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(** DelegatingConstructorAccessorI**mpl.java:27) at java.lang.reflect.Constructor.**newInstance(Constructor.java:** 513) at org.carrot2.util.**ExceptionUtils.wrapAs(** ExceptionUtils.java:63) at org.carrot2.core.**PoolingProcessingComponentMana**ger$** ComponentInstantiationListener**.objectInstantiated(** PoolingProcessingComponentMana**ger.java:234) at org.carrot2.core.**PoolingProcessingComponentMana**ger$** ComponentInstantiationListener**.objectInstantiated(** PoolingProcessingComponentMana**ger.java:169) at org.carrot2.util.pool.**SoftUnboundedPool.**borrowObject(** SoftUnboundedPool.java:83) at org.carrot2.core.**PoolingProcessingComponentMana**ger.prepare(* *PoolingProcessingComponentMana**ger.java:128) at org.carrot2.core.Controller.**process(Controller.java:333) at org.carrot2.core.Controller.**process(Controller.java:240) at org.apache.solr.handler.**clustering.carrot2.** CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**220) ... 24 more Caused by: org.carrot2.util.attribute.**AttributeBindingException: Could not assign field org.carrot2.text.**preprocessing.pipeline.**
Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?
Ahh, this is because I have to override DefaultSimilarity to turn off tf/idf scoring? But this will apply to all the fields and general search on text fields as well? Is there a way to apply custom similarity to specific field types or fields only? Is there no way of turning TF/IDF off without this? Thanks, Ravish On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev ravish.bhag...@gmail.comwrote: Hi All, I was wondering if omitNorms will have any effect on MLT handler at all? I'm using schema version 1.2 with Solr 1.4 and have defined couple of fields, which I want to use for MLT lookup and don't want factors like field length or TF/IDF to affect the scores. The definitions are as below: fieldType name=lowercase class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=text_nonorms class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType !-- and the fields that use the above field types are -- field name=PROFILE_TAGS type=lowercase indexed=true stored=true multiValued=true termVectors=true/ field name=PROFILE_TAGS_TXT type=text_nonorms indexed=true stored=true multiValued=true termVectors=true/ In My solrconfig.xml I have defined following for my MLT request handler: requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str str name=mlt.qfPROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0/str int name=mlt.mindf1/int int name=mlt.mintf1/int str name=flid,score/str str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str /lst /requestHandler However, when I run my query as follows: http://localhost:9090/solr/mlt?fl=*,scorestart=0q=id:4417454.matchRecordqt=/mltfq=targetDB:ConnectMeDBrows=1000debugQuery=on the debug scoring info shows following: str name=5042172.matchRecord 0.17156276 = (MATCH) product of: 1.4296896 = (MATCH) sum of: 0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472), product of: 0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of: 5.0 = boost 3.8795946 = idf(docFreq=538, maxDocs=9598) 0.0032871156 = queryNorm 3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1) 3.8795946 = idf(docFreq=538, maxDocs=9598) 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) 0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472), product of: 0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of: 5.0 = boost 6.298109 = idf(docFreq=47, maxDocs=9598) 0.0032871156 = queryNorm 6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1) 6.298109 = idf(docFreq=47, maxDocs=9598) 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) 0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472), product of: 0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of: 5.0 = boost 5.6806736 = idf(docFreq=88, maxDocs=9598) 0.0032871156 = queryNorm 5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1) 5.6806736 = idf(docFreq=88, maxDocs=9598) 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) 0.12 = coord(3/25) /str Which seems to suggest that the TF/IDF is being performed on these fields! Also, does it make any difference if I specify omitNorms in field definition vs specifying in fieldType definition? I will appreciate any help with this. Thanks, Ravish
org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Hi, I am getting this error: [doc=null] missing required field: id request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException: Job failed! I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Regards,
Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?
I found this: https://issues.apache.org/jira/browse/LUCENE-2236 So, it seems this feature is not supported in Solr 1.4 at all. Is there any possible work around? If not, I'll have to consider splitting my schema into two which will be quite a big change :( - Ravish On Mon, May 21, 2012 at 11:03 AM, Ravish Bhagdev ravish.bhag...@gmail.comwrote: Ahh, this is because I have to override DefaultSimilarity to turn off tf/idf scoring? But this will apply to all the fields and general search on text fields as well? Is there a way to apply custom similarity to specific field types or fields only? Is there no way of turning TF/IDF off without this? Thanks, Ravish On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev ravish.bhag...@gmail.com wrote: Hi All, I was wondering if omitNorms will have any effect on MLT handler at all? I'm using schema version 1.2 with Solr 1.4 and have defined couple of fields, which I want to use for MLT lookup and don't want factors like field length or TF/IDF to affect the scores. The definitions are as below: fieldType name=lowercase class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=text_nonorms class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType !-- and the fields that use the above field types are -- field name=PROFILE_TAGS type=lowercase indexed=true stored=true multiValued=true termVectors=true/ field name=PROFILE_TAGS_TXT type=text_nonorms indexed=true stored=true multiValued=true termVectors=true/ In My solrconfig.xml I have defined following for my MLT request handler: requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str str name=mlt.qfPROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0/str int name=mlt.mindf1/int int name=mlt.mintf1/int str name=flid,score/str str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str /lst /requestHandler However, when I run my query as follows: http://localhost:9090/solr/mlt?fl=*,scorestart=0q=id:4417454.matchRecordqt=/mltfq=targetDB:ConnectMeDBrows=1000debugQuery=on the debug scoring info shows following: str name=5042172.matchRecord 0.17156276 = (MATCH) product of: 1.4296896 = (MATCH) sum of: 0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472), product of: 0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of: 5.0 = boost 3.8795946 = idf(docFreq=538, maxDocs=9598) 0.0032871156 = queryNorm 3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1) 3.8795946 = idf(docFreq=538, maxDocs=9598) 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) 0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472), product of: 0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of: 5.0 = boost 6.298109 = idf(docFreq=47, maxDocs=9598) 0.0032871156 = queryNorm 6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1) 6.298109 = idf(docFreq=47, maxDocs=9598) 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) 0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472), product of: 0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of: 5.0 = boost 5.6806736 = idf(docFreq=88, maxDocs=9598) 0.0032871156 = queryNorm 5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472), product of: 1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1) 5.6806736 = idf(docFreq=88, maxDocs=9598) 1.0 =
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Am 21.05.2012 12:07, schrieb Tolga: Hi, I am getting this error: [doc=null] missing required field: id [...] I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Simply make sure that every document you're sending to Solr contains this id field. I assume it's declared as your unique id field, so it's mandatory. Greetings, Kuli
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. Regards, On 5/21/12 1:20 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:07, schrieb Tolga: Hi, I am getting this error: [doc=null] missing required field: id [...] I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Simply make sure that every document you're sending to Solr contains this id field. I assume it's declared as your unique id field, so it's mandatory. Greetings, Kuli
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Am 21.05.2012 12:40, schrieb Tolga: How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. It depends on what you're doing. Are you using nutch? -Kuli
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Yes. On 5/21/12 1:49 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:40, schrieb Tolga: How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. It depends on what you're doing. Are you using nutch? -Kuli
Re: Not able to use the highlighting feature! Want to return snippets of text
text:abstracthl=truehl.fl=textf.text.hl.snippets=2f.text.hl.fragsize=200debugQuery=true Three things to check: 1-) See your text field declared as suitable for highlighting. http://wiki.apache.org/solr/FieldOptionsByUseCase 2-) Increase hl.maxAnalyzedChars=Integer.MAX 3-) Increase maxFieldLengthInteger.MAX/maxFieldLength For some reason (complex analysis etc) snippets cannot be always generated. For this cases consider using hl.alternateField and hl.maxAlternateFieldLength
Re: Not able to use the highlighting feature! Want to return snippets of text
Take a look at the /browse request handler in the example solrconfig.xml and compare how it does highlighting to what you are doing. There are a lot of little details, so maybe even one might be missing. Also, you can only highlight stored fields, so make sure that text is stored. In the Solr example it is not stored and not intended to be stored, and highlighting should be performed using some other field containing the text as a stored field. -- Jack Krupansky -Original Message- From: 12rad Sent: Sunday, May 20, 2012 11:26 PM To: solr-user@lucene.apache.org Subject: Re: Not able to use the highlighting feature! Want to return snippets of text My query parameters are this: text:abstracthl=truehl.fl=textf.text.hl.snippets=2f.text.hl.fragsize=200debugQuery=true I still get the entire string as the result in the lst name=highlighting tag. -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985022.html Sent from the Solr - User mailing list archive at Nabble.com.
Facing problem to integrate UIMA in SOLR
Hello all, I am facing problem to integrate the UIMA in SOLR. I followed the following steps, provided in README file shipped along with Uima to integrate it in Solr Step1. I set lib/ tags in solrconfig.xml appropriately to point the jar files. lib dir=../../contrib/uima/lib / lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar / Step2. modified my schema.xml adding the fields I wanted to hold metadata specifying proper values for type, indexed, stored and multiValued options as follows: field name=language type=string indexed=true stored=true required=false/ field name=concept type=string indexed=true stored=true multiValued=true required=false/ field name=sentence type=text indexed=true stored=true multiValued=true required=false / Step3. modified my solrconfig.xml adding the following snippet: updateRequestProcessorChain name=uima default=true processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str str name=oc_licenseIDVALID_OPENCALAIS_KEY/str /lst str name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str bool name=ignoreErrorstrue/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strtext/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str lst name=mapping str name=featuretext/str str name=fieldconcept/str /lst /lst lst name=type str name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str lst name=mapping str name=featurelanguage/str str name=fieldlanguage/str /lst /lst lst name=type str name=nameorg.apache.uima.SentenceAnnotation/str lst name=mapping str name=featurecoveredText/str str name=fieldsentence/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Step 4: And finally created a new UpdateRequestHandler with the following: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processoruima/str /lst Further I indexed a word file called text.docx using the following command: curl http://localhost:8983/solr/update/extract?fmap.content=contentliteral.id=doc47commit=true; -F file=@test.docx When I searched the same document with http://localhost:8983/solr/select?q=id:doc47; command, got the following result i.e. not getting the additional UIMA fields in the response. result name=response numFound=1 start=0 doc str name=authordivakar/str arr name=content_type str application/vnd.openxmlformats-officedocument.wordprocessingml.document /str /arr str name=iddoc47/str date name=last_modified2012-04-18T14:19:00Z/date /doc /result Can anyone how to solve this problem? With Regds Thanks Divakar -- View this message in context: http://lucene.472066.n3.nabble.com/Facing-problem-to-integrate-UIMA-in-SOLR-tp3985089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Searching MySQL table with Hindi and English data
Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P. Sanjailal -- -- Lance Norskog goks...@gmail.com
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Solr appears to force your UniqueKey field to be required even though you don't have an explicit required=true attribute. As a debugging aid, try adding default=missing to your id field definition and then you can query on id:missing and see what data is being indexed without an id. But, it would be better to examine the input data and see why it is missing the id field, since that is the real problem that needs to be resolved. -- Jack Krupansky -Original Message- From: Tolga Sent: Monday, May 21, 2012 6:07 AM To: solr-user@lucene.apache.org Subject: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id Hi, I am getting this error: [doc=null] missing required field: id request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException: Job failed! I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Regards,
Re: Indexing Searching MySQL table with Hindi and English data
Is it possible that your text editor/display does not support UTF-8 encoding? Assuming the data is properly encoded, do you have the encoding=UTF-8 attribute in your DIH dataSource tag? -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Monday, May 21, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing Searching MySQL table with Hindi and English data Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P. Sanjailal -- -- Lance Norskog goks...@gmail.com
Re: problem in replication
hi Tomas , My queries are complex ,i am faceting on many fields ,and using highlighting and using boosts etc in the same query . auto warming takes hell lot of time hence i have removed it . -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-replication-tp3984654p3985098.html Sent from the Solr - User mailing list archive at Nabble.com.
no css on browse UI when multicore
Hi The css files from the browse GUI in solr 3.6 does not seem to work properly when solr is deployed with multiple cores and I cant figure out how to solve this. I know this have been an issue in solr but I thought it was fixed in the newer versions. Any answers or pointers on how to get this fixed is much appreciatedJ Regards, Aleksander Akerø
boost function parameter (bf) ignores character escaping
Hey, I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what I think is a bug with the boost function (bf) parameter. I've used sunspot (for use of solr with rails) which allows managing dynamic fields, which by default creates fields like dynamicfield:value1,dynamicfield:value2, though using the : character in the field name, which needs to be escaped. If I use a query which includes q=dynamicfield\:value1:6, everything works fines and matches are found. However, if I use the bf field with bf=dynamicfield\:value1, I get an error message undefined field dynamicfield, the same without escaping the : Should I file a bug report? Best, Nils ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/intlst name=paramsstr name=start0/strstr name=qexpertise\:solution_database_i:6/strstr name=defTypeedismax/strstr name=rows10/str/lst/lstresult name=response numFound=1 start=0docarr name=full_name_textsstrNils Kaiser/str/arrstr name=idUser 4f32081ccd112e65d36c/str/doc/result /response
Re: no css on browse UI when multicore
On May 21, 2012, at 08:11 , Aleksander Akerø wrote: The css files from the browse GUI in solr 3.6 does not seem to work properly when solr is deployed with multiple cores and I can’t figure out how to solve this. I know this have been an issue in solr but I thought it was fixed in the newer versions. Any answers or pointers on how to get this fixed is much appreciatedJ Each core has it's own templates, and thus it'll be core dependent. There is a conf/velocity/VM_global_library.vm that has the base path that the other templates can use for a base path, and it should look like this: #macro(url_for_solr)/solr#if($request.core.name != )/$request.core.name#end#end And the stylesheet is referenced in head.vm like this: link rel=stylesheet type=text/css href=#{url_for_solr}/admin/file?file=/velocity/main.csscontentType=text/css/ This requires that the file serving handler (/admin/file) be enabled and that conf/velocity/main.css exist. Does that help? If not, what is the HTML rendered from /browse say as the CSS URL? What error does hitting that URL directly give? Erik
UI
Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Re: boost function parameter (bf) ignores character escaping
Yeah, a bug report would be good. But really this is a Sunspot bug report. Field names should NOT have :'s in them. Field names should stick to standard Java identifier rules, otherwise it's escaping madness. You could try something like this as a workaround: bq=_val_:dynamicfield\:value1 I don't know if that'll do better than the bf issue you've hit, but it's another way of doing the same sort of thing. Erik On May 21, 2012, at 08:01 , m...@nils-kaiser.de wrote: Hey, I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what I think is a bug with the boost function (bf) parameter. I've used sunspot (for use of solr with rails) which allows managing dynamic fields, which by default creates fields like dynamicfield:value1,dynamicfield:value2, though using the : character in the field name, which needs to be escaped. If I use a query which includes q=dynamicfield\:value1:6, everything works fines and matches are found. However, if I use the bf field with bf=dynamicfield\:value1, I get an error message undefined field dynamicfield, the same without escaping the : Should I file a bug report? Best, Nils solr_debug_normalquery.xml
Re: boost function parameter (bf) ignores character escaping
Quoting from the new trunk example schema: field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. In other words, don't do it. Replace the colon with an underscore in your field names. -- Jack Krupansky -Original Message- From: m...@nils-kaiser.de Sent: Monday, May 21, 2012 8:01 AM To: solr-user@lucene.apache.org Subject: boost function parameter (bf) ignores character escaping Hey, I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what I think is a bug with the boost function (bf) parameter. I've used sunspot (for use of solr with rails) which allows managing dynamic fields, which by default creates fields like dynamicfield:value1,dynamicfield:value2, though using the : character in the field name, which needs to be escaped. If I use a query which includes q=dynamicfield\:value1:6, everything works fines and matches are found. However, if I use the bf field with bf=dynamicfield\:value1, I get an error message undefined field dynamicfield, the same without escaping the : Should I file a bug report? Best, Nils
RE: Solr Single Core vs Multiple Cores installation for localization
We intend to have separate, language specific search UI. At the moment we like solution with separate cores more because it is more flexible. But as a rule flexibility costs in terms of performance and we would like to know that price. Jack, what did you mean by 'Managing a bunch of small and tiny cores could be a pain'? Could you please provide more details. Thank you for your help, Ivan -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 17, 2012 3:17 AM To: solr-user@lucene.apache.org Subject: Re: Solr Single Core vs Multiple Cores installation for localization First you have to answer the twin questions of what you want the user experience to be and what expectations users may have independent of your intentions. Do you intend to have separate, language specific search UI? That would match up with separate cores, but can be done with a language type field as well. Sometimes, users want only documents in a specific language, but sometimes they want a globalized search for technical terms or names across all languages, such as searching for Lucene OR Solr and then faceting by language to get an idea of use by language. From a practical perspective, maybe most docs would be English, so that would be one big core anyway. And the main secondary languages would be modest sized, and then you may have a large number of tiny cores. Managing a bunch of small and tiny cores could be a pain. Maybe three cores: English-only, all non-English, and all language - if globalized search is desired. The all non-English could have a filter query on the specific language desired, or using different field sets for query and returned fields in a edismax query request. This is just one technical approach, but it still all depends on intended user experience and user expectations. -- Jack Krupansky -Original Message- From: Ivan Hrytsyuk Sent: Wednesday, May 16, 2012 6:31 AM To: solr-user@lucene.apache.org Subject: Solr Single Core vs Multiple Cores installation for localization Hello, We are going to add multi-language support for our Solr-based project. We consider next Solr installation types: 1. Single core - all fields for all languages reside in a single core. I.e. title_en, description_en, title_de, description_de, title_fr, description_fr 2. Multiple cores - one core for one language Looks like Multiple cores installation is more appropriate for multi-language, but we would like to see expert comments on this. What we have found so far for Multiple cores are: * Pros o Searching is faster because there is a linear relationship between index size and query response time as the size of index volumes increases o More flexible. We can shut-down any core at any time o Easier to maintain * Cons o Startup time is bigger in comparison with Single core Could anyone suggest: 1. Indexing for Multiple cores will be faster in comparison to Single core installation because size of index is smaller. Is there any relationship between size of index and time for indexing process? 2. How bigger startup time is for Solr with 30 multiple cores in comparison to Single core in case cache warming is disabled? This option is really important for us. 3. What processes are executed during Solr startup? Thank you in advance, Ivan
RE: no css on browse UI when multicore
Ok, thanks a bunch! I think the url's are set up properly but we have sort of made our own solrconfig files so it's probably the file handler then. I will look into that, but I'm 99.999% sure that this was my problem. Again, thank you for the quick reply! -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: 21. mai 2012 14:33 To: solr-user@lucene.apache.org Subject: Re: no css on browse UI when multicore On May 21, 2012, at 08:11 , Aleksander Akerø wrote: The css files from the browse GUI in solr 3.6 does not seem to work properly when solr is deployed with multiple cores and I cant figure out how to solve this. I know this have been an issue in solr but I thought it was fixed in the newer versions. Any answers or pointers on how to get this fixed is much appreciatedJ Each core has it's own templates, and thus it'll be core dependent. There is a conf/velocity/VM_global_library.vm that has the base path that the other templates can use for a base path, and it should look like this: #macro(url_for_solr)/solr#if($request.core.name != )/$request.core.name#end#end And the stylesheet is referenced in head.vm like this: link rel=stylesheet type=text/css href=#{url_for_solr}/admin/file?file=/velocity/main.csscontentType=text/cs s/ This requires that the file serving handler (/admin/file) be enabled and that conf/velocity/main.css exist. Does that help? If not, what is the HTML rendered from /browse say as the CSS URL? What error does hitting that URL directly give? Erik
Re: Fault tolerant Solr replication architecture
Have you looked at DataStax Enterprise? On May 21, 2012 12:25 AM, Parvin Gasimzade parvin.gasimz...@gmail.com wrote: Hi, I am using solr with replication. I have one master that indexes data and two slaves which pulls index from master and responds to the queries. My question is, how can i create fault tolerant architecture? I mean what should i do when master server crashes? I heard that repeater is used for this type of architecture. Then, do I have to create one master, one slave with repeater and one slave? Another question is, if master crashes then does slave with repeater start indexing authomatically or should i configure it manually? I asked similar question on the stackoverflow : http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture Any help will be appreciated. Regards, Parvin
Re: using Carrot2 custom ITokenizerFactory
My problem was gone. Thanks Staszek and Dawid! koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/05/21 18:11), Stanislaw Osinski wrote: Hi Koji, Dawid came up with a simple fix for this, it's committed to trunk and 3.6 branch. Staszek
RE: SolrCloud deduplication
Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
Re: Duplicate documents being added even with unique key
Changing my field type to string for my uniquekey field solved the problem. Thanks to Jack and Erik for the fix! On May 18, 2012, at 5:33 PM, Jack Krupansky wrote: Typically the uniqueKey field is a string field type (your schema uses text_general), although I don't think it is supposed to be a requirement. Still, it is one thing that stands out. Actually, you may be running into some variation of SOLR-1401: https://issues.apache.org/jira/browse/SOLR-1401 In other words, stick with string and stay away from a tokenized (text) key. You could also get duplicates by merging cores or if your add has allowDups = true or overwrite=false. -- Jack Krupansky -Original Message- From: Parmeley, Michael Sent: Friday, May 18, 2012 5:50 PM To: solr-user@lucene.apache.org Subject: Duplicate documents being added even with unique key I have a uniquekey set in my schema; however, I am still getting duplicated documents added. Can anyone provide any insight into why this may be happening? This is in my schema.xml: !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyuniquekey/uniqueKey field name=uniquekey type=text_general indexed=true stored=true required=true / On startup I get this message in catalina.out: INFO: unique key field: uniquekey However, you can see I get multiple documents: result name=response numFound=7 start=0 doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc doc str name=abbreviationPSR3/str int name=clientid1/int str name=entitytypeSkill/str int name=id510/int str name=nameBody and Soul/str int name=projectid1/int int name=skillnumber281/int str name=uniquekeySkill510/str /doc /result
RE: SolrCloud deduplication
Hi again, It seemed to work fine but in the end duplicates are not overwritten. We first run the SignatureProcessor and then the DistributedProcessor. If we do it the other way around the digest field receives multiple values and throws errors. Is there anything else we can do or another patch to try? Thanks Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Mon 21-May-2012 15:58 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com Subject: RE: SolrCloud deduplication Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
Re: Question about wildcards
Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
I change the fieldtype of field to the follow: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzertokenizer class=solr.WhitespaceTokenizerFactory//analyzer /fieldType As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i could find using *2231?7, *2231*7, *2231-7, *2231*,.2231-7. How i can see, with this tokenizer the text was not spplitted. Is that the best way to solve this? Thanks 2012/5/21 Anderson vasconcelos anderson.v...@gmail.com Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322.#1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
And, generally when I see a field that has values like .2231-7, it should be a string field rather than tokenized text. As a string, you can then do straight wildcards without surprises. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
Thanks all for the explanations. Anderson 2012/5/21 Jack Krupansky j...@basetechnology.com And, generally when I see a field that has values like .2231-7, it should be a string field rather than tokenized text. As a string, you can then do straight wildcards without surprises. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:***2231-7/str str name=querystringfield:***2231-7/str str name=parsedqueryfield:***2231-7/str str name=parsedquery_toString**field:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
RE: SolrCloud deduplication
https://issues.apache.org/jira/browse/SOLR-3473 -Original message- From:Mark Miller markrmil...@gmail.com Sent: Mon 21-May-2012 18:11 To: solr-user@lucene.apache.org Subject: Re: SolrCloud deduplication Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert update commands into solr documents - and that can cause a loss of info if an update proc modifies the update command. I think the reason that you see a multiple values error when you try the other order is because of the lack of a document clone (the other issue I mentioned a few emails back). Addressing that won't solve your issue though - we have to come up with a way to propagate the currently lost info on the update command. - Mark On May 21, 2012, at 10:39 AM, Markus Jelsma wrote: Hi again, It seemed to work fine but in the end duplicates are not overwritten. We first run the SignatureProcessor and then the DistributedProcessor. If we do it the other way around the digest field receives multiple values and throws errors. Is there anything else we can do or another patch to try? Thanks Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Mon 21-May-2012 15:58 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com Subject: RE: SolrCloud deduplication Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: boost function parameter (bf) ignores character escaping
I think there is a way in sunspot to give an explicit name to a field so that sunspot doesn't generate class-namecolonfield-name for field names. I think it is the :as function, such as: string :name, :as = :name_s So, you can then refer to name in your ruby code and name_s will be the field name in Solr. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 8:37 AM To: solr-user@lucene.apache.org Subject: Re: boost function parameter (bf) ignores character escaping Quoting from the new trunk example schema: field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. In other words, don't do it. Replace the colon with an underscore in your field names. -- Jack Krupansky -Original Message- From: m...@nils-kaiser.de Sent: Monday, May 21, 2012 8:01 AM To: solr-user@lucene.apache.org Subject: boost function parameter (bf) ignores character escaping Hey, I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what I think is a bug with the boost function (bf) parameter. I've used sunspot (for use of solr with rails) which allows managing dynamic fields, which by default creates fields like dynamicfield:value1,dynamicfield:value2, though using the : character in the field name, which needs to be escaped. If I use a query which includes q=dynamicfield\:value1:6, everything works fines and matches are found. However, if I use the bf field with bf=dynamicfield\:value1, I get an error message undefined field dynamicfield, the same without escaping the : Should I file a bug report? Best, Nils
RE: trunk cloud ui not working
After further investigation I have found that it is not a problem on firefox, only chrome and IE. Phil -Original Message- Sent: 21 May 2012 18:05 To: solr-user@lucene.apache.org Subject: trunk cloud ui not working Hi, I am running from the trunk and the localhost:8983/solr/#/~cloud page shows nothing but Fetch Zookeeper Data. If I run fiddler I see that: http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json and http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes are called and return data but no update to the ui. Cheers, Phil __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Re: trunk cloud ui not working
What OS? I was just trying trunk and looking at that view on Chrome on OSX and Linux and did not see an issue. On May 21, 2012, at 1:15 PM, Phil Hoy wrote: After further investigation I have found that it is not a problem on firefox, only chrome and IE. Phil -Original Message- Sent: 21 May 2012 18:05 To: solr-user@lucene.apache.org Subject: trunk cloud ui not working Hi, I am running from the trunk and the localhost:8983/solr/#/~cloud page shows nothing but Fetch Zookeeper Data. If I run fiddler I see that: http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json and http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes are called and return data but no update to the ui. Cheers, Phil __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __ - Mark Miller lucidimagination.com
Re: Not able to use the highlighting feature! Want to return snippets of text
The field I am trying to highlight is stored. field name=text type=text_en required=false compressed=false omitNorms=false indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ In the searchHandler i've set the parameters as follows: str name=hlon/str str name=hl.fltext/str str name =hl.snippets5/str str name=hl.fragsize1000/str str name=hl.maxAnalyzedChars51/str str name=hl.requireFieldMatchtrue/str str name=hl.fragmenterregex/str str name =hl.fragListBuildersimple/str str name =hl.fragmentsBuildercolored/str str name=hl.phraseLimit1000/str str name=hl.usePhraseHighlightertrue/str str name=hl.highlightMultiTermtrue/str str name =hl.useFastVectorHighlighertrue/str I still don't see any highlighting. I've managed to get snippets of text but the actual word is not highlighted. I don't know where I am going wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fault tolerant Solr replication architecture
Parvin, What you are looking for is already available in the bleeding edge, unreleased version of Solr, which will become version 4.0 sometime later this year. You can download it at [1] and test it out. The feature is called SolrCloud [2] and it replaces the old replication mechanism in 1.x and 3.x versions. Instead of slaves pulling the whole index from masters, the masters will forward individual updates to the slaves. Note that this feature is still under development and certain things will change before 4.0 release, but it is pretty stable and even in use in production some places. [1] https://builds.apache.org/job/Solr-trunk/lastSuccessfulBuild/artifact/artifacts/ [2] http://wiki.apache.org/solr/SolrCloud -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 21. mai 2012, at 09:25, Parvin Gasimzade wrote: Hi, I am using solr with replication. I have one master that indexes data and two slaves which pulls index from master and responds to the queries. My question is, how can i create fault tolerant architecture? I mean what should i do when master server crashes? I heard that repeater is used for this type of architecture. Then, do I have to create one master, one slave with repeater and one slave? Another question is, if master crashes then does slave with repeater start indexing authomatically or should i configure it manually? I asked similar question on the stackoverflow : http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture Any help will be appreciated. Regards, Parvin
Re: CloudSolrServer not working with standalone Zookeeper
Ok, it seems that a maven dependency to zookeeper version 3.3 broke this. Now it connects to the zk instance. Thanks. On Mon, May 21, 2012 at 5:31 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Thanks for your feedback. I don't know. I've tried just now with the newest trunk version and the embedded ZK on port 9983. In the logs of the zk-solr it shows: *INFO: Accepted socket connection from /XXX.XXX.XXX.XXX:1055* *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn doIO* *WARNING: EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket* *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn closeSock* *INFO: Closed socket connection for client /XXX.XXX.XXX.XXX:1055 (no session established for client)* So it can definitely connects to the port in my opinion, but it closes the connection after the defined timeout (here 1ms) *Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper MYZKHOST.:9983 within 1 m* Hmm. I also thought that this trivial setup should work. Will check again. Daniel On Fri, May 18, 2012 at 4:23 PM, Mark Miller markrmil...@gmail.comwrote: Seems something is stopping the connection from occurring? Tests are constantly running and doing this using an embedded zk server - and I know more than a few people using an external zk setup. I'd have to guess something in your env or URL is causing this? On May 16, 2012, at 3:11 PM, Daniel Brügge wrote: OK, it's also not working with an internal started Zookeeper. On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/thatit is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to connect to my standalone Zookeeper instance (not started with a Solr instance and -DzkRun) I am getting this error: Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper I am also getting this error when I try to connect directly to one of the Solr instances. My code looks like this: solr = new CloudSolrServer(myzkhost:2181); ((CloudSolrServer) solr).setDefaultCollection(collection1); I am working with the latest Solr trunk version ( https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/) Do I need to start the zookeeper in Solr to keep this working? Thanks regards Daniel - Mark Miller lucidimagination.com
Re: Lucene FieldCache - Out of memory exception
: I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application server : on Solaris. I use embedded solr server. More details : FWIW: Solr 1.3 is *REALLY* old ... do not be suprised if much of the info you are given (or read) doesn't apply. : - some mail threads on this forum seem to indicate that there could be some : connection between having dynamic fields and usage of FieldCache. Is this : true ? Most of the fields in my index are dynamic fields. there is no specific corrolation between dynamic fields and the field cache -- what you may be seeing is people commenting about dangers of *using* field caches with dynamic fields, because typically when people use dynamic fields there is no fixed number of pre-defined fields in use (that's the whole perk of dynamic fields) so if you are using hundreds or thousands of dynamic field in a way that involves the field cache, you might have problems (because field cache objects tend to be large) : - as mentioned above, most of my faceted queries could have around 50-70 : facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields : per query). Could this be the source of the problem ? Is this too high for : solr to support ? In Solr 1.3, faceting does not use the field cache AT ALL! starting with Solr 1.4, facting can use the field cache (or a similar concept called UnInvertedFields when multivalued). You can force Solr1.4+ not to use the fielld cache for this by specifying facet.method=enum https://wiki.apache.org/solr/SimpleFacetParameters#facet.method : - Initially, I had a facet.sort defined in solrconfig.xml. Since FieldCache : builds up on sorting, I even removed the facet.sort and tried, but no : respite. The behavior is same as before. Facet sorting is not the same as result sorting. facet sorting does not use the field cache at all. nothing you've mentioned in your initial email, or the example query you posted should involve the field cache in anyway (in Solr 1.3!) so if you are seeing your heap eaten up by field cache objects there is more going on in your system then you know about (or that you've told us) ... you need to look at the fields assocaited with those field caches, and then see how you are using those fields in requests, to make sense of what they exist. in your heap. -Hoss
Re: Not able to use the highlighting feature! Want to return snippets of text
Hi, Can you please provide the definitions of the following 3 objects from your solrconfig.xml ? str name =hl.fragListBuildersimple/str str name =hl.fragmentsBuildercolored/str str name=hl.fragmenterregex/str For eg, the simple hl.fragListBuilder should be defined as mentioned below in your solrconfig.xml fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ On Mon, May 21, 2012 at 2:06 PM, 12rad prama.an...@gmail.com wrote: The field I am trying to highlight is stored. field name=text type=text_en required=false compressed=false omitNorms=false indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ In the searchHandler i've set the parameters as follows: str name=hlon/str str name=hl.fltext/str str name =hl.snippets5/str str name=hl.fragsize1000/str str name=hl.maxAnalyzedChars51/str str name=hl.requireFieldMatchtrue/str str name=hl.fragmenterregex/str str name =hl.fragListBuildersimple/str str name =hl.fragmentsBuildercolored/str str name=hl.phraseLimit1000/str str name=hl.usePhraseHighlightertrue/str str name=hl.highlightMultiTermtrue/str str name =hl.useFastVectorHighlighertrue/str I still don't see any highlighting. I've managed to get snippets of text but the actual word is not highlighted. I don't know where I am going wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
SolrJ: clusters, labels, docs - search results
Hello, Was wondering how to access the cluster labels, and docs(ids) via SolrJ? I have added the following: query.seParam(q, userQuery); query.setParam(clustering, true); query.setParam(qt, /core2/clustering); query.setParam(carrot.title, title); But how to access the labels, docs in the clusters and display in a search result? Also, I've seen others specify clustering in this manner... ModifiableSolrParams params = new ModifiableSolrParams(); params.set(qt, /core2/clustering); params.set(q, userQuery); params.set(carrot.title, title); params.set(clustering, true); Is this preferred over the other? Thanks
Re: Not able to use the highlighting feature! Want to return snippets of text
For the fragListBuilder it's fragListBuilder name=simple default=true class=solr.highlight.SimpleFragListBuilder/ fragment builder is fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre/str str name=hl.tag.post/str /lst /fragmentsBuilder fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Facets and doc count for a term
: Is there a way to not only get the number of times a term appears for : a particular field (faceting) as well as the number of documents that : were associated with a particular term? So for instance if I had the : following docs Nope... faceting is associated with _sets_ of documents, so there is no scoring info associated with each constraint, just the number of documents in the set (that's what allows it to be very efficient) -Hoss
Re: Not able to use the highlighting feature! Want to return snippets of text
Hi, I believe, in your colored fragmentsBuilder definition, you have not mentioned anything in your pre and post tags and that may be the reason that you are getting snippets of text, without highlighting. Please refer http://wiki.apache.org/solr/HighlightingParameters and check the hl.fragmentsBuilder section. Try specifying the pre and post tags with information as mentioned below. (same as wiki link above) !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder On Mon, May 21, 2012 at 3:52 PM, 12rad prama.an...@gmail.com wrote: For the fragListBuilder it's fragListBuilder name=simple default=true class=solr.highlight.SimpleFragListBuilder/ fragment builder is fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre/str str name=hl.tag.post/str /lst /fragmentsBuilder fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
how to join 3 tables to pull required data
I am having a situation where I need to join 3 tables to pull the required information. Can anyone throw me some ideas!!! select A.sid, B.cid, C.NAME from table1 A, table2 B, table3 C where A.sid= C.sid and A.oid = B.oid and C.typeid = 5 and C.flag = 0 and B.cid= 1000; Can you please provide schema file above requirement? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-join-3-tables-to-pull-required-data-tp3985218.html Sent from the Solr - User mailing list archive at Nabble.com.
Remote streaming - posting a URL which is password protected
I want to post index a http document that is password protected. It has a username name login. I tried doing this curl -u username:password http://localhost:8983/solr/update/extract?literal.id=doc900commit=true; -F stream.url=http://somewebsite.com/docs/DOC2609 but it just indexes the login page only. -- View this message in context: http://lucene.472066.n3.nabble.com/Remote-streaming-posting-a-URL-which-is-password-protected-tp3985221.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr mail dataimporter cannot be found
Hi, I want to index emails using solr. I put the user name, password, hostname in data-config.xml under mail folder. This is a valid email but when I run in url http://localhost:8983/solr/mail/dataimport?command=full-import It said cannot access mail/dataimporter reason: no found. But when i run http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import or http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import They can be found. In addition, when I run the command java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of solr UI, there are db, rss, tika and solr but no mail. Is it a bug that mail indexing? Thank you so much! Best, Emma
Re: Remote streaming - posting a URL which is password protected
Hi, Using curl -u will only attempt to log in to Jetty/Solr, which is not password protected I assume. What you really would like is for the HTTP call which Solr does based on stream.url to attempt a login. Such functionality is not implemented as far as I know. You may try the syntax stream.url=http://username:passw...@somewebsite.com/docs/DOC2609 but I have not tested it. Why can't you download the file locally first? If you're looking for a production grade HTTP crawler you could look at ManifoldCF. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 21. mai 2012, at 22:44, 12rad wrote: I want to post index a http document that is password protected. It has a username name login. I tried doing this curl -u username:password http://localhost:8983/solr/update/extract?literal.id=doc900commit=true; -F stream.url=http://somewebsite.com/docs/DOC2609 but it just indexes the login page only. -- View this message in context: http://lucene.472066.n3.nabble.com/Remote-streaming-posting-a-URL-which-is-password-protected-tp3985221.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UI
yes, I am using this library and it works perfectly so far. If something does not work you can just modify it http://code.google.com/p/solr-php-client/ Johannes 2012/5/21 Tolga to...@ozses.net: Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Re: Newbie with Carrot2?
: Subject: Newbie with Carrot2? : References: 35E48F3294A0416A8F476E9C173321F3@msrvcn04 : In-Reply-To: 35E48F3294A0416A8F476E9C173321F3@msrvcn04 https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Date format in the schema.xml
: Subject: Date format in the schema.xml : References: 1336981696.60953.yahoomailclas...@web121705.mail.ne1.yahoo.com : In-Reply-To: 1336981696.60953.yahoomailclas...@web121705.mail.ne1.yahoo.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: UI
My favourite php library is solarium. Everything OOP. I've tried a few. http://www.solarium-project.org/ Sent from my iPhone On 21/05/2012, at 6:44 PM, Johannes Goll johannes.g...@gmail.com wrote: yes, I am using this library and it works perfectly so far. If something does not work you can just modify it http://code.google.com/p/solr-php-client/ Johannes 2012/5/21 Tolga to...@ozses.net: Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Re: Solr 3.6.0 problem with multi-core and json
: I should clarify the error a bit. When I make a select request on my first : core (called core0) using the wt=json parameter I get a 400 response with : the explanation undefined field: gid. The field gid is not defined in the : schema.xml file of my first core. But, it is defined in the schema.xml file : of my third core (core2). Hopefully, this is a slightly better explanation : of the problem. What is the full stack trace of the error? (even if your client doens't get it, it should be in the log) Are you sure there is no refrence to gid in your core0 solrconfig.xml? -Hoss
SolrCloud: how to index documents into a specific core and how to search against that core?
Hi Guys, I use following command to start solr cloud according to solr cloud wiki. yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar Then I have created several cores using CoreAdmin API ( http://localhost:8983/solr/admin/cores?action=CREATEname= coreNamecollection=collection1), and clusterstate.json show following topology: collection1: -- shard1: -- collection1 -- CoreForCustomer1 -- CoreForCustomer3 -- CoreForCustomer5 -- shard2: -- collection1 -- CoreForCustomer2 -- CoreForCustomer4 1) Index: Using following command to index mem.xml file in exampledocs directory. yydzero:exampledocs bjcoe$ java -Durl= http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml SimplePostTool: version 1.4 SimplePostTool: POSTing files to http://localhost:8983/solr/coreForCustomer3/update.. SimplePostTool: POSTing file mem.xml SimplePostTool: COMMITting Solr index changes. And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3', 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2 core has 0 documents. *Question 1:* Is this expected behavior? How do I to index documents into a specific core? *Question 2*: If SolrCloud don't support this yet, how could I extend it to support this feature (index document to particular core), where should i start, the hashing algorithm? *Question 3*: Why the documents are also indexed into 'coreForCustomer1' and 'coreForCustomer5'? The default replica for documents are 1, right? Then I try to index some document to 'coreForCustomer2': $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar post.jar ipod_video.xml While 'coreForCustomer2' still have 0 documents and documents in ipod_video are indexed to core for customer 1/3/5. *Question 4*: Why this happens? 2) Search: I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to search against 'CoreForCustomer2', while it will return all documents in the whole collection even though this core has no documents at all. Then I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2;, and it will return 0 documents. *Question 5*: So If want to search against a particular core, we need to use 'shards' parameter and use solrCore name as parameter value, right? Thanks very much in advance! Regards, Yandong
Date boosting mlt results - possible?
Specifically if I'm doing a query using the solr mlt handler (http://wiki.apache.org/solr/MoreLikeThisHandler) and stream.body to supply the source doc is there any way to boost result documents based on document age? I already know how to do that for a regular query using dismax (http://wiki.apache.org/solr/FunctionQuery#Date_Boosting) but I can't quite figure out the magic incantation to do it for the mlt handler. John Pettitt Email: j...@p.tt
Re: And results before Or results
: I want to have a strick enforcement that In case of a 3 word search, those : results that match all 3 term should be presented ahead of those that match : 2 terms when I set mm=2. : : I have seen quite some cases where, those results that match 2 out of 3 : words appear ahead of those matching all 3 words. which can happen because of tf/idf and length normalization. if you disable all of those things for hte fields you search on (omitNorms=true omitTf=true) you should see a strict ordering based on the number of matching clauses. -Hoss
Re: UI
The php.net plugin is the best. SolrPHPClient is missing several features. Sent from my Mobile device 720-256-8076 On May 21, 2012, at 6:35 AM, Tolga to...@ozses.net wrote: Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Re: SolrCloud: how to index documents into a specific core and how to search against that core?
Why do you want to control what gets indexed into a core and then knowing what core to search? That's the kind of knowing that SolrCloud solves. In SolrCloud, it handles the distribution of documents across shards and retrieves them regardless of which node is searched from. That is the point of cloud, you don't know the details of where exactly documents are being managed (i.e. they are cloudy). It can change and re-balance from time to time. SolrCloud performs the distributed search for you, therefore when you try to search a node/core with no documents, all the results from the cloud are retrieved regardless. This is considered A Good Thing. It requires a change in thinking about indexing and searching On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: Hi Guys, I use following command to start solr cloud according to solr cloud wiki. yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar Then I have created several cores using CoreAdmin API ( http://localhost:8983/solr/admin/cores?action=CREATEname= coreNamecollection=collection1), and clusterstate.json show following topology: collection1: -- shard1: -- collection1 -- CoreForCustomer1 -- CoreForCustomer3 -- CoreForCustomer5 -- shard2: -- collection1 -- CoreForCustomer2 -- CoreForCustomer4 1) Index: Using following command to index mem.xml file in exampledocs directory. yydzero:exampledocs bjcoe$ java -Durl= http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml SimplePostTool: version 1.4 SimplePostTool: POSTing files to http://localhost:8983/solr/coreForCustomer3/update.. SimplePostTool: POSTing file mem.xml SimplePostTool: COMMITting Solr index changes. And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3', 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2 core has 0 documents. *Question 1:* Is this expected behavior? How do I to index documents into a specific core? *Question 2*: If SolrCloud don't support this yet, how could I extend it to support this feature (index document to particular core), where should i start, the hashing algorithm? *Question 3*: Why the documents are also indexed into 'coreForCustomer1' and 'coreForCustomer5'? The default replica for documents are 1, right? Then I try to index some document to 'coreForCustomer2': $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar post.jar ipod_video.xml While 'coreForCustomer2' still have 0 documents and documents in ipod_video are indexed to core for customer 1/3/5. *Question 4*: Why this happens? 2) Search: I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to search against 'CoreForCustomer2', while it will return all documents in the whole collection even though this core has no documents at all. Then I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2;, and it will return 0 documents. *Question 5*: So If want to search against a particular core, we need to use 'shards' parameter and use solrCore name as parameter value, right? Thanks very much in advance! Regards, Yandong
Re: adding an OR to a fq makes some doc that matched not match anymore
: - /suggest?q=suggest_terms:lap*fq=type:Pfq=(-type:B) : numFound=1 : doc, so adding a doc will also fulfill right? : /suggest?q=suggest_terms:lap*fq=type:Pfq=(-type:B OR name:aa) : numFound=0 : : is there a logical explanation?? http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ -Hoss
Re: And results before Or results
Interesting, omitTf=true eventhough it would give strict enforcement, wouldnt it affect the relevancy? Like, I am wondering if the ordering amongst the three word matches would be not as good as it would be when we have omitNorms=trueomitTf=true. Do you have an idea? On Mon, May 21, 2012 at 8:51 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I want to have a strick enforcement that In case of a 3 word search, those : results that match all 3 term should be presented ahead of those that match : 2 terms when I set mm=2. : : I have seen quite some cases where, those results that match 2 out of 3 : words appear ahead of those matching all 3 words. which can happen because of tf/idf and length normalization. if you disable all of those things for hte fields you search on (omitNorms=true omitTf=true) you should see a strict ordering based on the number of matching clauses. -Hoss -- -- Karthick D S Master's in Computer Engineering ( Software Track ) Syracuse University Syracuse - 13210 New York United States of America
RE: Advanced search with results matrix
: No, it's not just one single query, rather, as I've mentioned before, it's : combination of searches with result count for each combination. Explained : in detail below: : 1) (SQL Server OR SQL) : 2) (Visual Basic OR VB.NET) : 3) (Java AND JavaScript) : 4) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) : 5) (Visual Basic OR VB.NET) AND (Java AND JavaScript) : 6) (SQL Server OR SQL) AND (Java AND JavaScript) : 7) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) AND (Java AND : JavaScript) As an added bonus, you can use nested parsers to simplify how you express your query... q1=...input from textbox #1... q2=...input from textbox #2... q3=...input from textbox #3... q=*:* facet=true facet.query={!v=q1} facet.query={!v=q2} facet.query={!v=q3} facet.query=+_query_:{!v=$q1} +_query_:{!v=$q2} facet.query=+_query_:{!v=$q1} +_query_:{!v=$q3} facet.query=+_query_:{!v=$q2} +_query_:{!v=$q3} facet.query=+_query_:{!v=$q3} +_query_:{!v=$q4} ...which doesn't look simpler, until you reallize that you can hardcode everything except q1, q2, and q3 in default params for a special request handler in your solrconfig.xml -Hoss
Re: And results before Or results
: Interesting, omitTf=true eventhough it would give strict enforcement, : wouldnt it affect the relevancy? Like, I am wondering if the ordering : amongst the three word matches would be not as good as it would be when we : have omitNorms=trueomitTf=true. Do you have an idea? It will *absolutely* affect the ranking ... that's the entire point. if the complaint is docA containig only two of the clauses scores higher then docB matching all 3 clauses the reason for that is (usually) because tf/idf scoring for docA is a *REALLY* good match for those two clauses (ie: they occur many, many times) where as docB might match all three but it may only match each of them once. you can't garuntee a strict ordering based on number of clauses that match unless you eliminate term freq and norms from the equation. That said, i realize now that i forgot to finish my previous message with the However... comment... However... if you still want the tf/idf and length norm to be a factor, but you just want to change the penalty of not matching all terms to be much higher (which doesn't garuntee a strict ordering, but biases things so much it's unlikely to ever be a factor) you could also play arround with a a custom implemntation of the coord factor in the similarity... http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html#coord%28int,%20int%29 : : I want to have a strick enforcement that In case of a 3 word search, : those : : results that match all 3 term should be presented ahead of those that : match : : 2 terms when I set mm=2. : : : : I have seen quite some cases where, those results that match 2 out of 3 : : words appear ahead of those matching all 3 words. : : which can happen because of tf/idf and length normalization. : : if you disable all of those things for hte fields you : search on (omitNorms=true omitTf=true) you should see a strict ordering : based on the number of matching clauses. -Hoss
Re: Indexing files using multi-cores - could not fix after many retries
On 22 May 2012 05:12, sudarshan chakravarthy.sudars...@gmail.com wrote: [...] requestHandler name=/update/csv class=solr.CSVRequestHandler startup=lazy / [...] Response: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 400 Unexpected character 'b' (code 98) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1]/title /head body HTTP ERROR 400 pProblem accessing /solr/core0/update/. [...] Looks like your CSV handler is set up at /update/csv whereas you are posting to /update. By default, the handler there expects XML, which is the source of the error. Try posting to /solr/core0/update/csv/ Regards, Gora