Re: Solr 4.0 BETA Replication problems on Tomcat
I opened SOLR-3789. As a workaround you can remove str name=compressioninternal/str from the config and it should work. -- Sami Siren On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote: Hello, I have a very simple setup one master and one slave configured as below, but replication keeps failing with stacktrace as shown below. Note that 3.6 works fine on the same machines so I am thinking that Iam missing something in configuration with regards to solr 4.0...can somebody kindly let me know if Iam missing something ? I am running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any problem with SOLR on glassfish, this is the first time Iam using it on Tomcat On Master requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str str name=commitReserveDuration00:00:10/str /lst /requestHandler On Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://testslave:8080/solr/mycore/replication/str str name=pollInterval00:00:50/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Error 22:44:10WARNING SnapPuller Error in fetching packets java.util.zip.ZipException: unknown compression method at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79) at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 22:44:10SEVERE ReplicationHandler SnapPull failed :org.apache.solr.common.SolrException: Unable to download _3_Lucene40_0.tip completely. Downloaded 0!=170 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
Re: Setting up two cores in solr.xml for Solr 4.0
cores adminPath=/admin/cores core name=core0 instanceDir=core0 / core name=core1 instanceDir=core1 / /cores try the above code snippet , in solr.xml. But it works on Tomcat. On Wed, Sep 5, 2012 at 1:10 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test / I'm pretty sure what you hav above tells solr that core MYCORE_test it should use the instanceDir MYCORE but ignore the dataDir/ in that solrconfig.xml and use the one you specified. This on the other hand... :core name=MYCORE_test instanceDir=MYCORE : property name=dataDir value=MYCORE_test / : /core ...tells solr that the MYCORE_test SolrCore should use the instanceDir MYCORE, and when parsing that solrconfig.xml file it should set the variable ${dataDir} to be MYCORE_test -- but if your solconfig.xml file does not ever refer to the ${dataDir} variable, it would have any effect. so the question becomes -- what does your solrconfig.xml look like? -Hoss -- Regards, Veena. Banglore.
Re: Sorting on mutivalued fields still impossible?
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote: Imagine you have two entries, aardvark and emu in your multiValued field. How should that document sort relative to another doc with camel and zebra? Any heuristic you apply will be wrong for someone else I see two obvious choices here: 1) Sort by the value that is ordered first by the comparator function. Doc1: aardvark, (emu) Doc2: camel, (zebra) This is what Uwe wants to do and it is normally done by preprocessing and collapsing to a single value. It could be implemented with an ordered multi-valued field cache by comparing on the first (or last, in the case of reverse sort) entry for each matching document. 2) Make duplicate entries in the result set, one for each value. Doc1: aardvark, (emu) Doc2: camel, (zebra) Doc1: (aardvark), emu Doc2: (camel), zebra I have a hard time coming up with a real world use case for this. It could be implemented by using a multi-valued field cache as above and putting the same document ID into the sliding window sorter once for each field value. Collapsing this into a single algorithm: Step through all IDs. For each ID, give access to the list of field values and provide a callback for adding one or more (value, ID)-pairs to the sliding windows sorter. Are there some other realistic heuristics that I have missed?
Re: Solr Cloud Implementation with Apache Tomcat
Hi Rafal, I worked with standalone zookeeper, which is starting. But the next step is, I want to configure the zookeeper with my solr cloud using Apache Tomcat. How it is really possible? Can you please tell me the steps, which I have to follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance.. Thanks, Guru -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Cloud Implementation with Apache Tomcat
Set the -DzkHost= property in some Tomcat configuration as per the wiki page and point it to the Zookeeper(s). On Debian systems you can use /etc/default/tomcat6 to configure your properties. -Original message- From:bsargurunathan bsargurunat...@gmail.com Sent: Wed 05-Sep-2012 10:40 To: solr-user@lucene.apache.org Subject: Re: Solr Cloud Implementation with Apache Tomcat Hi Rafal, I worked with standalone zookeeper, which is starting. But the next step is, I want to configure the zookeeper with my solr cloud using Apache Tomcat. How it is really possible? Can you please tell me the steps, which I have to follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance.. Thanks, Guru -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: AW: auto completion search with solr using NGrams in SOLR
Hi, You are trying to use two different approaches at the same time. 1) Remove arr name=last-components strsuggest/str strquery/str /arr from your requestHandler. 2) Execute this query URL : suggest/?q=michael bdf=titledefType=lucene And you will see my point. --- On Wed, 9/5/12, aniljayanti anil.jaya...@gmail.com wrote: From: aniljayanti anil.jaya...@gmail.com Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR To: solr-user@lucene.apache.org Date: Wednesday, September 5, 2012, 7:29 AM Hi, thanks, I m sending my whole configurations in schema and solrconfig.xml files. schema.xml --- fieldType name=edgytext class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ /analyzer /fieldType field name=title type=edgytext indexed=true stored=true / field name=empname type=edgytext indexed=true stored=true / field name=autocomplete_text type=edgytext indexed=true stored=false multiValued=true omitNorms=true omitTermFreqAndPositions=false / copyField source=title dest=autocomplete_text/ copyField source=empname dest=autocomplete_text/ * solrconfig.xml - searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str str name=storeDirsuggest/str str name=fieldautocomplete_text/str bool name=exactMatchFirsttrue/bool float name=threshold0.005/float str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldlowerfilt/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDirspellchecker/str /lst str name=queryAnalyzerFieldTypeedgytext/str /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest startup=lazy lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatefalse/str str name=spellcheck.maxCollations5/str str name=spellcheck.maxCollationTries1000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strsuggest/str strquery/str /arr /requestHandler URL : suggest/?q=michael b - Response : ?xml version=1.0 encoding=UTF-8 ? response lst name=responseHeader int name=status0/int int name=QTime3/int /lst result name=response numFound=0 start=0 / lst name=spellcheck lst name=suggestions lst name=michael int name=numFound10/int int name=startOffset1/int int name=endOffset8/int arr name=suggestion strmichael bully herbig/str strmichael bolton/str strmichael bolton: arias/str strmichael falch/str strmichael holm/str strmichael jackson/str strmichael neale/str strmichael penn/str strmichael salgado/str strmichael w. smith/str /arr /lst lst name=b int name=numFound10/int int name=startOffset9/int int name=endOffset10/int arr name=suggestion strb in the mix - the remixes/str strb2k/str strbackstreet boys/str strbackyard babies/str strbanda maguey/str strbarbra streisand/str strbarry manilow/str strbenny goodman/str strbeny more/str strbeyonce/str /arr /lst str name=collationmichael bully herbig b in the mix - the remixes/str /lst /lst /response -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005490.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Cloud Implementation with Apache Tomcat
Hi Markus, Can you please tell me the exact file name in the tomcat folder? Means where I have to set the properties? I am using Windows machine and I have the Tomcat6. Thanks, Guru -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication lag after cache optimizations
Thanks for all the information. I'm not sure how exactly you are measuring/defining replication lag but if you mean lag in how long until the newly replicated documents are visible in searches That is exactly what I wanted to say. I've attached the cache statistics. If you are interested in, a few more details on our use case : Actually, we have only few hits on Solr (about 2 req/s) but we will quickly have more than 50 req/s. The requests are mainly facet requests. The index counts about 1,5M documents and we plan a size of 15M documents in one year. Best regards, Damien CACHE name:queryResultCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, regenerator=org.apache.solr.search.SolrIndexSearcher$3@3d762027) stats: lookups : 4 hits : 4 hitratio : 1.00 inserts : 0 evictions : 0 size : 1024 warmupTime : 20 cumulative_lookups : 1003454 cumulative_hits : 894365 cumulative_hitratio : 0.89 cumulative_inserts : 120343 cumulative_evictions : 0 name:fieldCache class: org.apache.solr.search.SolrFieldCacheMBean insanity_count : 0 name:documentCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, regenerator=null) stats: lookups : 80 hits : 60 hitratio : 0.75 inserts : 20 evictions : 0 size : 20 warmupTime : 0 cumulative_lookups : 10844723 cumulative_hits : 8318341 cumulative_hitratio : 0.76 cumulative_inserts : 2526382 cumulative_evictions : 0 name:fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=16384, initialSize=16384, minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, regenerator=org.apache.solr.search.SolrIndexSearcher$1@38bdc9b3) stats: lookups : 2 hits : 2 hitratio : 1.00 inserts : 0 evictions : 0 size : 1 warmupTime : 1369 cumulative_lookups : 485281 cumulative_hits : 485276 cumulative_hitratio : 0.99 cumulative_inserts : 2 cumulative_evictions : 0 item_tags : {field=tags,memSize=5804302,tindexSize=36148,time=1369,phase1=1357,nTerms=118241,bigTerms=0,termInstances=448772,uses=2} name:filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, regenerator=org.apache.solr.search.SolrIndexSearcher$2@340523df) stats: lookups : 21 hits : 21 hitratio : 1.00 inserts : 0 evictions : 0 size : 1024 warmupTime : 1305 cumulative_lookups : 5956615 cumulative_hits : 5868136 cumulative_hitratio : 0.98 cumulative_inserts : 88479 cumulative_evictions : 0
Re: AW: AW: auto completion search with solr using NGrams in SOLR
HI, Thanks, i want to search with title and empname both. for example when we use any search engine like google,yahoo... we donot specify any type that is (name or title or song...). Here (*suggest/?q=michael bdf=titledefType=lucene*) we are specifying the title type search. I removed said configurations in solrconfig.xml file, got result like below. lst name=spellcheck lst name=suggestions lst name=michael int name=numFound10/int int name=startOffset1/int int name=endOffset8/int arr name=suggestion strmichael/str strmichael/str strmichael /str strmichael j/str strmichael ja/str strmichael jac/str strmichael jack/str strmichael jacks/str strmichael jackso/str strmichael jackson/str /arr /lst lst name=b int name=numFound10/int int name=startOffset9/int int name=endOffset10/int arr name=suggestion strb/str strb/str strba/str strbab/str strbar/str strbarb/str strbe/str strben/str strbi/str strbl/str /arr /lst str name=collationmichael b/str /lst /lst I sent my schema and solrconfig xml file configurations. Please check. Aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005545.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud partitioning
Hi, At the moment, partitioning with solrcloud is hash based on uniqueid. What I'd like to do is have custom partitioning, e.g. based on date (shard_MMYY). I'm aware of https://issues.apache.org/jira/browse/SOLR-2592, but after a cursory look it seems that with the latest patch, one might end up with multiple partitions in the same shard, perhaps all (e.g. if 2 or more partition hash values end up in the same range), which I'd not want. Has anyone else implemented custom shard partitioning for solrcloud ? I think the answer is to have the partition class itself pluggable (default to hash of unique_key as now), but not sure how to pass the solrConfig pluggable partition class through to ClusterState (which is in solrj not core)? any advice? Cheers, Dan
Re: Setting up two cores in solr.xml for Solr 4.0
I don't think I changed by solrconfig.xml file from the default that was provided in the example folder for solr 4.0. On Tue, Sep 4, 2012 at 3:40 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test / I'm pretty sure what you hav above tells solr that core MYCORE_test it should use the instanceDir MYCORE but ignore the dataDir/ in that solrconfig.xml and use the one you specified. This on the other hand... :core name=MYCORE_test instanceDir=MYCORE : property name=dataDir value=MYCORE_test / : /core ...tells solr that the MYCORE_test SolrCore should use the instanceDir MYCORE, and when parsing that solrconfig.xml file it should set the variable ${dataDir} to be MYCORE_test -- but if your solconfig.xml file does not ever refer to the ${dataDir} variable, it would have any effect. so the question becomes -- what does your solrconfig.xml look like? -Hoss
Re: AW: AW: auto completion search with solr using NGrams in SOLR
i want to search with title and empname both. I know, I give that URL just to get the idea here. If you try suggest/?q=michael bdf=titledefType=lucenefl=title you will see that your interested will in results section not lst name=spellcheck section. or title or song...). Here (*suggest/?q=michael bdf=titledefType=lucene*) we are specifying the title type search. q=title:michael b OR empname:michael bfl=title,empname would the trick. I removed said configurations in solrconfig.xml file, got result like below. If you removed it, then there shouldn't be spellcheck response. And you are still looking results in the wrong place.
Still see document after delete with commit in solr 4.0
I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete statement used to work, but now it doesn't seem to be deleting. I've been experimenting around, and it seems like this should be the URL for deleting the document with the uri of network_24. In a browser, I first go here: http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true I get this response: response lst name=responseHeader int name=status0/int int name=QTime5/int /lst /response And this is in the log file: (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} (timestamp) org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@646dd60e main (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. S(timestamp) org.apache.solr.core.SolrCore registerSearcher INFO: [MYCORE] Registered new searcher Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [MYCORE] webapp=/solr path=/update params={commit=truestream.body=deletequeryuri:network_24/query/delete} {deleteByQuery=uri:network_24,commit=} 0 5 But if I then go to this URL: http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml I get this response: response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=wtxml/str str name=quri:network_24/str /lst /lst result name=response numFound=1 start=0 doc str name=namenetwork24/str str name=urinetwork_24/str /doc /result /response Why didn't that document disappear?
Re: Still see document after delete with commit in solr 4.0
Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 -- Jack Krupansky -Original Message- From: Paul Sent: Wednesday, September 05, 2012 10:05 AM To: solr-user Subject: Still see document after delete with commit in solr 4.0 I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete statement used to work, but now it doesn't seem to be deleting. I've been experimenting around, and it seems like this should be the URL for deleting the document with the uri of network_24. In a browser, I first go here: http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true I get this response: response lst name=responseHeader int name=status0/int int name=QTime5/int /lst /response And this is in the log file: (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} (timestamp) org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@646dd60e main (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. S(timestamp) org.apache.solr.core.SolrCore registerSearcher INFO: [MYCORE] Registered new searcher Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [MYCORE] webapp=/solr path=/update params={commit=truestream.body=deletequeryuri:network_24/query/delete} {deleteByQuery=uri:network_24,commit=} 0 5 But if I then go to this URL: http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml I get this response: response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=wtxml/str str name=quri:network_24/str /lst /lst result name=response numFound=1 start=0 doc str name=namenetwork24/str str name=urinetwork_24/str /doc /result /response Why didn't that document disappear?
RE: exception in highlighter when using phrase search
I think I found the cause for this. It is partially my fault, because I sent solr a field with empty value, but this is also a configuration problem. https://issues.apache.org/jira/browse/SOLR-3792 -Original Message- From: Yoni Amir [mailto:yoni.a...@actimize.com] Sent: Tuesday, September 04, 2012 3:53 PM To: solr-user@lucene.apache.org Subject: exception in highlighter when using phrase search I got this problem with solr 4 beta and the highlighting component. When I search for a phrase, such as foo bar, everything works ok. When I add highlighting, I get this exception below. You can see according to the first log line that I am searching only one field (all_text), but what is not visible in the log is that I am highlighting on all fields in the document, with hl.requireFieldMatch=false and hl.fl=*. INFO (SolrCore.java:1670) - [rcmCore] webapp=/solr path=/select params={fq={!edismax}module:Alerts+and+bu:abcd+Region1qf=attachmentqf=all_textversion=2rows=20wt=javabinstart=0q=foo bar} hits=103 status=500 QTime=38 ERROR (SolrException.java:104) - null:java.lang.NullPointerException at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:191) at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:152) at org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:209) at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50) at org.apache.lucene.analysis.miscellaneous.RemoveDuplicatesTokenFilter.incrementToken(RemoveDuplicatesTokenFilter.java:54) at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) at org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:629) at org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:78) at org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:50) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:225) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:736) Any idea? Thanks, Yoni
Re: Solr 4.0 BETA Replication problems on Tomcat
Wow, That was quick. Thank you very much Mr. Siren. I shall remove the compression node in the solrconfig.xml and let you know how it went. Thanks, Ravi Kiran Bhaskar On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren ssi...@gmail.com wrote: I opened SOLR-3789. As a workaround you can remove str name=compressioninternal/str from the config and it should work. -- Sami Siren On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote: Hello, I have a very simple setup one master and one slave configured as below, but replication keeps failing with stacktrace as shown below. Note that 3.6 works fine on the same machines so I am thinking that Iam missing something in configuration with regards to solr 4.0...can somebody kindly let me know if Iam missing something ? I am running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any problem with SOLR on glassfish, this is the first time Iam using it on Tomcat On Master requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str str name=commitReserveDuration00:00:10/str /lst /requestHandler On Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://testslave:8080/solr/mycore/replication/str str name=pollInterval00:00:50/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Error 22:44:10WARNING SnapPuller Error in fetching packets java.util.zip.ZipException: unknown compression method at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79) at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 22:44:10SEVERE ReplicationHandler SnapPull failed :org.apache.solr.common.SolrException: Unable to download _3_Lucene40_0.tip completely. Downloaded 0!=170 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at
Website (crawler for) indexing
This may be a bit off topic: How do you index an existing website and control the data going into index? We already have Java code to process the HTML (or XHTML) and turn it into a SolrJ Document (removing tags and other things we do not want in the index). We use SolrJ for indexing. So I guess the question is essentially which Java crawler could be useful. We used to use wget on command line in our publishing process, but we do no longer want to do that. Thanks, Alexander
RE: Website (crawler for) indexing
Please take a look at the Apache Nutch project. http://nutch.apache.org/ -Original message- From:Lochschmied, Alexander alexander.lochschm...@vishay.com Sent: Wed 05-Sep-2012 17:09 To: solr-user@lucene.apache.org Subject: Website (crawler for) indexing This may be a bit off topic: How do you index an existing website and control the data going into index? We already have Java code to process the HTML (or XHTML) and turn it into a SolrJ Document (removing tags and other things we do not want in the index). We use SolrJ for indexing. So I guess the question is essentially which Java crawler could be useful. We used to use wget on command line in our publishing process, but we do no longer want to do that. Thanks, Alexander
Re: Website (crawler for) indexing
Hello! You can implement your own crawler using Droids (http://incubator.apache.org/droids/) or use Apache Nutch (http://nutch.apache.org/), which is very easy to integrate with Solr and is very powerful crawler. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch This may be a bit off topic: How do you index an existing website and control the data going into index? We already have Java code to process the HTML (or XHTML) and turn it into a SolrJ Document (removing tags and other things we do not want in the index). We use SolrJ for indexing. So I guess the question is essentially which Java crawler could be useful. We used to use wget on command line in our publishing process, but we do no longer want to do that. Thanks, Alexander
Re: Delete all documents in the index
Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your pre-4.0 schema.xml. -- Jack Krupansky -Original Message- From: Rohit Harchandani Sent: Wednesday, September 05, 2012 12:48 PM To: solr-user@lucene.apache.org Subject: Delete all documents in the index Hi, I am having difficulty deleting documents from the index using curl. The urls i tried were: curl http://localhost:9020/solr/core1/update/?stream.body= deletequery*:*/query/deletecommit=true curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequeryid:[* TO *]/query/delete' curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' I also tried: curl http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true as suggested on some forums. I get a response with status=0 in all cases, but none of the above seem to work. When I run curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml; I still get a value for numFound. I am currently using solr 4.0 beta version. Thanks for your help in advance. Regards, Rohit
Re: Delete all documents in the index
Rohit: If it's easy, the easiest thing to do is to turn off your servlet container, rm -r * inside of the data directory, and then restart the container. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com wrote: Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your pre-4.0 schema.xml. -- Jack Krupansky -Original Message- From: Rohit Harchandani Sent: Wednesday, September 05, 2012 12:48 PM To: solr-user@lucene.apache.org Subject: Delete all documents in the index Hi, I am having difficulty deleting documents from the index using curl. The urls i tried were: curl http://localhost:9020/solr/core1/update/?stream.body= deletequery*:*/query/deletecommit=true curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequeryid:[* TO *]/query/delete' curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' I also tried: curl http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true as suggested on some forums. I get a response with status=0 in all cases, but none of the above seem to work. When I run curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml; I still get a value for numFound. I am currently using solr 4.0 beta version. Thanks for your help in advance. Regards, Rohit
Solr index on Amazon S3
Hi, We currently share a single solr read index on an nfs accessed by various solr instances from various devices which gives us a high performant cluster framework. We would like to migrate to Amazon or other cloud. Is there any way (compatibility) to have solr index on Amazon S3 file cloud system, so that we could access a single index form various solr as we currently do ? Thanks for helping !
EdgeNgramTokenFilter and positions
In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at sequential positions. This seems wrong, because an n-gram is associated with a source token at a specific position. It also really messes up phrase matches. With the source text fleen, these positions and tokens are generated: 1,fl 2,fle 3,flee 4,fleen Is this a known bug? Fixed? I'm running 3.3. wunder -- Walter Underwood Search Guy wun...@chegg.commailto:wun...@chegg.com
Re: Delete all documents in the index
Thanks everyone. Adding the _version_ field in the schema worked. Deleting the data directory works for me, but was not sure why deleting using curl was not working. On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Rohit: If it's easy, the easiest thing to do is to turn off your servlet container, rm -r * inside of the data directory, and then restart the container. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com wrote: Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your pre-4.0 schema.xml. -- Jack Krupansky -Original Message- From: Rohit Harchandani Sent: Wednesday, September 05, 2012 12:48 PM To: solr-user@lucene.apache.org Subject: Delete all documents in the index Hi, I am having difficulty deleting documents from the index using curl. The urls i tried were: curl http://localhost:9020/solr/core1/update/?stream.body= deletequery*:*/query/deletecommit=true curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequeryid:[* TO *]/query/delete' curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' I also tried: curl http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true as suggested on some forums. I get a response with status=0 in all cases, but none of the above seem to work. When I run curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml; I still get a value for numFound. I am currently using solr 4.0 beta version. Thanks for your help in advance. Regards, Rohit
Re: Solr index on Amazon S3
Amazon doesn't have a prebuilt network filesystem that's mountable on multiple hosts out of the box. The closest thing would be setting up NFS among your hosts yourself, but at that point it'd probably be easier to set up Solr replication. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Sep 5, 2012 at 1:26 PM, Nicolas de Saint-Aubert dsanico...@gmail.com wrote: Hi, We currently share a single solr read index on an nfs accessed by various solr instances from various devices which gives us a high performant cluster framework. We would like to migrate to Amazon or other cloud. Is there any way (compatibility) to have solr index on Amazon S3 file cloud system, so that we could access a single index form various solr as we currently do ? Thanks for helping !
Re: Still see document after delete with commit in solr 4.0
That was exactly it. I added the following line to schema.xml and it now works. field name=_version_ type=long indexed=true stored=true/ On Wed, Sep 5, 2012 at 10:13 AM, Jack Krupansky j...@basetechnology.com wrote: Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 -- Jack Krupansky -Original Message- From: Paul Sent: Wednesday, September 05, 2012 10:05 AM To: solr-user Subject: Still see document after delete with commit in solr 4.0 I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete statement used to work, but now it doesn't seem to be deleting. I've been experimenting around, and it seems like this should be the URL for deleting the document with the uri of network_24. In a browser, I first go here: http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true I get this response: response lst name=responseHeader int name=status0/int int name=QTime5/int /lst /response And this is in the log file: (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} (timestamp) org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@646dd60e main (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. S(timestamp) org.apache.solr.core.SolrCore registerSearcher INFO: [MYCORE] Registered new searcher Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [MYCORE] webapp=/solr path=/update params={commit=truestream.body=deletequeryuri:network_24/query/delete} {deleteByQuery=uri:network_24,commit=} 0 5 But if I then go to this URL: http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml I get this response: response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=wtxml/str str name=quri:network_24/str /lst /lst result name=response numFound=1 start=0 doc str name=namenetwork24/str str name=urinetwork_24/str /doc /result /response Why didn't that document disappear?
Re: Solr index on Amazon S3
Nicolas - Can you elaborate on your use and configuration of Solr on NFS?What lock factory are you using? (you had to change from the default, right?) And how are you coordinating updates/commits to the other servers? Where does indexing occur and then how are commits sent to the NFS mounted servers? Thanks for sharing anything you can about this. Erik On Sep 5, 2012, at 13:26 , Nicolas de Saint-Aubert wrote: Hi, We currently share a single solr read index on an nfs accessed by various solr instances from various devices which gives us a high performant cluster framework. We would like to migrate to Amazon or other cloud. Is there any way (compatibility) to have solr index on Amazon S3 file cloud system, so that we could access a single index form various solr as we currently do ? Thanks for helping !
Re: Still see document after delete with commit in solr 4.0
: That was exactly it. I added the following line to schema.xml and it now works. : : field name=_version_ type=long indexed=true stored=true/ Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5 -- did you throw out your old solrconfig.xml and use the example solrconfig.xml from 4.0, but keep your 3.5 schema.xml? Do you in fact have an updateLog ... / in your solrconfig.xml? (if so: then this is all known as part of SOLR-3432, and won't affect any users of 4.0-final -- but i want to be absolutely sure there isn't some other edge case of this bug) -Hoss
Re: Setting up two cores in solr.xml for Solr 4.0
: I don't think I changed by solrconfig.xml file from the default that : was provided in the example folder for solr 4.0. ok ... well the Solr 4.0-BETA example solrconfig.xml has this in it... dataDir${solr.data.dir:}/dataDir So if you want to override the dataDir using a property like your second example, it should be something like... core name=MYCORE_test instanceDir=MYCORE property name=solr.data.dir value=MYCORE_test / /core ...the property name used in the solrconfig.xml has to match the property name you use when declaring the core, or it won't get used and you'll get the default behavior. solr.data.dir isn't special here -- you could use any umber of properties in your solrconfig.xml, and declare them when defining your individual cores. that's very differnet from your other example... core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test / ...which doesn't use properties at all, and says this is waht the dataDir should be, regardless of what the dataDir.../dataDir looks like in the solrconfig.xml (at least: i'm pretty sure that's how it works) -Hoss
Re: Still see document after delete with commit in solr 4.0
Actually, I didn't technically upgrade. I downloaded the new version, grabbed the example, and pasted in the fields from my schema into the new one. So the only two files I changed from the example are schema.xml and solr.xml. Then I reindexed everything from scratch so there was no old index involved, either. On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : That was exactly it. I added the following line to schema.xml and it now works. : : field name=_version_ type=long indexed=true stored=true/ Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5 -- did you throw out your old solrconfig.xml and use the example solrconfig.xml from 4.0, but keep your 3.5 schema.xml? Do you in fact have an updateLog ... / in your solrconfig.xml? (if so: then this is all known as part of SOLR-3432, and won't affect any users of 4.0-final -- but i want to be absolutely sure there isn't some other edge case of this bug) -Hoss
Re: EdgeNgramTokenFilter and positions
I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6 and 4.0-BETA in Solr admin analysis. Interestingly, the screen shot for LUCENE-3642 does in fact show the (improperly) incremented positions for successive ngrams. See: https://issues.apache.org/jira/browse/LUCENE-3642 I'm surprised that nobody noticed the bogus positions back then. Technically, this is a Lucene issue. -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Wednesday, September 05, 2012 1:51 PM To: solr-user@lucene.apache.org Subject: EdgeNgramTokenFilter and positions In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at sequential positions. This seems wrong, because an n-gram is associated with a source token at a specific position. It also really messes up phrase matches. With the source text fleen, these positions and tokens are generated: 1,fl 2,fle 3,flee 4,fleen Is this a known bug? Fixed? I'm running 3.3. wunder -- Walter Underwood Search Guy wun...@chegg.commailto:wun...@chegg.com
Re: Still see document after delete with commit in solr 4.0
And when you pasted your 3.5 fields into the 4.0 schema, did you delete the existing fields (including _version_) at the same time? -- Jack Krupansky -Original Message- From: Paul Sent: Wednesday, September 05, 2012 4:32 PM To: solr-user@lucene.apache.org Subject: Re: Still see document after delete with commit in solr 4.0 Actually, I didn't technically upgrade. I downloaded the new version, grabbed the example, and pasted in the fields from my schema into the new one. So the only two files I changed from the example are schema.xml and solr.xml. Then I reindexed everything from scratch so there was no old index involved, either. On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : That was exactly it. I added the following line to schema.xml and it now works. : : field name=_version_ type=long indexed=true stored=true/ Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5 -- did you throw out your old solrconfig.xml and use the example solrconfig.xml from 4.0, but keep your 3.5 schema.xml? Do you in fact have an updateLog ... / in your solrconfig.xml? (if so: then this is all known as part of SOLR-3432, and won't affect any users of 4.0-final -- but i want to be absolutely sure there isn't some other edge case of this bug) -Hoss
Re: Still see document after delete with commit in solr 4.0
: Actually, I didn't technically upgrade. I downloaded the new : version, grabbed the example, and pasted in the fields from my schema : into the new one. So the only two files I changed from the example are : schema.xml and solr.xml. ok -- so with the fix for SOLR-3432, anyone who tries similar steps with 4.0-final will get a clear error on startup -- that was my main concern. thanks for clarifying. -Hoss
Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0
: Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0 Günter, This is definitely strange The good news is, i can reproduce your problem. The bad news is, i can reproduce your problem - and i have no idea what's causing it. I've opened SOLR-3793 to try to get to the bottom of this, and included some basic steps to demonstrate the bug using the Solr 4.0-BETA example data, but i'm really not sure what the problem might be... https://issues.apache.org/jira/browse/SOLR-3793 -Hoss
Re: Solr 4.0 BETA Replication problems on Tomcat
The replication finally worked after I removed the compression setting from the solrconfig.xml on the slave. Thanks for providing the workaround. Ravi Kiran On Wed, Sep 5, 2012 at 10:23 AM, Ravi Solr ravis...@gmail.com wrote: Wow, That was quick. Thank you very much Mr. Siren. I shall remove the compression node in the solrconfig.xml and let you know how it went. Thanks, Ravi Kiran Bhaskar On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren ssi...@gmail.com wrote: I opened SOLR-3789. As a workaround you can remove str name=compressioninternal/str from the config and it should work. -- Sami Siren On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote: Hello, I have a very simple setup one master and one slave configured as below, but replication keeps failing with stacktrace as shown below. Note that 3.6 works fine on the same machines so I am thinking that Iam missing something in configuration with regards to solr 4.0...can somebody kindly let me know if Iam missing something ? I am running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any problem with SOLR on glassfish, this is the first time Iam using it on Tomcat On Master requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str str name=commitReserveDuration00:00:10/str /lst /requestHandler On Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://testslave:8080/solr/mycore/replication/str str name=pollInterval00:00:50/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Error 22:44:10WARNING SnapPuller Error in fetching packets java.util.zip.ZipException: unknown compression method at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79) at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 22:44:10SEVERE ReplicationHandler SnapPull failed :org.apache.solr.common.SolrException: Unable to download _3_Lucene40_0.tip completely. Downloaded 0!=170 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
Duplicates in the suggester.
Not sure whether it is a duplicate question. Did try to browse through the archive and did not find anything specific to what I was looking for. I see duplicates in the dictionary if I update the document concurrently. I am using Solr 3.6.1 with the following configurations for suggester: Solr Config: searchComponent name=suggest class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_auto_suggest/str lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldname_auto/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=components strsuggest/str /arr /requestHandler Schema: fieldType name=text_auto_suggest class=solr.TextField omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / !-- tokenizer class=solr.KeywordTokenizerFactory / -- !-- filter class=solr.LowerCaseFilterFactory / -- filter class=solr.ClassicFilterFactory / !-- filter class=solr.LengthFilterFactory min=2 / -- /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.ClassicFilterFactory / !-- filter class=solr.LengthFilterFactory min=2 / -- /analyzer /fieldType field name=name_auto type=text_auto_suggest indexed=true stored=true multiValued=false / Example text I would be indexing for suggester: foo_bar %|4%|1%|food %| - used as a combiner, Part 1: foo_bar, Name of the entity Part 2: number of activities(application specific) on the entity. Part 3: id of the document. Part 4: food, category of the entity. As I mentioned earlier, I saw duplicates in the spellcheck index documents when I updated the concurrently. arr name=suggestion strfoo_bar %|4%|1%|food/str strfoo_bar %|1%|1%|food/str strfoo_bar %|2%|1%|food/str strfoo_bar %|3%|1%|food/str /arr I do not see duplicates when I update the documents sequentially. I have a strong doubt this is happening because of the way I am combining multiple fields using %|. Would appreciate if somebody could suggest any suitable changes that would help me with this issue. -- Thanks, Sharath
Re: Delete all documents in the index
Thanks for posting this! I ran into exactly this issue yesterday, and ended up felting the files to get around it. Mark Sent from my mobile doohickey. On Sep 6, 2012 4:13 AM, Rohit Harchandani rhar...@gmail.com wrote: Thanks everyone. Adding the _version_ field in the schema worked. Deleting the data directory works for me, but was not sure why deleting using curl was not working. On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Rohit: If it's easy, the easiest thing to do is to turn off your servlet container, rm -r * inside of the data directory, and then restart the container. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com wrote: Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your pre-4.0 schema.xml. -- Jack Krupansky -Original Message- From: Rohit Harchandani Sent: Wednesday, September 05, 2012 12:48 PM To: solr-user@lucene.apache.org Subject: Delete all documents in the index Hi, I am having difficulty deleting documents from the index using curl. The urls i tried were: curl http://localhost:9020/solr/core1/update/?stream.body= deletequery*:*/query/deletecommit=true curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequeryid:[* TO *]/query/delete' curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' I also tried: curl http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true as suggested on some forums. I get a response with status=0 in all cases, but none of the above seem to work. When I run curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml; I still get a value for numFound. I am currently using solr 4.0 beta version. Thanks for your help in advance. Regards, Rohit
Solr not allowing persistent HTTP connections
Hi, Running example Solr from the 3.6.1 distribution I can not make it to keep persistent HTTP connections: $ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep Keep-Alive Keep-Alive requests:0 What should I change to fix that? P.S. We have the same issue in production with Jetty 7, but I thought it would be better to ask about Solr example, since it is easier for anyone to reproduce the issue. -- Aleksey
Re: Solr not allowing persistent HTTP connections
Some extra information. If I use curl and force it to use HTTP 1.0, it is more visible that Solr doesn't allow persistent connections: $ curl -v -0 'http://localhost:8983/solr/select?q=*:*' -H'Connection: Keep-Alive'* About to connect() to localhost port 8983 (#0) * Trying ::1... connected GET /solr/select?q=*:* HTTP/1.0 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: localhost:8983 Accept: */* Connection: Keep-Alive HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 * no chunk, no close, no size. Assume close to signal end ?xml version=1.0 encoding=UTF-8? response ...removed the rest of the response body... -- Aleksey On 12-09-05 03:54 PM, Aleksey Vorona wrote: Hi, Running example Solr from the 3.6.1 distribution I can not make it to keep persistent HTTP connections: $ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep Keep-Alive Keep-Alive requests:0 What should I change to fix that? P.S. We have the same issue in production with Jetty 7, but I thought it would be better to ask about Solr example, since it is easier for anyone to reproduce the issue. -- Aleksey
Re: Problem with verifying signature ?
: I download solr 4.0 beta and the .asc file. I use gpg4win and type this in : the command line: : : gpg --verify file.zip file.asc : : I get a message like this: : : *gpg: Can't check signature: No public key* you can verify the asc sig file using the public KEYS file hosted on the main apache download site (do not trust asc or KEYS from a download mirror, that defeats the point) https://www.apache.org/dist/lucene/solr/KEYS -Hoss
deletedPkQuery not work in solr 3.3
I have a data-config.xml with 2 entity, like entity name=full PK=ID ... ... /entity and entity name=delta_build PK=ID ... ... /entity entity delta_build is for delta import, query is ?command=full-importentity=delta_buildclean=false and I want to using deletedPkQuery to delete index. So I have add those to entity delta_build deltaQuery=select -1 as ID from dual deltaImportQuery=select * from product where a.id='${dataimporter.delta.ID}' deletedPKQuery=select product_id as ID from modified_product where gmt_create gt; to_date('${dataimporter.last_index_time}','-mm-dd hh24:mi:ss') and modification = 'deleted' deltaQuery and deltaImportQuery is simply to avoid delta import any records, course delta import has been implement by full import. and I am just want using delta for delete index. But when I hit query ?command=delta-import deltaQuery and deltaImportQuery can be found in log, and without deletedPKQuery. Is there any thing wrong in config file? -- from Jun Wang
Re: Problem with verifying signature ?
Thank you Hoss. I imported the KEYS file using *gpg --import KEYS.txt*. Then I did the *--verify* again. This time I get an output like this: gpg: Signature made 08/06/12 19:52:21 Pacific Daylight Time using RSA key ID 322 D7ECA gpg: Good signature from Robert Muir (Code Signing Key) rm...@apache.org *gpg: WARNING: This key is not certified with a trusted signature!* gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 6661 9BA3 C030 DD55 3625 1303 817A E1DD 322D 7ECA Is this acceptable ? Thanks On Wed, Sep 5, 2012 at 5:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I download solr 4.0 beta and the .asc file. I use gpg4win and type this in : the command line: : : gpg --verify file.zip file.asc : : I get a message like this: : : *gpg: Can't check signature: No public key* you can verify the asc sig file using the public KEYS file hosted on the main apache download site (do not trust asc or KEYS from a download mirror, that defeats the point) https://www.apache.org/dist/lucene/solr/KEYS -Hoss
Re: Searching of Chinese characters and English
Any thoughts? It is weird, i can see the words are cutting correctly in Field Analysis. I checked almost every website that they are telling either CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the words are cutting then it should not be the problem of settings of different Analyzer. Am I correct? Anyone have an idea or hints? Thanks so much Wayne On 4/9/2012 13:03, waynelam wrote: Hi all, I tried to modified the schema.xml and solrconfig.xml come with Drupal search_api_solr modules. I tried to modified it so that it is suitable for an CJK environment. I can see Chinese words cut up each 2 words in Field Analysis. If i use the following query my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title I can see it returning results. The problem is when i change the search keywords for one of my field (e.g. t_title) to Chinese characters. It always shows result name=response numFound=0 start=0/ in the results. It is strange because if a title contains both chinese and english (e.g. testing ??), when i search just the english part (e.g. fq=t_title:testing), i can find the result perfectly. It just happened to be problem when searching chinese characters. Much appreciated if you guys can show me which part i did wrong. Thanks Wayne *My Settings:* Java : 1.6.0_24 Solr : version 3.6.1 tomcat: version 6.0.35 *My schema.xml* (i highlighted the place i changed from default) *fieldType name=text class=solr.TextField indexed=true stored=true multiValued=true** ** analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer** **tokenizer class=org.apache.lucene.analysis.cjk.CJKTokenizer/** **filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/** **filter class=solr.LowerCaseFilterFactory/** **filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/** **filter class=solr.RemoveDuplicatesTokenFilterFactory/** **filter class=schema.UnicodeNormalizationFilterFactory version=icu4j composed=false remove_diacritics=true remove_modifiers=true fold=true/** **filter class=solr.ISOLatin1AccentFilterFactory/** ** /analyzer** ** analyzer type=query class=org.apache.lucene.analysis.cjk.CJKAnalyzer** **tokenizer class=org.apache.lucene.analysis.cjk.CJKTokenizer/** **filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/** **filter class=solr.LowerCaseFilterFactory/** **filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/** **filter class=solr.RemoveDuplicatesTokenFilterFactory/** **filter class=schema.UnicodeNormalizationFilterFactory version=icu4j composed=false remove_diacritics=true remove_modifiers=true fold=true/** **filter class=solr.ISOLatin1AccentFilterFactory/** ** /analyzer** **/fieldType* fieldType name=sortString class=solr.TextField indexed=true stored=true sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldType name=rand class=solr.RandomSortField indexed=true / fieldtype name=ignored stored=true indexed=false class=solr.StrField / /types fields field name=id type=string indexed=true stored=true required=true / field name=item_id type=string indexed=true stored=true required=true / field name=index_id type=string indexed=true stored=true required=true / copyField source=item_id dest=ss_search_api_id / field name=spell type=textSpell indexed=true stored=true multiValued=true/ copyField source=t_* dest=spell/ *field name=t_title type=text indexed=true stored=true autoGeneratePhraseQueries=false/* dynamicField name=t_* type=text termVectors=true / dynamicField name=ss_* type=sortString multiValued=false termVectors=true / dynamicField name=sm_* type=sortString multiValued=true termVectors=true / dynamicField name=is_* type=tlong multiValued=false termVectors=true / dynamicField name=im_* type=long multiValued=true termVectors=true / dynamicField name=fs_* type=tdouble multiValued=false termVectors=true / dynamicField name=fm_* type=tdouble multiValued=true termVectors=true / dynamicField name=ds_* type=tdate multiValued=false termVectors=true / dynamicField name=dm_* type=tdate multiValued=true termVectors=true / dynamicField name=bs_* type=boolean multiValued=false termVectors=true / dynamicField name=bm_* type=boolean multiValued=true termVectors=true / dynamicField name=f_ss_* type=string multiValued=false
Re: Document Processing
There is another way to do this: crawl the mobile site! The Fennec browser from Mozilla talks Android. I often use it to get pagecrap off my screen. - Original Message - | From: Lance Norskog goks...@gmail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, August 29, 2012 7:37:37 PM | Subject: Re: Document Processing | | I've seen the JSoup HTML parser library used for this. It worked | really well. The Boilerpipe library may be what you want. Its | schwerpunkt (*) is to separate boilerplate from wanted text in an | HTML | page. I don't know what fine-grained control it has. | | * raison d'être. There is no English word for this concept. | | On Tue, Dec 6, 2011 at 1:39 PM, Tommaso Teofili | tommaso.teof...@gmail.com wrote: | Hello Michael, | | I can help you with using the UIMA UpdateRequestProcessor [1]; the | current | implementation uses in-memory execution of UIMA pipelines but since | I was | planning to add the support for higher scalability (with UIMA-AS | [2]) that | may help you as well. | | Tommaso | | [1] : | http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java | [2] : http://uima.apache.org/doc-uimaas-what.html | | 2011/12/5 Michael Kelleher mj.kelle...@gmail.com | | Hello Erik, | | I will take a look at both: | | org.apache.solr.update.**processor.**LangDetectLanguageIdentifierUp** | dateProcessor | | and | | org.apache.solr.update.**processor.**TikaLanguageIdentifierUpdatePr** | ocessor | | | and figure out what I need to extend to handle processing in the | way I am | looking for. I am assuming that component configuration is | handled in a | standard way such that I can configure my new UpdateProcessor in | the same | way I would configure any other UpdateProcessor component? | | Thanks for the suggestion. | | | 1 more question: given that I am probably going to convert the | HTML to | XML so I can use XPath expressions to extract my content, do you | think | that this kind of processing will overload Solr? This Solr | instance will | be used solely for indexing, and will only ever have a single | ManifoldCF | crawling job feeding it documents at one time. | | --mike | | | | | -- | Lance Norskog | goks...@gmail.com |
Re: Searching of Chinese characters and English
I believe that you should remove the Analyzer class name from the field type. I think it overrides the stacks of tokenizer/tokenfilter. Other fieldType declarations do not have an Analyzer class and Tokenizers. analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer should be: analyzer type=index This may not help with your searching problem. - Original Message - | From: waynelam wayne...@ln.edu.hk | To: solr-user@lucene.apache.org | Sent: Wednesday, September 5, 2012 8:07:36 PM | Subject: Re: Searching of Chinese characters and English | | Any thoughts? | | It is weird, i can see the words are cutting correctly in Field | Analysis. I checked almost every website that they are telling either | CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the | words are cutting then it should not be the problem of settings of | different Analyzer. Am I correct? | | Anyone have an idea or hints? | | Thanks so much | | Wayne | | | | On 4/9/2012 13:03, waynelam wrote: | Hi all, | | I tried to modified the schema.xml and solrconfig.xml come with | Drupal | search_api_solr modules. I tried to modified it so that it is | suitable for an CJK environment. I can see Chinese words cut up | each 2 | words in Field Analysis. If i use the following query | | my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title | | | I can see it returning results. The problem is when i change the | search keywords for one of my field (e.g. t_title) to Chinese | characters. It always shows | | result name=response numFound=0 start=0/ | | in the results. It is strange because if a title contains both | chinese | and english (e.g. testing ??), when i search just the english part | (e.g. fq=t_title:testing), i can find the result perfectly. It | just | happened to be problem when searching chinese characters. | | | Much appreciated if you guys can show me which part i did wrong. | | Thanks | | Wayne | | *My Settings:* | Java : 1.6.0_24 | Solr : version 3.6.1 | tomcat: version 6.0.35 | | *My schema.xml* (i highlighted the place i changed from default) | | *fieldType name=text class=solr.TextField indexed=true | stored=true multiValued=true** | ** analyzer type=index | class=org.apache.lucene.analysis.cjk.CJKAnalyzer** | **tokenizer | class=org.apache.lucene.analysis.cjk.CJKTokenizer/** | **filter class=solr.WordDelimiterFilterFactory | generateWordParts=1 generateNumberParts=1 catenateWords=1 | catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/** | **filter class=solr.LowerCaseFilterFactory/** | **filter class=solr.SnowballPorterFilterFactory | language=English protected=protwords.txt/** | **filter | class=solr.RemoveDuplicatesTokenFilterFactory/** | **filter class=schema.UnicodeNormalizationFilterFactory | version=icu4j composed=false remove_diacritics=true | remove_modifiers=true fold=true/** | **filter class=solr.ISOLatin1AccentFilterFactory/** | ** /analyzer** | ** analyzer type=query | class=org.apache.lucene.analysis.cjk.CJKAnalyzer** | **tokenizer | class=org.apache.lucene.analysis.cjk.CJKTokenizer/** | **filter class=solr.WordDelimiterFilterFactory | generateWordParts=1 generateNumberParts=1 catenateWords=0 | catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/** | **filter class=solr.LowerCaseFilterFactory/** | **filter class=solr.SnowballPorterFilterFactory | language=English protected=protwords.txt/** | **filter | class=solr.RemoveDuplicatesTokenFilterFactory/** | **filter class=schema.UnicodeNormalizationFilterFactory | version=icu4j composed=false remove_diacritics=true | remove_modifiers=true fold=true/** | **filter class=solr.ISOLatin1AccentFilterFactory/** | ** /analyzer** | **/fieldType* | | fieldType name=sortString class=solr.TextField | indexed=true | stored=true sortMissingLast=true omitNorms=true |analyzer | | tokenizer class=solr.KeywordTokenizerFactory/ | | filter class=solr.LowerCaseFilterFactory / | filter class=solr.TrimFilterFactory / |/analyzer | /fieldType | | fieldType name=rand class=solr.RandomSortField | indexed=true / | | fieldtype name=ignored stored=true indexed=false | class=solr.StrField / | /types | fields | | field name=id type=string indexed=true | stored=true | required=true / | field name=item_id type=string indexed=true | stored=true | required=true / | field name=index_id type=string indexed=true | stored=true | required=true / | | copyField source=item_id dest=ss_search_api_id / | field name=spell type=textSpell indexed=true | stored=true | multiValued=true/ | copyField source=t_* dest=spell/ | | *field name=t_title type=text indexed=true stored=true |
Re: Searching of Chinese characters and English
Thank you Lance. I just found out the problem, in case somebody came across this. It turn out to be the problem that tomcat is not accepting UTF-8 in URL by default. http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config I have no idea why it is the case but after i follow the instruction in the document above. Problem solved!! Thanks so much for your help! Wayne On 6/9/2012 11:19, Lance Norskog wrote: I believe that you should remove the Analyzer class name from the field type. I think it overrides the stacks of tokenizer/tokenfilter. Other fieldType declarations do not have an Analyzer class and Tokenizers. analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer should be: analyzer type=index This may not help with your searching problem. - Original Message - | From: waynelam wayne...@ln.edu.hk | To: solr-user@lucene.apache.org | Sent: Wednesday, September 5, 2012 8:07:36 PM | Subject: Re: Searching of Chinese characters and English | | Any thoughts? | | It is weird, i can see the words are cutting correctly in Field | Analysis. I checked almost every website that they are telling either | CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the | words are cutting then it should not be the problem of settings of | different Analyzer. Am I correct? | | Anyone have an idea or hints? | | Thanks so much | | Wayne | | | | On 4/9/2012 13:03, waynelam wrote: | Hi all, | | I tried to modified the schema.xml and solrconfig.xml come with | Drupal | search_api_solr modules. I tried to modified it so that it is | suitable for an CJK environment. I can see Chinese words cut up | each 2 | words in Field Analysis. If i use the following query | | my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title | | | I can see it returning results. The problem is when i change the | search keywords for one of my field (e.g. t_title) to Chinese | characters. It always shows | | result name=response numFound=0 start=0/ | | in the results. It is strange because if a title contains both | chinese | and english (e.g. testing ??), when i search just the english part | (e.g. fq=t_title:testing), i can find the result perfectly. It | just | happened to be problem when searching chinese characters. | | | Much appreciated if you guys can show me which part i did wrong. | | Thanks | | Wayne | | *My Settings:* | Java : 1.6.0_24 | Solr : version 3.6.1 | tomcat: version 6.0.35 | | *My schema.xml* (i highlighted the place i changed from default) | | *fieldType name=text class=solr.TextField indexed=true | stored=true multiValued=true** | ** analyzer type=index | class=org.apache.lucene.analysis.cjk.CJKAnalyzer** | **tokenizer | class=org.apache.lucene.analysis.cjk.CJKTokenizer/** | **filter class=solr.WordDelimiterFilterFactory | generateWordParts=1 generateNumberParts=1 catenateWords=1 | catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/** | **filter class=solr.LowerCaseFilterFactory/** | **filter class=solr.SnowballPorterFilterFactory | language=English protected=protwords.txt/** | **filter | class=solr.RemoveDuplicatesTokenFilterFactory/** | **filter class=schema.UnicodeNormalizationFilterFactory | version=icu4j composed=false remove_diacritics=true | remove_modifiers=true fold=true/** | **filter class=solr.ISOLatin1AccentFilterFactory/** | ** /analyzer** | ** analyzer type=query | class=org.apache.lucene.analysis.cjk.CJKAnalyzer** | **tokenizer | class=org.apache.lucene.analysis.cjk.CJKTokenizer/** | **filter class=solr.WordDelimiterFilterFactory | generateWordParts=1 generateNumberParts=1 catenateWords=0 | catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/** | **filter class=solr.LowerCaseFilterFactory/** | **filter class=solr.SnowballPorterFilterFactory | language=English protected=protwords.txt/** | **filter | class=solr.RemoveDuplicatesTokenFilterFactory/** | **filter class=schema.UnicodeNormalizationFilterFactory | version=icu4j composed=false remove_diacritics=true | remove_modifiers=true fold=true/** | **filter class=solr.ISOLatin1AccentFilterFactory/** | ** /analyzer** | **/fieldType* | | fieldType name=sortString class=solr.TextField | indexed=true | stored=true sortMissingLast=true omitNorms=true |analyzer | | tokenizer class=solr.KeywordTokenizerFactory/ | | filter class=solr.LowerCaseFilterFactory / | filter class=solr.TrimFilterFactory / |/analyzer | /fieldType | | fieldType name=rand class=solr.RandomSortField | indexed=true / | | fieldtype name=ignored stored=true indexed=false | class=solr.StrField / | /types | fields | | field name=id type=string indexed=true | stored=true | required=true / | field name=item_id