Re: alternativeTermCount and WordBreakSolrSpellChecker combination not working
Hi O. Klein, How you sorted the suggestion on frequency? I think, You used str name=comparatorClass freq /str to sort suggestions of frequency. Are you using sharding/multiple servers in Solr because on single node, comparatorClass is working but on multiple servers, it is not working. Please assist me how to sort on frequency on multiple server/shards. On Wed, Feb 11, 2015 at 12:56 AM, O. Klein kl...@octoweb.nl wrote: I did some testing and the order of dictionaries doesn't seem to have an effect. They are sorted by frequency. So if mm was applied holy wood would have a lower frequency and solve this problem. suggestions:[ holywood,{ numFound:4, startOffset:0, endOffset:8, origFreq:4, suggestion:[{ word:holy wood, freq:71828}, { word:hollywood, freq:2669}, { word:holyrood, freq:14}, { word:homewood, freq:737}]}, correctlySpelled,false, collation,(holy wood), collation,hollywood]}} -- View this message in context: http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185461.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working
Hi O. Klein, How you sorted the suggestion on frequency? I think, You used str name=comparatorClass freq /str to sort suggestions of frequency. Are you using sharding/multiple servers in Solr because on single node, comparatorClass is working but on multiple servers, it is not working. Please assist me how to sort on frequency on multiple server/shards. -- View this message in context: http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4186206.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: variaton on boosting recent documents gives exception
Hello Michael, You can always change the type of your sortyear field to an int, or create an int version of it and use copyField to populate it. And using NOW/YEAR will round the current date to the start of the year, you can read more about this in the Javadoc: http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/util/DateMathParser.html You can test it using the example collection: http://localhost:8983/solr/collection1/select?q=*:*boost=recip(ms(NOW/YEAR,manufacturedate_dt),3.16e-11,1,1)fl=id,manufacturedate_dt,score,[explain]defType=edismax and checking the explain field for the numeric value given to NOW/YEAR vs NOW/HOUR, etc. Gonzalo -Original Message- From: Michael Lackhoff [mailto:mich...@lackhoff.de] Sent: Thursday, February 12, 2015 8:57 AM To: solr-user@lucene.apache.org Subject: variaton on boosting recent documents gives exception Since my field to measure recency is not a date field but a string field (with only year-numbers in it), I tried a variation on the suggested boost function for recent documents: recip(sub(2015,min(sortyear,2015)),1,10,10) But this gives an exception when used in a boost or bf parameter. I guess the reason is that all the mathematics doesn't work with a string field even if it only contains numbers. Am I right with this guess? And if so, is there a function I can use to change the type to something numeric? Or are there other problems with my function? Another related question: as you can see the current year (2015) is hard coded. Is there an easy way to get the current year within the function? Messing around with NOW looks very complicated. -Michael
Re: alternativeTermCount and WordBreakSolrSpellChecker combination not working
I am using the default on single node, which is frequency. On the Wiki it says: In case of a distributed request to the SpellCheckComponent, the shards are requested for at least five suggestions even if the spellcheck.count parameter value is less than five. Once the suggestions are collected, they are ranked by the configured distance measure (Levenstein Distance by default) and then by aggregate frequency. So for distributed this is different. Maybe James knows how to get the behavior you are looking for. -- View this message in context: http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4186214.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stopwords in shingles suggester
I found the issue in Jira https://issues.apache.org/jira/browse/SOLR-6468 O. Klein wrote With more and more people starting to use the Suggester it seems that enablePositionIncrements for StopFilterFactory is still needed. Not sure why it is being removed from Solr5, but is there a way to keep the functionality beyond lucene 4.3 ? Or can this feature be reinstated? -- View this message in context: http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057p4186219.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr - Mahout
Sir , I need to know solr have bulletin recommendation (collaborative filtering) is there or their is it possible to add recommender with out using Mahout. i kindly request give a fast replay Your Mohamed Sahad K P
Re: 43sec commit duration - blocked by index merge events?
Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: Multi words query
A couple more things would help debug this. First, could you grab the specific Solr log entry when this query is sent? Also, have you changed the default schema at all? If you're querying string fields you have to exactly match what's indexed there, versus text which gets tokenized. k/r, Scott On Thu, Feb 12, 2015 at 4:22 AM, melb melaggo...@gmail.com wrote: I am using rub gem rsolr and querying simply the collection by this query: response = solr.get 'select', :params = { :q=query, :fl= 'id,title,description,body' :rows=10 } response[response][docs].each{|doc| puts doc[id] } I created a text field to copy all the fields to and the query handler request this field rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625p4185922.html Sent from the Solr - User mailing list archive at Nabble.com. -- Scott Stults | Founder Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: bulk indexing with optimistick lock
This isn't a Solr-specific answer, but the easiest approach might be to just collect the document IDs you're about to add, query for them, and then filter out the ones Solr already has (this'll give you a nice list for later reporting). You'll need to keep your batch sizes below maxBooleanClauses in solrconfig.xml. Overall, this might be simpler to maintain and less prone to bugs. k/r, Scott On Wed, Feb 11, 2015 at 4:59 AM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi All, My server side we are trying to add multiple documents in a list and then ask solr to add them in solr (using solrj client) and then after its finished calling the commit. Now we also want to control concurrency and for that we wanted to use solr's optimistic lock/versioning feature. That is good but *in case of bulk docs add, the solr doesn't perform add docs as expected.* It fails as soon as it finds any doc with optimistic lock failure and return response telling only the first failed doc (adding all docs before that and no docs are added after that). *We require solr to add all docs for which no versioning problem is there and return list of all failed docs. * Please can anyone suggest a way to do this? Regards Sankalp Gupta -- Scott Stults | Founder Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: variaton on boosting recent documents gives exception
Am 13.02.2015 um 11:18 schrieb Gonzalo Rodriguez: You can always change the type of your sortyear field to an int, or create an int version of it and use copyField to populate it. But that would require me to reindex. Would be nice to have some type conversion available within a function query. And using NOW/YEAR will round the current date to the start of the year, you can read more about this in the Javadoc: http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/util/DateMathParser.html You can test it using the example collection: http://localhost:8983/solr/collection1/select?q=*:*boost=recip(ms(NOW/YEAR,manufacturedate_dt),3.16e-11,1,1)fl=id,manufacturedate_dt,score,[explain]defType=edismax and checking the explain field for the numeric value given to NOW/YEAR vs NOW/HOUR, etc. The definition of *_dt fields int the example-schema is 'date' but my field is text or (t)int if I have to reindex. To compare against this int field I need another (comparable) int. ms(NOW/YEAR,manufacturedate_dt) is an int, but a huge one, which is very difficult to bring into a sensible relationship to e.g. '2015'. Your suggestion would only work if I change my year to a date like 2015-01-01T00:00:00Z which is not a sensible format for a publication year and not even easily creatable by copyfield. What I need is a real year number, not a date truncated to the year, which is only accessible as the number of milliseconds since the epoch of Jan, 1st 00:00:00h, which is not very handy. -Michael
Dovecot FTS Solr Error [urgent/serious]
Hi Guys, Serious help requested with dovecot and apache solr. Appreciate if someone could see the solr log outputs and tell me whats going wrong. Problem: Dovecot adm keeps reporting error as shown below:- root@mail:/var/log# doveadm index -u u...@domain.net inbox doveadm(t...@sicl.net): Error: fts_solr: Indexing failed: Server Error The log suggests as follows:- http://pastebin.com/KSvignc9 My system settings: solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56 Physical Memory 98.8% Swap Space 0.0% File Descriptor Count 2.3% JVM-Memory 5.4% I login to my server as follows:- kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net I start my server with following command: :/opt/solr# java -jar start.jar Startup log is showing as follows:- http://pastebin.com/EYVJ06rL It keeps indicating u...@domain.net (This was the user name that was part of a command passed from doveadm as follows):- doveadm index -u u...@domain.net inbox Could someone tell me what is happening in the log(can the apache solr read the request from dovecot correctly or is this some schema problem or what?) ? Thanks Kevin
Re: Dovecot FTS Solr Error [urgent/serious]
Look at the admin UI screen, the overview screen has this in the lower-right corner. To add more memory, you can start it like this: java -Xmx4G -Xms4G -jar start.jar Best, Erick On Fri, Feb 13, 2015 at 10:24 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi, how do i check the heap size for Solr java? i am not very well versed with java, only using it for my mail server so appreciate if you could help thanks Kevin On Saturday, February 14, 2015, Shalin Shekhar Mangar shalinman...@gmail.com wrote: According to the logs, the Solr process has run out of memory and therefore it won't accept any more writes. What is the heap size for the Solr java process? The u...@domain.net javascript:; in the logs is a red herring. Those just seem to be part of the document's id. On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie superinterstel...@gmail.com javascript:; wrote: Hi Guys, Serious help requested with dovecot and apache solr. Appreciate if someone could see the solr log outputs and tell me whats going wrong. Problem: Dovecot adm keeps reporting error as shown below:- root@mail:/var/log# doveadm index -u u...@domain.net javascript:; inbox doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing failed: Server Error The log suggests as follows:- http://pastebin.com/KSvignc9 My system settings: solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56 Physical Memory 98.8% Swap Space 0.0% File Descriptor Count 2.3% JVM-Memory 5.4% I login to my server as follows:- kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net javascript:; I start my server with following command: :/opt/solr# java -jar start.jar Startup log is showing as follows:- http://pastebin.com/EYVJ06rL It keeps indicating u...@domain.net javascript:; (This was the user name that was part of a command passed from doveadm as follows):- doveadm index -u u...@domain.net javascript:; inbox Could someone tell me what is happening in the log(can the apache solr read the request from dovecot correctly or is this some schema problem or what?) ? Thanks Kevin -- Regards, Shalin Shekhar Mangar.
RE: Collations are not working fine.
Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Re: Dovecot FTS Solr Error [urgent/serious]
Hi, how do i check the heap size for Solr java? i am not very well versed with java, only using it for my mail server so appreciate if you could help thanks Kevin On Saturday, February 14, 2015, Shalin Shekhar Mangar shalinman...@gmail.com wrote: According to the logs, the Solr process has run out of memory and therefore it won't accept any more writes. What is the heap size for the Solr java process? The u...@domain.net javascript:; in the logs is a red herring. Those just seem to be part of the document's id. On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie superinterstel...@gmail.com javascript:; wrote: Hi Guys, Serious help requested with dovecot and apache solr. Appreciate if someone could see the solr log outputs and tell me whats going wrong. Problem: Dovecot adm keeps reporting error as shown below:- root@mail:/var/log# doveadm index -u u...@domain.net javascript:; inbox doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing failed: Server Error The log suggests as follows:- http://pastebin.com/KSvignc9 My system settings: solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56 Physical Memory 98.8% Swap Space 0.0% File Descriptor Count 2.3% JVM-Memory 5.4% I login to my server as follows:- kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net javascript:; I start my server with following command: :/opt/solr# java -jar start.jar Startup log is showing as follows:- http://pastebin.com/EYVJ06rL It keeps indicating u...@domain.net javascript:; (This was the user name that was part of a command passed from doveadm as follows):- doveadm index -u u...@domain.net javascript:; inbox Could someone tell me what is happening in the log(can the apache solr read the request from dovecot correctly or is this some schema problem or what?) ? Thanks Kevin -- Regards, Shalin Shekhar Mangar.
Re: Index directory containing only segments.gen
Erick Erickson wrote OK, I think this is the root of your problem: bq: Everything was setup using the - now deprecated - tags cores and core inside solr.xml. There are a bunch of ways this could go wrong. I'm pretty sure you have something that would take quite a while to untangle, so unless you have a _very_ good reason for making this work, I'd blow everything away. I've started playing with SolrCloud before the new solr.xml made its appearance (in the example files of 4.4 distribution If I'm not mistaken) and since it was classified only as deprecated I decided to postpone the transition to the new solr.xml for the migration to Solr 5.0. Anyway, what you are saying is that the use of the new solrcloud-friendly configuration file is accompanied by changes in SolrCloud behavior? Erick Erickson wrote If you're using an external Zookeeper shut if off and, 'rm -rf /tmp/zookeeper'. If using embedded, you can remove zoo_data under your SOLR_HOME. Do you mean getting rid of Zookeeper snapshot and transcation logs, basically clearing things and removing zknodes like clusterstate.json, overseer and the like? Erick Erickson wrote OK, now use the Collections API to create your collection, see: https://cwiki.apache.org/confluence/display/solr/Collections+API and go from there (don't forget to push your configs to Zookeeper first) and go from there. I've successfully tried your proposed approach using the new solr.xml but I've bypassed the collections API and added core.properties files inside my collection directories. Directories contain no other files and configuration has been preloaded into Zookeeper. I prefer to have everything ready before starting the Solr servers. Do you see anything unusual there? One last thing, what exactly is HttpShardHandlerFactory responsible for? Because there was no such definition in the deprecated solr.xml I was using. Thanks Erick, Zisis T. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index directory containing only segments.gen
Zisis: It's not so much that the behavior has changed, and it's still perfectly possible to use old-style solr.xml. Rather, it's that there has been quite a lot of hardening how SolrCloud creates things. I realize that the Collections API is more of a black box and you have to take on faith that it's doing the right thing. That said, the API was written by people deeply involved in the guts of how SolrCloud works and has some safeguards built in. bq: Do you mean getting rid of Zookeeper snapshot and transcation logs, basically clearing things and removing zknodes like clusterstate.json, overseer and the like? This is just a recipe for how I completely blow away Zookeeper's knowledge of the system so I'm _sure_ nothing's hanging around. Note that I frequently bounce around from one thing to another rather than maintain a single system, so I'm pretty cavalier about this. Mostly I've spent too much time being hammered by something I'd forgotten I changed when moving from one problem to another. I mean I'll spin up Solr 5.x, one or two Solr 4.x versions and maybe trunk over the course of a day while working on various problems; it's easy to lose track. Of course you wouldn't want to resort to this in a real environment. rm -rf /tmp/zookeeper' is just an incantation I often use ;) But yes, that's what's going on. The clusterstate and all non-ephemeral nodes are just gone. There are more sophisticated ways to do this that aren't so blunt if you'd prefer. Do be a bit aware, though, that if the replicas are still on the Solr nodes, they can re-register themselves after you blow away the Zookeeper info. bq: Do you see anything unusual there. Unusual, but not necessarily bad. The collections API takes much of the guesswork out of this though. For instance, are you quite sure you're naming each replica such that there are no collisions? Note that you can also specify what nodes the leaders and replicas go on, and you can script this if using the Collections API. Ditto with adding replicas (of course this latter came in later than 4.4 IIRC). Not intimately familiar with HttpShardHandlerFactory, but on a quick glance it's handling pooling threads for sub-requests to other shards. May be way off base here. Erick On Fri, Feb 13, 2015 at 9:21 AM, Zisis Tachtsidis zist...@runbox.com wrote: Erick Erickson wrote OK, I think this is the root of your problem: bq: Everything was setup using the - now deprecated - tags cores and core inside solr.xml. There are a bunch of ways this could go wrong. I'm pretty sure you have something that would take quite a while to untangle, so unless you have a _very_ good reason for making this work, I'd blow everything away. I've started playing with SolrCloud before the new solr.xml made its appearance (in the example files of 4.4 distribution If I'm not mistaken) and since it was classified only as deprecated I decided to postpone the transition to the new solr.xml for the migration to Solr 5.0. Anyway, what you are saying is that the use of the new solrcloud-friendly configuration file is accompanied by changes in SolrCloud behavior? Erick Erickson wrote If you're using an external Zookeeper shut if off and, 'rm -rf /tmp/zookeeper'. If using embedded, you can remove zoo_data under your SOLR_HOME. Do you mean getting rid of Zookeeper snapshot and transcation logs, basically clearing things and removing zknodes like clusterstate.json, overseer and the like? Erick Erickson wrote OK, now use the Collections API to create your collection, see: https://cwiki.apache.org/confluence/display/solr/Collections+API and go from there (don't forget to push your configs to Zookeeper first) and go from there. I've successfully tried your proposed approach using the new solr.xml but I've bypassed the collections API and added core.properties files inside my collection directories. Directories contain no other files and configuration has been preloaded into Zookeeper. I prefer to have everything ready before starting the Solr servers. Do you see anything unusual there? One last thing, what exactly is HttpShardHandlerFactory responsible for? Because there was no such definition in the deprecated solr.xml I was using. Thanks Erick, Zisis T. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dovecot FTS Solr Error [urgent/serious]
According to the logs, the Solr process has run out of memory and therefore it won't accept any more writes. What is the heap size for the Solr java process? The u...@domain.net in the logs is a red herring. Those just seem to be part of the document's id. On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Guys, Serious help requested with dovecot and apache solr. Appreciate if someone could see the solr log outputs and tell me whats going wrong. Problem: Dovecot adm keeps reporting error as shown below:- root@mail:/var/log# doveadm index -u u...@domain.net inbox doveadm(t...@sicl.net): Error: fts_solr: Indexing failed: Server Error The log suggests as follows:- http://pastebin.com/KSvignc9 My system settings: solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56 Physical Memory 98.8% Swap Space 0.0% File Descriptor Count 2.3% JVM-Memory 5.4% I login to my server as follows:- kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net I start my server with following command: :/opt/solr# java -jar start.jar Startup log is showing as follows:- http://pastebin.com/EYVJ06rL It keeps indicating u...@domain.net (This was the user name that was part of a command passed from doveadm as follows):- doveadm index -u u...@domain.net inbox Could someone tell me what is happening in the log(can the apache solr read the request from dovecot correctly or is this some schema problem or what?) ? Thanks Kevin -- Regards, Shalin Shekhar Mangar.
Solr and UIMA, capturing fields
Hi, I successfully combined Solr and UIMA with the help of https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to provide some help about how to reach this step). Right now I can run an analysis engine and get some primitive feature/fields which I specify in the schema.xml automatically recognized by Solr. But if the features itself are objects, I do not know how to capture them in Solr. I provided the relevant solrconfig.xml in [1], and the schema.xml addition in [2] for the following small example, they are using the AE directly provided by the UIMA example. With the input This is a sentence with an email at u...@host.com, Solr correctly adds the field: UIMAname: [ 36 ] since this is the index where the email token starts. I could also successfully capture the feature str name=featureend/str to indicate where the found email token ends. However, example.EmailAddress has the features: begin, end, sofa. sofa is not a primitive feature, but an object which itself has features sofaNum, sofaID, sofaString, ... How can I access fields in Solr from an annotation like example.EmailAddress that are not simple strings but itself objects? I made an image of the CAS Visual Debugger with this AE and the sentence to show which fields I mean, I hope this makes it more clear: http://tinypic.com/view.php?pic=34rud1ss=8#.VN5bF7s2cWN Does anyone know how to access such fields with Solr and UIMA? Thanks a lot for any help, Tom [1] updateRequestProcessorChain name=uima default=true processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngine/home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml/str !-- Set to true if you want to continue indexing even if text processing fails. Default is false. That is, Solr throws RuntimeException and never indexed documents entirely in your session. -- bool name=ignoreErrorsfalse/bool !-- This is optional. It is used for logging when text processing fails. If logField is not specified, uniqueKey will be used as logField. str name=logFieldid/str -- str name=logFieldid/str lst name=analyzeFields bool name=mergefalse/bool arr name=fields strtext/str /arr /lst lst name=fieldMappings lst name=type str name=nameexample.EmailAddress/str lst name=mapping str name=featurebegin/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain [2] field name=UIMAname type=string indexed=true stored=true multiValued=true required=false/
Re: Collations are not working fine.
Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Re: Dovecot FTS Solr Error [urgent/serious]
Hi Erik, Below link is my admin_ui output: http://postimg.org/image/4rbubv54x/ Could you confirm if the problem if my system is short of memory? Also my VPS system runs on 2GB ram, should I increase the ram/processor? Thanks Kevin On Sat, Feb 14, 2015 at 2:27 AM, Erick Erickson erickerick...@gmail.com wrote: Look at the admin UI screen, the overview screen has this in the lower-right corner. To add more memory, you can start it like this: java -Xmx4G -Xms4G -jar start.jar Best, Erick On Fri, Feb 13, 2015 at 10:24 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi, how do i check the heap size for Solr java? i am not very well versed with java, only using it for my mail server so appreciate if you could help thanks Kevin On Saturday, February 14, 2015, Shalin Shekhar Mangar shalinman...@gmail.com wrote: According to the logs, the Solr process has run out of memory and therefore it won't accept any more writes. What is the heap size for the Solr java process? The u...@domain.net javascript:; in the logs is a red herring. Those just seem to be part of the document's id. On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie superinterstel...@gmail.com javascript:; wrote: Hi Guys, Serious help requested with dovecot and apache solr. Appreciate if someone could see the solr log outputs and tell me whats going wrong. Problem: Dovecot adm keeps reporting error as shown below:- root@mail:/var/log# doveadm index -u u...@domain.net javascript:; inbox doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing failed: Server Error The log suggests as follows:- http://pastebin.com/KSvignc9 My system settings: solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56 Physical Memory 98.8% Swap Space 0.0% File Descriptor Count 2.3% JVM-Memory 5.4% I login to my server as follows:- kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net javascript:; I start my server with following command: :/opt/solr# java -jar start.jar Startup log is showing as follows:- http://pastebin.com/EYVJ06rL It keeps indicating u...@domain.net javascript:; (This was the user name that was part of a command passed from doveadm as follows):- doveadm index -u u...@domain.net javascript:; inbox Could someone tell me what is happening in the log(can the apache solr read the request from dovecot correctly or is this some schema problem or what?) ? Thanks Kevin -- Regards, Shalin Shekhar Mangar.
Re: 43sec commit duration - blocked by index merge events?
I wasn't able to follow Otis' answer but... the purpose of commit is to make make recent document changes (since the last commit) visible to queries, and has nothing to do with merging of segments. IOW, take the new segment that is being created and not yet ready for use by query, and finish it so that query can access it. Soft commit vs. hard commit is simply a matter of whether Solr will wait for the I/O to write the new segment to disk to complete. Merging is an independent, background procedure (thread) that merges existing segments. It does seem odd that the cited doc does say that soft commit waits for background merges! (Hoss??) -- Jack Krupansky On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user e.g. http://search-lucene.com/m/QTPa7Sqx81 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote: Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: Solr - Mahout
There is no recommendation built into Solr itself, but you might get some good ideas from this presentation: http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine -- Jack Krupansky On Fri, Feb 13, 2015 at 8:33 AM, mohamed.sa...@experionglobal.com wrote: Sir , I need to know solr have bulletin recommendation (collaborative filtering) is there or their is it possible to add recommender with out using Mahout. i kindly request give a fast replay Your Mohamed Sahad K P
Solr scoring confusion
We are getting inconsistent scoring results in Solr. It works about 95% of the time, where a search on one term returns the results which equal exactly that one term at the top, and results with multiple terms that also contain that one term are returned lower. Occasionally, however, if a subset of the data has been re-indexed (the same data just added to the index again) then the results will be slightly off, for example the data from the earlier index will get a higher score than it should, until we re-index all the data. Our assumption here is that setting omitNorms to false, then indexing the data, then searching, should result in scores where the data with an exact match has a higher score. We usually see this but not always. Is something added to the score besides the value that is being searched that we are not understaning? Thanks. .. Scott Johnson Data Advantage Group, Inc. 604 Mission Street San Francisco, CA 94105 Office: +1.415.947.0400 x204 Fax: +1.415.947.0401 Take the first step towards a successful meta data initiative with MetaCenter - the only plug and play, real-time meta data solution.http://www.dag.com/ www.dag.com ..
Re: 43sec commit duration - blocked by index merge events?
Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user e.g. http://search-lucene.com/m/QTPa7Sqx81 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote: Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: How to make SolrCloud more elastic
Hi Matt, See: http://search-lucene.com/?q=query+routingfc_project=Solr https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper matt.kui...@issinc.com wrote: Otis, Thanks for your reply. I see your point about too many shards and search efficiency. I also agree that I need to get a better handle on customer requirements and expected loads. Initially I figured that with the shard splitting option, I would need to double my Solr nodes every time I split (as I would want to split every shard within the collection). Where actually only the number of shards would double, and then I would have the opportunity to rebalance the shards over the existing Solr nodes plus a number of new nodes that make sense at the time. This may be preferable to defining many micro shards up front. The time-base collections may be an option for this project. I am not familiar with query routing, can you point me to any documentation on how this might be implemented? Thanks, Matt -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, February 11, 2015 9:13 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). My understanding is that one way to address search performance is by adding more replicas. I am more concerned about handling a growing index size. I have already been given some good input on this topic and am considering a shard splitting approach, but am more focused on a rebalancing approach that includes defining many shards up front and then moving these existing shards on to new Solr servers as needed. Plan to experiment with this approach first. Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. Thanks, Matt
Re: Solr scoring confusion
Hi Scott, Try optimizing after reindexing and this should go away. Had to do with updated/deleted docs participating in score computation. Otis On Feb 13, 2015, at 18:29, Scott Johnson sjohn...@dag.com wrote: We are getting inconsistent scoring results in Solr. It works about 95% of the time, where a search on one term returns the results which equal exactly that one term at the top, and results with multiple terms that also contain that one term are returned lower. Occasionally, however, if a subset of the data has been re-indexed (the same data just added to the index again) then the results will be slightly off, for example the data from the earlier index will get a higher score than it should, until we re-index all the data. Our assumption here is that setting omitNorms to false, then indexing the data, then searching, should result in scores where the data with an exact match has a higher score. We usually see this but not always. Is something added to the score besides the value that is being searched that we are not understaning? Thanks. .. Scott Johnson Data Advantage Group, Inc. 604 Mission Street San Francisco, CA 94105 Office: +1.415.947.0400 x204 Fax: +1.415.947.0401 Take the first step towards a successful meta data initiative with MetaCenter - the only plug and play, real-time meta data solution.http://www.dag.com/ www.dag.com ..
Re: 43sec commit duration - blocked by index merge events?
Exactly how are you issuing the commit? I'm assuming you're using SolrJ. the server.commit(whatever, true) waits for the searcher to be opened before returning. This includes (I believe) warmup times. It could be that the warmup times are huge in your case, the solr logs should show you the autowarm times for a new searcher. Best, Erick On Fri, Feb 13, 2015 at 2:53 PM, Jack Krupansky jack.krupan...@gmail.com wrote: I wasn't able to follow Otis' answer but... the purpose of commit is to make make recent document changes (since the last commit) visible to queries, and has nothing to do with merging of segments. IOW, take the new segment that is being created and not yet ready for use by query, and finish it so that query can access it. Soft commit vs. hard commit is simply a matter of whether Solr will wait for the I/O to write the new segment to disk to complete. Merging is an independent, background procedure (thread) that merges existing segments. It does seem odd that the cited doc does say that soft commit waits for background merges! (Hoss??) -- Jack Krupansky On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user e.g. http://search-lucene.com/m/QTPa7Sqx81 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote: Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: 43sec commit duration - blocked by index merge events?
I think Mark found something similar - https://issues.apache.org/jira/browse/SOLR-6838 On Sat, Feb 14, 2015 at 2:05 AM, Erick Erickson erickerick...@gmail.com wrote: Exactly how are you issuing the commit? I'm assuming you're using SolrJ. the server.commit(whatever, true) waits for the searcher to be opened before returning. This includes (I believe) warmup times. It could be that the warmup times are huge in your case, the solr logs should show you the autowarm times for a new searcher. Best, Erick On Fri, Feb 13, 2015 at 2:53 PM, Jack Krupansky jack.krupan...@gmail.com wrote: I wasn't able to follow Otis' answer but... the purpose of commit is to make make recent document changes (since the last commit) visible to queries, and has nothing to do with merging of segments. IOW, take the new segment that is being created and not yet ready for use by query, and finish it so that query can access it. Soft commit vs. hard commit is simply a matter of whether Solr will wait for the I/O to write the new segment to disk to complete. Merging is an independent, background procedure (thread) that merges existing segments. It does seem odd that the cited doc does say that soft commit waits for background merges! (Hoss??) -- Jack Krupansky On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user e.g. http://search-lucene.com/m/QTPa7Sqx81 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote: Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.