CSV entry as multiple documents
Hi all, I was wondering if there is a way to tell Solr to treat a CSV entry as multiple documents instead of one document. For instance, suppose that a CSV file has 4 fields and a single entry: t1,v1,v2,v3 2015-01-01T01:00:59Z,0.3,0.5,0.7 I want Solr to update its index like it were 3 different documents: t1,v 2015-01-01T01:00:59Z,0.3 2015-01-01T01:00:59Z,0.5 2015-01-01T01:00:59Z,0.7 Is that possible, or do I have to create a different CSV for it? Many thanks, Henrique.
Re: Simple Sort Is Not Working In Solr 4.7?
What's the field definition for your title field? Is it just string or are you doing some tokenizing? It should be a string or a single token cleaned up (e.g. lower-cased) using KeywordTokenizer. In the example schema, you will normally see the original field tokenized and the sort field separately with copyField connection. In latest Solr, docValues are also recommended for sort fields. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 19:52, Simon Cheng simonwhch...@gmail.com wrote: I don't know whether it is my setup or any other reasons. But the fact is that a very simple sort is not working in my Solr 4.7 environment. The query is very simple : http://localhost:8983/solr/bibs/select?q=author:sorosfl=id,author,titlesort=title+ascwt=xmlstart=0indent=true And the output is NOT sorted according to title :
Re: Checkout the source Code to the Release Version of Solr?
The SVN source is under tags, not branches. http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/ On Tue, Feb 17, 2015 at 4:39 PM, O. Olson olson_...@yahoo.it wrote: Thank you Hrishikesh. Funny how GitHub is not mentioned on http://lucene.apache.org/solr/resources.html I think common-build.xml is what I was looking for. Thank you Hrishikesh Gadre-3 wrote Also the version number is encoded (at least) in the build file https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32 Hope this helps. Thanks Hrishikesh Hrishikesh Gadre-3 wrote Hi, You can get the released code base here https://github.com/apache/lucene-solr/releases Thanks Hrishikesh -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CSV entry as multiple documents
Hi Henrique, Solr supports posting a csv with multiple rows. Have a look at the documentation in the ref. guide here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com wrote: Hi all, I was wondering if there is a way to tell Solr to treat a CSV entry as multiple documents instead of one document. For instance, suppose that a CSV file has 4 fields and a single entry: t1,v1,v2,v3 2015-01-01T01:00:59Z,0.3,0.5,0.7 I want Solr to update its index like it were 3 different documents: t1,v 2015-01-01T01:00:59Z,0.3 2015-01-01T01:00:59Z,0.5 2015-01-01T01:00:59Z,0.7 Is that possible, or do I have to create a different CSV for it? Many thanks, Henrique. -- Anshum Gupta http://about.me/anshumgupta
Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain
if your goal is that *every* doc will get a last_modified, regarldess of how it is indexed, then you don't need to set the update.chain default on every requestHandler -- instead just mark your updateRequestProcessorChain as the default... updateRequestProcessorChain name=last_modified default=true processor class=solr.TimestampUpdateProcessorFactory str name=fieldNamelast_modified/str /processor … Thanks for this. There was some confusion between me and my coworker on which requestHandler to set it, but setting it as a default should solve the problem. Unfortunately, I’m still not getting it back. I’m now wondering if it’s the schema that I’m screwing up or how I’m sending the index command. Schema.xml: : field name=last_modified type=date indexed=true stored=true / : fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0”/ And the update command: : curl http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW; -F sc=@1234.txt -- On Feb 17, 2015, at 10:26 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Hi, : : You are using /update when registering, but using /update/extract when invoking. : : Ahmet if your goal is that *every* doc will get a last_modified, regarldess of how it is indexed, then you don't need to set the update.chain default on every requestHandler -- instead just mark your updateRequestProcessorChain as the default... updateRequestProcessorChain name=last_modified default=true processor class=solr.TimestampUpdateProcessorFactory str name=fieldNamelast_modified/str /processor ... : : On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow sc...@alumni.rutgers.edu wrote: : Hi, all. I’m trying to insert a field into Solr called last_modified, which holds a timestamp of the update. Since this is a cloud setup, I'm using the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain. : : solrconfig.xml: : : requestHandler name=/update class=solr.UpdateRequestHandler : lst name=defaults : str name=update.chainlast_modified/str : /lst : /requestHandler : : updateRequestProcessorChain name=last_modified : processor class=solr.TimestampUpdateProcessorFactory : str name=fieldNamelast_modified/str : /processor : processor class=solr.LogUpdateProcessorFactory / : processor class=solr.RunUpdateProcessorFactory / : /updateRequestProcessorChain : : : In schema.xml, I have: : : field name=last_modified type=date indexed=true stored=true / : fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ : This is the command I'm using to index: : : curl http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW; -F sc=@1234.txt : However, after indexing, the last_modified field is still not showing up on queries. Is there something else I should be doing? Thanks. : -Hoss http://www.lucidworks.com/
Re: Checkout the source Code to the Release Version of Solr?
On 2/17/2015 3:20 PM, O. Olson wrote: At this time the latest released version of Solr is 4.10.3. Is there anyway we can get the source code for this release version? I tried to checkout the Solr code from http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the commit log, I see a number of revisions but nothing mention which is the release version. The latest revision being 1657441 on Feb 4. Does this correspond to 4.10.3? If no, then how do I go about getting the source code of 4.10.3. That is the current development branch for 4.10.x. There are some changes in that branch that are not in any released version yet. If a 4.10.4 is ever released, it will come from that branch. There is no guarantee that a 4.10.4 will ever be released. It is likely that the 5.0.0 release will be announced in the next few days. A problem could still be found, but the current release candidate is looking good so far. I'm also curious where the version number is embedded i.e. is it in a file somewhere? Yes. You can find it in lucene/version.propertiesin a typical checkout. I want to ensure I am using the released version, and not some bug fixes after the version got released. For that exact version, you want to use this URL for your svn checkout: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/ I don't see lucene/version.properties in that tag, but the 4.10.3 version does show up in lucene/common-build.xml. Thanks, Shawn
Re: Checkout the source Code to the Release Version of Solr?
Thank you Mike. This is what I was looking for. I apparently did not understand what tags where. Mike Drob wrote The SVN source is under tags, not branches. http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/ -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187054.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CSV entry as multiple documents
Yes, Alexandre is right about my question. To make it clear, a CSV that look like: t1,v1,v2,v2 2015-01-01T01:59:00Z,0.3,0.5,0.7 2015-01-01T02:00:00Z,0.4,0.5,0.8 would be the same of indexing t1,v 2015-01-01T01:59:00Z,0.3 2015-01-01T01:59:00Z,0.5 2015-01-01T01:59:00Z,0.7 2015-01-01T02:00:00Z,0.4 2015-01-01T02:00:00Z,0.5 2015-01-01T02:00:00Z,0.8 I don’t know if multiValued field would do the trick. Do you have more info on that split command? Henrique On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I think the question asked was a bit different. It was about having one row/document split into multiple with some fields replicated and some mapped. JSON (single-document format) has a split command which might be similar to what's being asked. CSV has a split command as well, but I think it is more about creating a multiValued field. Or did I miss a different parameter? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote: Hi Henrique, Solr supports posting a csv with multiple rows. Have a look at the documentation in the ref. guide here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com wrote: Hi all, I was wondering if there is a way to tell Solr to treat a CSV entry as multiple documents instead of one document. For instance, suppose that a CSV file has 4 fields and a single entry: t1,v1,v2,v3 2015-01-01T01:00:59Z,0.3,0.5,0.7 I want Solr to update its index like it were 3 different documents: t1,v 2015-01-01T01:00:59Z,0.3 2015-01-01T01:00:59Z,0.5 2015-01-01T01:00:59Z,0.7 Is that possible, or do I have to create a different CSV for it? Many thanks, Henrique. -- Anshum Gupta http://about.me/anshumgupta
Re: Checkout the source Code to the Release Version of Solr?
Thank you Hrishikesh. Funny how GitHub is not mentioned on http://lucene.apache.org/solr/resources.html I think common-build.xml is what I was looking for. Thank you Hrishikesh Gadre-3 wrote Also the version number is encoded (at least) in the build file https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32 Hope this helps. Thanks Hrishikesh Hrishikesh Gadre-3 wrote Hi, You can get the released code base here https://github.com/apache/lucene-solr/releases Thanks Hrishikesh -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Checkout the source Code to the Release Version of Solr?
Thank you Shawn. I have not updated my version in a while, so I prefer to do it to 4.10 first, rather than go directly to 5.0. I'd be working on it towards the end of this week. -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187055.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrcloud sizing
One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory, ...) Some inputs : - Collection size : 15 billions document - Collection update : 8 millions new documents / days + 8 millions deleted documents / days - Updates occur during the night without queries - Queries occur during the day without updates - Document size is nearly 300 bytes - Document fields are mainly string including one date field - The same terms will occurs several time for a given field (from 10 to 100.000) - Query will use a date period and a filter query on one or more fields - 10.000 queries / minutes - expected response time 500ms - 1 billion documents indexed = 5Gb index size - no ssd drives So, what is you advice about : # of shards : 15 billions documents - 16 shards ? # of replicas ? # of nodes = # of shards ? heap memory per node ? direct memory per node ? Thank your advices ? Dominique
Re: CSV entry as multiple documents
I think the question asked was a bit different. It was about having one row/document split into multiple with some fields replicated and some mapped. JSON (single-document format) has a split command which might be similar to what's being asked. CSV has a split command as well, but I think it is more about creating a multiValued field. Or did I miss a different parameter? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote: Hi Henrique, Solr supports posting a csv with multiple rows. Have a look at the documentation in the ref. guide here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com wrote: Hi all, I was wondering if there is a way to tell Solr to treat a CSV entry as multiple documents instead of one document. For instance, suppose that a CSV file has 4 fields and a single entry: t1,v1,v2,v3 2015-01-01T01:00:59Z,0.3,0.5,0.7 I want Solr to update its index like it were 3 different documents: t1,v 2015-01-01T01:00:59Z,0.3 2015-01-01T01:00:59Z,0.5 2015-01-01T01:00:59Z,0.7 Is that possible, or do I have to create a different CSV for it? Many thanks, Henrique. -- Anshum Gupta http://about.me/anshumgupta
Re: Simple Sort Is Not Working In Solr 4.7?
Hi Alex, It's okay after I added in a new field s_title in the schema and re-indexed. field name=s_title type=string indexed=true stored=false multiValued=false/ copyField source=title dest=s_title/ But how can I ignore the articles (A, An, The) in the sorting. As you can see from the below example : http://localhost:8983/solr/bibs/select?q=singaporefl=id,titlesort=s_title+ascwt=xmlstart=0rows=20indent=true response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=qsingapore/str str name=indenttrue/str str name=flid,title/str str name=start0/str str name=sorts_title asc/str str name=rows20/str str name=wtxml/str /lst /lst result name=response numFound=18 start=0 doc str name=id36/str str name=title 5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore /str /doc doc str name=id70/str str name=title Anti-money laundering counter-terrorism financing / Commercial Affairs Dept /str /doc doc str name=id15/str str name=title China's anti-secession law : a legal perspective / Zou, Keyuan /str /doc doc str name=id12/str str name=title China's currency peg : firm in the eye of the storm / Calla Wiemer /str /doc doc str name=id22/str str name=title China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian Lye Liang Fook /str /doc doc str name=id92/str str name=title Goods and Services Tax Act [2005 ed.] (Chapter 117A) /str /doc doc str name=id13/str str name=title Governing capacity in China : creating a contingent of qualified personnel / Kjeld Erik Brodsgaard /str /doc doc str name=id21/str str name=titleHealth care marketization in urban China / Gu Xin/str /doc doc str name=id85/str str name=titleLianhe Zaobao, Sunday/str /doc doc str name=id84/str str name=title Singapore : vision of a global city / Jones Lang LaSalle /str /doc doc str name=id7/str str name=title Singapore real estate investment trusts : leveraged value / Tony Darwell /str /doc doc str name=id96/str str name=title Singapore's success : engineering economic growth / Henri Ghesquiere /str /doc doc str name=id23/str str name=title The Chen-Soong meeting : the beginning of inter-party rapprochement in Taiwan? / Raymond R. Wu /str /doc doc str name=id17/str str name=title The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader, Sandy Ho; team members, Audrey Low ... et al /str /doc doc str name=id78/str str name=titleThe New paper on Sunday/str /doc doc str name=id95/str str name=title The little Red Dot : reflections by Singapore's diplomats / editors, Tommy Koh, Chang Li Lin /str /doc doc str name=id52/str str name=title [Press releases and articles on policy changes affecting the Singapore property market] / compiled by the Information Resource Centre, Monetary Authority of Singapore /str /doc doc str name=iddataq/str str name=title Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 , БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois /str /doc /result /response
Re: Simple Sort Is Not Working In Solr 4.7?
Like I mentioned before. You could use string type if you just want title it is. Or you can use a custom type to normalize the indexed value, as long as you end up with a single token. So, if you want to strip leading A/An/The, you can use KeywordTokenizer, combined with whatever post-processing you need. I would suggest LowerCase filter and perhaps Regex filter to strip off those leading articles. You may need to iterate a couple of times on that specific chain. The good news is that you can just make a couple of type definitions with different values/order, reload the index (from Cores screen of the Web Admin UI) and run some of your sample titles through those different definitions without having to reindex in the Analysis screen. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 22:36, Simon Cheng simonwhch...@gmail.com wrote: Hi Alex, It's okay after I added in a new field s_title in the schema and re-indexed. field name=s_title type=string indexed=true stored=false multiValued=false/ copyField source=title dest=s_title/ But how can I ignore the articles (A, An, The) in the sorting. As you can see from the below example :
Re: Simple Sort Is Not Working In Solr 4.7?
Hi Alex, It's simply defined like this in the schema.xml : field name=title type=text_general indexed=true stored=true multiValued=false/ and it is cloned to the other multi-valued field o_title : copyField source=title dest=o_title/ Should I simply change the type to be string instead? Thanks again, Simon. On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's the field definition for your title field? Is it just string or are you doing some tokenizing? It should be a string or a single token cleaned up (e.g. lower-cased) using KeywordTokenizer. In the example schema, you will normally see the original field tokenized and the sort field separately with copyField connection. In latest Solr, docValues are also recommended for sort fields. Regards, Alex.
Simple Sort Is Not Working In Solr 4.7?
Hi, I don't know whether it is my setup or any other reasons. But the fact is that a very simple sort is not working in my Solr 4.7 environment. The query is very simple : http://localhost:8983/solr/bibs/select?q=author:sorosfl=id,author,titlesort=title+ascwt=xmlstart=0indent=true And the output is NOT sorted according to title : response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=sorttitle asc/str str name=flid,author,title/str str name=indenttrue/str str name=start0/str str name=qauthor:soros/str str name=wtxml/str /lst /lst result name=response numFound=13 start=0 doc str name=id9018/str arr name=author strSoros, George, 1930-/str /arr str name=title The alchemy of finance : reading the mind of the market / George Soros /str /doc doc str name=id15785/str arr name=author strSoros, George, 1930-/str strSoros Foundations/str /arr str name=titleBosnia / by George Soros/str /doc doc str name=id16281/str arr name=author strSoros, George, 1930-/str strSoros Foundations/str /arr str name=title Prospect for European disintegration / by George Soros /str /doc doc str name=id25807/str arr name=author strSoros, George/str /arr str name=title Open society : reforming global capitalism / George Soros /str /doc doc str name=id27440/str str name=titleGeorge Soros on globalization/str arr name=author strSoros, George, 1930-/str /arr /doc doc str name=id22254/str arr name=author strSoros, George, 1930-/str /arr str name=title The crisis of global capitalism : open society endangered / George Soros /str /doc doc str name=id16914/str arr name=author strSoros, George, 1930-/str strSoros Fund Management/str /arr str name=titleThe theory of reflexivity / by George Soros/str /doc doc str name=id17343/str str name=title Financial turmoil in Europe and the United States : essays / George Soros /str arr name=author strSoros, George, 1930-/str /arr /doc doc str name=id15542/str arr name=author strSoros, George, 1930-/str strHarvard Club of New York City/str /arr str name=title Nationalist dictatorships versus open society / by George Soros /str /doc doc str name=id15891/str arr name=author strSoros, George/str /arr str name=title The new paradigm for financial markets : the credit crisis of 2008 and what it means / George Soros /str /doc /result /response Thank you for the help in advance, Simon.
Re: Simple Sort Is Not Working In Solr 4.7?
If you are not searching against the title field directly, you can change it to string. If you do, create a separate one, specifically for sorting. You should be able to use docValues with that field even in Solr 4.7. Remember to re-index. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 20:16, Simon Cheng simonwhch...@gmail.com wrote: Hi Alex, It's simply defined like this in the schema.xml : field name=title type=text_general indexed=true stored=true multiValued=false/ and it is cloned to the other multi-valued field o_title : copyField source=title dest=o_title/ Should I simply change the type to be string instead? Thanks again, Simon. On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's the field definition for your title field? Is it just string or are you doing some tokenizing? It should be a string or a single token cleaned up (e.g. lower-cased) using KeywordTokenizer. In the example schema, you will normally see the original field tokenized and the sort field separately with copyField connection. In latest Solr, docValues are also recommended for sort fields. Regards, Alex.
Re: CSV entry as multiple documents
What's your business use case? You don't need the split command, as you already have those values in separate fields. You could copyField them to a single multiValued field, but you would still have one document per original CSV line. Why do you need multiple documents out of one big CSV entry? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 20:37, Henrique Oliveira hensan...@gmail.com wrote: Yes, Alexandre is right about my question. To make it clear, a CSV that look like: t1,v1,v2,v2 2015-01-01T01:59:00Z,0.3,0.5,0.7 2015-01-01T02:00:00Z,0.4,0.5,0.8 would be the same of indexing t1,v 2015-01-01T01:59:00Z,0.3 2015-01-01T01:59:00Z,0.5 2015-01-01T01:59:00Z,0.7 2015-01-01T02:00:00Z,0.4 2015-01-01T02:00:00Z,0.5 2015-01-01T02:00:00Z,0.8 I don’t know if multiValued field would do the trick. Do you have more info on that split command? Henrique On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I think the question asked was a bit different. It was about having one row/document split into multiple with some fields replicated and some mapped. JSON (single-document format) has a split command which might be similar to what's being asked. CSV has a split command as well, but I think it is more about creating a multiValued field. Or did I miss a different parameter? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote: Hi Henrique, Solr supports posting a csv with multiple rows. Have a look at the documentation in the ref. guide here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com wrote: Hi all, I was wondering if there is a way to tell Solr to treat a CSV entry as multiple documents instead of one document. For instance, suppose that a CSV file has 4 fields and a single entry: t1,v1,v2,v3 2015-01-01T01:00:59Z,0.3,0.5,0.7 I want Solr to update its index like it were 3 different documents: t1,v 2015-01-01T01:00:59Z,0.3 2015-01-01T01:00:59Z,0.5 2015-01-01T01:00:59Z,0.7 Is that possible, or do I have to create a different CSV for it? Many thanks, Henrique. -- Anshum Gupta http://about.me/anshumgupta
Confirm Solr index corruption
Hi All, I use Solr 4.4.0 in a master-slave configuration. Last week, the master server ran out of disk (logs got too big too quick due to a bug in our system). Because of this, we weren't able to add new docs to an index. The first thing I did was to delete a few old log files to free up disk space (later I moved the other logs to free up disk). The index is working fine even after this fiasco. The next day, a colleague of mine pointed out that we may be missing a few documents in the index. I suspect the above scenario may have broken the index. I ran the checkIndex against this index. It didn't mention of any corruption though. Right now, the index has about 25k docs. I haven't optimized this index in a while, and there are about 4000 deleted-docs. How can I confirm if we lost anything? If we've lost docs, is there a way to recover it? Thanks in advance!! Regards Thomas
Re: spellcheck.count v/s spellcheck.alternativeTermCount
Thanks James, I tried the same thing spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5 suggestions of both life and hope but not like this * The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life. * On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com wrote: Here is an example to illustrate what I mean... - query q=text:(life AND hope)spellcheck.count=10spellcheck.alternativeTermCount=5 - suppose at least one document in your dictionary field has life in it - also suppose zero documents in your dictionary field have hope in them - The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Hi James, How can you say that count doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com wrote: See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Re: Boosting by calculated distance buckets
Raav, You may need to actually subscribe to the solr-user list. Nabble seems to not be working to well. p.s. I’m on vacation this week so I can’t be very responsive First of all... it's not clear you actually want to *boost* (since you seem to not care about the relevancy score), it seems you want to *sort* based on a function query. So simply sort by the function query instead of using the 'bq' param. Have you read about geodist() in the Solr Reference Guide? It returns the spatial distance. With that and other function queries like map() you could do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and you could put that into your main function query. I purposefully overlapped the map ranges so that I didn't have to deal with double-counting an edge. The only thing I don't like about this is that the distance is going to be calculated as many times as you reference the function, and it's slow. So you may want to write your own function query (internally called a ValueSource), which is relatively easy to do in Solr. ~ David sraav wrote David, Thank you for your prompt response. I truly appreciate it. Also, My post was not accepted the first two times so I am posting it again one final time. In my case I want to turn off the dependency on scoring and let solr use just the boost values that I pass to each function to sort on. Here is a quick example of how I got that to work with non-geo fields which are present in the document and are not dynamically calculated. Using edismax ofcourse. I was able to turn off the scoring (i mean remove the dependency on score) on the result set and drive the sort by the boost that I mentioned in the below query. In the below function For example - if the document1 matches the date listed it gets a boost = 5. If the same document matches the owner AND product - it will get an additional boost of 5 more. The total boost of this document1 is 10. From what ever I have seen, it seems like i was able to turn off of negate the affects of solr score. There was a query norm param that was affecting the boost but it seemed to be a constant around 0.70345...most of the time for any fq mentioned). bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0)) What I am trying to do is to add additional boosting function to the custom boost that will eventually tie into the above function and boost value. For example - if document1 falls in 0-20 KM range i would like to add a boost of 50 making the final boost value to be 60. If it falls under 20-40KM - i would like to add a boost of 40 and so on. Is there a way we can do this? Please let me know if I can provide better clarity on the use case that I am trying to solve. Thank you David. Thanks, Raav - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book Independent Lucene/Solr search consultant, http://www.linkedin.com/in/davidwsmiley -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187112.html Sent from the Solr - User mailing list archive at Nabble.com.
Collations problem even term is available in documents.
Hi, I am misspelling a query hota hai to hota hain. Inside collations, hota hai is not coming, instead of that hot main, home have. etc are coming. I have 37 documents where hota hai is present. *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:hota hainwt=jsonindent=trueshards.qt=/spell *Configuration:* *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count15/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations1000/str str name=spellcheck.maxCollationTries3000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr. StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=5 minShingleSize=2 outputUnigrams=true/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=5 minShingleSize=2 outputUnigrams=true/ /analyzer /fieldType
Re: Solrcloud sizing
Well, it's really impossible to say, you have to prototype. Here's something explaining this a bit: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ This is a major undertaking. Your question is simply impossible to answer without prototyping as in the link above, anything else is guesswork. And at this scale being wrong is expensive. So my advice would be to test on a small cluster, say a 2 shard system and see what kind of performance you can get and extrapolate from there, with your data, your queries etc. Perhaps work with your client on a limited-scope proof-of-concept. Plan on spending some time tuning even the small cluster to get enough answers to form a go/no-go decision. Best, Erick On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory, ...) Some inputs : - Collection size : 15 billions document - Collection update : 8 millions new documents / days + 8 millions deleted documents / days - Updates occur during the night without queries - Queries occur during the day without updates - Document size is nearly 300 bytes - Document fields are mainly string including one date field - The same terms will occurs several time for a given field (from 10 to 100.000) - Query will use a date period and a filter query on one or more fields - 10.000 queries / minutes - expected response time 500ms - 1 billion documents indexed = 5Gb index size - no ssd drives So, what is you advice about : # of shards : 15 billions documents - 16 shards ? # of replicas ? # of nodes = # of shards ? heap memory per node ? direct memory per node ? Thank your advices ? Dominique
Re: Solrcloud sizing
Thank you Erick. This was also my own opinion. 2015-02-18 7:12 GMT+01:00 Erick Erickson erickerick...@gmail.com: Well, it's really impossible to say, you have to prototype. Here's something explaining this a bit: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ This is a major undertaking. Your question is simply impossible to answer without prototyping as in the link above, anything else is guesswork. And at this scale being wrong is expensive. So my advice would be to test on a small cluster, say a 2 shard system and see what kind of performance you can get and extrapolate from there, with your data, your queries etc. Perhaps work with your client on a limited-scope proof-of-concept. Plan on spending some time tuning even the small cluster to get enough answers to form a go/no-go decision. Best, Erick On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory, ...) Some inputs : - Collection size : 15 billions document - Collection update : 8 millions new documents / days + 8 millions deleted documents / days - Updates occur during the night without queries - Queries occur during the day without updates - Document size is nearly 300 bytes - Document fields are mainly string including one date field - The same terms will occurs several time for a given field (from 10 to 100.000) - Query will use a date period and a filter query on one or more fields - 10.000 queries / minutes - expected response time 500ms - 1 billion documents indexed = 5Gb index size - no ssd drives So, what is you advice about : # of shards : 15 billions documents - 16 shards ? # of replicas ? # of nodes = # of shards ? heap memory per node ? direct memory per node ? Thank your advices ? Dominique
Re: Block Join Query Parsers regular expression feature workaround req
Hi Mikhail, It won't solve my problem. For ex: Suppose my docs are like this: doc field name=userid1/field doc field name=addresscity1/field /doc doc field name=addresscity2/field /doc /doc doc field name=userid2/field doc field name=addresscity2/field /doc doc field name=addresscity3/field /doc /doc Now if I want* a query to return me all the users not having any address* related to *city1* (i.e. only userid=2 should be in the result)and then if i query: *q={!parent which=userid:*}*:* -address:city1* This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1 is also having a child whose address is city2) , *desired output was userid=2 only.* On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: try to search all children remove those who has a value1 by dash, then join remaining q={!parent which=contentType:parent}contentType:child -contentType:value1 if the space in underneath query causes the problem try to escape it or wrap to v=$subq On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi I need to have a query in which I need to choose only those parent docs none of whose children's field is having the specified value. i.e. I need something like this: http://localhost:8983/solr/core1/select?*q={!parent which=contentType:parent}childField:NOT value1* The problem is* NOT operator is not being supported* in the Block Join Query Parsers. Could anyone please suggest a way to workaround this problem? Have also added the problem on *stackoverflow*: http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature Regards Sankalp Gupta -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain
Hi, You are using /update when registering, but using /update/extract when invoking. Ahmet On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow sc...@alumni.rutgers.edu wrote: Hi, all. I’m trying to insert a field into Solr called last_modified, which holds a timestamp of the update. Since this is a cloud setup, I'm using the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain. solrconfig.xml: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlast_modified/str /lst /requestHandler updateRequestProcessorChain name=last_modified processor class=solr.TimestampUpdateProcessorFactory str name=fieldNamelast_modified/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain In schema.xml, I have: field name=last_modified type=date indexed=true stored=true / fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ This is the command I'm using to index: curl http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW; -F sc=@1234.txt However, after indexing, the last_modified field is still not showing up on queries. Is there something else I should be doing? Thanks.
Using TimestampUpdateProcessorFactory and updateRequestProcessorChain
Hi, all. I’m trying to insert a field into Solr called last_modified, which holds a timestamp of the update. Since this is a cloud setup, I'm using the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain. solrconfig.xml: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlast_modified/str /lst /requestHandler updateRequestProcessorChain name=last_modified processor class=solr.TimestampUpdateProcessorFactory str name=fieldNamelast_modified/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain In schema.xml, I have: field name=last_modified type=date indexed=true stored=true / fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ This is the command I'm using to index: curl http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW; -F sc=@1234.txt However, after indexing, the last_modified field is still not showing up on queries. Is there something else I should be doing? Thanks.
RE: spellcheck.count v/s spellcheck.alternativeTermCount
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Discrepancy between Full import and Delta import query
Hi Folks, I am running Solr 3.4 and using DIH for importing data from a SQL server backend. The query for Full import and Delta import is the same ie both pull the same data. Full and Delta import query: SELECT KB_ENTRY.ADDITIONAL_INFO ,KB_ENTRY.KNOWLEDGE_REF ID,SU_ENTITY_TYPE.REF ENTRY_TYPE_REF,KB_ENTRY.PROFILE_REF, KB_ENTRY.ITEM_REF, KB_ENTRY.TITLE, KB_ENTRY.ABSTRACT, KB_ENTRY.SOLUTION, KB_ENTRY.SOLUTION_HTML, KB_ENTRY.FREE_TEXT, KB_ENTRY.DATE_UPDATED, KB_ENTRY.STATUS_REF, KB_ENTRY.CALL_NUMBER, SU_ENTITY_TYPE.DISPLAY ENTRY_TYPE, KB_PROFILE.NAME PROFILE_TYPE, AR_PRIMARY_ASSET.ASSET_REF SERVICE_TYPE, AR_PERSON.FULL_NAME CONTRIBUTOR, IN_SYS_SOURCE.NAME SOURCE, KB_ENTRY_STATUS.NAME STATUS,(SELECT COUNT (CL_KB_REFER.CALL_NUMBER) FROM CL_KB_REFER WHERE CL_KB_REFER.ARTICLE_REF = KB_ENTRY.KNOWLEDGE_REF) LINK_RATE FROM KB_ENTRY, SU_ENTITY_TYPE, KB_PROFILE, AR_PRIMARY_ASSET, AR_PERSON, IN_SYS_SOURCE, KB_ENTRY_STATUS WHERE KB_ENTRY.PARTITION = 1 AND KB_ENTRY.STATUS = 'A' AND AR_PERSON.OFFICER_IND = 1 AND KB_ENTRY.CREATED_BY_REF = AR_PERSON.REF AND KB_ENTRY.SOURCE = IN_SYS_SOURCE.REF AND KB_ENTRY.STATUS_REF = KB_ENTRY_STATUS.REF AND KB_ENTRY_STATUS.STATUS = 'A' AND KB_ENTRY.PROFILE_REF = KB_PROFILE.REF AND KB_ENTRY.ITEM_REF = AR_PRIMARY_ASSET.ITEM_REF AND KB_ENTRY.ENTITY_TYPE = SU_ENTITY_TYPE.REF AND KB_ENTRY.KNOWLEDGE_REF='${dataimporter.delta.ID}' Delta query:select KNOWLEDGE_REF as ID from KB_ENTRY where (DATE_UPDATED gt; '${dataimporter.last_index_time}' OR DATE_CREATED gt; '${dataimporter.last_index_time}') The Problem here is that When I run the full Import ,everything works fine and all the feilds .data are displayed fine in the search However When I run Delta import,for some records the ENTRY_TYPE field is not returned from the database, Let me illustrate it with an example: Search result After running Full Import: Record Name:John Doe Entry ID:500 Entry Type:Worker Search result after running Delta import: Record Name:John Doe Entry ID:500 Entry Type: FYI:I have run the Full and Delta import queries(Though both are the same) on the SQL Server IDE and both return The Entry Type feild correctly. Not sure why the entry Type feild vanishes from Solr when Delta import is run. Any idea why this would happen. Thanks, Aniket
Collations are not using suggestions to build collations
Hi, I want to build collations using suggestions of the query. But collations are building without using suggestions, they are using its own suggesters*(misspellingsAndCorrections)* and don't know from where these suggestions are coming. You can see the result by seeing below response for the query *URL :* http://localhost:8983/solr/wikingram/spell?q=gram_ci:%22kuchi%20kucch%20hota%22wt=jsonindent=trueshards.qt=/spellshards.tolerant=truerows=1 You can see that kuch terms are not in both kuchi and kucch suggestions. But kuch is coming into misspellingsAndCorrections,[ kuchi,kuch, kucch,kuch, hota,hota]]]}} . How it is happening? *Response:* { responseHeader:{ status:0, QTime:3440}, response:{numFound:0,start:0,maxScore:0.0,docs:[] }, spellcheck:{ suggestions:[ kuchi,{ numFound:5, startOffset:9, endOffset:14, origFreq:40, suggestion:[{ word:kochi, freq:976}, { word:k chi, freq:442}, { word:yuchi, freq:71}, { word:kucha, freq:32}, { word:kichi, freq:17}]}, kucch,{ numFound:2, startOffset:15, endOffset:20, origFreq:9, suggestion:[{ word:kutch, freq:231}, { word:kusch, freq:67}]}, correctlySpelled,false, collation,[ collationQuery,gram_ci:\kuch kuch hota\, hits,22, misspellingsAndCorrections,[ kuchi,kuch, kucch,kuch, hota,hota]]]}}
Re: spellcheck.count v/s spellcheck.alternativeTermCount
Any help please? On Tue, Feb 17, 2015 at 4:57 PM, Nitin Solanki nitinml...@gmail.com wrote: Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Re: Block Join Query Parsers regular expression feature workaround req
try to search all children remove those who has a value1 by dash, then join remaining q={!parent which=contentType:parent}contentType:child -contentType:value1 if the space in underneath query causes the problem try to escape it or wrap to v=$subq On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi I need to have a query in which I need to choose only those parent docs none of whose children's field is having the specified value. i.e. I need something like this: http://localhost:8983/solr/core1/select?*q={!parent which=contentType:parent}childField:NOT value1* The problem is* NOT operator is not being supported* in the Block Join Query Parsers. Could anyone please suggest a way to workaround this problem? Have also added the problem on *stackoverflow*: http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature Regards Sankalp Gupta -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Having a spot of trouble setting up /browse
And FYI, out of the box with Solr 5.0, using the data driven config (the default when creating a collection with `bin/solr create -c …`), /browse is wired in by default with no templates explicit in the configuration as they are baked into the VrW library itself. But yeah, what Alexandre said - need to the lib’s included like in Solr’s 4.10.3 example collection1 configuration as well as the conf/velocity files. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Feb 16, 2015, at 8:44 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Velocity libraries and .vm templates as a first step! Did you get those setup? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 16 February 2015 at 19:33, Benson Margulies ben...@basistech.com wrote: So, I had set up a solr core modelled on the 'multicore' example in 4.10.3, which has no /browse. Upon request, I went to set up /browse. I copied in a minimal version. When I go there, I just get some XML back: response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params/ /lst result name=response numFound=0 start=0 maxScore=0.0/ /response What else does /browse depend upon?
Re: Too many merges, stalling...
On 2/16/2015 8:12 PM, ralph tice wrote: Recently I turned on INFO level logging in order to get better insight as to what our Solr cluster is doing. Sometimes as frequently as almost 3 times a second we get messages like: [CMS][qtp896644936-33133]: too many merges; stalling... Less frequently we get: [TMP][commitScheduler-8-thread-1]: seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip: too large] where size is 2500-4900MB. I've trimmed most of your original message, but I will refer to things you have mentioned in the unquoted portion. The first message simply indicates that you have reached more simultaneous merges than CMS is configured to allow (3 by default), so it will stall all of them except one. The javadocs say that the one allowed to run will be the smallest, but I have observed the opposite -- the one that is allowed to run is always the largest. The second message indicates that the merge under consideration would have exceeded the maximum size, which defaults to 5GB, so it refused to do that merge. The mergeFactor setting is deprecated, but still works for now in 4.x releases. The reason your merges are happening so frequently is that you have set this to a low value - 5. Setting it to a larger value will make merges less frequent. The mergeFactor value is used to set maxMergeAtOnce and segmentsPerTier. A proper TieredMergePolicy config will have those two settings (normally set to the same value) as well as maxMergeAtOnceExplicit, which should be set to three times the value of the other two. My config uses 35, 35, and 105 for each of those values, respectively. You can also allow more simultaneous merges in the CMS config. I use a value of 6 here, to avoid lengthy indexing stalls that will kill the DIH connection to MySQL. If the disks are standard spinning magnetic disks, the number of CMS threads should be one. If it's SSD, you can use more threads. Thanks, Shawn
Re: Block Join Query Parsers regular expression feature workaround req
How about find all parents which have at least one child with address:city1 and then not Like (not sure about syntax at all) q=-{!parent which=userid:*}address:city1 17.02.2015, 20:21, Sankalp Gupta sankalp.gu...@snapdeal.com: Hi Mikhail, It won't solve my problem. For ex: Suppose my docs are like this: doc field name=userid1/field doc field name=addresscity1/field /doc doc field name=addresscity2/field /doc /doc doc field name=userid2/field doc field name=addresscity2/field /doc doc field name=addresscity3/field /doc /doc Now if I want* a query to return me all the users not having any address* related to *city1* (i.e. only userid=2 should be in the result)and then if i query: *q={!parent which=userid:*}*:* -address:city1* This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1 is also having a child whose address is city2) , *desired output was userid=2 only.* On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: try to search all children remove those who has a value1 by dash, then join remaining q={!parent which=contentType:parent}contentType:child -contentType:value1 if the space in underneath query causes the problem try to escape it or wrap to v=$subq On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi I need to have a query in which I need to choose only those parent docs none of whose children's field is having the specified value. i.e. I need something like this: http://localhost:8983/solr/core1/select?*q={!parent which=contentType:parent}childField:NOT value1* The problem is* NOT operator is not being supported* in the Block Join Query Parsers. Could anyone please suggest a way to workaround this problem? Have also added the problem on *stackoverflow*: http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature Regards Sankalp Gupta -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Too many merges, stalling...
On 2/17/2015 7:47 AM, Shawn Heisey wrote: The first message simply indicates that you have reached more simultaneous merges than CMS is configured to allow (3 by default), so it will stall all of them except one. The javadocs say that the one allowed to run will be the smallest, but I have observed the opposite -- the one that is allowed to run is always the largest. I have stated some things incorrectly here. The gist of what I wrote is correct, but the details are not. These details are important, especially for those who read this in the archives later. As long as you are below maxMergeCount (default 3) for the number of merges that have been scheduled, the system will simultaneously run up to maxThreads (default 1) merges from that list, and it will ALSO allow the incoming thread (indexing new data) to run. Once you reach maxMergeCount, the incoming thread is stalled until you are back below maxMergeCount, and up to maxThreads merges will be running while the incoming thread is stalled. Thanks, Shawn
Re: spellcheck.count v/s spellcheck.alternativeTermCount
Hi James, How can you say that count doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com wrote: See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain
: Hi, : : You are using /update when registering, but using /update/extract when invoking. : : Ahmet if your goal is that *every* doc will get a last_modified, regarldess of how it is indexed, then you don't need to set the update.chain default on every requestHandler -- instead just mark your updateRequestProcessorChain as the default... updateRequestProcessorChain name=last_modified default=true processor class=solr.TimestampUpdateProcessorFactory str name=fieldNamelast_modified/str /processor ... : : On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow sc...@alumni.rutgers.edu wrote: : Hi, all. I’m trying to insert a field into Solr called last_modified, which holds a timestamp of the update. Since this is a cloud setup, I'm using the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain. : : solrconfig.xml: : : requestHandler name=/update class=solr.UpdateRequestHandler : lst name=defaults : str name=update.chainlast_modified/str : /lst : /requestHandler : : updateRequestProcessorChain name=last_modified : processor class=solr.TimestampUpdateProcessorFactory : str name=fieldNamelast_modified/str : /processor : processor class=solr.LogUpdateProcessorFactory / : processor class=solr.RunUpdateProcessorFactory / : /updateRequestProcessorChain : : : In schema.xml, I have: : : field name=last_modified type=date indexed=true stored=true / : fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ : This is the command I'm using to index: : : curl http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW; -F sc=@1234.txt : However, after indexing, the last_modified field is still not showing up on queries. Is there something else I should be doing? Thanks. : -Hoss http://www.lucidworks.com/
Re: Solr 4.8.1 : Response Code 500 when creating the new request handler
: 1. Look further down in the stack trace for the caused by that details : the specific cause of the exception. : I am still not able to find the cause of this. jack is refering to the log file from your server ... sometimes there are more details there. : Sorry i but don't know it is non-standard approach. please guide me here. I'm not sure what jack was refering to -- i don't see anything non standard about how you have your handler configured. : We are trying to find all the results so we are using q.alt=*:*. : There are some products in our company who wants of find all the results *whose : type is garments* and i forgot to mention we are trying to find only 6 : rows. So using this request handler we are providing the 6 rows. Jack's point here is that you have specified a q.alt in your invariants but you have also specified it in the query params -- which will be totally ignored. what specifically is your goal of haivng that query param in the sample query you tried? As a general debugging tip: Did you try ignoring your custom reuqestHandler, and just running a simple /select query with all of those params specified in the URL? ... it can help to try and narrow down the problem -- in this case, i'm pretty sure you would have gotten the same error, and then the distractions of hte invariants question owuld have been irellevant Looking at the source code for 4.8.1 it appears that the error you are seeing is edismax doing a really bad job of trying to report an error parsing in parsing the qf param -- which you haven't specified at all in your params try { queryFields = DisMaxQParser.parseQueryFields(req.getSchema(), solrParams); // req.getSearcher() here causes searcher refcount imbalance } catch (SyntaxError e) { throw new RuntimeException(); } ..if you add a qf param with the list of fields you want to search, (of a 'df' param to specify a default field) i suspect this error will go away. I filed a bug to fix this terrible code to give a useful error msg in the future... https://issues.apache.org/jira/browse/SOLR-7120 : 3. You have q.alt in invariants, but also in the actual request, which is a : contradiction in terms - what is your actual intent? This isn't the cause : of the exception, but does raise questions of what you are trying to do. : 4. Why don't you have a q parameter for the actual query? : : : -- Jack Krupansky : : On Sat, Feb 14, 2015 at 1:57 AM, Aman Tandon amantandon...@gmail.com : wrote: : : Hi, : : I am using Solr 4.8.1 and when i am creating the new request handler i am : getting the following error: : : *Request Handler config:* : : requestHandler name=my_clothes_data class=solr.SearchHandler : lst name=invariants : str name=defTypeedismax/str : str name=indenton/str : str name=q.alt*:*/str : : float name=tie0.01/float : /lst : : lst name=appends : str name=fqtype:garments/str : /lst : /requestHandler : : *Error:* : : java.lang.RuntimeException at : : : org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.init(ExtendedDismaxQParser.java:1455) :at : : : org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239) :at : : : org.apache.solr.search.ExtendedDismaxQParser.init(ExtendedDismaxQParser.java:108) :at : : : org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37) :at org.apache.solr.search.QParser.getParser(QParser.java:315) at : : : org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144) :at : : : org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) :at : : : org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) :at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at : : : org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) :at : : : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) :at : : : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) :at : : : org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) :at : : : org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) :at : : : org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) :at : : : org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) :at : : : org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) :at : : :
Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote: Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. As the JVM does not free OS memory once allocated, top just shows whatever peak it reached at some point. When you tell the JVM that it is free to use 25GB, it makes a lot of sense to allocate a fair chunk of that instead of garbage collecting if there is a period of high usage (mass indexing for example). What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. If you set your Xmx to a lot less, the JVM will probably favour more frequent garbage collections over extra heap allocation. - Toke Eskildsen, State and University Library, Denmark
RE: unusually high 4.10.2 vs 4.3.1 RAM consumption
We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption Hi, We are currently comparing the RAM consumption of two parallel Solr clusters with different solr versions: 4.10.2 and 4.3.1. For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner. We have not changed the solrconfig.xml to upgrade to 4.10.2 and have reindexed data from scratch. The commits are all controlled on the client, i.e. not auto-commits. Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. This unusual spike happened during mass data indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
;) ok. Currently I'm trying parallel GC options, mentioned here: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/101377 At least the saw-tooth RAM chart is starting to shape up. On Tue, Feb 17, 2015 at 12:55 PM, Markus Jelsma markus.jel...@openindex.io wrote: I would have shared it if i had one :) -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:40 To: solr-user@lucene.apache.org Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption Have you found an explanation to that? On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io wrote: We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption Hi, We are currently comparing the RAM consumption of two parallel Solr clusters with different solr versions: 4.10.2 and 4.3.1. For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner. We have not changed the solrconfig.xml to upgrade to 4.10.2 and have reindexed data from scratch. The commits are all controlled on the client, i.e. not auto-commits. Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. This unusual spike happened during mass data indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
RE: spellcheck.count v/s spellcheck.alternativeTermCount
Here is an example to illustrate what I mean... - query q=text:(life AND hope)spellcheck.count=10spellcheck.alternativeTermCount=5 - suppose at least one document in your dictionary field has life in it - also suppose zero documents in your dictionary field have hope in them - The spellchecker will try to return you up to 10 suggestions for hope, but only up to 5 suggestions for life James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Hi James, How can you say that count doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com wrote: See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, count is the # of suggestions it will return for terms that are *not* in your index/dictionary. alternativeTermCount are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Re: Block Join Query Parsers regular expression feature workaround req
Sankalp, would you mind to post debugQuery=on output, without it it's hard to get what's the problem? However, it's worth to mention that Andrey's suggestion seems really promising. On Tue, Feb 17, 2015 at 8:19 PM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi Mikhail, It won't solve my problem. For ex: Suppose my docs are like this: doc field name=userid1/field doc field name=addresscity1/field /doc doc field name=addresscity2/field /doc /doc doc field name=userid2/field doc field name=addresscity2/field /doc doc field name=addresscity3/field /doc /doc Now if I want* a query to return me all the users not having any address* related to *city1* (i.e. only userid=2 should be in the result)and then if i query: *q={!parent which=userid:*}*:* -address:city1* This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1 is also having a child whose address is city2) , *desired output was userid=2 only.* On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: try to search all children remove those who has a value1 by dash, then join remaining q={!parent which=contentType:parent}contentType:child -contentType:value1 if the space in underneath query causes the problem try to escape it or wrap to v=$subq On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com wrote: Hi I need to have a query in which I need to choose only those parent docs none of whose children's field is having the specified value. i.e. I need something like this: http://localhost:8983/solr/core1/select?*q={!parent which=contentType:parent}childField:NOT value1* The problem is* NOT operator is not being supported* in the Block Join Query Parsers. Could anyone please suggest a way to workaround this problem? Have also added the problem on *stackoverflow*: http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature Regards Sankalp Gupta -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Collations are not working fine.
Hey James Dyer, Sorry for late responding because I went out for couple of days. I have tried out the Rajesh Hazari's configuration which he pasted inside the mail. It seems to be working. I feel that It is working because by reducing the *str name=spellcheck.count25/str *to* str name=spellcheck.count5/str* by which collations come less and spellcheck.maxCollationTries is able to identify or evaluate the collation gone with the wind. But here, the problem is that, hits of gone with the wind are coming less(only 53) *{Look collations.png}* while there are 394 hits for gone with the wind, if I tried the correct phrase in param q=gone with the wind. I got 394 - numFound in response.*{Look response.png}* Any Idea of it? On Fri, Feb 13, 2015 at 11:31 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Re: Release date for Solr 5
Hi, Can i get any developer version to test and run for now. On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net wrote: There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0. If everything goes smooth and the vote passes, the release should happen in about 4-5 days. On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu chaitu...@gmail.com wrote: What is the anticipated release date for Solr 5? -- ckreddybh. chaitu...@gmail.com -- Anshum Gupta http://about.me/anshumgupta -- ckreddybh. chaitu...@gmail.com
Re: Collations are not working fine.
Hey Rajesh, Sorry for late responding because I went out for couple of days. I have tried out the configuration which you sent me. Thanks a lot. It seems to be working. I feel that It is working because by reducing the *str name=spellcheck.count25/str *to* str name=spellcheck.count5/str* by which collations come less and spellcheck.maxCollationTries is able to identify or evaluate the collation gone with the wind. But here, the problem is that, hits of gone with the wind are coming less(only 53) *{Look collations.png}* while there are 394 hits for gone with the wind, if I tried the correct phrase in param q=gone with the wind. I got 394 - numFound in response.*{Look response.png}* Any Idea of it? One more thing to say: You used str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str But It doesn't seems to be working. I tried by removing above 2 lines, it doesn't affect the result. I also changed the value of spellcheck.collateParam.mm to 0% and spellcheck.collateParam.q.op to OR. Even it doesn't affect on the results. I am unable to understand what is spellcheck.collateParam.mm and spellcheck.collateParam.q.op after googling. Will you please assist me? Thanks . On Sat, Feb 14, 2015 at 2:18 AM, Rajesh Hazari rajeshhaz...@gmail.com wrote: Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str
Re: Release date for Solr 5
You can either checkout the release branch and build it yourself from: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_0 or download it from the RC here: http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987 You should remember that this is a release candidate and not a release at this point. On Tue, Feb 17, 2015 at 12:13 AM, CKReddy Bhimavarapu chaitu...@gmail.com wrote: Hi, Can i get any developer version to test and run for now. On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net wrote: There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0. If everything goes smooth and the vote passes, the release should happen in about 4-5 days. On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu chaitu...@gmail.com wrote: What is the anticipated release date for Solr 5? -- ckreddybh. chaitu...@gmail.com -- Anshum Gupta http://about.me/anshumgupta -- ckreddybh. chaitu...@gmail.com -- Anshum Gupta http://about.me/anshumgupta
Sort collation on hits.
Hi, All I want to sort the collations on hits in descending order. How to do ?
Re: Solr suggest is related to second letter, not to initial letter
First of all thank you for your answer. Example Url: doc 1 suggest_field: galaxy samsung s5 phone doc 2 suggest_field: shoe adidas 2 hiking http://localhost:8983/solr/solr/suggest?q=galaxy+s The result for which I am waiting is just like the one indicated below. But; the ‘’Galaxy shoe’’ isn’t supposed to appear. However,unfortunately, the galaxy shoe appears now. lst name=collation str name=collationQuerygalaxy samsung/str int name=hits0/int lst name=misspellingsAndCorrections str name=galaxygalaxy/str str name=samsungsamsung/str /lst /lst lst name=collation str name=collationQuerygalaxy s5/str int name=hits0/int lst name=misspellingsAndCorrections str name=galaxygalaxy/str str name=s5s5/str /lst /lst I don’t want to use KeywordTokenizer. Because, as long as the compound words written by the user are available in any document, I am able to receive a conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an inappropriate suggession and it doesn’t work. Many Thanks Ahead of Time! My settings; searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namedefault/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggestions/str float name=threshold0.1/float str name=buildOnCommittrue/str /lst str name=queryAnalyzerFieldTypesuggest_term/str /searchComponent !-- auto-complete -- requestHandler name=/suggest class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.buildfalse/str str name=spellcheck.dictionarydefault/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=“spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollations10/str str name=spellcheck.maxCollationTries100/str /lst arr name=components strsuggest/str /arr /requestHandler fieldType name=suggest_term class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-PunctuationToSpace.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-PunctuationToSpace.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ApostropheFilterFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ /analyzer /fieldType On 16 Şub 2015, at 03:52, Michael Sokolov msoko...@safaribooksonline.com wrote: StandardTokenizer splits your text into tokens, and the suggester suggests tokens independently. It sounds as if you want the suggestions to be based on the entire text (not just the current word), and that only adjacent words in the original should appear as suggestions. Assuming that's what you are after (it's a little hard to tell from your e-mail -- you might want to clarify by providing a few example of how you *do* want it to work instead of just examples of how you *don't* want it to work), you have a couple of choices: 1) don't use StandardTokenizer, use KeywordTokenizer instead - this will preserve the entire original text and suggest complete texts, rather than words 2) maybe consider using a shingle filter along with standard tokenizer, so that your tokens include multi-word shingles 3) Use a suggester with better support for a statistical language model, like this one: http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html, but to do this you will probably need to do some java programming since it isn't well integrated into solr -Mike On 2/14/2015 3:44 AM, Volkan Altan wrote: Any idea? On 12 Şub 2015, at 11:12, Volkan Altan volkanal...@gmail.com wrote: Hello Everyone, All I want to do with Solr suggester is obtaining the
Re: Collations are not working fine.
Hi Charles, Will you please send the configuration which you tried. It will help to solve my problem. Have you sorted the collations on hits or frequencies of suggestions? If you did than please assist me. On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr
Re: Release date for Solr 5
You can help by testing out the release candidate available from: http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987 Note that this is *NOT* an official release. On Tue, Feb 17, 2015 at 1:43 PM, CKReddy Bhimavarapu chaitu...@gmail.com wrote: Hi, Can i get any developer version to test and run for now. On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net wrote: There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0. If everything goes smooth and the vote passes, the release should happen in about 4-5 days. On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu chaitu...@gmail.com wrote: What is the anticipated release date for Solr 5? -- ckreddybh. chaitu...@gmail.com -- Anshum Gupta http://about.me/anshumgupta -- ckreddybh. chaitu...@gmail.com -- Regards, Shalin Shekhar Mangar.
Re: Weird Solr Replication Slave out of sync
Hi, This sounds quite strange. Do you see any error messages either in the solr admin's replication page or in the master's OR slave's logs? When we had issues with slave replicating from the master, they related to slave running out of disk. I'm sure there could be a bunch of other reasons for failed replication, but those should generally be evident in the logs. On Tue, Feb 17, 2015 at 7:46 AM, Summer Shire shiresum...@gmail.com wrote: Hi All, My master and slave index version and generation is the same yet the index is not in sync because when I execute the same query on both master and slave I see old docs on slave which should not be there. I also tried to fetch a specific indexversion on slave using command=fetchindexindexversion=latestMasterVersion This is very spooky because I do not get any errors on master or slave. Also I see in the logs that the slave is polling the master every 15 mins I was able to find this issue only because I was looking at the specific old document. Now I can manually delete the index folder on slave and restart my slave. But I really want to find out what could be going on. Because these type of issues are going to be hard to find especially when there are on errors. What could be happening. and how can I avoid this from happening ? Thanks, Summer -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Checkout the source Code to the Release Version of Solr?
At this time the latest released version of Solr is 4.10.3. Is there anyway we can get the source code for this release version? I tried to checkout the Solr code from http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the commit log, I see a number of revisions but nothing mention which is the release version. The latest revision being 1657441 on Feb 4. Does this correspond to 4.10.3? If no, then how do I go about getting the source code of 4.10.3. I'm also curious where the version number is embedded i.e. is it in a file somewhere? I want to ensure I am using the released version, and not some bug fixes after the version got released. Thank you in anticipation. -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Collations are not working fine.
Hi Nitin, I was trying many different options for a couple different queries. In fact, I have collations working ok now with the Suggester and WFSTLookup. The problem may have been due to a different dictionary and/or lookup implementation and the specific options I was sending. In general, we're using spellcheck for search suggestions. The Suggester component (vs. Suggester spellcheck implementation), doesn't handle all of our cases. But we can get things working using the spellcheck interface. What gives us particular troubles are the cases where a term may be valid by itself, but also be the start of longer words. The specific terms are acronyms specific to our business. But I'll attempt to show generic examples. E.g. a partial term like fo can expand to fox, fog, etc. and a full term like brown can also expand to something like brownstone. And, yes, the collation brownstone fox is nonsense. But assume, for the sake of argument, it appears in our documents somewhere. For multiple term query with a spelling error (or partially typed term): brown fo We get collations in order of hits, descending like ... brown fox, brown fog, brownstone fox. So far, so good. For a single term query, brown, we get a single suggestion, brownstone and no collations. So, we don't know to keep the term brown! At this point, we need spellcheck.extendedResults=true and look at the origFreq value in the suggested corrections. Unfortunately, the Suggester (spellcheck dictionary) does not populate the original frequency information. And, without this information, the SpellCheckComponent cannot format the extended results. However, with a simple change to Suggester.java, it was easy to get the needed frequency information use it to make a sound decision to keep or drop the input term. But I'd be much obliged if there is a better way to go about it. Configs below. Thanks, Charlie !-- SpellCheck component -- searchComponent class=solr.SpellCheckComponent name=suggestSC lst name=spellchecker str name=namesuggestDictionary/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=fieldtext_all/str float name=threshold0.0001/float str name=exactMatchFirsttrue/str str name=buildOnCommittrue/str /lst /searchComponent !-- Request Handler -- requestHandler name=/tcSuggest class=solr.SearchHandler lst name=defaults str name=titleSearch Suggestions (spellcheck)/str str name=echoParamsexplicit/str str name=wtjson/str str name=rows0/str str name=defTypeedismax/str str name=dftext_all/str str name=flid,name,ticker,entityType,transactionType,accountType/str str name=spellchecktrue/str str name=spellcheck.count5/str str name=spellcheck.dictionarysuggestDictionary/str str name=spellcheck.alternativeTermCount5/str str name=spellcheck.collatetrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations5/str /lst arr name=last-components strsuggestSC/str /arr /requestHandler -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 3:17 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Charles, Will you please send the configuration which you tried. It will help to solve my problem. Have you sorted the collations on hits or frequencies of suggestions? If you did than please assist me. On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float
Re: Checkout the source Code to the Release Version of Solr?
Hi, You can get the released code base here https://github.com/apache/lucene-solr/releases Thanks Hrishikesh On Tue, Feb 17, 2015 at 2:20 PM, O. Olson olson_...@yahoo.it wrote: At this time the latest released version of Solr is 4.10.3. Is there anyway we can get the source code for this release version? I tried to checkout the Solr code from http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the commit log, I see a number of revisions but nothing mention which is the release version. The latest revision being 1657441 on Feb 4. Does this correspond to 4.10.3? If no, then how do I go about getting the source code of 4.10.3. I'm also curious where the version number is embedded i.e. is it in a file somewhere? I want to ensure I am using the released version, and not some bug fixes after the version got released. Thank you in anticipation. -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Checkout the source Code to the Release Version of Solr?
Also the version number is encoded (at least) in the build file https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32 Hope this helps. Thanks Hrishikesh On Tue, Feb 17, 2015 at 2:25 PM, Hrishikesh Gadre gadre.s...@gmail.com wrote: Hi, You can get the released code base here https://github.com/apache/lucene-solr/releases Thanks Hrishikesh On Tue, Feb 17, 2015 at 2:20 PM, O. Olson olson_...@yahoo.it wrote: At this time the latest released version of Solr is 4.10.3. Is there anyway we can get the source code for this release version? I tried to checkout the Solr code from http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the commit log, I see a number of revisions but nothing mention which is the release version. The latest revision being 1657441 on Feb 4. Does this correspond to 4.10.3? If no, then how do I go about getting the source code of 4.10.3. I'm also curious where the version number is embedded i.e. is it in a file somewhere? I want to ensure I am using the released version, and not some bug fixes after the version got released. Thank you in anticipation. -- View this message in context: http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html Sent from the Solr - User mailing list archive at Nabble.com.
Better way of copying/backup of index in Solr 4.10.2
What is the best way for copying/backup of index in Solr 4.10.2? -- Best Regards, Dinesh Naik
unusually high 4.10.2 vs 4.3.1 RAM consumption
Hi, We are currently comparing the RAM consumption of two parallel Solr clusters with different solr versions: 4.10.2 and 4.3.1. For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner. We have not changed the solrconfig.xml to upgrade to 4.10.2 and have reindexed data from scratch. The commits are all controlled on the client, i.e. not auto-commits. Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. This unusual spike happened during mass data indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
Have you found an explanation to that? On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io wrote: We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption Hi, We are currently comparing the RAM consumption of two parallel Solr clusters with different solr versions: 4.10.2 and 4.3.1. For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner. We have not changed the solrconfig.xml to upgrade to 4.10.2 and have reindexed data from scratch. The commits are all controlled on the client, i.e. not auto-commits. Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. This unusual spike happened during mass data indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: Possibility of Indexing without feeding again in Solr 4.10.2
On 17 February 2015 at 15:18, dinesh naik dineshkumarn...@gmail.com wrote: Hi all, How to can do re-indexing in Solr without importing the data again? Is there a way to do re-indexing only for few documents ? If you have a unique ID for your documents, updating the index with that ID will update just that document. Other than that, you need to import all your data again if you want to change the Solr index. Regards, Gora
Possibility of Indexing without feeding again in Solr 4.10.2
Hi all, How to can do re-indexing in Solr without importing the data again? Is there a way to do re-indexing only for few documents ? -- Best Regards, Dinesh Naik
Re: Better way of copying/backup of index in Solr 4.10.2
On 17 February 2015 at 15:19, dinesh naik dineshkumarn...@gmail.com wrote: What is the best way for copying/backup of index in Solr 4.10.2? Please take a look at https://cwiki.apache.org/confluence/display/solr/Backing+Up Regards, Gora
Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
Thanks Toke! Now I consistently see the saw-tooth pattern on two shards with new GC parameters, next I will try your suggestion. The current params are: -Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40 Dmitry On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote: Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. As the JVM does not free OS memory once allocated, top just shows whatever peak it reached at some point. When you tell the JVM that it is free to use 25GB, it makes a lot of sense to allocate a fair chunk of that instead of garbage collecting if there is a period of high usage (mass indexing for example). What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. If you set your Xmx to a lot less, the JVM will probably favour more frequent garbage collections over extra heap allocation. - Toke Eskildsen, State and University Library, Denmark -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
RE: unusually high 4.10.2 vs 4.3.1 RAM consumption
I would have shared it if i had one :) -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:40 To: solr-user@lucene.apache.org Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption Have you found an explanation to that? On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io wrote: We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption Hi, We are currently comparing the RAM consumption of two parallel Solr clusters with different solr versions: 4.10.2 and 4.3.1. For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner. We have not changed the solrconfig.xml to upgrade to 4.10.2 and have reindexed data from scratch. The commits are all controlled on the client, i.e. not auto-commits. Solr: 4.10.2 (high load, mass indexing) Java: 1.7.0_76 (Oracle) -Xmx25600m Solr: 4.3.1 (normal load, no mass indexing) Java: 1.7.0_11 (Oracle) -Xmx25600m The RAM consumption remained the same after the load has stopped on the 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as seen by top remained at 9G level. This unusual spike happened during mass data indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
spellcheck.count v/s spellcheck.alternativeTermCount
Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
Block Join Query Parsers regular expression feature workaround req
Hi I need to have a query in which I need to choose only those parent docs none of whose children's field is having the specified value. i.e. I need something like this: http://localhost:8983/solr/core1/select?*q={!parent which=contentType:parent}childField:NOT value1* The problem is* NOT operator is not being supported* in the Block Join Query Parsers. Could anyone please suggest a way to workaround this problem? Have also added the problem on *stackoverflow*: http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature Regards Sankalp Gupta