Re: Please add to contributers group

2015-02-16 Thread Erick Erickson
Done.


On Mon, Feb 16, 2015 at 1:58 AM, Christian Marquardt
christianmarqua...@gmx.net wrote:
 Hi,

 please add me to the contributers group.

 Username: ChristianMarquardt https://wiki.apache.org/solr/ChristianMarquardt

 Best Regards

 Christian


 Beste Grüße

 Christian Marquardt

 Tannenweg 43
 86391 Stadtbergen
 +49-179-9735764
 christianmarqua...@gmx.net



Re: Solr cloud confusion?

2015-02-16 Thread Erick Erickson
First, if master/slave suits our use-case, there's no reason to go to
SolrCloud.
However, the following are some of the things you get with SolrCloud:

1 automatic document routing (irrelevant if you don't have enough
docs to need more than one shard)

2 automatic fail-over/recovery if nodes go down, without having to
implement re-indexing strategies for docs that might have been indexed
to the master but not yet replicated to the slaves.

3 Near Real Time (NRT) searches on all the nodes. In master/slave
setups you can't search a doc until it's been both indexed, then
replicated which is probably many minutes.

4 Centralized configuration management via Zookeeper.

5+ there's lots more, but these are the high points.

None of these are relevant in all situations. For instance, say you
have an application that re-indexes your entire corpus once a day and
your corpus fits on a single shard, there's no compelling reason to go
to SolrCloud. In fact, there's added complexity you can just avoid.
Adding capacity is just adding slaves as you indicated.

Contrast that with a large collection spanning, say 10 shards that is
continuously getting updated, add the requirement is that the new
documents be searchable within 10 seconds of being received by the
system-of-record. Add that nodes may come up and go down for some
reason you get the idea ;).

SolrCloud is really not about performance measured by query response
time as much as management of large, multi-sharded collections.

Best,
Erick


On Sun, Feb 15, 2015 at 9:05 PM, CKReddy Bhimavarapu
chaitu...@gmail.com wrote:
 Dear all,

 I don't get when and where we should use solr cloud so that we will get
 best performance.  How to know the peak point of solr master server?

 my question is when a solr can't handle the requests i.e if we reach the
 peak point at that time if I increase my master configuration does that
 slove the problem.

 thanks in advance,
 ckreddybh. chaitu...@gmail.com


AND query not working on stopwords as expected

2015-02-16 Thread Arun Rangarajan
Solr version 4.2.1

In my schema, I have text type defined as follows:
---
fieldType name=text class=solr.TextField
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory
preserveOriginal=1 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=0 catenateAll=1
splitOnCaseChange=1/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
  /analyzer

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory
preserveOriginal=1 generateWordParts=1 generateNumberParts=1
catenateWords=0 catenateNumbers=0 catenateAll=0
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
  /analyzer

/fieldType
---

Field name is of type text.

I have another multi-valued int field called all_class_ids.

Both fields are indexed. I have 'of' in stopwords.txt file.

I am using lucene query parser.

This query
q=name:ofrows=0
gives no results as expected.

However, this query:
q=name:of AND all_class_ids:(371)rows=0
gives results and is equal to the same number of results as
q=all_class_ids:(371)rows=0

This is happening only for stopwords. Why?

Thanks.


Re: AND query not working on stopwords as expected

2015-02-16 Thread Yonik Seeley
On Mon, Feb 16, 2015 at 4:32 PM, Arun Rangarajan
arunrangara...@gmail.com wrote:
[...]
 This query
 q=name:ofrows=0
 gives no results as expected.

 However, this query:
 q=name:of AND all_class_ids:(371)rows=0
 gives results and is equal to the same number of results as
 q=all_class_ids:(371)rows=0

 This is happening only for stopwords. Why?

This is more of a full-text search thing.
Removal of stopwords is more like a don't care, it's not important.
Hence a query for a plane should return all documents containing
plane, ignoring the question of if the document contained an a
(which we can't tell since stopwords were removed during indexing).

Now I understand your point about consistency too.  Using the example
above, something like q=name:of should arguably match all documents
(or at least all documents with a name field).  It is very odd to
add an additional restriction and end up with more docs.

-Yonik


Re: AND query not working on stopwords as expected

2015-02-16 Thread Erick Erickson
Query parsing is not strict boolean logic, this trips up many people
even though AND, NOT and OR are used. See:
https://lucidworks.com/blog/why-not-and-or-and-not/
I think what you've really got at the top level is a single MUST
clause, namely all_class_ids:(371).

What is _not_ happening here is a set intersection as it would if the
logic were strictly boolean, and I suspect that expectation is what's
misleading you.

If that's not the case, post the results of adding debug=query to
your URL, that'll help.


Best,
Erick

On Mon, Feb 16, 2015 at 1:32 PM, Arun Rangarajan
arunrangara...@gmail.com wrote:
 Solr version 4.2.1

 In my schema, I have text type defined as follows:
 ---
 fieldType name=text class=solr.TextField
 positionIncrementGap=100

   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=0 catenateAll=1
 splitOnCaseChange=1/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer

   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer

 /fieldType
 ---

 Field name is of type text.

 I have another multi-valued int field called all_class_ids.

 Both fields are indexed. I have 'of' in stopwords.txt file.

 I am using lucene query parser.

 This query
 q=name:ofrows=0
 gives no results as expected.

 However, this query:
 q=name:of AND all_class_ids:(371)rows=0
 gives results and is equal to the same number of results as
 q=all_class_ids:(371)rows=0

 This is happening only for stopwords. Why?

 Thanks.


Custom facet.sort

2015-02-16 Thread Ryan Josal
Hey guys, I have a desire to order (field) facets by their order of
appearance in the search results.

When I first thought about it, I figured there would be some way to plug a
custom Comparator into FacetComponent and link it to facet.sort=rank or
something like that, but not only is there no real way to plug in a custom
sort (nor is subclassing the component feasible), the complexity is further
compounded by the fact faceting really only operates on the docset and so
scores aren't available.  If max(score) was an attribute of a facetcount
object this type of sort could be done.  sum(score) might also be
interesting for a weighted approach.  I can imagine performance concerns
with doing this though.  Operating on the doclist isn't enough because it's
only a slice of the results.  What if I reduce my scope to only needing the
top 2 facets in order?  It still seems to be just as complex because you
have to start from the first page and request an extra long docslice from
QueryComponent by hacking the start/rows params, and for all you know you
need to get to the last document to get all the facets.

So does anyone have any ideas of how to implement this?  Maybe it isn't
even through faceting.

Ryan


Re: Too many merges, stalling...

2015-02-16 Thread Erick Erickson
I guess my first question is why you're splitting up your shards this way. You
may have very good reasons, but as you outline, a huge amount of your work is
on a single shard.

Is it even possible to spread docs randomly instead, thus spread the load
over the entire cluster rather than the hot shard?

As you can probably tell, I don't have much specific to say, but LUCENE-6161
seems quite possible in your situation. Of course turning off deletes would
pinpoint whether that's really the problem.

Best,
Erick

On Mon, Feb 16, 2015 at 7:12 PM, ralph tice ralph.t...@gmail.com wrote:
 We index lots of relatively small documents, minimum of around
 6k/second, but up to 20k/second.  At the same time we are deleting
 400-900 documents a second.  We have our shards organized by time, so
 the bulk of our indexing happens in one 'hot' shard, but deletes can
 go back in time to our epoch.

 Recently I turned on INFO level logging in order to get better insight
 as to what our Solr cluster is doing.  Sometimes as frequently as
 almost 3 times a second we get messages like:
 [CMS][qtp896644936-33133]: too many merges; stalling...

 Less frequently we get:
 [TMP][commitScheduler-8-thread-1]:
 seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip:
 too large]

 where size is 2500-4900MB.

 Am I correct in assuming that CMS is getting overwhelmed by merge
 activity given these log statements?  I notice index data files grow
 up to ~1000 (~50-70GB on disk), where a non-actively indexed shard
 will generally use around ~400-450 data files in this SolrCloud.
 Also, transaction logs tend to accumulate, I suspect in relation to
 how far behind CMS gets.

 We are using default TieredMergePolicy on Solr 4.10.3.  We have
 mergeFactor set to 5.  I notice maxMergedSegmentBytes defaults to 5GB,
 has anyone had any success (or horror stories) trying to tune this
 value?  Should we be looking into custom merge policies at our
 indexing rate?  Any advice for getting better performance out of
 merging, or work in progress in this area? Worst case, are there any
 metrics to look at to monitor these sorts of situations?  It seems
 like I need to parse log files to get a useful set of metrics data...

 One thought I had was deferring deletes until our 'hot' shard rotates
 out of its active indexing time window, I suspect that may make a
 large enough difference but I need to see whether we can satisfy our
 business rule constraints to accommodate this.

 https://issues.apache.org/jira/browse/SOLR-6816 and
 https://issues.apache.org/jira/browse/SOLR-6838 and
 https://issues.apache.org/jira/browse/LUCENE-6161 seem relevant, we
 set ramBufferSizeMB to 256 but I don't know that this is the same
 setting as described in the LUCENE issue.

 Thanks for any thoughts,

 --Ralph


Re: Release date for Solr 5

2015-02-16 Thread Anshum Gupta
There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0.
If everything goes smooth and the vote passes, the release should happen in
about 4-5 days.

On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu chaitu...@gmail.com
wrote:

 What is the anticipated release date for Solr 5?

 --
 ckreddybh. chaitu...@gmail.com




-- 
Anshum Gupta
http://about.me/anshumgupta


Re: AND query not working on stopwords as expected

2015-02-16 Thread Alexandre Rafalovitch
On 16 February 2015 at 19:12, Jack Krupansky jack.krupan...@gmail.com wrote:
 In fact, it would be better to only remove stop words at query time
 when they are not at either end of the query.

And how is that achieved in Solr? This sounds interesting but
stretches my knowledge of the available filters.

Regards,
   Alex.


Sign up for my Solr resources newsletter at http://www.solr-start.com/


Having a spot of trouble setting up /browse

2015-02-16 Thread Benson Margulies
So, I had set up a solr core modelled on the 'multicore' example in 4.10.3,
which has no /browse.

Upon request, I went to set up /browse.

I copied in a minimal version. When I go there, I just get some XML back:

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params/
/lst
result name=response numFound=0 start=0 maxScore=0.0/
/response

What else does /browse depend upon?


Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
Notice that I said would be rather than is!

Yeah, Solr is basically broken WRT intelligent stop word handling, but
nobody wants to admit it. edismax does have some limited support for the
case of the query being all stop words, but that doesn't work for more
complex queries with operators and the case of a leading or trailing
stopword. The old Lucid query parser did have better support for queries
with stop words, but that's no longer available in their current product.

-- Jack Krupansky

On Mon, Feb 16, 2015 at 8:16 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 On 16 February 2015 at 19:12, Jack Krupansky jack.krupan...@gmail.com
 wrote:
  In fact, it would be better to only remove stop words at query time
  when they are not at either end of the query.

 And how is that achieved in Solr? This sounds interesting but
 stretches my knowledge of the available filters.

 Regards,
Alex.

 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/



Re: AND query not working on stopwords as expected

2015-02-16 Thread Alexandre Rafalovitch
Well, there is CommonGrams and CommonGramsQuery filters (e.g.
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html
). But I haven't seen them in use much.

If the description above (about the first/last token) would actually
be useful, it is probably implementable in a similar fashion. But it
would need a bunch of use-cases and/or reference material to check the
benefits against.

Or it could be just a matter of marking first/last token as keywords.
Well, would have been if StopWordFilter actually checked for
*isKeyword()*. It does not seem to.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 16 February 2015 at 20:33, Jack Krupansky jack.krupan...@gmail.com wrote:
 Notice that I said would be rather than is!

 Yeah, Solr is basically broken WRT intelligent stop word handling, but
 nobody wants to admit it. edismax does have some limited support for the
 case of the query being all stop words, but that doesn't work for more
 complex queries with operators and the case of a leading or trailing
 stopword. The old Lucid query parser did have better support for queries
 with stop words, but that's no longer available in their current product.

 -- Jack Krupansky

 On Mon, Feb 16, 2015 at 8:16 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 On 16 February 2015 at 19:12, Jack Krupansky jack.krupan...@gmail.com
 wrote:
  In fact, it would be better to only remove stop words at query time
  when they are not at either end of the query.

 And how is that achieved in Solr? This sounds interesting but
 stretches my knowledge of the available filters.

 Regards,
Alex.

 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/



Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
Specifically what is happening is that the query parser passes of to the
analyzer for the name field, which removes the stopwords, including of,
which results in no term to be queried. A Lucene BooleanQuery with no terms
will match... nothing. But then when you add another clause, you have the
combination of an empty term, and a specific term, which is equivalent to
just using the specific term. Think of a sequence of terms to be ANDed as a
set - if a term analyzing to no terms, there are no terms to add to the set
of terms to be ANDed.

Diving a little deeper, the AND operator of the two terms simply means
that all terms MUST be present, but since your first term analyzed to no
terms, only one term is present.

Another example where this could happen is a query such as $,@. AND 371 -
the $,@. gets parsed as a term, but then all the punctuation gets removed
by the analyzer, leaving no term.

These days, the recommended practice is to keep stopwords in the index but
remove them at query time unless all of the terms in the query are stop
words. In fact, it would be better to only remove stop words at query time
when they are not at either end of the query. This way, queries such as to
be or not to be, vitamin a, and the office can still provide
meaningful and precise matches even as stop words are generally ignored.



-- Jack Krupansky

On Mon, Feb 16, 2015 at 4:32 PM, Arun Rangarajan arunrangara...@gmail.com
wrote:

 Solr version 4.2.1

 In my schema, I have text type defined as follows:
 ---
 fieldType name=text class=solr.TextField
 positionIncrementGap=100

   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=0 catenateAll=1
 splitOnCaseChange=1/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer

   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer

 /fieldType
 ---

 Field name is of type text.

 I have another multi-valued int field called all_class_ids.

 Both fields are indexed. I have 'of' in stopwords.txt file.

 I am using lucene query parser.

 This query
 q=name:ofrows=0
 gives no results as expected.

 However, this query:
 q=name:of AND all_class_ids:(371)rows=0
 gives results and is equal to the same number of results as
 q=all_class_ids:(371)rows=0

 This is happening only for stopwords. Why?

 Thanks.



Re: Having a spot of trouble setting up /browse

2015-02-16 Thread Alexandre Rafalovitch
Velocity libraries and .vm templates as a first step! Did you get those setup?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 16 February 2015 at 19:33, Benson Margulies ben...@basistech.com wrote:
 So, I had set up a solr core modelled on the 'multicore' example in 4.10.3,
 which has no /browse.

 Upon request, I went to set up /browse.

 I copied in a minimal version. When I go there, I just get some XML back:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime4/int
 lst name=params/
 /lst
 result name=response numFound=0 start=0 maxScore=0.0/
 /response

 What else does /browse depend upon?


RE: Collations are not working fine.

2015-02-16 Thread Reitzel, Charles
I have been working with collations the last couple days and I kept adding the 
collation-related parameters until it started working for me.   It seems I 
needed str name=spellcheck.collateMaxCollectDocs50/str.  

But, I am using the Suggester with the WFSTLookupFactory.

Also, I needed to patch the suggester to get frequency information in the 
spellcheck response.

-Original Message-
From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] 
Sent: Friday, February 13, 2015 3:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Nitin,

Can u try with the below config, we have these config seems to be working for 
us.

searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetext_general/str


  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldtextSpell/str
str name=combineWordstrue/str
str name=breakWordsfalse/str
int name=maxChanges5/int
  /lst

   lst name=spellchecker
str name=namedefault/str
str name=fieldtextSpell/str
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=accuracy0.75/str
float name=thresholdTokenFrequency0.01/float
str name=buildOnCommittrue/str
str name=spellcheck.maxResultsForSuggest5/str
 /lst


  /searchComponent



str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
int name=spellcheck.count5/int
str name=spellcheck.alternativeTermCount15/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultstrue/str
str name =spellcheck.maxCollations100/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateParam.q.opAND/str
str name=spellcheck.maxCollationTries1000/str


*Rajesh.*

On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 Nitin,

 Can you post the full spellcheck response when you query:

 q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Friday, February 13, 2015 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi James Dyer,
   I did the same as you told me. Used 
 WordBreakSolrSpellChecker instead of shingles. But still collations 
 are not coming or working.
 For instance, I tried to get collation of gone with the wind by 
 searching gone wthh thes wint on field=gram_ci but didn't succeed. 
 Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as 
 *wind*.
 Also I have documents which contains gone with the wind having 167 
 times in the documents. I don't know that I am missing something or not.
 Please check my below solr configuration:

 *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes 
 wintwt=jsonindent=trueshards.qt=/spell

 *solrconfig.xml:*

 searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetextSpellCi/str
 lst name=spellchecker
   str name=namedefault/str
   str name=fieldgram_ci/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
   int name=maxEdits2/int
   int name=minPrefix0/int
   int name=maxInspections5/int
   int name=minQueryLength2/int
   float name=maxQueryFrequency0.9/float
   str name=comparatorClassfreq/str
 /lst
 lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldgram/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges5/int
 /lst
 /searchComponent

 requestHandler name=/spell class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=dfgram_ci/str
   str name=spellcheck.dictionarydefault/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count25/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest1/str
   str name=spellcheck.alternativeTermCount25/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations50/str
   str name=spellcheck.maxCollationTries50/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
   /requestHandler

 *Schema.xml: *

 field name=gram_ci type=textSpellCi indexed=true stored=true
 multiValued=false/

 /fieldTypefieldType name=textSpellCi class=solr.TextField
 positionIncrementGap=100
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer 

Too many merges, stalling...

2015-02-16 Thread ralph tice
We index lots of relatively small documents, minimum of around
6k/second, but up to 20k/second.  At the same time we are deleting
400-900 documents a second.  We have our shards organized by time, so
the bulk of our indexing happens in one 'hot' shard, but deletes can
go back in time to our epoch.

Recently I turned on INFO level logging in order to get better insight
as to what our Solr cluster is doing.  Sometimes as frequently as
almost 3 times a second we get messages like:
[CMS][qtp896644936-33133]: too many merges; stalling...

Less frequently we get:
[TMP][commitScheduler-8-thread-1]:
seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip:
too large]

where size is 2500-4900MB.

Am I correct in assuming that CMS is getting overwhelmed by merge
activity given these log statements?  I notice index data files grow
up to ~1000 (~50-70GB on disk), where a non-actively indexed shard
will generally use around ~400-450 data files in this SolrCloud.
Also, transaction logs tend to accumulate, I suspect in relation to
how far behind CMS gets.

We are using default TieredMergePolicy on Solr 4.10.3.  We have
mergeFactor set to 5.  I notice maxMergedSegmentBytes defaults to 5GB,
has anyone had any success (or horror stories) trying to tune this
value?  Should we be looking into custom merge policies at our
indexing rate?  Any advice for getting better performance out of
merging, or work in progress in this area? Worst case, are there any
metrics to look at to monitor these sorts of situations?  It seems
like I need to parse log files to get a useful set of metrics data...

One thought I had was deferring deletes until our 'hot' shard rotates
out of its active indexing time window, I suspect that may make a
large enough difference but I need to see whether we can satisfy our
business rule constraints to accommodate this.

https://issues.apache.org/jira/browse/SOLR-6816 and
https://issues.apache.org/jira/browse/SOLR-6838 and
https://issues.apache.org/jira/browse/LUCENE-6161 seem relevant, we
set ramBufferSizeMB to 256 but I don't know that this is the same
setting as described in the LUCENE issue.

Thanks for any thoughts,

--Ralph


Re: Solr 4.8.1 : Response Code 500 when creating the new request handler

2015-02-16 Thread Aman Tandon
Dear Jack,

1. Look further down in the stack trace for the caused by that details
 the specific cause of the exception.


I am still not able to find the cause of this.

2. Please explain in plain English what you are really trying to do with
 this non-standard approach - why aren't you just using the normal request
 handler?

Sorry i but don't know it is non-standard approach. please guide me here.

 3. You have q.alt in invariants, but also in the actual request, which is a
 contradiction in terms - what is your actual intent? This isn't the cause
 of the exception, but does raise questions of what you are trying to do.


We are trying to find all the results so we are using q.alt=*:*.
There are some products in our company who wants of find all the results *whose
type is garments* and i forgot to mention we are trying to find only 6
rows. So using this request handler we are providing the 6 rows.

4. Why don't you have a q parameter for the actual query?

In start some of the product is using the solr  as the database, so we will
stop these type of queries.

With Regards
Aman Tandon

On Sun, Feb 15, 2015 at 6:25 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 1. Look further down in the stack trace for the caused by that details
 the specific cause of the exception.
 2. Please explain in plain English what you are really trying to do with
 this non-standard approach - why aren't you just using the normal request
 handler?
 3. You have q.alt in invariants, but also in the actual request, which is a
 contradiction in terms - what is your actual intent? This isn't the cause
 of the exception, but does raise questions of what you are trying to do.
 4. Why don't you have a q parameter for the actual query?


 -- Jack Krupansky

 On Sat, Feb 14, 2015 at 1:57 AM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  I am using Solr 4.8.1 and when i am creating the new request handler i am
  getting the following error:
 
  *Request Handler config:*
 
  requestHandler name=my_clothes_data class=solr.SearchHandler
  lst name=invariants
  str name=defTypeedismax/str
  str name=indenton/str
  str name=q.alt*:*/str
 
  float name=tie0.01/float
  /lst
 
  lst name=appends
  str name=fqtype:garments/str
  /lst
  /requestHandler
 
  *Error:*
 
  java.lang.RuntimeException at
  
 
 org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.init(ExtendedDismaxQParser.java:1455)
   at
  
 
 org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
   at
  
 
 org.apache.solr.search.ExtendedDismaxQParser.init(ExtendedDismaxQParser.java:108)
   at
  
 
 org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
   at org.apache.solr.search.QParser.getParser(QParser.java:315) at
  
 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
   at
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
   at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
   at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
   at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
   at
  
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
   at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
   at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at
  
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at
  
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
   at
  
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:368) at
  
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
   at
  
 
 

Weird Solr Replication Slave out of sync

2015-02-16 Thread Summer Shire
Hi All,

My master and slave index version and generation is the same
yet the index is not in sync because when I execute the same query 
on both master and slave I see old docs on slave which should not be there.

I also tried to fetch a specific indexversion on slave using 
command=fetchindexindexversion=latestMasterVersion

This is very spooky because I do not get any errors on master or slave.
Also I see in the logs that the slave is polling the master every 15 mins 
I was able to find this issue only because I was looking at the specific old 
document.

Now I can manually delete the index folder on slave and restart my slave.
But I really want to find out what could be going on. Because these type of 
issues are going to 
be hard to find especially when there are on errors.

What could be happening. and how can I avoid this from happening ?


Thanks,
Summer



Release date for Solr 5

2015-02-16 Thread CKReddy Bhimavarapu
What is the anticipated release date for Solr 5?

-- 
ckreddybh. chaitu...@gmail.com


Please add to contributers group

2015-02-16 Thread Christian Marquardt
Hi,

please add me to the contributers group.

Username: ChristianMarquardt https://wiki.apache.org/solr/ChristianMarquardt

Best Regards

Christian


Beste Grüße

Christian Marquardt

Tannenweg 43
86391 Stadtbergen
+49-179-9735764
christianmarqua...@gmx.net