Re: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-13 Thread Nitin Solanki
Hi O. Klein,
How you sorted the suggestion on frequency?  I think, You
used str name=comparatorClass freq /str to sort suggestions of
frequency. Are you using sharding/multiple servers in Solr because on
single node, comparatorClass is working but on multiple servers, it is not
working. Please assist me how to sort on frequency on multiple
server/shards.

On Wed, Feb 11, 2015 at 12:56 AM, O. Klein kl...@octoweb.nl wrote:

 I did some testing and the order of dictionaries doesn't seem to have an
 effect. They are sorted by frequency. So if mm was applied holy wood
 would
 have a lower frequency and solve this problem.

   suggestions:[
   holywood,{
 numFound:4,
 startOffset:0,
 endOffset:8,
 origFreq:4,
 suggestion:[{
 word:holy wood,
 freq:71828},
   {
 word:hollywood,
 freq:2669},
   {
 word:holyrood,
 freq:14},
   {
 word:homewood,
 freq:737}]},
   correctlySpelled,false,
   collation,(holy wood),
   collation,hollywood]}}



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185461.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-13 Thread iNikkz
Hi O. Klein, 
How you sorted the suggestion on frequency?  I think, You
used str name=comparatorClass freq /str to sort suggestions of
frequency. Are you using sharding/multiple servers in Solr because on single
node, comparatorClass is working but on multiple servers, it is not working.
Please assist me how to sort on frequency on multiple server/shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4186206.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: variaton on boosting recent documents gives exception

2015-02-13 Thread Gonzalo Rodriguez
Hello Michael,

You can always change the type of your sortyear field to an int, or create an 
int version of it and use copyField to populate it.

And using NOW/YEAR will round the current date to the start of the year, you 
can read more about this in the Javadoc: 
http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/util/DateMathParser.html

You can test it using the example collection: 
http://localhost:8983/solr/collection1/select?q=*:*boost=recip(ms(NOW/YEAR,manufacturedate_dt),3.16e-11,1,1)fl=id,manufacturedate_dt,score,[explain]defType=edismax
 and checking the explain field for the numeric value given to NOW/YEAR vs 
NOW/HOUR, etc.


Gonzalo

-Original Message-
From: Michael Lackhoff [mailto:mich...@lackhoff.de] 
Sent: Thursday, February 12, 2015 8:57 AM
To: solr-user@lucene.apache.org
Subject: variaton on boosting recent documents gives exception

Since my field to measure recency is not a date field but a string field (with 
only year-numbers in it), I tried a variation on the suggested boost function 
for recent documents:
  recip(sub(2015,min(sortyear,2015)),1,10,10)
But this gives an exception when used in a boost or bf parameter.
I guess the reason is that all the mathematics doesn't work with a string field 
even if it only contains numbers. Am I right with this guess? And if so, is 
there a function I can use to change the type to something numeric? Or are 
there other problems with my function?

Another related question: as you can see the current year (2015) is hard coded. 
Is there an easy way to get the current year within the function?
Messing around with NOW looks very complicated.

-Michael


Re: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-13 Thread O. Klein
I am using the default on single node, which is frequency.

On the Wiki it says: In case of a distributed request to the
SpellCheckComponent, the shards are requested for at least five suggestions
even if the spellcheck.count parameter value is less than five. Once the
suggestions are collected, they are ranked by the configured distance
measure (Levenstein Distance by default) and then by aggregate frequency.

So for distributed this is different. Maybe James knows how to get the
behavior you are looking for.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4186214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stopwords in shingles suggester

2015-02-13 Thread O. Klein
I found the issue in Jira https://issues.apache.org/jira/browse/SOLR-6468


O. Klein wrote
 With more and more people starting to use the Suggester it seems that
 enablePositionIncrements for StopFilterFactory is still needed.
 
 Not sure why it is being removed from Solr5, but is there a way to keep
 the functionality beyond lucene 4.3 ? Or can this feature be reinstated?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057p4186219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr - Mahout

2015-02-13 Thread mohamed . sahad
Sir ,
   I need to know solr have bulletin recommendation (collaborative filtering) 
is there or their is it possible to add recommender with out using Mahout. i 
kindly request give a fast replay 


Your 
Mohamed Sahad K P



Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Gili Nachum
Thanks Otis, can you confirm that a commit call will wait for merges to
complete before returning?

On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 If you are using Solr and SPM for Solr, you can check a report that shows
 the # of files in an index and the report that shows you the max docs-num
 docs delta.  If you see the # of files drop during a commit, that's a
 merge.  If you see a big delta change, that's probably a merge, too.

 You could also jstack or kill -3 the JVM and see where it's spending its
 time to give you some ideas what's going on inside.

 HTH.

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote:

  Hello,
 
  During a load test I noticed a commit that took 43 seconds to complete
  (client hard complete).
  Is this to be expected? What's causing it?
  I have a pair of machines hosting a 128M docs collection (8 shards,
  replication factor=2).
 
  Could it be merges? In Lucene merges happen async of commit statements,
 but
  reading Solr's doc for Update Hanlder
  
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig
  
  it sounds like hard commits do wait for merges to occur: * The tradeoff
 is
  that a soft commit gives you faster visibility because it's not waiting
 for
  background merges to finish.*
  Thanks.
 



Re: Multi words query

2015-02-13 Thread Scott Stults
 A couple more things would help debug this. First, could you grab the
specific Solr log entry when this query is sent? Also, have you changed the
default schema at all? If you're querying string fields you have to
exactly match what's indexed there, versus text which gets tokenized.


k/r,
Scott

On Thu, Feb 12, 2015 at 4:22 AM, melb melaggo...@gmail.com wrote:

 I am using rub gem rsolr and querying simply the collection by this query:

 response = solr.get 'select', :params = {
   :q=query,
   :fl= 'id,title,description,body'
   :rows=10
 }

 response[response][docs].each{|doc| puts doc[id] }

 I created a text field to copy all the fields to and the query handler
 request this field

 rgds,



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625p4185922.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Scott Stults | Founder  Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: bulk indexing with optimistick lock

2015-02-13 Thread Scott Stults
This isn't a Solr-specific answer, but the easiest approach might be to
just collect the document IDs you're about to add, query for them, and then
filter out the ones Solr already has (this'll give you a nice list for
later reporting). You'll need to keep your batch sizes below
maxBooleanClauses in solrconfig.xml.

Overall, this might be simpler to maintain and less prone to bugs.

k/r,
Scott

On Wed, Feb 11, 2015 at 4:59 AM, Sankalp Gupta sankalp.gu...@snapdeal.com
wrote:

 Hi All,
 My server side we are trying to add multiple documents in a list and then
 ask solr to add them in solr (using solrj client) and then after its
 finished calling the commit.
 Now we also want to control concurrency and for that we wanted to use
 solr's optimistic lock/versioning feature. That is good but *in case of
 bulk docs add, the solr doesn't perform add docs as expected.* It fails as
 soon as it finds any doc with optimistic lock failure and return response
 telling only the first failed doc (adding all docs before that and no docs
 are added after that). *We require solr to add all docs for which no
 versioning problem is there and return list of all failed docs. *
 Please can anyone suggest a way to do this?

 Regards
 Sankalp Gupta




-- 
Scott Stults | Founder  Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: variaton on boosting recent documents gives exception

2015-02-13 Thread Michael Lackhoff
Am 13.02.2015 um 11:18 schrieb Gonzalo Rodriguez:

 You can always change the type of your sortyear field to an int, or create an 
 int version of it and use copyField to populate it.

But that would require me to reindex. Would be nice to have some type
conversion available within a function query.

 And using NOW/YEAR will round the current date to the start of the year, you 
 can read more about this in the Javadoc: 
 http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/util/DateMathParser.html
 
 You can test it using the example collection: 
 http://localhost:8983/solr/collection1/select?q=*:*boost=recip(ms(NOW/YEAR,manufacturedate_dt),3.16e-11,1,1)fl=id,manufacturedate_dt,score,[explain]defType=edismax
  and checking the explain field for the numeric value given to NOW/YEAR vs 
 NOW/HOUR, etc.

The definition of *_dt fields int the example-schema is 'date' but my
field is text or (t)int if I have to reindex.

To compare against this int field I need another (comparable) int.
ms(NOW/YEAR,manufacturedate_dt) is an int, but a huge one, which is very
difficult to bring into a sensible relationship to e.g. '2015'.

Your suggestion would only work if I change my year to a date like
2015-01-01T00:00:00Z which is not a sensible format for a publication
year and not even easily creatable by copyfield.

What I need is a real year number, not a date truncated to the year,
which is only accessible as the number of milliseconds since the epoch
of Jan, 1st 00:00:00h, which is not very handy.

-Michael


Dovecot FTS Solr Error [urgent/serious]

2015-02-13 Thread Kevin Laurie
Hi Guys,
Serious help requested with dovecot and apache solr. Appreciate if
someone could see the solr log outputs and tell me whats going wrong.


Problem:
Dovecot adm keeps reporting error as shown below:-

root@mail:/var/log# doveadm index -u u...@domain.net inbox
doveadm(t...@sicl.net): Error: fts_solr: Indexing failed: Server Error

The log suggests as follows:-
http://pastebin.com/KSvignc9

My system settings:

solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21
lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56
Physical Memory 98.8%
Swap Space 0.0%
File Descriptor Count 2.3%
JVM-Memory 5.4%


I login to my server as follows:-

kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net

I start my server with following command:

:/opt/solr# java -jar start.jar

Startup log is showing as follows:-
http://pastebin.com/EYVJ06rL

It keeps indicating u...@domain.net  (This was the user name that was
part of a command passed from doveadm as follows):-

doveadm index -u u...@domain.net inbox


Could someone tell me what is happening in the log(can the apache solr
read the request from dovecot correctly or is this some schema problem
or what?) ?

Thanks

Kevin


Re: Dovecot FTS Solr Error [urgent/serious]

2015-02-13 Thread Erick Erickson
Look at the admin UI screen, the overview screen has this in the
lower-right corner.

To add more memory, you can start it like this:

java -Xmx4G -Xms4G -jar start.jar

Best,
Erick

On Fri, Feb 13, 2015 at 10:24 AM, Kevin Laurie
superinterstel...@gmail.com wrote:
 Hi,
 how do i check the heap size for Solr java?

 i am not very well versed with java, only using it for my mail server so
 appreciate if you could help
 thanks
 Kevin

 On Saturday, February 14, 2015, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 According to the logs, the Solr process has run out of memory and therefore
 it won't accept any more writes. What is the heap size for the Solr java
 process?

 The u...@domain.net javascript:; in the logs is a red herring. Those
 just seem to be
 part of the document's id.



 On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie 
 superinterstel...@gmail.com javascript:;
 wrote:

  Hi Guys,
  Serious help requested with dovecot and apache solr. Appreciate if
  someone could see the solr log outputs and tell me whats going wrong.
 
 
  Problem:
  Dovecot adm keeps reporting error as shown below:-
 
  root@mail:/var/log# doveadm index -u u...@domain.net javascript:;
 inbox
  doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing
 failed: Server Error
 
  The log suggests as follows:-
  http://pastebin.com/KSvignc9
 
  My system settings:
 
  solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21
  lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56
  Physical Memory 98.8%
  Swap Space 0.0%
  File Descriptor Count 2.3%
  JVM-Memory 5.4%
 
 
  I login to my server as follows:-
 
  kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net
 javascript:;
 
  I start my server with following command:
 
  :/opt/solr# java -jar start.jar
 
  Startup log is showing as follows:-
  http://pastebin.com/EYVJ06rL
 
  It keeps indicating u...@domain.net javascript:;  (This was the user
 name that was
  part of a command passed from doveadm as follows):-
 
  doveadm index -u u...@domain.net javascript:; inbox
 
 
  Could someone tell me what is happening in the log(can the apache solr
  read the request from dovecot correctly or is this some schema problem
  or what?) ?
 
  Thanks
 
  Kevin
 



 --
 Regards,
 Shalin Shekhar Mangar.



RE: Collations are not working fine.

2015-02-13 Thread Dyer, James
Nitin,

Can you post the full spellcheck response when you query:

q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, February 13, 2015 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi James Dyer,
  I did the same as you told me. Used
WordBreakSolrSpellChecker instead of shingles. But still collations are not
coming or working.
For instance, I tried to get collation of gone with the wind by searching
gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
Also I have documents which contains gone with the wind having 167 times
in the documents. I don't know that I am missing something or not.
Please check my below solr configuration:

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
wintwt=jsonindent=trueshards.qt=/spell

*solrconfig.xml:*

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpellCi/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldgram_ci/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix0/int
  int name=maxInspections5/int
  int name=minQueryLength2/int
  float name=maxQueryFrequency0.9/float
  str name=comparatorClassfreq/str
/lst
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldgram/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges5/int
/lst
/searchComponent

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count25/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.maxResultsForSuggest1/str
  str name=spellcheck.alternativeTermCount25/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations50/str
  str name=spellcheck.maxCollationTries50/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

*Schema.xml: *

field name=gram_ci type=textSpellCi indexed=true stored=true
multiValued=false/

/fieldTypefieldType name=textSpellCi class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


Re: Dovecot FTS Solr Error [urgent/serious]

2015-02-13 Thread Kevin Laurie
Hi,
how do i check the heap size for Solr java?

i am not very well versed with java, only using it for my mail server so
appreciate if you could help
thanks
Kevin

On Saturday, February 14, 2015, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 According to the logs, the Solr process has run out of memory and therefore
 it won't accept any more writes. What is the heap size for the Solr java
 process?

 The u...@domain.net javascript:; in the logs is a red herring. Those
 just seem to be
 part of the document's id.



 On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie 
 superinterstel...@gmail.com javascript:;
 wrote:

  Hi Guys,
  Serious help requested with dovecot and apache solr. Appreciate if
  someone could see the solr log outputs and tell me whats going wrong.
 
 
  Problem:
  Dovecot adm keeps reporting error as shown below:-
 
  root@mail:/var/log# doveadm index -u u...@domain.net javascript:;
 inbox
  doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing
 failed: Server Error
 
  The log suggests as follows:-
  http://pastebin.com/KSvignc9
 
  My system settings:
 
  solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21
  lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56
  Physical Memory 98.8%
  Swap Space 0.0%
  File Descriptor Count 2.3%
  JVM-Memory 5.4%
 
 
  I login to my server as follows:-
 
  kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net
 javascript:;
 
  I start my server with following command:
 
  :/opt/solr# java -jar start.jar
 
  Startup log is showing as follows:-
  http://pastebin.com/EYVJ06rL
 
  It keeps indicating u...@domain.net javascript:;  (This was the user
 name that was
  part of a command passed from doveadm as follows):-
 
  doveadm index -u u...@domain.net javascript:; inbox
 
 
  Could someone tell me what is happening in the log(can the apache solr
  read the request from dovecot correctly or is this some schema problem
  or what?) ?
 
  Thanks
 
  Kevin
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Index directory containing only segments.gen

2015-02-13 Thread Zisis Tachtsidis
Erick Erickson wrote
 OK, I think this is the root of your problem:
 
 bq:  Everything was setup using the - now deprecated - tags 
 cores
 and 
 core
   inside solr.xml.
 
 There are a bunch of ways this could go wrong. I'm pretty sure you
 have something that would take quite a while to untangle, so unless
 you have a _very_ good reason for making this work, I'd blow
 everything away.

I've started playing with SolrCloud before the new solr.xml made its
appearance (in the example files of 4.4 distribution If I'm not mistaken)
and since it was classified only as deprecated I decided to postpone the
transition to the new solr.xml for the migration to Solr 5.0. Anyway, what
you are saying is that the use of the new solrcloud-friendly configuration
file is accompanied by changes in SolrCloud behavior?


Erick Erickson wrote
 If you're using an external Zookeeper shut if off and, 'rm -rf
 /tmp/zookeeper'. If using embedded, you can remove zoo_data under your
 SOLR_HOME.

Do you mean getting rid of Zookeeper snapshot and transcation logs,
basically clearing things and removing zknodes like clusterstate.json,
overseer and the like?


Erick Erickson wrote
 OK, now use the Collections API to create your collection, see:
 https://cwiki.apache.org/confluence/display/solr/Collections+API and
 go from there (don't forget to push your configs to Zookeeper first)
 and go from there.

I've successfully tried your proposed approach using the new solr.xml but
I've bypassed the collections API and added core.properties files inside my
collection directories. Directories contain no other files and configuration
has been preloaded into Zookeeper. I prefer to have everything ready before
starting the Solr servers. Do you see anything unusual there?

One last thing, what exactly is HttpShardHandlerFactory responsible for?
Because there was no such definition in the deprecated solr.xml I was using.

Thanks Erick,
Zisis T.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index directory containing only segments.gen

2015-02-13 Thread Erick Erickson
Zisis:

It's not so much that the behavior has changed, and it's still
perfectly possible to use old-style solr.xml. Rather,
it's that there has been quite a lot of hardening how SolrCloud
creates things. I realize that the Collections API
is more of a black box and you have to take on faith that it's doing
the right thing. That said, the API was
written by people deeply involved in the guts of how SolrCloud works
and has some safeguards built in.

bq: Do you mean getting rid of Zookeeper snapshot and transcation logs,
basically clearing things and removing zknodes like clusterstate.json,
overseer and the like?

This is just a recipe for how I completely blow away Zookeeper's
knowledge of the system so I'm _sure_
nothing's hanging around. Note that I frequently bounce around from
one thing to another rather than maintain
a single system, so I'm pretty cavalier about this. Mostly I've spent
too much time being hammered by something
I'd forgotten I changed when moving from one problem to another. I
mean I'll spin up Solr 5.x, one or two
Solr 4.x versions and maybe trunk over the course of a day while
working on various problems; it's easy to lose track.
Of course you wouldn't want to resort to this in a real environment.
rm -rf /tmp/zookeeper' is just an incantation
I often use ;)

But yes, that's what's going on. The clusterstate and all
non-ephemeral nodes are just gone. There are more
sophisticated ways to do this that aren't so blunt if you'd prefer. Do
be a bit aware, though, that if the replicas
are still on the Solr nodes, they can re-register themselves after you
blow away the Zookeeper info.

bq: Do you see anything unusual there.
Unusual, but not necessarily bad. The collections API takes much of
the guesswork out of this though. For instance,
are you quite sure you're naming each replica such that there are no
collisions? Note that you can also specify
what nodes the leaders and replicas go on, and you can script this if
using the Collections API. Ditto with adding
replicas (of course this latter came in later than 4.4 IIRC).

Not intimately familiar with HttpShardHandlerFactory, but on a quick
glance it's handling pooling threads for
sub-requests to other shards. May be way off base here.

Erick

On Fri, Feb 13, 2015 at 9:21 AM, Zisis Tachtsidis zist...@runbox.com wrote:
 Erick Erickson wrote
 OK, I think this is the root of your problem:

 bq:  Everything was setup using the - now deprecated - tags
 cores
 and
 core
   inside solr.xml.

 There are a bunch of ways this could go wrong. I'm pretty sure you
 have something that would take quite a while to untangle, so unless
 you have a _very_ good reason for making this work, I'd blow
 everything away.

 I've started playing with SolrCloud before the new solr.xml made its
 appearance (in the example files of 4.4 distribution If I'm not mistaken)
 and since it was classified only as deprecated I decided to postpone the
 transition to the new solr.xml for the migration to Solr 5.0. Anyway, what
 you are saying is that the use of the new solrcloud-friendly configuration
 file is accompanied by changes in SolrCloud behavior?


 Erick Erickson wrote
 If you're using an external Zookeeper shut if off and, 'rm -rf
 /tmp/zookeeper'. If using embedded, you can remove zoo_data under your
 SOLR_HOME.

 Do you mean getting rid of Zookeeper snapshot and transcation logs,
 basically clearing things and removing zknodes like clusterstate.json,
 overseer and the like?


 Erick Erickson wrote
 OK, now use the Collections API to create your collection, see:
 https://cwiki.apache.org/confluence/display/solr/Collections+API and
 go from there (don't forget to push your configs to Zookeeper first)
 and go from there.

 I've successfully tried your proposed approach using the new solr.xml but
 I've bypassed the collections API and added core.properties files inside my
 collection directories. Directories contain no other files and configuration
 has been preloaded into Zookeeper. I prefer to have everything ready before
 starting the Solr servers. Do you see anything unusual there?

 One last thing, what exactly is HttpShardHandlerFactory responsible for?
 Because there was no such definition in the deprecated solr.xml I was using.

 Thanks Erick,
 Zisis T.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186316.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dovecot FTS Solr Error [urgent/serious]

2015-02-13 Thread Shalin Shekhar Mangar
According to the logs, the Solr process has run out of memory and therefore
it won't accept any more writes. What is the heap size for the Solr java
process?

The u...@domain.net in the logs is a red herring. Those just seem to be
part of the document's id.



On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie superinterstel...@gmail.com
wrote:

 Hi Guys,
 Serious help requested with dovecot and apache solr. Appreciate if
 someone could see the solr log outputs and tell me whats going wrong.


 Problem:
 Dovecot adm keeps reporting error as shown below:-

 root@mail:/var/log# doveadm index -u u...@domain.net inbox
 doveadm(t...@sicl.net): Error: fts_solr: Indexing failed: Server Error

 The log suggests as follows:-
 http://pastebin.com/KSvignc9

 My system settings:

 solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21
 lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56
 Physical Memory 98.8%
 Swap Space 0.0%
 File Descriptor Count 2.3%
 JVM-Memory 5.4%


 I login to my server as follows:-

 kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net

 I start my server with following command:

 :/opt/solr# java -jar start.jar

 Startup log is showing as follows:-
 http://pastebin.com/EYVJ06rL

 It keeps indicating u...@domain.net  (This was the user name that was
 part of a command passed from doveadm as follows):-

 doveadm index -u u...@domain.net inbox


 Could someone tell me what is happening in the log(can the apache solr
 read the request from dovecot correctly or is this some schema problem
 or what?) ?

 Thanks

 Kevin




-- 
Regards,
Shalin Shekhar Mangar.


Solr and UIMA, capturing fields

2015-02-13 Thread Tom Devel
Hi,

I successfully combined Solr and UIMA with the help of
https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to
provide some help about how to reach this step).

Right now I can run an analysis engine and get some primitive
feature/fields which I specify in the schema.xml automatically recognized
by Solr. But if the features itself are objects, I do not know how to
capture them in Solr.

I provided the relevant solrconfig.xml in [1], and the schema.xml addition
in [2] for the following small example, they are using the AE directly
provided by the UIMA example.

With the input This is a sentence with an email at u...@host.com, Solr
correctly adds the field:

UIMAname: [
  36
]

since this is the index where the email token starts. I could also
successfully capture the feature
str name=featureend/str to indicate where the found email token ends.

However, example.EmailAddress has the features: begin, end, sofa. sofa is
not a primitive feature, but an object which itself has features
sofaNum, sofaID, sofaString, ...

How can I access fields in Solr from an annotation like
example.EmailAddress that are not simple strings but itself objects?

I made an image of the CAS Visual Debugger with this AE and the sentence to
show which fields I mean, I hope this makes it more clear:
http://tinypic.com/view.php?pic=34rud1ss=8#.VN5bF7s2cWN

Does anyone know how to access such fields with Solr and UIMA?

Thanks a lot for any help,
Tom


[1]
  updateRequestProcessorChain name=uima default=true
processor
class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
  lst name=uimaConfig
lst name=runtimeParameters
/lst
str
name=analysisEngine/home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml/str
!-- Set to true if you want to continue indexing even if text
processing fails.
 Default is false. That is, Solr throws RuntimeException and
 never indexed documents entirely in your session. --
bool name=ignoreErrorsfalse/bool
!-- This is optional. It is used for logging when text processing
fails.
 If logField is not specified, uniqueKey will be used as
logField.
str name=logFieldid/str
--
str name=logFieldid/str
lst name=analyzeFields
  bool name=mergefalse/bool
  arr name=fields
strtext/str
  /arr
/lst
lst name=fieldMappings
  lst name=type
str name=nameexample.EmailAddress/str
lst name=mapping
  str name=featurebegin/str
  str name=fieldUIMAname/str
/lst
  /lst
/lst
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

[2]
field name=UIMAname type=string indexed=true stored=true
multiValued=true required=false/


Re: Collations are not working fine.

2015-02-13 Thread Rajesh Hazari
Hi Nitin,

Can u try with the below config, we have these config seems to be working
for us.

searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetext_general/str


  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldtextSpell/str
str name=combineWordstrue/str
str name=breakWordsfalse/str
int name=maxChanges5/int
  /lst

   lst name=spellchecker
str name=namedefault/str
str name=fieldtextSpell/str
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=accuracy0.75/str
float name=thresholdTokenFrequency0.01/float
str name=buildOnCommittrue/str
str name=spellcheck.maxResultsForSuggest5/str
 /lst


  /searchComponent



str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
int name=spellcheck.count5/int
str name=spellcheck.alternativeTermCount15/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultstrue/str
str name =spellcheck.maxCollations100/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateParam.q.opAND/str
str name=spellcheck.maxCollationTries1000/str


*Rajesh.*

On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 Nitin,

 Can you post the full spellcheck response when you query:

 q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Friday, February 13, 2015 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi James Dyer,
   I did the same as you told me. Used
 WordBreakSolrSpellChecker instead of shingles. But still collations are not
 coming or working.
 For instance, I tried to get collation of gone with the wind by searching
 gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
 getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
 Also I have documents which contains gone with the wind having 167 times
 in the documents. I don't know that I am missing something or not.
 Please check my below solr configuration:

 *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
 wintwt=jsonindent=trueshards.qt=/spell

 *solrconfig.xml:*

 searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetextSpellCi/str
 lst name=spellchecker
   str name=namedefault/str
   str name=fieldgram_ci/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
   int name=maxEdits2/int
   int name=minPrefix0/int
   int name=maxInspections5/int
   int name=minQueryLength2/int
   float name=maxQueryFrequency0.9/float
   str name=comparatorClassfreq/str
 /lst
 lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldgram/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges5/int
 /lst
 /searchComponent

 requestHandler name=/spell class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=dfgram_ci/str
   str name=spellcheck.dictionarydefault/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count25/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest1/str
   str name=spellcheck.alternativeTermCount25/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations50/str
   str name=spellcheck.maxCollationTries50/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
   /requestHandler

 *Schema.xml: *

 field name=gram_ci type=textSpellCi indexed=true stored=true
 multiValued=false/

 /fieldTypefieldType name=textSpellCi class=solr.TextField
 positionIncrementGap=100
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType



Re: Dovecot FTS Solr Error [urgent/serious]

2015-02-13 Thread Kevin Laurie
Hi Erik,
Below link is my admin_ui output:

http://postimg.org/image/4rbubv54x/

Could you confirm if the problem if my system is short of memory?
Also my VPS system runs on 2GB ram, should I increase the ram/processor?

Thanks
Kevin

On Sat, Feb 14, 2015 at 2:27 AM, Erick Erickson erickerick...@gmail.com wrote:
 Look at the admin UI screen, the overview screen has this in the
 lower-right corner.

 To add more memory, you can start it like this:

 java -Xmx4G -Xms4G -jar start.jar

 Best,
 Erick

 On Fri, Feb 13, 2015 at 10:24 AM, Kevin Laurie
 superinterstel...@gmail.com wrote:
 Hi,
 how do i check the heap size for Solr java?

 i am not very well versed with java, only using it for my mail server so
 appreciate if you could help
 thanks
 Kevin

 On Saturday, February 14, 2015, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 According to the logs, the Solr process has run out of memory and therefore
 it won't accept any more writes. What is the heap size for the Solr java
 process?

 The u...@domain.net javascript:; in the logs is a red herring. Those
 just seem to be
 part of the document's id.



 On Fri, Feb 13, 2015 at 11:06 PM, Kevin Laurie 
 superinterstel...@gmail.com javascript:;
 wrote:

  Hi Guys,
  Serious help requested with dovecot and apache solr. Appreciate if
  someone could see the solr log outputs and tell me whats going wrong.
 
 
  Problem:
  Dovecot adm keeps reporting error as shown below:-
 
  root@mail:/var/log# doveadm index -u u...@domain.net javascript:;
 inbox
  doveadm(t...@sicl.net javascript:;): Error: fts_solr: Indexing
 failed: Server Error
 
  The log suggests as follows:-
  http://pastebin.com/KSvignc9
 
  My system settings:
 
  solr-spec4.10.2 solr-impl4.10.2 1634293 - mike - 2014-10-26 05:56:21
  lucene-spec4.10.2 lucene-impl4.10.2 1634293 - mike - 2014-10-26 05:51:56
  Physical Memory 98.8%
  Swap Space 0.0%
  File Descriptor Count 2.3%
  JVM-Memory 5.4%
 
 
  I login to my server as follows:-
 
  kevin-MBP:~ kevin$ ssh -t -L 8983:localhost:8983 ad...@server.net
 javascript:;
 
  I start my server with following command:
 
  :/opt/solr# java -jar start.jar
 
  Startup log is showing as follows:-
  http://pastebin.com/EYVJ06rL
 
  It keeps indicating u...@domain.net javascript:;  (This was the user
 name that was
  part of a command passed from doveadm as follows):-
 
  doveadm index -u u...@domain.net javascript:; inbox
 
 
  Could someone tell me what is happening in the log(can the apache solr
  read the request from dovecot correctly or is this some schema problem
  or what?) ?
 
  Thanks
 
  Kevin
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Jack Krupansky
I wasn't able to follow Otis' answer but... the purpose of commit is to
make make recent document changes (since the last commit) visible to
queries, and has nothing to do with merging of segments. IOW, take the new
segment that is being created and not yet ready for use by query, and
finish it so that query can access it. Soft commit vs. hard commit is
simply a matter of whether Solr will wait for the I/O to write the new
segment to disk to complete. Merging is an independent, background
procedure (thread) that merges existing segments. It does seem odd that the
cited doc does say that soft commit waits for background merges! (Hoss??)

-- Jack Krupansky

On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Check
 http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user

 e.g. http://search-lucene.com/m/QTPa7Sqx81

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote:

  Thanks Otis, can you confirm that a commit call will wait for merges to
  complete before returning?
 
  On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   If you are using Solr and SPM for Solr, you can check a report that
 shows
   the # of files in an index and the report that shows you the max
 docs-num
   docs delta.  If you see the # of files drop during a commit, that's a
   merge.  If you see a big delta change, that's probably a merge, too.
  
   You could also jstack or kill -3 the JVM and see where it's spending
 its
   time to give you some ideas what's going on inside.
  
   HTH.
  
   Otis
   --
   Monitoring * Alerting * Anomaly Detection * Centralized Log Management
   Solr  Elasticsearch Support * http://sematext.com/
  
  
   On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com
  wrote:
  
Hello,
   
During a load test I noticed a commit that took 43 seconds to
 complete
(client hard complete).
Is this to be expected? What's causing it?
I have a pair of machines hosting a 128M docs collection (8 shards,
replication factor=2).
   
Could it be merges? In Lucene merges happen async of commit
 statements,
   but
reading Solr's doc for Update Hanlder

   
  
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig

it sounds like hard commits do wait for merges to occur: * The
  tradeoff
   is
that a soft commit gives you faster visibility because it's not
 waiting
   for
background merges to finish.*
Thanks.
   
  
 



Re: Solr - Mahout

2015-02-13 Thread Jack Krupansky
There is no recommendation built into Solr itself, but you might get some
good ideas from this presentation:
http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine


-- Jack Krupansky

On Fri, Feb 13, 2015 at 8:33 AM, mohamed.sa...@experionglobal.com wrote:

 Sir ,
I need to know solr have bulletin recommendation (collaborative
 filtering) is there or their is it possible to add recommender with out
 using Mahout. i kindly request give a fast replay


 Your
 Mohamed Sahad K P




Solr scoring confusion

2015-02-13 Thread Scott Johnson
We are getting inconsistent scoring results in Solr. It works about 95% of
the time, where a search on one term returns the results which equal exactly
that one term at the top, and results with multiple terms that also contain
that one term are returned lower. Occasionally, however, if a subset of the
data has been re-indexed (the same data just added to the index again) then
the results will be slightly off, for example the data from the earlier
index will get a higher score than it should, until we re-index all the
data.

 

Our assumption here is that setting omitNorms to false, then indexing the
data, then searching, should result in scores where the data with an exact
match has a higher score. We usually see this but not always. Is something
added to the score besides the value that is being searched that we are not
understaning?

 

Thanks.

..
Scott Johnson
Data Advantage Group, Inc.

604 Mission Street 
San Francisco, CA 94105 
Office:   +1.415.947.0400 x204
Fax:  +1.415.947.0401

Take the first step towards a successful
meta data initiative with MetaCenter - 
the only plug and play, real-time 
meta data solution.http://www.dag.com/ www.dag.com 
..

 



Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Otis Gospodnetic
Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user

e.g. http://search-lucene.com/m/QTPa7Sqx81

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote:

 Thanks Otis, can you confirm that a commit call will wait for merges to
 complete before returning?

 On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

  If you are using Solr and SPM for Solr, you can check a report that shows
  the # of files in an index and the report that shows you the max docs-num
  docs delta.  If you see the # of files drop during a commit, that's a
  merge.  If you see a big delta change, that's probably a merge, too.
 
  You could also jstack or kill -3 the JVM and see where it's spending its
  time to give you some ideas what's going on inside.
 
  HTH.
 
  Otis
  --
  Monitoring * Alerting * Anomaly Detection * Centralized Log Management
  Solr  Elasticsearch Support * http://sematext.com/
 
 
  On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com
 wrote:
 
   Hello,
  
   During a load test I noticed a commit that took 43 seconds to complete
   (client hard complete).
   Is this to be expected? What's causing it?
   I have a pair of machines hosting a 128M docs collection (8 shards,
   replication factor=2).
  
   Could it be merges? In Lucene merges happen async of commit statements,
  but
   reading Solr's doc for Update Hanlder
   
  
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig
   
   it sounds like hard commits do wait for merges to occur: * The
 tradeoff
  is
   that a soft commit gives you faster visibility because it's not waiting
  for
   background merges to finish.*
   Thanks.
  
 



Re: How to make SolrCloud more elastic

2015-02-13 Thread Otis Gospodnetic
Hi Matt,

See:
http://search-lucene.com/?q=query+routingfc_project=Solr
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper matt.kui...@issinc.com wrote:

 Otis,

 Thanks for your reply.  I see your point about too many shards and search
 efficiency.  I also agree that I need to get a better handle on customer
 requirements and expected loads.

 Initially I figured that with the shard splitting option, I would need to
 double my Solr nodes every time I split (as I would want to split every
 shard within the collection).  Where actually only the number of shards
 would double, and then I would have the opportunity to rebalance the shards
 over the existing Solr nodes plus a number of new nodes that make sense at
 the time.  This may be preferable to defining many micro shards up front.

 The time-base collections may be an option for this project.  I am not
 familiar with query routing, can you point me to any documentation on how
 this might be implemented?

 Thanks,
 Matt

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Wednesday, February 11, 2015 9:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to make SolrCloud more elastic

 Hi Matt,

 You could create extra shards up front, but if your queries are fanned out
 to all of them, you can run into situations where there are too many
 concurrent queries per node causing lots of content switching and
 ultimately being less efficient than if you had fewer shards.  So while
 this is an approach to take, I'd personally first try to run tests to see
 how much a single node can handle in terms of volume, expected query rates,
 and target latency, and then use monitoring/alerting/whatever-helps tools
 to keep an eye on the cluster so that when you start approaching the target
 limits you are ready with additional nodes and shard splitting if needed.

 Of course, if your data and queries are such that newer documents are
 queries   more, you should look into time-based collections... and if your
 queries can only query a subset of data you should look into query routing.

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com
 wrote:

  I am starting a new project and one of the requirements is that Solr
  must scale to handle increasing load (both search performance and index
 size).
 
  My understanding is that one way to address search performance is by
  adding more replicas.
 
  I am more concerned about handling a growing index size.  I have
  already been given some good input on this topic and am considering a
  shard splitting approach, but am more focused on a rebalancing
  approach that includes defining many shards up front and then moving
  these existing shards on to new Solr servers as needed.  Plan to
  experiment with this approach first.
 
  Before I got too deep, I wondered if anyone has any tips or warnings
  on these approaches, or has scaled Solr in a different manner.
 
  Thanks,
  Matt
 



Re: Solr scoring confusion

2015-02-13 Thread Otis Gospodnetic
Hi Scott,

Try optimizing after reindexing and this should go away. Had to do with 
updated/deleted docs participating in score computation.

Otis
 

 On Feb 13, 2015, at 18:29, Scott Johnson sjohn...@dag.com wrote:
 
 We are getting inconsistent scoring results in Solr. It works about 95% of
 the time, where a search on one term returns the results which equal exactly
 that one term at the top, and results with multiple terms that also contain
 that one term are returned lower. Occasionally, however, if a subset of the
 data has been re-indexed (the same data just added to the index again) then
 the results will be slightly off, for example the data from the earlier
 index will get a higher score than it should, until we re-index all the
 data.
 
 
 
 Our assumption here is that setting omitNorms to false, then indexing the
 data, then searching, should result in scores where the data with an exact
 match has a higher score. We usually see this but not always. Is something
 added to the score besides the value that is being searched that we are not
 understaning?
 
 
 
 Thanks.
 
 ..
 Scott Johnson
 Data Advantage Group, Inc.
 
 604 Mission Street 
 San Francisco, CA 94105 
 Office:   +1.415.947.0400 x204
 Fax:  +1.415.947.0401
 
 Take the first step towards a successful
 meta data initiative with MetaCenter - 
 the only plug and play, real-time 
 meta data solution.http://www.dag.com/ www.dag.com 
 ..
 
 
 


Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Erick Erickson
Exactly how are you issuing the commit? I'm assuming you're
using SolrJ. the server.commit(whatever, true) waits for the searcher
to be opened before returning. This includes (I believe) warmup
times. It could be that the warmup times are huge in your case, the
solr logs should show you the autowarm times for a new searcher.

Best,
Erick

On Fri, Feb 13, 2015 at 2:53 PM, Jack Krupansky
jack.krupan...@gmail.com wrote:
 I wasn't able to follow Otis' answer but... the purpose of commit is to
 make make recent document changes (since the last commit) visible to
 queries, and has nothing to do with merging of segments. IOW, take the new
 segment that is being created and not yet ready for use by query, and
 finish it so that query can access it. Soft commit vs. hard commit is
 simply a matter of whether Solr will wait for the I/O to write the new
 segment to disk to complete. Merging is an independent, background
 procedure (thread) that merges existing segments. It does seem odd that the
 cited doc does say that soft commit waits for background merges! (Hoss??)

 -- Jack Krupansky

 On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Check
 http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user

 e.g. http://search-lucene.com/m/QTPa7Sqx81

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote:

  Thanks Otis, can you confirm that a commit call will wait for merges to
  complete before returning?
 
  On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   If you are using Solr and SPM for Solr, you can check a report that
 shows
   the # of files in an index and the report that shows you the max
 docs-num
   docs delta.  If you see the # of files drop during a commit, that's a
   merge.  If you see a big delta change, that's probably a merge, too.
  
   You could also jstack or kill -3 the JVM and see where it's spending
 its
   time to give you some ideas what's going on inside.
  
   HTH.
  
   Otis
   --
   Monitoring * Alerting * Anomaly Detection * Centralized Log Management
   Solr  Elasticsearch Support * http://sematext.com/
  
  
   On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com
  wrote:
  
Hello,
   
During a load test I noticed a commit that took 43 seconds to
 complete
(client hard complete).
Is this to be expected? What's causing it?
I have a pair of machines hosting a 128M docs collection (8 shards,
replication factor=2).
   
Could it be merges? In Lucene merges happen async of commit
 statements,
   but
reading Solr's doc for Update Hanlder

   
  
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig

it sounds like hard commits do wait for merges to occur: * The
  tradeoff
   is
that a soft commit gives you faster visibility because it's not
 waiting
   for
background merges to finish.*
Thanks.
   
  
 



Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Timothy Potter
I think Mark found something similar -
https://issues.apache.org/jira/browse/SOLR-6838

On Sat, Feb 14, 2015 at 2:05 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Exactly how are you issuing the commit? I'm assuming you're
 using SolrJ. the server.commit(whatever, true) waits for the searcher
 to be opened before returning. This includes (I believe) warmup
 times. It could be that the warmup times are huge in your case, the
 solr logs should show you the autowarm times for a new searcher.

 Best,
 Erick

 On Fri, Feb 13, 2015 at 2:53 PM, Jack Krupansky
 jack.krupan...@gmail.com wrote:
  I wasn't able to follow Otis' answer but... the purpose of commit is to
  make make recent document changes (since the last commit) visible to
  queries, and has nothing to do with merging of segments. IOW, take the
 new
  segment that is being created and not yet ready for use by query, and
  finish it so that query can access it. Soft commit vs. hard commit is
  simply a matter of whether Solr will wait for the I/O to write the new
  segment to disk to complete. Merging is an independent, background
  procedure (thread) that merges existing segments. It does seem odd that
 the
  cited doc does say that soft commit waits for background merges! (Hoss??)
 
  -- Jack Krupansky
 
  On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Check
  http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user
 
  e.g. http://search-lucene.com/m/QTPa7Sqx81
 
  Otis
  --
  Monitoring * Alerting * Anomaly Detection * Centralized Log Management
  Solr  Elasticsearch Support * http://sematext.com/
 
 
  On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com
 wrote:
 
   Thanks Otis, can you confirm that a commit call will wait for merges
 to
   complete before returning?
  
   On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic 
   otis.gospodne...@gmail.com wrote:
  
If you are using Solr and SPM for Solr, you can check a report that
  shows
the # of files in an index and the report that shows you the max
  docs-num
docs delta.  If you see the # of files drop during a commit, that's
 a
merge.  If you see a big delta change, that's probably a merge, too.
   
You could also jstack or kill -3 the JVM and see where it's spending
  its
time to give you some ideas what's going on inside.
   
HTH.
   
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log
 Management
Solr  Elasticsearch Support * http://sematext.com/
   
   
On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com
   wrote:
   
 Hello,

 During a load test I noticed a commit that took 43 seconds to
  complete
 (client hard complete).
 Is this to be expected? What's causing it?
 I have a pair of machines hosting a 128M docs collection (8
 shards,
 replication factor=2).

 Could it be merges? In Lucene merges happen async of commit
  statements,
but
 reading Solr's doc for Update Hanlder
 

   
  
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig
 
 it sounds like hard commits do wait for merges to occur: * The
   tradeoff
is
 that a soft commit gives you faster visibility because it's not
  waiting
for
 background merges to finish.*
 Thanks.