Re: q and logical operators.

2014-09-15 Thread John Nielsen
Thanks for the heads up.

On Fri, Sep 12, 2014 at 5:48 PM, Erick Erickson erickerick...@gmail.com
wrote:

 John:

 Glad it worked. Bit a little careful with large slops. As the slop
 increases, you approach the same result set as

 vis AND dis AND dur

 so choosing the appropriate slop is something of a balancing act

 Best,
 Erick

 On Fri, Sep 12, 2014 at 2:10 AM, John Nielsen j...@mcb.dk wrote:
  I didn't know about sloppy queries. This is great stuff!
 
  I solved it with a qs=100.
 
  Thank you for the help.
 
 
 
  On Thu, Sep 11, 2014 at 11:36 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  just skimmed, but:
 
  bq:  I would get a hit for vis dis dur, but vis dur dis no longer
  returns anything. This is not an option for me
 
  Would slop help here? i.e. vis dur dis~3 or some such?
 
  Best
  Erick
 
  On Thu, Sep 11, 2014 at 4:34 AM, John Nielsen j...@mcb.dk wrote:
   q and logical operators.
  
   Hi all,
  
   I have a strange problem which seems to stomp my google-fu skills.
  
   We have a webshop which has a solr based search mechanism which allows
   customers to search for products based on a range of different fields,
   including item numbers. I recently added a feature which allows users
 who
   are logged in to search for custom item numbers which are associated
 with
   that user. What this means in practical terms is that when a user logs
  in,
   the solr search query has to look in one extra field compared to when
 the
   user is not logged in.
  
   The standard non-logged in search query looks like this (I only
 included
   the relevant first part of the query.):
   http://
  
 
 secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort
  
   When doing the same search while logged in, the query looks like this:
   http://
  
 
 secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort+OR+customer_5266762_product_number_string:Visitkort+display+Durable+4+rum+til+240+kort
  
   Here I add an extra field, customer_5266762_product_number_string
  (5266762
   being the logged in users internal ID), basically including the same
  search
   tearm two times.
  
   The above examples work beautifully when searching for a specific item
   number stored in the customer_5266762_product_number_string. The
 problem
  is
   that when a user is logged in and want to do regular searches, the
 system
   begins to break down. In the specific example above, I expect to get a
   single hit for a product with the title Visitkort display Durable 4
 rum
   til 240 kort. It works as expected with the first non-logged-in
 example.
   The second logged-in example returns over 7000 hits. I would expect
 it to
   return just one hit since there is nothing relevant in the
   customer_5266762_product_number_string for this query.
  
   Now, the following is where my brain begins to melt down.
  
   I discovered that if you put the search text in quotation marks, it
 will
   work as expected, but doing so breaks another loved feature we have:
  
   If i want a hit on the product named Visitkort display Durable 4 rum
 til
   240 kort, I could do a search for vis dis dur, and it would show
 up. I
   could also get a hit if i write vis dur dis, changing the orden of
 the
   words. If i put the search query in quotation marks, I break that
   capability. I would get a hit for vis dis dur, but vis dur dis no
   longer returns anything. This is not an option for me.
  
   It is entirely posible that there is a better way of implementing this
  and
   fortunately, a rewrite is possible at this time. If my basic approach
 is
   correct and I just don't understand how to construct my query
 correctly,
  an
   RTFM pointer will be most welcome!
  
   --
   Med venlig hilsen / Best regards
  
   *John Nielsen*
   Programmer
  
  
  
   *MCB A/S*
   Enghaven 15
   DK-7500 Holstebro
  
   Kundeservice: +45 9610 2824
   p...@mcb.dk
   www.mcb.dk
 
 
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: q and logical operators.

2014-09-12 Thread John Nielsen
I didn't know about sloppy queries. This is great stuff!

I solved it with a qs=100.

Thank you for the help.



On Thu, Sep 11, 2014 at 11:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

 just skimmed, but:

 bq:  I would get a hit for vis dis dur, but vis dur dis no longer
 returns anything. This is not an option for me

 Would slop help here? i.e. vis dur dis~3 or some such?

 Best
 Erick

 On Thu, Sep 11, 2014 at 4:34 AM, John Nielsen j...@mcb.dk wrote:
  q and logical operators.
 
  Hi all,
 
  I have a strange problem which seems to stomp my google-fu skills.
 
  We have a webshop which has a solr based search mechanism which allows
  customers to search for products based on a range of different fields,
  including item numbers. I recently added a feature which allows users who
  are logged in to search for custom item numbers which are associated with
  that user. What this means in practical terms is that when a user logs
 in,
  the solr search query has to look in one extra field compared to when the
  user is not logged in.
 
  The standard non-logged in search query looks like this (I only included
  the relevant first part of the query.):
  http://
 
 secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort
 
  When doing the same search while logged in, the query looks like this:
  http://
 
 secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort+OR+customer_5266762_product_number_string:Visitkort+display+Durable+4+rum+til+240+kort
 
  Here I add an extra field, customer_5266762_product_number_string
 (5266762
  being the logged in users internal ID), basically including the same
 search
  tearm two times.
 
  The above examples work beautifully when searching for a specific item
  number stored in the customer_5266762_product_number_string. The problem
 is
  that when a user is logged in and want to do regular searches, the system
  begins to break down. In the specific example above, I expect to get a
  single hit for a product with the title Visitkort display Durable 4 rum
  til 240 kort. It works as expected with the first non-logged-in example.
  The second logged-in example returns over 7000 hits. I would expect it to
  return just one hit since there is nothing relevant in the
  customer_5266762_product_number_string for this query.
 
  Now, the following is where my brain begins to melt down.
 
  I discovered that if you put the search text in quotation marks, it will
  work as expected, but doing so breaks another loved feature we have:
 
  If i want a hit on the product named Visitkort display Durable 4 rum til
  240 kort, I could do a search for vis dis dur, and it would show up. I
  could also get a hit if i write vis dur dis, changing the orden of the
  words. If i put the search query in quotation marks, I break that
  capability. I would get a hit for vis dis dur, but vis dur dis no
  longer returns anything. This is not an option for me.
 
  It is entirely posible that there is a better way of implementing this
 and
  fortunately, a rewrite is possible at this time. If my basic approach is
  correct and I just don't understand how to construct my query correctly,
 an
  RTFM pointer will be most welcome!
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


q and logical operators.

2014-09-11 Thread John Nielsen
q and logical operators.

Hi all,

I have a strange problem which seems to stomp my google-fu skills.

We have a webshop which has a solr based search mechanism which allows
customers to search for products based on a range of different fields,
including item numbers. I recently added a feature which allows users who
are logged in to search for custom item numbers which are associated with
that user. What this means in practical terms is that when a user logs in,
the solr search query has to look in one extra field compared to when the
user is not logged in.

The standard non-logged in search query looks like this (I only included
the relevant first part of the query.):
http://
secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort

When doing the same search while logged in, the query looks like this:
http://
secret/solr/11731_Danish/search?defType=edismaxq=Visitkort+display+Durable+4+rum+til+240+kort+OR+customer_5266762_product_number_string:Visitkort+display+Durable+4+rum+til+240+kort

Here I add an extra field, customer_5266762_product_number_string (5266762
being the logged in users internal ID), basically including the same search
tearm two times.

The above examples work beautifully when searching for a specific item
number stored in the customer_5266762_product_number_string. The problem is
that when a user is logged in and want to do regular searches, the system
begins to break down. In the specific example above, I expect to get a
single hit for a product with the title Visitkort display Durable 4 rum
til 240 kort. It works as expected with the first non-logged-in example.
The second logged-in example returns over 7000 hits. I would expect it to
return just one hit since there is nothing relevant in the
customer_5266762_product_number_string for this query.

Now, the following is where my brain begins to melt down.

I discovered that if you put the search text in quotation marks, it will
work as expected, but doing so breaks another loved feature we have:

If i want a hit on the product named Visitkort display Durable 4 rum til
240 kort, I could do a search for vis dis dur, and it would show up. I
could also get a hit if i write vis dur dis, changing the orden of the
words. If i put the search query in quotation marks, I break that
capability. I would get a hit for vis dis dur, but vis dur dis no
longer returns anything. This is not an option for me.

It is entirely posible that there is a better way of implementing this and
fortunately, a rewrite is possible at this time. If my basic approach is
correct and I just don't understand how to construct my query correctly, an
RTFM pointer will be most welcome!

-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Strange relevance scoring

2014-04-08 Thread John Nielsen
Hi,

We are seeing a strange phenomenon with our Solr setup which I have been
unable to answer.

My Google-fu is clearly not up to the task, so I am trying here.

It appears that if i do a freetext search for a single word, say modellering
on a text field, the scoring is massively boosted if the first word of the
text field is a hit.

For instance if there is only one occurrence of the word modellering in
the text field and that occurrence is the first word of the text, then that
document gets a higher relevancy than if the word modelling occurs 5
times in the text and the first word of the text is any other word.

Is this normal behavior? Is special attention paid to the first word in a
text field? I would think that the latter case would get the highest score.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Strange relevance scoring

2014-04-08 Thread John Nielsen
Interesting.

Most of the text fields are single word fields or close to it, but on some
of the documents, long text appears.

How long does a text need to be before hitting length normalization?


On Tue, Apr 8, 2014 at 11:36 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Nielsen,

 There is no special attention paid to first word. You are probably hitting
 length normalisation.
 Lucene/Solr punishes long documents, favours short documents.
 (5 times appearing one) longer?



 On Tuesday, April 8, 2014 12:03 PM, John Nielsen j...@mcb.dk wrote:
 Hi,

 We are seeing a strange phenomenon with our Solr setup which I have been
 unable to answer.

 My Google-fu is clearly not up to the task, so I am trying here.

 It appears that if i do a freetext search for a single word, say
 modellering
 on a text field, the scoring is massively boosted if the first word of the
 text field is a hit.

 For instance if there is only one occurrence of the word modellering in
 the text field and that occurrence is the first word of the text, then that
 document gets a higher relevancy than if the word modelling occurs 5
 times in the text and the first word of the text is any other word.

 Is this normal behavior? Is special attention paid to the first word in a
 text field? I would think that the latter case would get the highest score.


 --
 Med venlig hilsen / Best regards

 *John Nielsen*
 Programmer



 *MCB A/S*
 Enghaven 15
 DK-7500 Holstebro

 Kundeservice: +45 9610 2824
 p...@mcb.dk
 www.mcb.dk




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Strange relevance scoring

2014-04-08 Thread John Nielsen
Hi,

I couldn't find any occurrence of SpanFirstQuery in either the schema.xml
or solrconfig.xml files.

This is the query i used with debug=results.
http://pastebin.com/bWzUkjKz

 And here is the answer.
http://pastebin.com/nCXFcuky

I am not sure what I am supposed to be looking for.



On Tue, Apr 8, 2014 at 11:34 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi - the thing you describe is possible when your set up uses
 SpanFirstQuery. But to be sure what's going on you should post the debug
 output.

 -Original message-
  From:John Nielsen j...@mcb.dk
  Sent: Tuesday 8th April 2014 11:03
  To: solr-user@lucene.apache.org
  Subject: Strange relevance scoring
 
  Hi,
 
  We are seeing a strange phenomenon with our Solr setup which I have been
  unable to answer.
 
  My Google-fu is clearly not up to the task, so I am trying here.
 
  It appears that if i do a freetext search for a single word, say
 modellering
  on a text field, the scoring is massively boosted if the first word of
 the
  text field is a hit.
 
  For instance if there is only one occurrence of the word modellering in
  the text field and that occurrence is the first word of the text, then
 that
  document gets a higher relevancy than if the word modelling occurs 5
  times in the text and the first word of the text is any other word.
 
  Is this normal behavior? Is special attention paid to the first word in a
  text field? I would think that the latter case would get the highest
 score.
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread John Nielsen

  
  
We used to use G1, but recently went
  back to CMS.
  
  G1 gave us too long stop-the-world events. CMS uses more
  ressources for the same work, but it is more predictable and we
  get better worst-case performance out of it.
  
  
  
Med venlig hilsen / Best regards

John Nielsen
Programmer

MCB A/S
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



  
  On 20-06-2013 00:18, Timothy Potter wrote:


  I'm sure there's some site to do this but wanted to get a feel for
who's running Solr 4 on Java 7 with G1 gc enabled?

Cheers,
Tim



  



Searching for cache stats

2013-06-17 Thread John Nielsen

  
  
Hi,

I am looking for an automated way of getting cache stats from Solr.

Specificly what I am looking for are the cumulative evictions for
each cache type for each core:

http://screencast.com/t/IrD0VItfVduk

An example of how I would like to be able to query the cache
information is basically something like when I get core information
like this:

http://URL:8000/solr/admin/cores

Does anything similar exist which will allow me to get the cache
information?


-- 
  
  Med venlig hilsen / Best regards
  
  John Nielsen
  Programmer
  
  MCB A/S
  Enghaven 15
  DK-7500 Holstebro
  
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
  
  
  

  



Re: Solr using a ridiculous amount of memory

2013-06-14 Thread John Nielsen
Sorry for not getting back to the list sooner. It seems like I finally
solved the memory problems by following Toke's instruction of splitting the
cores up into smaller chunks.

After some major refactoring, our 15 cores have now turned into ~500 cores
and our memory consumption has dropped dramaticly. Running 200 webshops now
actually uses less memory as our 24 test shops did before.

Thank you to everyone who helped, and especially to Toke.

I looked at the wiki, but could not find any reference to this unintuitive
way of using memory. Did I miss it somewhere?



On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm. There has been quite a bit of work lately to support a couple of
 things that might be of interest (4.3, which Simon cut today, probably
 available to all mid next week at the latest). Basically, you can
 choose to pre-define all the cores in solr.xml (so-called old style)
 _or_ use the new-style solr.xml which uses auto-discover mode to
 walk the indicated directory and find all the cores (indicated by the
 presence of a 'core.properties' file). Don't know if this would make
 your particular case easier, and I should warn you that this is
 relatively new code (although there are some reasonable unit tests).

 You also have the option to only load the cores when they are
 referenced, and only keep N cores open at a time (loadOnStartup and
 transient properties).

 See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
 http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond

 Note, the docs are somewhat sketchy, so if you try to go down this
 route let us know anything that should be improved (or you can be
 added to the list of wiki page contributors and help out!)

 Best
 Erick

 On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen j...@mcb.dk wrote:
  You are missing an essential part: Both the facet and the sort
  structures needs to hold one reference for each document
  _in_the_full_index_, even when the document does not have any values in
  the fields.
 
 
  Wow, thank you for this awesome explanation! This is where the penny
  dropped for me.
 
  I will definetely move to a multi-core setup. It will take some time and
 a
  lot of re-coding. As soon as I know the result, I will let you know!
 
 
 
 
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread John Nielsen
 jobs and have
 similar requirements.  I would imagine that Google gets incredible response
 time because they have incredible amounts of RAM at their disposal that
 keep
 the important bits of their index instantly available.  They have thousands
 of servers in each data center.  I once got a look at the extent of
 Google's
 hardware in one data center - it was HUGE.  I couldn't get in to examine
 things closely, they keep that stuff very locked down.

 Thanks,
 Shawn




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
 That was strange. As you are using a multi-valued field with the new
setup, they should appear there.

Yes, the new field we use for faceting is a multi valued field.

 Can you find the facet fields in any of the other caches?

Yes, here it is, in the field cache:

http://screencast.com/t/mAwEnA21yL

 I hope you are not calling the facets with facet.method=enum? Could you
paste a typical facet-enabled search request?

Here is a typical example (I added newlines for readability):

http://172.22.51.111:8000/solr/default1_Danish/search
?defType=edismax
q=*%3a*
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv
fq=site_guid%3a(10217)
fq=item_type%3a(PRODUCT)
fq=language_guid%3a(1)
fq=item_group_1522_combination%3a(*)
fq=is_searchable%3a(True)
sort=item_group_1522_name_int+asc, variant_of_item_guid+asc
querytype=Technical
fl=feed_item_serialized
facet=true
group=true
group.facet=true
group.ngroups=true
group.field=groupby_variant_of_item_guid
group.sort=name+asc
rows=0

 Are you warming all the sort- and facet-fields?

I'm sorry, I don't know. I have the field value cache commented out in my
config, so... Whatever is default?

Removing the custom sort fields is unfortunately quite a bit more difficult
than my other facet modification.

The problem is that each item can have several sort orders. The sort order
to use is defined by a group number which is known ahead of time. The group
number is included in the sort order field name. To solve it in the same
way i solved the facet problem, I would need to be able to sort on a
multi-valued field, and unless I'm wrong, I don't think that it's possible.

I am quite stomped on how to fix this.




On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 John Nielsen [j...@mcb.dk]:
  I never seriously looked at my fieldValueCache. It never seemed to get
 used:

  http://screencast.com/t/YtKw7UQfU

 That was strange. As you are using a multi-valued field with the new
 setup, they should appear there. Can you find the facet fields in any of
 the other caches?

 ...I hope you are not calling the facets with facet.method=enum? Could you
 paste a typical facet-enabled search request?

  Yep. We still do a lot of sorting on dynamic field names, so the field
 cache
  has a lot of entries. (9.411 entries as we speak. This is considerably
 lower
  than before.). You mentioned in an earlier mail that faceting on a field
  shared between all facet queries would bring down the memory needed.
  Does the same thing go for sorting?

 More or less. Sorting stores the raw string representations (utf-8) in
 memory so the number of unique values has more to say than it does for
 faceting. Just as with faceting, a list of pointers from documents to
 values (1 value/document as we are sorting) is maintained, so the overhead
 is something like

 #documents*log2(#unique_terms*average_term_length) +
 #unique_terms*average_term_length
 (where average_term_length is in bits)

 Caveat: This is with the index-wide sorting structure. I am fairly
 confident that this is what Solr uses, but I have not looked at it lately
 so it is possible that some memory-saving segment-based trickery has been
 implemented.

  Does those 9411 entries duplicate data between them?

 Sorry, I do not know. SOLR- discusses the problems with the field
 cache and duplication of data, but I cannot infer if it is has been solved
 or not. I am not familiar with the stat breakdown of the fieldCache, but it
 _seems_ to me that there are 2 or 3 entries for each segment for each sort
 field. Guesstimating further, let's say you have 30 segments in your index.
 Going with the guesswork, that would bring the number of sort fields to
 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?

 Extrapolating from 1.4M documents and 180 clients, let's say that there
 are 1.4M/180/5 unique terms for each sort-field and that their average
 length is 10. We thus have
 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
 per sort field or about 4GB for all the 180 fields.

 With this few unique values, the doc-value structure is by far the
 biggest, just as with facets. As opposed to the faceting structure, this is
 fairly close to the actual memory usage. Switching to a single sort field
 would reduce the memory usage from 4GB to about 55MB.

  I do commit a bit more often than i should. I get these in my log file
 from
  time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

 So 1 active searcher and 2 warming searchers. Ignoring that one of the
 warming searchers is highly likely

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen

  http://172.22.51.111:8000/solr/default1_Danish/search

 [...]

  fq=site_guid%3a(10217)

 This constraints to hits to a specific customer, right? Any search will
 only be in a single customer's data?


Yes, thats right. No search from any given client ever returns anything
from another client.


[Toke: Are you warming all the sort- and facet-fields?]

  I'm sorry, I don't know. I have the field value cache commented out in
  my config, so... Whatever is default?

 (a bit shaky here) I would say not warming. You could check simply by
 starting solr and looking at the caches before you issue any searches.


The field cache shows 0 entries at startup. On the running server, forcing
a commit (and thus opening a new searcher) does not change the number of
entries.


  The problem is that each item can have several sort orders. The sort
  order to use is defined by a group number which is known ahead of
  time. The group number is included in the sort order field name. To
  solve it in the same way i solved the facet problem, I would need to
  be able to sort on a multi-valued field, and unless I'm wrong, I don't
  think that it's possible.

 That is correct.

 Three suggestions off the bat:

 1) Reduce the number of sort fields by mapping names.
 Count the maximum number of unique sort fields for any given customer.
 That will be the total number of sort fields in the index. For each
 group number for a customer, map that number to one of the index-wide
 sort fields.
 This only works if the maximum number of unique fields is low (let's say
 a single field takes 50MB, so 20 fields should be okay).


I just checked our DB. Our worst case scenario client has over a thousand
groups for sorting. Granted, it may be, probably is, an error with the
data. It is an interesting idea though and I will look into this posibility.


 3) Switch to a layout where each customer has a dedicated core.
 The basic overhead is a lot larger than for a shared index, but it would
 make your setup largely immune to the adverse effect of many documents
 coupled with many facet- and sort-fields.


Now this is where my brain melts down.

If I understand the fieldCache mechanism correctly (which i can see that I
don't), the data used for faceting and sorting is saved in the fieldCache
using a key comprised of the fields used for said faceting/sorting. That
data only contains the data which is actually used for the operation. This
is what the fq queries are for.

So if i generate a core for each client, I would have a client specific
fieldCache containing the data from that client. Wouldn't I just split up
the same data into several cores?

I'm afraid I don't understand how this would help.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
 You are missing an essential part: Both the facet and the sort
 structures needs to hold one reference for each document
 _in_the_full_index_, even when the document does not have any values in
 the fields.


Wow, thank you for this awesome explanation! This is where the penny
dropped for me.

I will definetely move to a multi-core setup. It will take some time and a
lot of re-coding. As soon as I know the result, I will let you know!






-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.

Unfortunately it doesn't seem to have done much difference, if any at all.

Some more information that might help:

The JVM memory seem to be eaten up slowly. I dont think that there is one
single query that causes the problem. My test case (dumping 180 clients on
top of solr) takes hours before it causes an OOM. Often a full day. The
memory usage wobbles up and down, so the GC is at least partially doing its
job. It still works its way up to 100% eventually. When that happens it
either OOM's or it stops the world and brings the memory consumption to
10-15 gigs.

I did try to facet on all products across all clients (about 1.4 mil docs)
and i could not make it OOM on a server with a 4 gig jvm. This was on a
dedicated test server with my test being the only traffic.

I am beginning to think that this may be related to traffic volume and not
just on the type of query that I do.

I tried to calculate the memory requirement example you gave me above based
on the change that got rid of the dynamic fields.

documents = ~1.400.000
references 11.200.000  (we facet on two multivalue fields with each 4
values on average, so 1.400.000 * 2 * 4 = 11.200.000
unique values = 1.132.344 (total number of variant options across all
clients. This is what we facet on)

1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field
(we have 4 fields)?

I must be calculating this wrong.






On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen j...@mcb.dk wrote:

 I did a search. I have no occurrence of UnInverted in the solr logs.

  Another explanation for the large amount of memory presents itself if
  you use a single index: If each of your clients facet on at least one
  fields specific to the client (client123_persons or something like
  that), then your memory usage goes through the roof.

 This is exactly how we facet right now! I will definetely rewrite the
 relevant parts of our product to test this out before moving further down
 the docValues path.

 I will let you know as soon as I know one way or the other.


 On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen 
 t...@statsbiblioteket.dkwrote:

 On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

  The FieldCache is the big culprit. We do a huge amount of faceting so
  it seems right.

 Yes, you wrote that earlier. The mystery is that the math does not check
 out with the description you have given us.

  Unfortunately I am super swamped at work so I have precious little
  time to work on this, which is what explains my silence.

 No problem, we've all been there.
 
 [Band aid: More memory]

  The extra memory helped a lot, but it still OOM with about 180 clients
  using it.

 You stated earlier that you has a solr cluster and your total(?) index
 size was 35GB, with each register being between 15k and 30k. I am
 using the quotes to signify that it is unclear what you mean. Is your
 cluster multiple machines (I'm guessing no), multiple Solr's, cores,
 shards or maybe just a single instance prepared for later distribution?
 Is a register a core, shard or a simply logical part (one client's data)
 of the index?

 If each client has their own core or shard, that would mean that each
 client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
 ~= 200MB of index. That sounds quite high and you would need a very
 heavy facet to reach that.

 If you could grep UnInverted from the Solr log file and paste the
 entries here, that would help to clarify things.


 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

 Assuming an index with 10M documents, each with 5 references to a modest
 10K unique values in a facet field, the simplified formula
   #documents*log2(#references) + #references*log2(#unique_values) bit
 tells us that this takes at least 110MB with field cache based faceting.

 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
 least double that. This fits neatly with your new heap of 64GB.


 If my guessing is correct, you can solve your memory problems very
 easily by sharing _all_ the facet fields between your clients.
 This should bring your memory usage down to a few GB.

 You are probably already restricting their searches to their own data by
 filtering, so this should not influence the returned facet values and
 counts, as compared to separate fields.

 This is very similar to the thread Facets with 5000 facet fields BTW.

  Today I finally managed to set up a test core so I can begin to play
  around with docValues.

 If you are using a single index with the individual-facet-fields for
 each client approach, the DocValues will also have scaling issues, as
 the amount of values (of which

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
 I am surprised about the lack of UnInverted from your logs as it is
logged on INFO level.

Nope, no trace of it. No mention either in Logging - Level from the admin
interface.

 It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.

I never seriously looked at my fieldValueCache. It never seemed to get used:

http://screencast.com/t/YtKw7UQfU

 You stated that you were unable to make a 4GB JVM OOM when you just
performed faceting (I guesstimate that it will also run fine with just ½GB
or at least with 1GB, based on the
 numbers above) and you have observed that the field cache eats the
memory.

Yep. We still do a lot of sorting on dynamic field names, so the field
cache has a lot of entries. (9.411 entries as we speak. This is
considerably lower than before.). You mentioned in an earlier mail that
faceting on a field shared between all facet queries would bring down the
memory needed. Does the same thing go for sorting? Does those 9411 entries
duplicate data between them? If this is where all the memory is going, I
have a lot of coding to do.

 Guessing wildly: Do you issue a high frequency small updates with
frequent commits? If you pause the indexing, does memory use fall back to
the single GB level

I do commit a bit more often than i should. I get these in my log file from
time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The way I
understand this is that two searchers are being warmed at the same time and
that one will be discarded when it finishes its auto warming procedure. If
the math above is correct, I would need tens of searchers auto
warming in parallel to cause my problem. If I misunderstand how this works,
do let me know.

My indexer has a cleanup routine that deletes replay logs and other things
when it has nothing to do. This includes running a commit on the solr
server to make sure nothing is ever in a state where something is not
written to disk anywhere. In theory it can commit once every 60 seconds,
though i doubt that ever happenes. The less work the indexer has, the more
often it commits. (yes i know, its on my todo list)

Other than that, my autocommit settings look like this:

autoCommit maxTime6/maxTime maxDocs6000/maxDocs openSearcher
false/openSearcher /autoCommit

The control panel says that the warm up time of the last searcher is 5574.
Is that seconds or milliseconds?
http://screencast.com/t/d9oIbGLCFQwl

I would prefer to not turn off the indexer unless the numbers above
suggests that I really should try this. Waiting for a full GC would take a
long time. Unfortunately I don't know of a way to provoke a full GC on
command.


On Wed, Apr 17, 2013 at 11:48 AM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:

 John Nielsen [j...@mcb.dk] wrote:
  I managed to get this done. The facet queries now facets on a multivalue
 field as opposed to the dynamic field names.

  Unfortunately it doesn't seem to have done much difference, if any at
 all.

 I am sorry to hear that.

  documents = ~1.400.000
  references 11.200.000  (we facet on two multivalue fields with each 4
 values
  on average, so 1.400.000 * 2 * 4 = 11.200.000
  unique values = 1.132.344 (total number of variant options across all
 clients.
  This is what we facet on)

  1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per
 field (we have 4 fields)?

  I must be calculating this wrong.

 No, that sounds about right. In reality you need to multiply with 3 or 4,
 so let's round to 50MB/field: 1.4M documents with 2 fields with 5M
 references/field each is not very much and should not take a lot of memory.
 In comparison, we facet on 12M documents with 166M references and do some
 other stuff (in Lucene with a different faceting implementation, but at
 this level it is equivalent to Solr's in terms of memory). Our heap is 3GB.

 I am surprised about the lack of UnInverted from your logs as it is
 logged on INFO level. It should also be available from the admin interface
 under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing
 you got your numbers from that and that the list only contains the few
 facets you mentioned previously? It might be wise to sanity check by
 summing the memSizes though; they ought to take up far below 1GB.

 From your description, your index is small and your faceting requirements
 modest. A SSD-equipped laptop should be adequate as server. So we are back
 to math does not check out.


 You stated that you were unable to make a 4GB JVM OOM when you just
 performed faceting (I guesstimate that it will also run fine with just ½GB
 or at least with 1GB, based on the numbers above) and you have observed
 that the field cache eats the memory. This does indicate that the old
 caches are somehow not freed when the index is updated. That is strange as
 Solr should take care of that automatically.

 Guessing wildly: Do you issue a high frequency small updates with frequent
 commits? If you pause

Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
Yes and no,

The FieldCache is the big culprit. We do a huge amount of faceting so it
seems right. Unfortunately I am super swamped at work so I have precious
little time to work on this, which is what explains my silence.

Out of desperation, I added another 32G of memory to each server and
increased the JVM size to 64G from 25G. The servers are running with 96G
memory right now (this is the max amount supported by the hardware) which
leaves solr somewhat starved for memory. I am aware of the performance
implications of doing this but I have little choice.

The extra memory helped a lot, but it still OOM with about 180 clients
using it. Unfortunately I need to support at least double that. After
upgrading the RAM, I ran for almost two weeks with the same workload that
used to OOM a couple of times a day, so it doesn't look like a leak.

Today I finally managed to set up a test core so I can begin to play around
with docValues.

I actually have a couple of questions regarding docValues:
1) If I facet on multible fields and only some of those fields are using
docValues, will I still get the memory saving benefit of docValues? (one of
the facet fields use null values and will require a lot of work in our
product to fix)
2) If i just use docValues on one small core with very limited traffic at
first for testing purposes, how can I test that it is actually using the
disk for caching?

I really appreciate all the help I have received on this list so far. I do
feel confident that I will be able to solve this issue eventually.



On Mon, Apr 15, 2013 at 9:00 AM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
  Our memory requirements are running amok. We have less than a quarter of
  our customers running now and even though we have allocated 25GB to the
 JVM
  already, we are still seeing daily OOM crashes.

 Out of curiosity: Did you manage to pinpoint the memory eater in your
 setup?

 - Toke Eskildsen




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
I did a search. I have no occurrence of UnInverted in the solr logs.

 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

This is exactly how we facet right now! I will definetely rewrite the
relevant parts of our product to test this out before moving further down
the docValues path.

I will let you know as soon as I know one way or the other.


On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

  The FieldCache is the big culprit. We do a huge amount of faceting so
  it seems right.

 Yes, you wrote that earlier. The mystery is that the math does not check
 out with the description you have given us.

  Unfortunately I am super swamped at work so I have precious little
  time to work on this, which is what explains my silence.

 No problem, we've all been there.
 
 [Band aid: More memory]

  The extra memory helped a lot, but it still OOM with about 180 clients
  using it.

 You stated earlier that you has a solr cluster and your total(?) index
 size was 35GB, with each register being between 15k and 30k. I am
 using the quotes to signify that it is unclear what you mean. Is your
 cluster multiple machines (I'm guessing no), multiple Solr's, cores,
 shards or maybe just a single instance prepared for later distribution?
 Is a register a core, shard or a simply logical part (one client's data)
 of the index?

 If each client has their own core or shard, that would mean that each
 client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
 ~= 200MB of index. That sounds quite high and you would need a very
 heavy facet to reach that.

 If you could grep UnInverted from the Solr log file and paste the
 entries here, that would help to clarify things.


 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

 Assuming an index with 10M documents, each with 5 references to a modest
 10K unique values in a facet field, the simplified formula
   #documents*log2(#references) + #references*log2(#unique_values) bit
 tells us that this takes at least 110MB with field cache based faceting.

 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
 least double that. This fits neatly with your new heap of 64GB.


 If my guessing is correct, you can solve your memory problems very
 easily by sharing _all_ the facet fields between your clients.
 This should bring your memory usage down to a few GB.

 You are probably already restricting their searches to their own data by
 filtering, so this should not influence the returned facet values and
 counts, as compared to separate fields.

 This is very similar to the thread Facets with 5000 facet fields BTW.

  Today I finally managed to set up a test core so I can begin to play
  around with docValues.

 If you are using a single index with the individual-facet-fields for
 each client approach, the DocValues will also have scaling issues, as
 the amount of values (of which the majority will be null) will be
   #clients*#documents*#facet_fields
 This means that the adding a new client will be progressively more
 expensive.

 On the other hand, if you use a lot of small shards, DocValues should
 work for you.

 Regards,
 Toke Eskildsen





-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-03-25 Thread John Nielsen
I apologize for the slow reply. Today has been killer. I will reply to
everyone as soon as I get the time.

I am having difficulties understanding how docValues work.

Should I only add docValues to the fields that I actually use for sorting
and faceting or on all fields?

Will the docValues magic apply to the fields i activate docValues on or on
the entire document when sorting/faceting on a field that has docValues
activated?

I'm not even sure which question to ask. I am struggling to understand this
on a conceptual level.


On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote:

 On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:

  Schema with DocValues attempt at solving problem:
  http://pastebin.com/Ne23NnW4
  Config: http://pastebin.com/x1qykyXW
 

 This schema isn't using docvalues, due to a typo in your config.
 it should not be DocValues=true but docValues=true.

 Are you not getting an error? Solr needs to throw exception if you
 provide invalid attributes to the field. Nothing is more frustrating
 than having a typo or something in your configuration and solr just
 ignores this, reports no error, and doesnt work the way you want.
 I'll look into this (I already intend to add these checks to analysis
 factories for the same reason).

 Separately, if you really want the terms data and so on to remain on
 disk, it is not enough to just enable docvalues for the field. The
 default implementation uses the heap. So if you want that, you need to
 set docValuesFormat=Disk on the fieldtype. This will keep the
 majority of the data on disk, and only some key datastructures in heap
 memory. This might have significant performance impact depending upon
 what you are doing so you need to test that.




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Solr using a ridiculous amount of memory

2013-03-24 Thread John Nielsen
Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread John Nielsen
with the on disk option.

Could you elaborate on that?
Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com:

 You might try using docvalues with the on disk option and try and let the
 OS manage all the memory needed for all the faceting/sorting. This would
 require Solr 4.2.

 - Mark

 On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:

  Hello All,
 
  Scenario:
 
  My data model consist of approx. 450 fields with different types of
 data. We
  want to include each field for indexing as a result it will create a
 single
  SOLR document with *450 fields*. The total of number of records in the
 data
  set is *755K*. We will be using the features like faceting and sorting on
  approx. 50 fields.
 
  We are planning to use SOLR 4.1. Following is the hardware configuration
 of
  the web server that we plan to install SOLR on:-
 
  CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
 
  Questions :
 
  1)What's the best approach when dealing with documents with large number
 of
  fields. What's the drawback of having a single document with a very large
  number of fields. Does SOLR support documents with large number of
 fields as
  in my case?
 
  2)Will there be any performance issue if i define all of the 450 fields
 for
  indexing? Also if faceting is done on 50 fields with document having
 large
  number of fields and huge number of records?
 
  3)The name of the fields in the data set are quiet lengthy around 60
  characters. Will it be a problem defining fields with such a huge name in
  the schema file? Is there any best practice to be followed related to
 naming
  convention? Will big field names create problem during querying?
 
  Thanks!
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Facets with 5000 facet fields

2013-03-21 Thread John Nielsen
It looks like docvalues might solve a problem we have. (sorry for the
thread jacking)

I looked for info on it on the wiki, but could not find any.

Is there any documentation done on it yet?




On Wed, Mar 20, 2013 at 6:09 PM, Mark Miller markrmil...@gmail.com wrote:


 On Mar 20, 2013, at 11:29 AM, Chris Hostetter hossman_luc...@fucit.org
 wrote:

  Not true ... per segment FIeldCache support is available in solr
  faceting, you just have to specify facet.method=fcs (FieldCache per
  Segment)

 Also, if you use docvalues in 4.2, Robert tells me it is uses a new per
 seg faceting method that may have some better nrt characteristics than fcs.
 I have not played with it yet but hope to soon.

 - Mark




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Data from deleted from Solr (Solr cloud)

2012-12-20 Thread John Nielsen
Yeah, I ran into this issue myself with solr-4.0.0.

To fix it, I had to compile my own version from the solr-4x branch. That
is, I assume it's fixed as I have been unable to replicate it after the
switch.

I'm afraid you will have to reindex your data.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


On Wed, Dec 19, 2012 at 5:08 PM, shreejay shreej...@gmail.com wrote:

 Hi All,

 I have a solrlcoud instance with 3 shards. Each shard has 2 instance (2
 servers each running a instance of solr)

 Lets say I had Instance1 and instance2 in shard1 … At some point, instance2
 went down due to OOM (out of memory) . instance1 for some reason was not
 replicating the data properly and when it became the leader, it had only
 around 1% of the data that instance2 had. I restarted instance2, and hoped
 that instance1 will replicate from 2, but instead instanace2 replicated
 from
 instance1 . and ended up deleting the original index folder it had. There
 were around 2 million documents in that instance.

 Can any one of solrlcoud users give any hints if I can recover this data?




 --Shreejay



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Data-from-deleted-from-Solr-Solr-cloud-tp4028055.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange data-loss problem on one of our cores

2012-12-18 Thread John Nielsen
I build a solr version from the solr-4x branch yesterday and so far am
unable to replicate the problems i had before.

I am cautiously optimistic that the problem has been resolved. If i run
into any more problems, I'll let you all know.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 7:33 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Mark, no issue has been filed. That cluster runs a check out from round
 end of july/beginning of august. I'm in the process of including another
 cluster in the indexing and removal of documents besides the old production
 clusters. I'll start writing to that one tuesday orso.
 If i notice a discrepancy after some time i am sure to report it. I doubt
 i'll find it before 2013, if the problem is still there.


 -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Fri 14-Dec-2012 19:05
  To: solr-user@lucene.apache.org
  Subject: Re: Strange data-loss problem on one of our cores
 
  Have you filed a JIRA issue for this that I don't remember Markus?
 
  We need to make sure this is fixed.
 
  Any idea around when the trunk version came from? Before or after 4.0?
 
  - Mark
 
  On Dec 14, 2012, at 6:36 AM, Markus Jelsma markus.jel...@openindex.io
 wrote:
 
   We did not solve it but reindexing can remedy the problem.
  
   -Original message-
   From:John Nielsen j...@mcb.dk
   Sent: Fri 14-Dec-2012 12:31
   To: solr-user@lucene.apache.org
   Subject: Re: Strange data-loss problem on one of our cores
  
   How did you solve the problem?
  
  
   --
   Med venlig hilsen / Best regards
  
   *John Nielsen*
   Programmer
  
  
  
   *MCB A/S*
   Enghaven 15
   DK-7500 Holstebro
  
   Kundeservice: +45 9610 2824
   p...@mcb.dk
   www.mcb.dk
  
  
  
   On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
   markus.jel...@openindex.iowrote:
  
   FYI, we observe the same issue, after some time (days, months) a
 cluster
   running an older trunk version has at least two shards where the
 leader and
   the replica do not contain the same number of records. No recovery is
   attempted, it seems it thinks everything is alright. Also, one core
 of one
   of the unsynced shards waits forever loading
   /replication?command=detailwt=json, other cores load it in a few
 ms. Both
   cores of another unsynced shard does not show this problem.
  
   -Original message-
   From:John Nielsen j...@mcb.dk
   Sent: Fri 14-Dec-2012 11:50
   To: solr-user@lucene.apache.org
   Subject: Re: Strange data-loss problem on one of our cores
  
   I did a manual commit, and we are still missing docs, so it doesn't
 look
   like the search race condition you mention.
  
   My boss wasn't happy when i mentioned that I wanted to try out
 unreleased
   code. Ill get him won over though and return with my findings. It
 will
   probably be some time next week.
  
   Thanks for your help.
  
  
   --
   Med venlig hilsen / Best regards
  
   *John Nielsen*
   Programmer
  
  
  
   *MCB A/S*
   Enghaven 15
   DK-7500 Holstebro
  
   Kundeservice: +45 9610 2824
   p...@mcb.dk
   www.mcb.dk
  
  
  
   On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller markrmil...@gmail.com
 
   wrote:
  
   Couple things to start:
  
   By default SolrCloud distributes updates a doc at a time. So if you
   have 1
   shard, whatever node you index too, it will send updates to the
 other.
   Replication is only used for recovery, not distributing data. So
 for
   some
   reason, there is an IOException when it tries to forward.
  
   The other issue is not something that Ive seen reported. Can/did
 you
   try
   and do another hard commit to make sure you had the latest search
 open
   when
   checking the # of docs on each node? There was previously a race
 around
   commit that could cause some issues around expected visibility.
  
   If you are able to, you might try out a nightly build - 4.1 will be
   ready
   very soon and has numerous bug fixes for SolrCloud.
  
   - Mark
  
   On Dec 13, 2012, at 9:53 AM, John Nielsen j...@mcb.dk wrote:
  
   Hi all,
  
   We are seeing a strange problem on our 2-node solr4 cluster. This
   problem
   has resultet in data loss.
  
   We have two servers, varnish01 and varnish02. Zookeeper is
 running on
   varnish02, but in a separate jvm.
  
   We index directly to varnish02 and we read from varnish01. Data is
   thus
   replicated from varnish02 to varnish01.
  
   I found this in the varnish01 log:
  
   *INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
  
  
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
   }
   status=0 QTime=42
   Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
   INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
  
  
  
 http://varnish02.lynero.net:8000/solr

Re: Strange data-loss problem on one of our cores

2012-12-14 Thread John Nielsen
I did a manual commit, and we are still missing docs, so it doesn't look
like the search race condition you mention.

My boss wasn't happy when i mentioned that I wanted to try out unreleased
code. Ill get him won over though and return with my findings. It will
probably be some time next week.

Thanks for your help.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Couple things to start:

 By default SolrCloud distributes updates a doc at a time. So if you have 1
 shard, whatever node you index too, it will send updates to the other.
 Replication is only used for recovery, not distributing data. So for some
 reason, there is an IOException when it tries to forward.

 The other issue is not something that Ive seen reported. Can/did you try
 and do another hard commit to make sure you had the latest search open when
 checking the # of docs on each node? There was previously a race around
 commit that could cause some issues around expected visibility.

 If you are able to, you might try out a nightly build - 4.1 will be ready
 very soon and has numerous bug fixes for SolrCloud.

 - Mark

 On Dec 13, 2012, at 9:53 AM, John Nielsen j...@mcb.dk wrote:

  Hi all,
 
  We are seeing a strange problem on our 2-node solr4 cluster. This problem
  has resultet in data loss.
 
  We have two servers, varnish01 and varnish02. Zookeeper is running on
  varnish02, but in a separate jvm.
 
  We index directly to varnish02 and we read from varnish01. Data is thus
  replicated from varnish02 to varnish01.
 
  I found this in the varnish01 log:
 
  *INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=42
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=41
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=33
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=33
  Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
  SEVERE: shard update error StdNode:
 
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
 :
  IOException occured when talking to server at:
  http://varnish02.lynero.net:8000/solr/default1_Norwegian
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
  Caused by: org.apache.http.NoHttpResponseException: The target server
  failed to respond
 at
 
 org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
 at
 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
 at
 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
 at
 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
 at
 
 org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
 at
 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
 at
 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 at
 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647

Re: Strange data-loss problem on one of our cores

2012-12-14 Thread John Nielsen
How did you solve the problem?


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 FYI, we observe the same issue, after some time (days, months) a cluster
 running an older trunk version has at least two shards where the leader and
 the replica do not contain the same number of records. No recovery is
 attempted, it seems it thinks everything is alright. Also, one core of one
 of the unsynced shards waits forever loading
 /replication?command=detailwt=json, other cores load it in a few ms. Both
 cores of another unsynced shard does not show this problem.

 -Original message-
  From:John Nielsen j...@mcb.dk
  Sent: Fri 14-Dec-2012 11:50
  To: solr-user@lucene.apache.org
  Subject: Re: Strange data-loss problem on one of our cores
 
  I did a manual commit, and we are still missing docs, so it doesn't look
  like the search race condition you mention.
 
  My boss wasn't happy when i mentioned that I wanted to try out unreleased
  code. Ill get him won over though and return with my findings. It will
  probably be some time next week.
 
  Thanks for your help.
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
   Couple things to start:
  
   By default SolrCloud distributes updates a doc at a time. So if you
 have 1
   shard, whatever node you index too, it will send updates to the other.
   Replication is only used for recovery, not distributing data. So for
 some
   reason, there is an IOException when it tries to forward.
  
   The other issue is not something that Ive seen reported. Can/did you
 try
   and do another hard commit to make sure you had the latest search open
 when
   checking the # of docs on each node? There was previously a race around
   commit that could cause some issues around expected visibility.
  
   If you are able to, you might try out a nightly build - 4.1 will be
 ready
   very soon and has numerous bug fixes for SolrCloud.
  
   - Mark
  
   On Dec 13, 2012, at 9:53 AM, John Nielsen j...@mcb.dk wrote:
  
Hi all,
   
We are seeing a strange problem on our 2-node solr4 cluster. This
 problem
has resultet in data loss.
   
We have two servers, varnish01 and varnish02. Zookeeper is running on
varnish02, but in a separate jvm.
   
We index directly to varnish02 and we read from varnish01. Data is
 thus
replicated from varnish02 to varnish01.
   
I found this in the varnish01 log:
   
*INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
   
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
   }
status=0 QTime=42
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
   
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
   }
status=0 QTime=41
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
   
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
   }
status=0 QTime=33
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update
   params={distrib.from=
   
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
   }
status=0 QTime=33
Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
   
  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
   :
IOException occured when talking to server at:
http://varnish02.lynero.net:8000/solr/default1_Norwegian
   at
   
  
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
   at
   
  
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
   at
   
  
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
   at
   
  
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
   at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
   
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166

Re: Strange data-loss problem on one of our cores

2012-12-14 Thread John Nielsen
I'm building a simple tool which will help us monitor the solr cores for
this problem. Basically it does a q=*:* on both servers on each cores and
compares numFound of each result. Problem is that since this is a cloud
setup, i can't be sure which server gets me the result. Is there a
parameter I can add to the GET requests that will lock the request to a
specific node in the cluster, treating the server receiving the request as
a standalone server as opposed to a member of a cluster?

I tried googeling it without luck.



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 12:36 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 We did not solve it but reindexing can remedy the problem.

 -Original message-
  From:John Nielsen j...@mcb.dk
  Sent: Fri 14-Dec-2012 12:31
  To: solr-user@lucene.apache.org
  Subject: Re: Strange data-loss problem on one of our cores
 
  How did you solve the problem?
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
   FYI, we observe the same issue, after some time (days, months) a
 cluster
   running an older trunk version has at least two shards where the
 leader and
   the replica do not contain the same number of records. No recovery is
   attempted, it seems it thinks everything is alright. Also, one core of
 one
   of the unsynced shards waits forever loading
   /replication?command=detailwt=json, other cores load it in a few ms.
 Both
   cores of another unsynced shard does not show this problem.
  
   -Original message-
From:John Nielsen j...@mcb.dk
Sent: Fri 14-Dec-2012 11:50
To: solr-user@lucene.apache.org
Subject: Re: Strange data-loss problem on one of our cores
   
I did a manual commit, and we are still missing docs, so it doesn't
 look
like the search race condition you mention.
   
My boss wasn't happy when i mentioned that I wanted to try out
 unreleased
code. Ill get him won over though and return with my findings. It
 will
probably be some time next week.
   
Thanks for your help.
   
   
--
Med venlig hilsen / Best regards
   
*John Nielsen*
Programmer
   
   
   
*MCB A/S*
Enghaven 15
DK-7500 Holstebro
   
Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk
   
   
   
On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller markrmil...@gmail.com
   wrote:
   
 Couple things to start:

 By default SolrCloud distributes updates a doc at a time. So if you
   have 1
 shard, whatever node you index too, it will send updates to the
 other.
 Replication is only used for recovery, not distributing data. So
 for
   some
 reason, there is an IOException when it tries to forward.

 The other issue is not something that Ive seen reported. Can/did
 you
   try
 and do another hard commit to make sure you had the latest search
 open
   when
 checking the # of docs on each node? There was previously a race
 around
 commit that could cause some issues around expected visibility.

 If you are able to, you might try out a nightly build - 4.1 will be
   ready
 very soon and has numerous bug fixes for SolrCloud.

 - Mark

 On Dec 13, 2012, at 9:53 AM, John Nielsen j...@mcb.dk wrote:

  Hi all,
 
  We are seeing a strange problem on our 2-node solr4 cluster. This
   problem
  has resultet in data loss.
 
  We have two servers, varnish01 and varnish02. Zookeeper is
 running on
  varnish02, but in a separate jvm.
 
  We index directly to varnish02 and we read from varnish01. Data
 is
   thus
  replicated from varnish02 to varnish01.
 
  I found this in the varnish01 log:
 
  *INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 

  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=42
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 

  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=41
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr path=/update
 params={distrib.from=
 

  
 http://varnish02.lynero.net:8000/solr/default1_Norwegian/update.distrib=TOLEADERwt=javabinversion=2
 }
  status=0 QTime=33
  Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
  INFO: [default1_Norwegian] webapp=/solr

Re: Strange data-loss problem on one of our cores

2012-12-14 Thread John Nielsen
Awesome!

http://host:port/solr/admin/cores is exactly what i needed!



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 1:21 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 You must use the core's name and not use the collection name so you have
 to know which core is on which server.
 http://host:port/solr/corename/select

 You can use the cores handler to find out about the cores on the node:
 http://host:port/solr/admin/cores

 You can also use luke for this. It returns the same stats as in the
 interface:
 http://host:port/solr/corename/admin/luke

 -Original message-
  From:John Nielsen j...@mcb.dk
  Sent: Fri 14-Dec-2012 13:16
  To: solr-user@lucene.apache.org
  Subject: Re: Strange data-loss problem on one of our cores
 
  I'm building a simple tool which will help us monitor the solr cores for
  this problem. Basically it does a q=*:* on both servers on each cores and
  compares numFound of each result. Problem is that since this is a cloud
  setup, i can't be sure which server gets me the result. Is there a
  parameter I can add to the GET requests that will lock the request to a
  specific node in the cluster, treating the server receiving the request
 as
  a standalone server as opposed to a member of a cluster?
 
  I tried googeling it without luck.
 
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Fri, Dec 14, 2012 at 12:36 PM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
   We did not solve it but reindexing can remedy the problem.
  
   -Original message-
From:John Nielsen j...@mcb.dk
Sent: Fri 14-Dec-2012 12:31
To: solr-user@lucene.apache.org
Subject: Re: Strange data-loss problem on one of our cores
   
How did you solve the problem?
   
   
--
Med venlig hilsen / Best regards
   
*John Nielsen*
Programmer
   
   
   
*MCB A/S*
Enghaven 15
DK-7500 Holstebro
   
Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk
   
   
   
On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
   
 FYI, we observe the same issue, after some time (days, months) a
   cluster
 running an older trunk version has at least two shards where the
   leader and
 the replica do not contain the same number of records. No recovery
 is
 attempted, it seems it thinks everything is alright. Also, one
 core of
   one
 of the unsynced shards waits forever loading
 /replication?command=detailwt=json, other cores load it in a few
 ms.
   Both
 cores of another unsynced shard does not show this problem.

 -Original message-
  From:John Nielsen j...@mcb.dk
  Sent: Fri 14-Dec-2012 11:50
  To: solr-user@lucene.apache.org
  Subject: Re: Strange data-loss problem on one of our cores
 
  I did a manual commit, and we are still missing docs, so it
 doesn't
   look
  like the search race condition you mention.
 
  My boss wasn't happy when i mentioned that I wanted to try out
   unreleased
  code. Ill get him won over though and return with my findings. It
   will
  probably be some time next week.
 
  Thanks for your help.
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller 
 markrmil...@gmail.com
 wrote:
 
   Couple things to start:
  
   By default SolrCloud distributes updates a doc at a time. So
 if you
 have 1
   shard, whatever node you index too, it will send updates to the
   other.
   Replication is only used for recovery, not distributing data.
 So
   for
 some
   reason, there is an IOException when it tries to forward.
  
   The other issue is not something that Ive seen reported.
 Can/did
   you
 try
   and do another hard commit to make sure you had the latest
 search
   open
 when
   checking the # of docs on each node? There was previously a
 race
   around
   commit that could cause some issues around expected visibility.
  
   If you are able to, you might try out a nightly build - 4.1
 will be
 ready
   very soon and has numerous bug fixes for SolrCloud.
  
   - Mark
  
   On Dec 13, 2012, at 9:53 AM, John Nielsen j...@mcb.dk wrote:
  
Hi all,
   
We are seeing a strange problem on our 2-node solr4 cluster.
 This
 problem
has resultet in data loss.
   
We have two servers, varnish01 and varnish02

Strange data-loss problem on one of our cores

2012-12-13 Thread John Nielsen
,+variant_of_item_guid+ascgroup.distributed.first=truefacet.limit=1000q.alt=*:*q.alt=*:*distrib=falsefacet.method=enumversion=2df=textfl=docidshard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/NOW=1355397828395group.field=groupby_variant_of_item_guidfacet.field=itemgroups_int_mvfq=site_guid:(11440)fq=item_type:(PRODUCT)fq=language_guid:(1)fq=item_group_56823_combination:(*)fq=item_group_45879_combination:(*)fq=is_searchable:(True)querytype=Technicalmm=100%25facet.missing=ongroup.ngroups=truefacet.mincount=1qf=%0a++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++wt=javabingroup.facet=truedefType=edismaxrows=0facet.sort=lexstart=0group=truegroup.sort=name+ascisShard=true}
status=0 QTime=8
Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper... *

Which is picked up on varnish01:

*Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*

It looks like it replicated successfully, only it didnt. The
default1_Norwegian core on varnish01 now has 55.071 docs and the same core
on varnish02 has 35.088 docs.

I checked the log files for both JVM's and no stop-the-world GC were taking
place.

There is also nothing in the zookeeper log of interest that I can see.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: SOLR4 cluster - strange CPU spike on slave

2012-12-05 Thread John Nielsen
Very interesting!

I've seen references to NRTCachingDirectory, MMapDirectory, FSDirectory,
RamDirectory and NIOFSDirectory, and thats just what I can remember. I have
tried to search for more information about these, but I'm not having much
luck.

Is there a place where I can read up on these?

Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Wed, Dec 5, 2012 at 1:11 AM, Mark Miller markrmil...@gmail.com wrote:


 On Dec 4, 2012, at 2:25 AM, John Nielsen j...@mcb.dk wrote:

  The post about mmapdirectory is really interesting. We switched to using
  that from NRTCachingDirectory and am monitoring performance as well.
  Initially performance doesn't look stellar, but i suspect that we lack
  memory in the server to really make it shine.

 NRTCachingDirectory delegates to another directory and simply caches small
 segments in RAM - usually it delegates MMapDirectory by default. So likely
 you won't notice any changes, because you have not likely really changed
 anything. NRTCachingDirectory simply helps in the NRT case and doesn't
 really hurt that I've seen in the std case. It's more like a help dir than
 a replacement.

 - Mark


Re: SOLR4 cluster - strange CPU spike on slave

2012-12-05 Thread John Nielsen
I'm not sure I understand why this is important. Too much memory would just
be unused.

This is what the heap looks now:

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize  = 17179869184 (16384.0MB)
   NewSize  = 21757952 (20.75MB)
   MaxNewSize   = 283508736 (270.375MB)
   OldSize  = 65404928 (62.375MB)
   NewRatio = 7
   SurvivorRatio= 8
   PermSize = 21757952 (20.75MB)
   MaxPermSize  = 176160768 (168.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 255197184 (243.375MB)
   used = 108828496 (103.78694152832031MB)
   free = 146368688 (139.5880584716797MB)
   42.644865548359654% used
Eden Space:
   capacity = 226885632 (216.375MB)
   used = 83498424 (79.63030242919922MB)
   free = 143387208 (136.74469757080078MB)
   36.80198841326365% used
From Space:
   capacity = 28311552 (27.0MB)
   used = 25330072 (24.156639099121094MB)
   free = 2981480 (2.8433609008789062MB)
   89.46903370044849% used
To Space:
   capacity = 28311552 (27.0MB)
   used = 0 (0.0MB)
   free = 28311552 (27.0MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 16896360448 (16113.625MB)
   used = 12452710200 (11875.829887390137MB)
   free = 4443650248 (4237.795112609863MB)
   73.70054775005708% used
Perm Generation:
   capacity = 70578176 (67.30859375MB)
   used = 37652032 (35.90777587890625MB)
   free = 32926144 (31.40081787109375MB)
   53.347981109627995% used



Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Thu, Nov 29, 2012 at 4:08 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 If this is caused by index segment merging you should be able to see that
 very clearly on the Index report in SPM, where you would see sudden graph
 movement at the time of spike and corresponding to CPU and disk activity.
 I think uncommenting that infostream in solrconfig would also show it.

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm
 On Nov 28, 2012 9:20 PM, Erick Erickson erickerick...@gmail.com wrote:

  Am I reading this right? All you're doing on varnish1 is replicating to
 it?
  You're not searching or indexing? I'm sure I'm misreading this.
 
 
  The spike, which only lasts for a couple of minutes, sends the disks
  racing This _sounds_ suspiciously like segment merging, especially the
  disks racing bit. Or possibly replication. Neither of which make much
  sense. But is there any chance that somehow multiple commits are being
  issued? Of course if varnish1 is a slave, that shouldn't be happening
  either.
 
  And the whole bit about nothing going to the logs is just bizarre. I'm
  tempted to claim hardware gremlins, especially if you see nothing similar
  on varnish2. Or some other process is pegging the machine. All of which
 is
  a way of saying I have no idea
 
  Yours in bewilderment,
  Erick
 
 
 
  On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen j...@mcb.dk wrote:
 
   I apologize for the late reply.
  
   The query load is more or less stable during the spikes. There are
 always
   fluctuations, but nothing on the order of magnitude that could explain
  this
   spike. In fact, the latest spike occured last night when there were
  almost
   noone using it.
  
   To test a hunch of mine, I tried to deactivate all caches by commenting
  out
   all cache entries in solrconfig.xml. It still spikes, so I dont think
 it
   has anything to do with cache warming or hits/misses or anything of the
   sort.
  
   One interesting thing GC though. This is our latest spike with cpu load
   (this server has 8 cores, so a load higher than 8 is potentially
   troublesome):
  
   2012.Nov.27 19:58:182.27
   2012.Nov.27 19:57:174.06
   2012.Nov.27 19:56:188.95
   2012.Nov.27 19:55:1719.97
   2012.Nov.27 19:54:1732.27
   2012.Nov.27 19:53:181.67
   2012.Nov.27 19:52:171.6
   2012.Nov.27 19:51:181.77
   2012.Nov.27 19:50:171.89
  
   This is what the GC was doing around that time:
  
   2012-11-27T19:50:04.933+0100: [GC [PSYoungGen:
  4777586K-277641K(4969216K)]
   8887542K-4387693K(9405824K), 0.0856360 secs] [Times: user=0.54
 sys=0.00,
   real=0.09 secs]
   2012-11-27T19:50:30.785+0100: [GC [PSYoungGen:
  4749769K-325171K(5068096K)]
   8859821K-4435320K(9504704K), 0.0992160 secs] [Times: user=0.63
 sys=0.00,
   real=0.10 secs]
   2012-11-27T19:51:12.293+0100: [GC [PSYoungGen:
  4911603K-306181K(5071168K)]
   9021752K-4416617K(9507776K), 0.0957890 secs] [Times: user=0.62
 sys=0.00,
   real=0.09 secs]
   2012-11-27T19:51:52.817+0100: [GC [PSYoungGen:
  4892613K-376175K(5075328K)]
   9003049K-4486755K(9511936K), 0.1099830 secs] [Times: user=0.79
 sys=0.01,
   real=0.11 secs]
   2012-11-27T19:52:29.454+0100: [GC [PSYoungGen:
  4972847K-271468K(4868160K)]
   9083427K-4383520K(9304768K), 0.0699660 secs] [Times

Re: SOLR4 cluster - strange CPU spike on slave

2012-12-04 Thread John Nielsen
Success!

I tried adding -XX:+UseConcMarkSweepGC to java to make it GC earlier. We
haven't seen any spikes since.

I'm cautiously optimistic though and will be monitoring the servers for a
week or so before declaring final victory.

The post about mmapdirectory is really interesting. We switched to using
that from NRTCachingDirectory and am monitoring performance as well.
Initially performance doesn't look stellar, but i suspect that we lack
memory in the server to really make it shine.


Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Nov 30, 2012 at 3:13 PM, Erick Erickson erickerick...@gmail.comwrote:

 right, so here's what I'd check for.

 Your logs should show a replication pretty coincident with the spike and
 that should be in the log. Note: the replication should complete just
 before the spike.

 Or you can just turn replication off and fire it manually to try to force
 the situation at will, see:
 http://wiki.apache.org/solr/SolrReplication#HTTP_API. (but note that
 you'll
 have to wait until the index has changed on the master to see any action).

 So you should be able to create your spike at will. And this will be pretty
 normal. When replication happens, a new searcher is opened, caches are
 filled, autowarming is done, all kinds of stuff like that. During this
 period, the _old_ searcher is still open, which will both cause the CPU to
 be busier and require additional memory. Once the new searcher is warmed,
 new queries go to it, and when the old searcher has finished serving all
 the queries it shuts down and all the resources are freed. Which is why
 commits are expensive operations.

 All of which means that so far I don't think there's a problem, this is
 just normal Solr operation. If you're seeing responsiveness problems when
 serving queries you probably want to throw more hardware (particularly
 memory) at the problem.

 But when thinking about memory allocating to the JVM, _really_ read Uwe's
 post here:
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 Best
 Erick


 On Thu, Nov 29, 2012 at 2:39 AM, John Nielsen j...@mcb.dk wrote:

  Yup you read it right.
 
  We originally intended to do all our indexing to varnish02, replicate to
  varnish01 and then search from varnish01 (through a fail-over ip which
  would switch the reader to varnish02 in case of trouble).
 
  When I saw the spikes, I tried to eliminate possibilities by starting
  searching from varnish02, leaving varnish01 with nothing to do but to
  receive replication data. This did not remove the spikes. As soon as this
  spike is fixed, I will start searching from varnish01 again. These sort
 of
  debug antics are only possible because, although we do have customers
 using
  this, we are still in our beta phase.
 
  Varnish01 never receives any manual commit orders. Varnish02 does from
 time
  to time.
 
  Oh, and I accidentally misinformed you before. (damn secondary language)
 We
  are actually seeing the spikes on both servers. I was just focusing on
  varnish01 because I use it to eliminate possibilities.
 
  It just occurred to me now; We tried switching off our feeder/index tool
  for 24 hours, and we didn't see any spikes during this period, so
 receiving
  replication data certainly has something to do with it.
 
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Thu, Nov 29, 2012 at 3:20 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Am I reading this right? All you're doing on varnish1 is replicating to
  it?
   You're not searching or indexing? I'm sure I'm misreading this.
  
  
   The spike, which only lasts for a couple of minutes, sends the disks
   racing This _sounds_ suspiciously like segment merging, especially the
   disks racing bit. Or possibly replication. Neither of which make much
   sense. But is there any chance that somehow multiple commits are being
   issued? Of course if varnish1 is a slave, that shouldn't be happening
   either.
  
   And the whole bit about nothing going to the logs is just bizarre. I'm
   tempted to claim hardware gremlins, especially if you see nothing
 similar
   on varnish2. Or some other process is pegging the machine. All of which
  is
   a way of saying I have no idea
  
   Yours in bewilderment,
   Erick
  
  
  
   On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen j...@mcb.dk wrote:
  
I apologize for the late reply.
   
The query load is more or less stable during the spikes. There are
  always
fluctuations, but nothing on the order of magnitude that could
 explain
   this
spike. In fact, the latest spike occured last night when there were
   almost
noone using it.
   
To test a hunch of mine, I tried to deactivate all caches by
 commenting

Re: SOLR4 cluster - strange CPU spike on slave

2012-11-28 Thread John Nielsen
 this could be. Google has not
given me any clues.


Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Nov 23, 2012 at 1:56 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Strange indeed. What about query load/ayes during that time? What about GC?
 And does cache hit rate drop?

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm
 On Nov 23, 2012 2:45 AM, John Nielsen j...@mcb.dk wrote:

  Hi all,
 
  We are seeing a strange CPU spike on one of our solr4 servers which we
 are
  unable to explain. The spike, which only lasts for a couple of minutes,
  sends the disks racing. This happens a few times a times a day. This is
  what the load looks like:
 
  2012.Nov.14 13:37:172.77
  2012.Nov.14 13:36:173.65
  2012.Nov.14 13:35:183.92
  2012.Nov.14 13:34:173.95
  2012.Nov.14 13:33:186.56
  2012.Nov.14 13:32:1710.79
  2012.Nov.14 13:31:1724.38
  2012.Nov.14 13:30:1763.35
  2012.Nov.14 13:29:1724.68
  2012.Nov.14 13:28:172.44
  2012.Nov.14 13:27:183.51
  2012.Nov.14 13:26:175.26
  2012.Nov.14 13:25:185.71
  2012.Nov.14 13:24:172.7
 
  The problem is that out of a 3 minute spike, I get about 40 seconds of
  silence in the logs. This log usually adds like a thousand lines every
  second. Not being able to communicate with the server for this long,
 breaks
  our use case.
 
  We have two servers, varnish01 and varnish02. We used to feed data to
  varnish02, replicate it to varnish01 where the data is then read from.
 When
  we discovered this issue, we moved all traffic to varnish02 so that data
 is
  being replicated to varnish01, but other than that, gets zero traffic.
 The
  spike did not disappear.
 
  The spike we are seeing is on varnish01 only.
 
  Please note that our use case requires us to continuously feed large
  amounts of data from our main system in the order of up to 1.000
 registers
  every minute. Our solrconfig.xml is attached.
 
  Has anyone seen this phenomenon before?
 
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 



Re: SOLR4 cluster - strange CPU spike on slave

2012-11-28 Thread John Nielsen
Yup you read it right.

We originally intended to do all our indexing to varnish02, replicate to
varnish01 and then search from varnish01 (through a fail-over ip which
would switch the reader to varnish02 in case of trouble).

When I saw the spikes, I tried to eliminate possibilities by starting
searching from varnish02, leaving varnish01 with nothing to do but to
receive replication data. This did not remove the spikes. As soon as this
spike is fixed, I will start searching from varnish01 again. These sort of
debug antics are only possible because, although we do have customers using
this, we are still in our beta phase.

Varnish01 never receives any manual commit orders. Varnish02 does from time
to time.

Oh, and I accidentally misinformed you before. (damn secondary language) We
are actually seeing the spikes on both servers. I was just focusing on
varnish01 because I use it to eliminate possibilities.

It just occurred to me now; We tried switching off our feeder/index tool
for 24 hours, and we didn't see any spikes during this period, so receiving
replication data certainly has something to do with it.

Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Thu, Nov 29, 2012 at 3:20 AM, Erick Erickson erickerick...@gmail.comwrote:

 Am I reading this right? All you're doing on varnish1 is replicating to it?
 You're not searching or indexing? I'm sure I'm misreading this.


 The spike, which only lasts for a couple of minutes, sends the disks
 racing This _sounds_ suspiciously like segment merging, especially the
 disks racing bit. Or possibly replication. Neither of which make much
 sense. But is there any chance that somehow multiple commits are being
 issued? Of course if varnish1 is a slave, that shouldn't be happening
 either.

 And the whole bit about nothing going to the logs is just bizarre. I'm
 tempted to claim hardware gremlins, especially if you see nothing similar
 on varnish2. Or some other process is pegging the machine. All of which is
 a way of saying I have no idea

 Yours in bewilderment,
 Erick



 On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen j...@mcb.dk wrote:

  I apologize for the late reply.
 
  The query load is more or less stable during the spikes. There are always
  fluctuations, but nothing on the order of magnitude that could explain
 this
  spike. In fact, the latest spike occured last night when there were
 almost
  noone using it.
 
  To test a hunch of mine, I tried to deactivate all caches by commenting
 out
  all cache entries in solrconfig.xml. It still spikes, so I dont think it
  has anything to do with cache warming or hits/misses or anything of the
  sort.
 
  One interesting thing GC though. This is our latest spike with cpu load
  (this server has 8 cores, so a load higher than 8 is potentially
  troublesome):
 
  2012.Nov.27 19:58:182.27
  2012.Nov.27 19:57:174.06
  2012.Nov.27 19:56:188.95
  2012.Nov.27 19:55:1719.97
  2012.Nov.27 19:54:1732.27
  2012.Nov.27 19:53:181.67
  2012.Nov.27 19:52:171.6
  2012.Nov.27 19:51:181.77
  2012.Nov.27 19:50:171.89
 
  This is what the GC was doing around that time:
 
  2012-11-27T19:50:04.933+0100: [GC [PSYoungGen:
 4777586K-277641K(4969216K)]
  8887542K-4387693K(9405824K), 0.0856360 secs] [Times: user=0.54 sys=0.00,
  real=0.09 secs]
  2012-11-27T19:50:30.785+0100: [GC [PSYoungGen:
 4749769K-325171K(5068096K)]
  8859821K-4435320K(9504704K), 0.0992160 secs] [Times: user=0.63 sys=0.00,
  real=0.10 secs]
  2012-11-27T19:51:12.293+0100: [GC [PSYoungGen:
 4911603K-306181K(5071168K)]
  9021752K-4416617K(9507776K), 0.0957890 secs] [Times: user=0.62 sys=0.00,
  real=0.09 secs]
  2012-11-27T19:51:52.817+0100: [GC [PSYoungGen:
 4892613K-376175K(5075328K)]
  9003049K-4486755K(9511936K), 0.1099830 secs] [Times: user=0.79 sys=0.01,
  real=0.11 secs]
  2012-11-27T19:52:29.454+0100: [GC [PSYoungGen:
 4972847K-271468K(4868160K)]
  9083427K-4383520K(9304768K), 0.0699660 secs] [Times: user=0.48 sys=0.01,
  real=0.07 secs]
  2012-11-27T19:53:08.176+0100: [GC [PSYoungGen:
 4868140K-336421K(5090944K)]
  8980192K-4448572K(9527552K), 0.0824350 secs] [Times: user=0.56 sys=0.01,
  real=0.08 secs]
  2012-11-27T19:54:53.534+0100: [GC [PSYoungGen:
 4950373K-340513K(5092864K)]
  9062524K-4468215K(9529472K), 0.1016770 secs] [Times: user=0.71 sys=0.00,
  real=0.10 secs]
  2012-11-27T19:55:02.906+0100: [GC [PSYoungGen:
 4954465K-480488K(4952000K)]
  9082167K-4684537K(9388608K), 0.1813290 secs] [Times: user=1.23 sys=0.09,
  real=0.19 secs]
  2012-11-27T19:55:09.114+0100: [GC [PSYoungGen:
 4951976K-560434K(5031936K)]
  9156025K-5075285K(9547072K), 0.3511090 secs] [Times: user=2.32 sys=0.12,
  real=0.35 secs]
  2012-11-27T19:55:09.465+0100: [Full GC [PSYoungGen:
 560434K-0K(5031936K)]
  [PSOldGen: 4514851K-2793342K(5047296K)] 5075285K-2793342K(10079232K)
  [PSPermGen: 35285K-35285K(44864K)], 5.2310820 secs

SOLR4 cluster - strange CPU spike on slave

2012-11-23 Thread John Nielsen
Hi all,

We are seeing a strange CPU spike on one of our solr4 servers which we are
unable to explain. The spike, which only lasts for a couple of minutes,
sends the disks racing. This happens a few times a times a day. This is
what the load looks like:

2012.Nov.14 13:37:172.77
2012.Nov.14 13:36:173.65
2012.Nov.14 13:35:183.92
2012.Nov.14 13:34:173.95
2012.Nov.14 13:33:186.56
2012.Nov.14 13:32:1710.79
2012.Nov.14 13:31:1724.38
2012.Nov.14 13:30:1763.35
2012.Nov.14 13:29:1724.68
2012.Nov.14 13:28:172.44
2012.Nov.14 13:27:183.51
2012.Nov.14 13:26:175.26
2012.Nov.14 13:25:185.71
2012.Nov.14 13:24:172.7

The problem is that out of a 3 minute spike, I get about 40 seconds of
silence in the logs. This log usually adds like a thousand lines every
second. Not being able to communicate with the server for this long, breaks
our use case.

We have two servers, varnish01 and varnish02. We used to feed data to
varnish02, replicate it to varnish01 where the data is then read from. When
we discovered this issue, we moved all traffic to varnish02 so that data is
being replicated to varnish01, but other than that, gets zero traffic. The
spike did not disappear.

The spike we are seeing is on varnish01 only.

Please note that our use case requires us to continuously feed large
amounts of data from our main system in the order of up to 1.000 registers
every minute.

Has anyone seen this phenomenon before?

Best regards

John Nielsen