Re: 2 Async exceptions during distributed update issue...

2019-11-19 Thread Erick Erickson
First, please don’t use the “schemaless” mode (add-unknown-fields-to-schema in 
your solrconfig) while load testing. There’s quite a bit of work being done 
when Solr discovers an unknown field that’ll cause some instability under heavy 
load.

Second, when you put a large batch against Solr, you have the possibility of 
the update just taking longer than the timeout. There are several timeouts you 
can increase, see the “solr.xml” section of the ref guide.

Best,
Erick

> On Nov 19, 2019, at 12:29 PM, Fiz N  wrote:
> 
> 
> Hello  Solr Experts, 
> 
> Just wanted to follow up in case my question.Would appreciate help on this.
>  
> SOLR Version : 6.6.2
> OS – Linux 3.1.2
> JDK – 1.8
>  
> Shard – 16 – All are active.
> Xms – 16 gb
> Xmx – 16 gb
> Host has 64 cores.
>  
> the update processor chain.attaching the complete updateRequestProcessorChain 
> in a file.
> Attaching physical and CPU memory screenshot.
>  
> There are multiple threads sending products to solr. tested with batch size 
> per thread 50 & 100, it worked fine without error and if the batch size is 
> 1000 the following error occurs.
>  
>  
> I am getting the following error when the batch size is 1000. Please advise.
>  
> 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] - 
> {collection=c:ducts, core=x:ducts_shard15_replica1, 
> node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30, shard=s:shard15} - 
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  2 Async exceptions during distributed update:
> 10.YYY.40.62:8983 failed to respond
> 10.YYY.40.62:8983 failed to respond
>  
> 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] - 
> {collection=c:ducts, core=x:ducts_shard7_replica1, 
> node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29, shard=s:shard7} - 
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: 10.YYY.40.81:8983 failed to 
> respond
>  
>  
> 2019-11-14T19:36:11,599 - ERROR 
> [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2 
> http:10.YYY.40.68:8983//solr//ducts_shard11_replica2 r:core_node26 
> n:10.YYY.40.68:8983_solr c:ducts 
> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131] - 
> {ConcurrentUpdateSolrClient.url=http://10.YYY.40.68:8983/solr/ducts_shard11_replica2,
>  collection=c:ducts, core=x:ducts_shard3_replica2, 
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26, shard=s:shard3} - 
> error
> org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to respond
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
>  ~[httpclient-4.4.1.jar:4.4.1]
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>  ~[httpclient-4.4.1.jar:4.4.1]
> at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>  ~[httpcore-4.4.1.jar:4.4.1]
>  
>  
> 2019-11-14T19:36:14,567 - ERROR 
> [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2 
> http:10.YYY.40.62:8983//solr//ducts_shard2_replica1 r:core_node25 
> n:10.YYY.40.68:8983_solr c:ducts 
> s:shard11:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131] 
> - 
> {ConcurrentUpdateSolrClient.url=http://10.YYY.40.62:8983/solr/ducts_shard2_replica1,
>  collection=c:ducts, core=x:ducts_shard11_replica2, 
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11} - 
> error
> java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method) 
> ~[?:1.8.0_232]
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) 
> ~[?:1.8.0_232]
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155) 
> ~[?:1.8.0_232]
>  
>  
> 2019-11-14T19:36:38,851 - ERROR [qtp876213901-542:SolrException@159] - 
> {collection=c:ducts, core=x:ducts_shard11_replica2, 
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11} - 
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Broken pipe (Write failed)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
>  
>  
> Thanks 
>  
> 



2 Async exceptions during distributed update issue...

2019-11-19 Thread Fiz N
Hello  Solr Experts,


Just wanted to follow up in case my question.Would appreciate help on this.



SOLR Version : 6.6.2

OS – Linux 3.1.2

JDK – 1.8



Shard – 16 – All are active.

Xms – 16 gb

Xmx – 16 gb

Host has 64 cores.



the update processor chain.attaching the complete updateRequestProcessorChain
in a file.

Attaching physical and CPU memory screenshot.



There are multiple threads sending products to solr. tested with batch size
per thread 50 & 100, it worked fine without error and if the batch size is
1000 the following error occurs.





*I am getting the following error when the batch size is 1000. Please
advise.*



2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
{collection=c:ducts, core=x:ducts_shard15_replica1,
node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30, shard=s:shard15}
-
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
2 Async exceptions during distributed update:

10.YYY.40.62:8983 failed to respond

10.YYY.40.62:8983 failed to respond



2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
{collection=c:ducts, core=x:ducts_shard7_replica1,
node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29, shard=s:shard7}
-
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: 10.YYY.40.81:8983 failed to
respond





2019-11-14T19:36:11,599 - ERROR
[updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
<http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
n:10.YYY.40.68:8983_solr c:ducts
s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
- {ConcurrentUpdateSolrClient.url=
http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
<http://10.148.40.68:8983/solr/products_shard11_replica2>,
collection=c:ducts, core=x:ducts_shard3_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26, shard=s:shard3}
- error

org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to respond

at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
~[httpclient-4.4.1.jar:4.4.1]

at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
~[httpclient-4.4.1.jar:4.4.1]

at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
~[httpcore-4.4.1.jar:4.4.1]





2019-11-14T19:36:14,567 - ERROR
[updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
<http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
n:10.YYY.40.68:8983_solr c:ducts
s:shard11:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
- {ConcurrentUpdateSolrClient.url=
http://10.YYY.40.62:8983/solr/ducts_shard2_replica1
<http://10.148.40.62:8983/solr/products_shard2_replica1>,
collection=c:ducts, core=x:ducts_shard11_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
- error

java.net.SocketException: Broken pipe (Write failed)

at java.net.SocketOutputStream.socketWrite0(Native Method)
~[?:1.8.0_232]

at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
~[?:1.8.0_232]

at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
~[?:1.8.0_232]





2019-11-14T19:36:38,851 - ERROR [qtp876213901-542:SolrException@159] -
{collection=c:ducts, core=x:ducts_shard11_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
-
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Broken pipe (Write failed)

   at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)

at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)





Thanks

  




  [^\w-\.]
  _





  
-MM-dd'T'HH:mm:ss.SSSZ
-MM-dd'T'HH:mm:ss,SSSZ
-MM-dd'T'HH:mm:ss.SSS
-MM-dd'T'HH:mm:ss,SSS
-MM-dd'T'HH:mm:ssZ
-MM-dd'T'HH:mm:ss
-MM-dd'T'HH:mmZ
-MM-dd'T'HH:mm
-MM-dd HH:mm:ss.SSSZ
-MM-dd HH:mm:ss,SSSZ
-MM-dd HH:mm:ss.SSS
-MM-dd HH:mm:ss,SSS
-MM-dd HH:mm:ssZ
-MM-dd HH:mm:ss
-MM-dd HH:mmZ
-MM-dd HH:mm
-MM-dd
  


  strings
  
java.lang.Boolean
booleans
  
  
java.util.Date
tdates
  
  
java.lang.Long
java.lang.Integer
tlongs
  
  
java.lang.Number
tdoubles
  




  
  
  
  
  
  
  

Fwd: 2 Async exceptions during distributed update

2019-11-18 Thread Fiz N
Thanks Jorn on getting back to me. Please find attached memory/cpu
utilization and complete update process chain.
Please let me know your thoughts.

On Sat, Nov 16, 2019 at 9:09 AM Jörn Franke  wrote:

> Can you please provide the whole update chain ? Below is the
> directoryFactory and not the update processor chain.
>
> > Am 15.11.2019 um 17:34 schrieb Fiz N :
> >
> > 
> > Thanks for your response.
> >
> >  name="DirectoryFactory"class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >
> > attaching the screenshot of physical memory and cpu.
> > Please let me know your thoughts on the below issue.
> >
> >
> >> On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke 
> wrote:
> >> Do you use a updateprocess factory? How does it look like?
> >>
> >> What is the physical memory size and CPU?
> >> What do you mean by “there are 64 cores sending concurrently?” An
> application has 64 threats that send concurrently those updates?
> >>
> >> > Am 15.11.2019 um 02:14 schrieb Fiz N :
> >> >
> >> > Hi Solr Experts,
> >> >
> >> > SOLR Version : 6.6.2
> >> >
> >> > OS – Linux 3.1.2
> >> >
> >> > JDK – 1.8
> >> >
> >> >
> >> >
> >> > Shard – 16 – All are active.
> >> >
> >> > Xms – 16 gb
> >> >
> >> > Xmx – 16 gb
> >> >
> >> >
> >> >
> >> > Schema fields count – 91
> >> >
> >> > Dynamic fields – 83.
> >> >
> >> >
> >> >
> >> > There are multiple threads sending products to solr. tested with
> batch size
> >> > per thread 50 & 100, it worked fine without error and if the batch
> size is
> >> > 1000 the following error occurs.
> >> >
> >> > there are 64 cores that are sending batches concurrently.
> >> >
> >> >
> >> >
> >> > *I am getting the following error when the batch size is 1000. Please
> >> > advise.*
> >> >
> >> >
> >> >
> >> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY]
> -
> >> > {collection=c:ducts, core=x:ducts_shard15_replica1,
> >> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30,
> shard=s:shard15}
> >> > -
> >> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> >> > 2 Async exceptions during distributed update:
> >> >
> >> > 10.YYY.40.62:8983 failed to respond
> >> >
> >> > 10.YYY.40.62:8983 failed to respond
> >> >
> >> >
> >> >
> >> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY]
> -
> >> > {collection=c:ducts, core=x:ducts_shard7_replica1,
> >> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29,
> shard=s:shard7}
> >> > -
> >> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> >> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
> >> > respond
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2019-11-14T19:36:11,599 - ERROR
> >> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
> >> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
> >> > <http://10.148.40.68:8983/solr/products_shard11_replica2>
> r:core_node26
> >> > n:10.YYY.40.68:8983_solr c:ducts
> >> >
> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
> ]
> >> > - {ConcurrentUpdateSolrClient.url=
> >> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
> >> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
> >> > collection=c:ducts, core=x:ducts_shard3_replica2,
> >> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26,
> shard=s:shard3}
> >> > - error
> >> >
> >> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to
> respond
> >> >
> >> >at
> >> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
> >> > ~[httpclient-4.4.1.jar:4.4.1]
> >> >
> >> >at
> >> >
> org.apache.http.impl.conn.Def

Re: 2 Async exceptions during distributed update

2019-11-16 Thread Fiz N
Thanks Jorn on getting back to me. Please find attached memory/cpu
utilization and complete update process chain.
Please let me know your thoughts.

On Sat, Nov 16, 2019 at 9:09 AM Jörn Franke  wrote:

> Can you please provide the whole update chain ? Below is the
> directoryFactory and not the update processor chain.
>
> > Am 15.11.2019 um 17:34 schrieb Fiz N :
> >
> > 
> > Thanks for your response.
> >
> >  name="DirectoryFactory"class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >
> > attaching the screenshot of physical memory and cpu.
> > Please let me know your thoughts on the below issue.
> >
> >
> >> On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke 
> wrote:
> >> Do you use a updateprocess factory? How does it look like?
> >>
> >> What is the physical memory size and CPU?
> >> What do you mean by “there are 64 cores sending concurrently?” An
> application has 64 threats that send concurrently those updates?
> >>
> >> > Am 15.11.2019 um 02:14 schrieb Fiz N :
> >> >
> >> > Hi Solr Experts,
> >> >
> >> > SOLR Version : 6.6.2
> >> >
> >> > OS – Linux 3.1.2
> >> >
> >> > JDK – 1.8
> >> >
> >> >
> >> >
> >> > Shard – 16 – All are active.
> >> >
> >> > Xms – 16 gb
> >> >
> >> > Xmx – 16 gb
> >> >
> >> >
> >> >
> >> > Schema fields count – 91
> >> >
> >> > Dynamic fields – 83.
> >> >
> >> >
> >> >
> >> > There are multiple threads sending products to solr. tested with
> batch size
> >> > per thread 50 & 100, it worked fine without error and if the batch
> size is
> >> > 1000 the following error occurs.
> >> >
> >> > there are 64 cores that are sending batches concurrently.
> >> >
> >> >
> >> >
> >> > *I am getting the following error when the batch size is 1000. Please
> >> > advise.*
> >> >
> >> >
> >> >
> >> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY]
> -
> >> > {collection=c:ducts, core=x:ducts_shard15_replica1,
> >> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30,
> shard=s:shard15}
> >> > -
> >> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> >> > 2 Async exceptions during distributed update:
> >> >
> >> > 10.YYY.40.62:8983 failed to respond
> >> >
> >> > 10.YYY.40.62:8983 failed to respond
> >> >
> >> >
> >> >
> >> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY]
> -
> >> > {collection=c:ducts, core=x:ducts_shard7_replica1,
> >> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29,
> shard=s:shard7}
> >> > -
> >> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> >> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
> >> > respond
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2019-11-14T19:36:11,599 - ERROR
> >> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
> >> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
> >> > <http://10.148.40.68:8983/solr/products_shard11_replica2>
> r:core_node26
> >> > n:10.YYY.40.68:8983_solr c:ducts
> >> >
> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
> ]
> >> > - {ConcurrentUpdateSolrClient.url=
> >> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
> >> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
> >> > collection=c:ducts, core=x:ducts_shard3_replica2,
> >> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26,
> shard=s:shard3}
> >> > - error
> >> >
> >> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to
> respond
> >> >
> >> >at
> >> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
> >> > ~[httpclient-4.4.1.jar:4.4.1]
> >> >
> >> >at
> >> >
> org.apache.http.impl.conn.Def

Re: 2 Async exceptions during distributed update

2019-11-16 Thread Jörn Franke
Can you please provide the whole update chain ? Below is the directoryFactory 
and not the update processor chain.

> Am 15.11.2019 um 17:34 schrieb Fiz N :
> 
> 
> Thanks for your response.
> 
>  name="DirectoryFactory"class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>  
> attaching the screenshot of physical memory and cpu.
> Please let me know your thoughts on the below issue.
> 
> 
>> On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke  wrote:
>> Do you use a updateprocess factory? How does it look like?
>> 
>> What is the physical memory size and CPU?
>> What do you mean by “there are 64 cores sending concurrently?” An 
>> application has 64 threats that send concurrently those updates?
>> 
>> > Am 15.11.2019 um 02:14 schrieb Fiz N :
>> > 
>> > Hi Solr Experts,
>> > 
>> > SOLR Version : 6.6.2
>> > 
>> > OS – Linux 3.1.2
>> > 
>> > JDK – 1.8
>> > 
>> > 
>> > 
>> > Shard – 16 – All are active.
>> > 
>> > Xms – 16 gb
>> > 
>> > Xmx – 16 gb
>> > 
>> > 
>> > 
>> > Schema fields count – 91
>> > 
>> > Dynamic fields – 83.
>> > 
>> > 
>> > 
>> > There are multiple threads sending products to solr. tested with batch size
>> > per thread 50 & 100, it worked fine without error and if the batch size is
>> > 1000 the following error occurs.
>> > 
>> > there are 64 cores that are sending batches concurrently.
>> > 
>> > 
>> > 
>> > *I am getting the following error when the batch size is 1000. Please
>> > advise.*
>> > 
>> > 
>> > 
>> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard15_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30, shard=s:shard15}
>> > -
>> > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > 2 Async exceptions during distributed update:
>> > 
>> > 10.YYY.40.62:8983 failed to respond
>> > 
>> > 10.YYY.40.62:8983 failed to respond
>> > 
>> > 
>> > 
>> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard7_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29, shard=s:shard7}
>> > -
>> > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
>> > respond
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 2019-11-14T19:36:11,599 - ERROR
>> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
>> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
>> > n:10.YYY.40.68:8983_solr c:ducts
>> > s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
>> > - {ConcurrentUpdateSolrClient.url=
>> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
>> > collection=c:ducts, core=x:ducts_shard3_replica2,
>> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26, shard=s:shard3}
>> > - error
>> > 
>> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to 
>> > respond
>> > 
>> >at
>> > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> > 
>> >at
>> > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> > 
>> >at
>> > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>> > ~[httpcore-4.4.1.jar:4.4.1]
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 2019-11-14T19:36:14,567 - ERROR
>> > [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
>> > http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
>> > <http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
>> > n:10.YYY.40.68:898

Re: 2 Async exceptions during distributed update

2019-11-16 Thread Jörn Franke
No screenshot attached. The Apache mailservers filter attachments. Can you
please provide an external link.

On Fri, Nov 15, 2019 at 5:34 PM Fiz N  wrote:

> Thanks for your response.
>
>  "${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>
>
>
> attaching the screenshot of physical memory and cpu.
>
> Please let me know your thoughts on the below issue.
>
>
>
> On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke  wrote:
>
>> Do you use a updateprocess factory? How does it look like?
>>
>> What is the physical memory size and CPU?
>> What do you mean by “there are 64 cores sending concurrently?” An
>> application has 64 threats that send concurrently those updates?
>>
>> > Am 15.11.2019 um 02:14 schrieb Fiz N :
>> >
>> > Hi Solr Experts,
>> >
>> > SOLR Version : 6.6.2
>> >
>> > OS – Linux 3.1.2
>> >
>> > JDK – 1.8
>> >
>> >
>> >
>> > Shard – 16 – All are active.
>> >
>> > Xms – 16 gb
>> >
>> > Xmx – 16 gb
>> >
>> >
>> >
>> > Schema fields count – 91
>> >
>> > Dynamic fields – 83.
>> >
>> >
>> >
>> > There are multiple threads sending products to solr. tested with batch
>> size
>> > per thread 50 & 100, it worked fine without error and if the batch size
>> is
>> > 1000 the following error occurs.
>> >
>> > there are 64 cores that are sending batches concurrently.
>> >
>> >
>> >
>> > *I am getting the following error when the batch size is 1000. Please
>> > advise.*
>> >
>> >
>> >
>> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard15_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30,
>> shard=s:shard15}
>> > -
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > 2 Async exceptions during distributed update:
>> >
>> > 10.YYY.40.62:8983 failed to respond
>> >
>> > 10.YYY.40.62:8983 failed to respond
>> >
>> >
>> >
>> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard7_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29,
>> shard=s:shard7}
>> > -
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
>> > respond
>> >
>> >
>> >
>> >
>> >
>> > 2019-11-14T19:36:11,599 - ERROR
>> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
>> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
>> > n:10.YYY.40.68:8983_solr c:ducts
>> >
>> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
>> ]
>> > - {ConcurrentUpdateSolrClient.url=
>> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
>> > collection=c:ducts, core=x:ducts_shard3_replica2,
>> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26,
>> shard=s:shard3}
>> > - error
>> >
>> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to
>> respond
>> >
>> >at
>> >
>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> >
>> >at
>> >
>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> >
>> >at
>> > org.apache.http.impl.io
>> .AbstractMessageParser.parse(AbstractMessageParser.java:261)
>> > ~[httpcore-4.4.1.jar:4.4.1]
>> >
>> >
>> >
>> >
>> >
>> > 2019-11-14T19:36:14,567 - ERROR
>> > [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
>> > http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
>> > <http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
>> > n:10.YYY.40.68:8983_solr c:ducts
>>

Re: 2 Async exceptions during distributed update

2019-11-16 Thread Fiz N
Hi Solr Experts,

Do you have any thoughts on the below issue ?

On Fri, Nov 15, 2019 at 11:33 AM Fiz N  wrote:

> Thanks for your response.
>
>  "${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>
>
>
> attaching the screenshot of physical memory and cpu.
>
> Please let me know your thoughts on the below issue.
>
>
>
> On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke  wrote:
>
>> Do you use a updateprocess factory? How does it look like?
>>
>> What is the physical memory size and CPU?
>> What do you mean by “there are 64 cores sending concurrently?” An
>> application has 64 threats that send concurrently those updates?
>>
>> > Am 15.11.2019 um 02:14 schrieb Fiz N :
>> >
>> > Hi Solr Experts,
>> >
>> > SOLR Version : 6.6.2
>> >
>> > OS – Linux 3.1.2
>> >
>> > JDK – 1.8
>> >
>> >
>> >
>> > Shard – 16 – All are active.
>> >
>> > Xms – 16 gb
>> >
>> > Xmx – 16 gb
>> >
>> >
>> >
>> > Schema fields count – 91
>> >
>> > Dynamic fields – 83.
>> >
>> >
>> >
>> > There are multiple threads sending products to solr. tested with batch
>> size
>> > per thread 50 & 100, it worked fine without error and if the batch size
>> is
>> > 1000 the following error occurs.
>> >
>> > there are 64 cores that are sending batches concurrently.
>> >
>> >
>> >
>> > *I am getting the following error when the batch size is 1000. Please
>> > advise.*
>> >
>> >
>> >
>> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard15_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30,
>> shard=s:shard15}
>> > -
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > 2 Async exceptions during distributed update:
>> >
>> > 10.YYY.40.62:8983 failed to respond
>> >
>> > 10.YYY.40.62:8983 failed to respond
>> >
>> >
>> >
>> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
>> > {collection=c:ducts, core=x:ducts_shard7_replica1,
>> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29,
>> shard=s:shard7}
>> > -
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
>> > respond
>> >
>> >
>> >
>> >
>> >
>> > 2019-11-14T19:36:11,599 - ERROR
>> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
>> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
>> > n:10.YYY.40.68:8983_solr c:ducts
>> >
>> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
>> ]
>> > - {ConcurrentUpdateSolrClient.url=
>> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
>> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
>> > collection=c:ducts, core=x:ducts_shard3_replica2,
>> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26,
>> shard=s:shard3}
>> > - error
>> >
>> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to
>> respond
>> >
>> >at
>> >
>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> >
>> >at
>> >
>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>> > ~[httpclient-4.4.1.jar:4.4.1]
>> >
>> >at
>> > org.apache.http.impl.io
>> .AbstractMessageParser.parse(AbstractMessageParser.java:261)
>> > ~[httpcore-4.4.1.jar:4.4.1]
>> >
>> >
>> >
>> >
>> >
>> > 2019-11-14T19:36:14,567 - ERROR
>> > [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
>> > http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
>> > <http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
>> > n:10.YYY.40.68:8983_solr c:ducts
>> >
>> s:shard11:StreamingSolrClients$ErrorReportingCo

Re: 2 Async exceptions during distributed update

2019-11-15 Thread Fiz N
Thanks for your response.





attaching the screenshot of physical memory and cpu.

Please let me know your thoughts on the below issue.



On Fri, Nov 15, 2019 at 2:18 AM Jörn Franke  wrote:

> Do you use a updateprocess factory? How does it look like?
>
> What is the physical memory size and CPU?
> What do you mean by “there are 64 cores sending concurrently?” An
> application has 64 threats that send concurrently those updates?
>
> > Am 15.11.2019 um 02:14 schrieb Fiz N :
> >
> > Hi Solr Experts,
> >
> > SOLR Version : 6.6.2
> >
> > OS – Linux 3.1.2
> >
> > JDK – 1.8
> >
> >
> >
> > Shard – 16 – All are active.
> >
> > Xms – 16 gb
> >
> > Xmx – 16 gb
> >
> >
> >
> > Schema fields count – 91
> >
> > Dynamic fields – 83.
> >
> >
> >
> > There are multiple threads sending products to solr. tested with batch
> size
> > per thread 50 & 100, it worked fine without error and if the batch size
> is
> > 1000 the following error occurs.
> >
> > there are 64 cores that are sending batches concurrently.
> >
> >
> >
> > *I am getting the following error when the batch size is 1000. Please
> > advise.*
> >
> >
> >
> > 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
> > {collection=c:ducts, core=x:ducts_shard15_replica1,
> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30,
> shard=s:shard15}
> > -
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> > 2 Async exceptions during distributed update:
> >
> > 10.YYY.40.62:8983 failed to respond
> >
> > 10.YYY.40.62:8983 failed to respond
> >
> >
> >
> > 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
> > {collection=c:ducts, core=x:ducts_shard7_replica1,
> > node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29,
> shard=s:shard7}
> > -
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> > Async exception during distributed update: 10.YYY.40.81:8983 failed to
> > respond
> >
> >
> >
> >
> >
> > 2019-11-14T19:36:11,599 - ERROR
> > [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
> > http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
> > <http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
> > n:10.YYY.40.68:8983_solr c:ducts
> >
> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
> ]
> > - {ConcurrentUpdateSolrClient.url=
> > http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
> > <http://10.148.40.68:8983/solr/products_shard11_replica2>,
> > collection=c:ducts, core=x:ducts_shard3_replica2,
> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26,
> shard=s:shard3}
> > - error
> >
> > org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to
> respond
> >
> >at
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
> > ~[httpclient-4.4.1.jar:4.4.1]
> >
> >at
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> > ~[httpclient-4.4.1.jar:4.4.1]
> >
> >at
> > org.apache.http.impl.io
> .AbstractMessageParser.parse(AbstractMessageParser.java:261)
> > ~[httpcore-4.4.1.jar:4.4.1]
> >
> >
> >
> >
> >
> > 2019-11-14T19:36:14,567 - ERROR
> > [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
> > http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
> > <http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
> > n:10.YYY.40.68:8983_solr c:ducts
> >
> s:shard11:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131
> ]
> > - {ConcurrentUpdateSolrClient.url=
> > http://10.YYY.40.62:8983/solr/ducts_shard2_replica1
> > <http://10.148.40.62:8983/solr/products_shard2_replica1>,
> > collection=c:ducts, core=x:ducts_shard11_replica2,
> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25,
> shard=s:shard11}
> > - error
> >
> > java.net.SocketException: Broken pipe (Write failed)
> >
> >at java.net.SocketOutputStream.socketWrite0(Native Method)
> > ~[?:1.8.0_232]
> >
> >at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> > ~[?:1.8.0_232]
> >
> >at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> > ~[?:1.8.0_232]
> >
> >
> >
> >
> >
> > 2019-11-14T19:36:38,851 - ERROR [qtp876213901-542:SolrException@159] -
> > {collection=c:ducts, core=x:ducts_shard11_replica2,
> > node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25,
> shard=s:shard11}
> > -
> >
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> > Async exception during distributed update: Broken pipe (Write failed)
> >
> >   at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
> >
> >at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
> >
> >
> >
> >
> > Thanks
> >
> > Fiz..
>


Re: 2 Async exceptions during distributed update

2019-11-14 Thread Jörn Franke
Do you use a updateprocess factory? How does it look like?

What is the physical memory size and CPU?
What do you mean by “there are 64 cores sending concurrently?” An application 
has 64 threats that send concurrently those updates?

> Am 15.11.2019 um 02:14 schrieb Fiz N :
> 
> Hi Solr Experts,
> 
> SOLR Version : 6.6.2
> 
> OS – Linux 3.1.2
> 
> JDK – 1.8
> 
> 
> 
> Shard – 16 – All are active.
> 
> Xms – 16 gb
> 
> Xmx – 16 gb
> 
> 
> 
> Schema fields count – 91
> 
> Dynamic fields – 83.
> 
> 
> 
> There are multiple threads sending products to solr. tested with batch size
> per thread 50 & 100, it worked fine without error and if the batch size is
> 1000 the following error occurs.
> 
> there are 64 cores that are sending batches concurrently.
> 
> 
> 
> *I am getting the following error when the batch size is 1000. Please
> advise.*
> 
> 
> 
> 2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
> {collection=c:ducts, core=x:ducts_shard15_replica1,
> node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30, shard=s:shard15}
> -
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> 2 Async exceptions during distributed update:
> 
> 10.YYY.40.62:8983 failed to respond
> 
> 10.YYY.40.62:8983 failed to respond
> 
> 
> 
> 2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
> {collection=c:ducts, core=x:ducts_shard7_replica1,
> node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29, shard=s:shard7}
> -
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> Async exception during distributed update: 10.YYY.40.81:8983 failed to
> respond
> 
> 
> 
> 
> 
> 2019-11-14T19:36:11,599 - ERROR
> [updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
> http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
> <http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
> n:10.YYY.40.68:8983_solr c:ducts
> s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
> - {ConcurrentUpdateSolrClient.url=
> http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
> <http://10.148.40.68:8983/solr/products_shard11_replica2>,
> collection=c:ducts, core=x:ducts_shard3_replica2,
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26, shard=s:shard3}
> - error
> 
> org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to respond
> 
>at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
> ~[httpclient-4.4.1.jar:4.4.1]
> 
>at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> ~[httpclient-4.4.1.jar:4.4.1]
> 
>at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> ~[httpcore-4.4.1.jar:4.4.1]
> 
> 
> 
> 
> 
> 2019-11-14T19:36:14,567 - ERROR
> [updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
> http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
> <http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
> n:10.YYY.40.68:8983_solr c:ducts
> s:shard11:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
> - {ConcurrentUpdateSolrClient.url=
> http://10.YYY.40.62:8983/solr/ducts_shard2_replica1
> <http://10.148.40.62:8983/solr/products_shard2_replica1>,
> collection=c:ducts, core=x:ducts_shard11_replica2,
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
> - error
> 
> java.net.SocketException: Broken pipe (Write failed)
> 
>at java.net.SocketOutputStream.socketWrite0(Native Method)
> ~[?:1.8.0_232]
> 
>at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> ~[?:1.8.0_232]
> 
>at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> ~[?:1.8.0_232]
> 
> 
> 
> 
> 
> 2019-11-14T19:36:38,851 - ERROR [qtp876213901-542:SolrException@159] -
> {collection=c:ducts, core=x:ducts_shard11_replica2,
> node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
> -
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> Async exception during distributed update: Broken pipe (Write failed)
> 
>   at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
> 
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
> 
> 
> 
> 
> Thanks
> 
> Fiz..


2 Async exceptions during distributed update

2019-11-14 Thread Fiz N
Hi Solr Experts,

SOLR Version : 6.6.2

OS – Linux 3.1.2

JDK – 1.8



Shard – 16 – All are active.

Xms – 16 gb

Xmx – 16 gb



Schema fields count – 91

Dynamic fields – 83.



There are multiple threads sending products to solr. tested with batch size
per thread 50 & 100, it worked fine without error and if the batch size is
1000 the following error occurs.

there are 64 cores that are sending batches concurrently.



*I am getting the following error when the batch size is 1000. Please
advise.*



2019-11-14T17:45:44,646 - ERROR [qtp876213901-1169:SolrException@YYY] -
{collection=c:ducts, core=x:ducts_shard15_replica1,
node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node30, shard=s:shard15}
-
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
2 Async exceptions during distributed update:

10.YYY.40.62:8983 failed to respond

10.YYY.40.62:8983 failed to respond



2019-11-14T17:45:48,821 - ERROR [qtp876213901-202:SolrException@YYY] -
{collection=c:ducts, core=x:ducts_shard7_replica1,
node_name=n:10.YYY.40.81:8983_solr, replica=r:core_node29, shard=s:shard7}
-
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: 10.YYY.40.81:8983 failed to
respond





2019-11-14T19:36:11,599 - ERROR
[updateExecutor-2-thread-176-processing-x:ducts_shard3_replica2
http:10.YYY.40.68:8983//solr//ducts_shard11_replica2
<http://10.148.40.68:8983/solr/products_shard11_replica2> r:core_node26
n:10.YYY.40.68:8983_solr c:ducts
s:shard3:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
- {ConcurrentUpdateSolrClient.url=
http://10.YYY.40.68:8983/solr/ducts_shard11_replica2
<http://10.148.40.68:8983/solr/products_shard11_replica2>,
collection=c:ducts, core=x:ducts_shard3_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node26, shard=s:shard3}
- error

org.apache.http.NoHttpResponseException: 10.YYY.40.68:8983 failed to respond

at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
~[httpclient-4.4.1.jar:4.4.1]

at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
~[httpclient-4.4.1.jar:4.4.1]

at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
~[httpcore-4.4.1.jar:4.4.1]





2019-11-14T19:36:14,567 - ERROR
[updateExecutor-2-thread-189-processing-x:ducts_shard11_replica2
http:10.YYY.40.62:8983//solr//ducts_shard2_replica1
<http://10.148.40.62:8983/solr/products_shard2_replica1> r:core_node25
n:10.YYY.40.68:8983_solr c:ducts
s:shard11:StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient@131]
- {ConcurrentUpdateSolrClient.url=
http://10.YYY.40.62:8983/solr/ducts_shard2_replica1
<http://10.148.40.62:8983/solr/products_shard2_replica1>,
collection=c:ducts, core=x:ducts_shard11_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
- error

java.net.SocketException: Broken pipe (Write failed)

at java.net.SocketOutputStream.socketWrite0(Native Method)
~[?:1.8.0_232]

at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
~[?:1.8.0_232]

at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
~[?:1.8.0_232]





2019-11-14T19:36:38,851 - ERROR [qtp876213901-542:SolrException@159] -
{collection=c:ducts, core=x:ducts_shard11_replica2,
node_name=n:10.YYY.40.68:8983_solr, replica=r:core_node25, shard=s:shard11}
-
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Broken pipe (Write failed)

   at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)

at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)




Thanks

Fiz..


Re: Async exceptions during distributed update

2018-05-14 Thread Jay Potharaju
Adding some more context to my last email
Solr:6.6.3
2 nodes : 3 shards each
No replication .
Can someone answer the following questions 
1) any ideas on why the following errors keep happening. AFAIK streaming solr 
clients error is  because of timeouts when connecting to other nodes. 
Async errors are also network related as explained earlier in the email by Emir.
There were no network issues but the error has comeback and filling up my logs. 
2) is anyone using solr 6.6.3 in production and what has their experience been 
so far.
3) is there any good documentation or blog post that would explain about inner 
working of solrcloud networking?

Thanks
Jay
org.apache.solr.update.StreamingSolrClients  
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during 


> On May 13, 2018, at 9:21 PM, Jay Potharaju  wrote:
> 
> Hi,
> I restarted both my solr servers but I am seeing the async error again. In 
> older 5x version of solrcloud, solr would normally recover gracefully in case 
> of network errors, but solr 6.6.3 does not seem to be doing that. At this 
> time I am not doing only a small percentage of  deletebyquery operations, its 
> mostly indexing of documents only.
> I have not noticed any network blip like last time.  Any suggestions or is 
> any else also having the same issue on solr 6.6.3?
> 
>   I am again seeing the following two errors back to back. 
> 
>  ERROR org.apache.solr.update.StreamingSolrClients  
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Read timed out
> Thanks
> Jay 
>  
> 
> 
>> On Wed, May 9, 2018 at 12:34 AM Emir Arnautović 
>>  wrote:
>> Hi Jay,
>> Network blip might be the cause, but also the consequence of this issue. 
>> Maybe you can try avoiding DBQ while indexing and see if it is the cause. 
>> You can do thread dump on “the other” node and see if there are blocked 
>> threads and that can give you more clues what’s going on.
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>> > On 8 May 2018, at 17:53, Jay Potharaju  wrote:
>> > 
>> > Hi Emir,
>> > I was seeing this error as long as the indexing was running. Once I stopped
>> > the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
>> > but have not seen anything out of the ordinary except for a small network
>> > blip. In my experience solr generally recovers after a network blip and
>> > there are a few errors for streaming solr client...but have never seen this
>> > error before.
>> > 
>> > Thanks
>> > Jay
>> > 
>> > Thanks
>> > Jay Potharaju
>> > 
>> > 
>> > On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
>> > emir.arnauto...@sematext.com> wrote:
>> > 
>> >> Hi Jay,
>> >> This is low ingestion rate. What is the size of your index? What is heap
>> >> size? I am guessing that this is not a huge index, so  I am leaning toward
>> >> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
>> >> is blocking indexing. Though, it is strange that it is happening only on
>> >> one node if you are sending updates randomly to both nodes. Do you monitor
>> >> your hosts/Solr? Do you see anything different at the time when timeouts
>> >> happen?
>> >> 
>> >> Thanks,
>> >> Emir
>> >> --
>> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >> 
>> >> 
>> >> 
>> >>> On 8 May 2018, at 03:23, Jay Potharaju  wrote:
>> >>> 
>> >>> I have about 3-5 updates per second.
>> >>> 
>> >>> 
>>  On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
>>  
>> > On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> > There are some deletes by query. I have not had any issues with DBQ,
>> > currently have 5.3 running in production.
>>  
>>  Here's the big problem with DBQ.  Imagine this sequence of events with
>>  these timestamps:
>>  
>>  13:00:00: A commit for change visibility happens.
>>  13:00:00: A segment merge is triggered by the commit.
>>  (It's a big merge that takes exactly 3 minutes.)
>>  13:00:05: A deleteByQuery is sent.
>>  13:00:15: An update to the index is sent.
>>  13:00:25: An update to the index is sent.
>>  13:00:35: An update to the index is sent.
>>  13:00:45: An update to the index is sent.
>>  13:00:55: An update to the index is sent.
>>  13:01:05: An update to the index is sent.
>>  13:01:15: An update to the index is sent.
>>  13:01:25: An update to the index is sent.
>>  {time passes, more updates might be sent}
>>  13:03:00: The merge finishes.
>>  
>>  Here's what would happen in this scenario:  The DBQ and all of the
>>  update requests sent *after* the DBQ will block until the merge
>>  finishes.  

Re: Async exceptions during distributed update

2018-05-13 Thread Jay Potharaju
Hi,
I restarted both my solr servers but I am seeing the async error again. In
older 5x version of solrcloud, solr would normally recover gracefully in
case of network errors, but solr 6.6.3 does not seem to be doing that. At
this time I am not doing only a small percentage of  deletebyquery
operations, its mostly indexing of documents only.
I have not noticed any network blip like last time.  Any suggestions or is
any else also having the same issue on solr 6.6.3?

  I am again seeing the following two errors back to back.

 ERROR org.apache.solr.update.StreamingSolrClients

org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Read timed out
Thanks
Jay



On Wed, May 9, 2018 at 12:34 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> Network blip might be the cause, but also the consequence of this issue.
> Maybe you can try avoiding DBQ while indexing and see if it is the cause.
> You can do thread dump on “the other” node and see if there are blocked
> threads and that can give you more clues what’s going on.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 17:53, Jay Potharaju  wrote:
> >
> > Hi Emir,
> > I was seeing this error as long as the indexing was running. Once I
> stopped
> > the indexing the errors also stopped.  Yes, we do monitor both hosts &
> solr
> > but have not seen anything out of the ordinary except for a small network
> > blip. In my experience solr generally recovers after a network blip and
> > there are a few errors for streaming solr client...but have never seen
> this
> > error before.
> >
> > Thanks
> > Jay
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Jay,
> >> This is low ingestion rate. What is the size of your index? What is heap
> >> size? I am guessing that this is not a huge index, so  I am leaning
> toward
> >> what Shawn mentioned - some combination of DBQ/merge/commit/optimise
> that
> >> is blocking indexing. Though, it is strange that it is happening only on
> >> one node if you are sending updates randomly to both nodes. Do you
> monitor
> >> your hosts/Solr? Do you see anything different at the time when timeouts
> >> happen?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 8 May 2018, at 03:23, Jay Potharaju  wrote:
> >>>
> >>> I have about 3-5 updates per second.
> >>>
> >>>
>  On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
> 
> > On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> > There are some deletes by query. I have not had any issues with DBQ,
> > currently have 5.3 running in production.
> 
>  Here's the big problem with DBQ.  Imagine this sequence of events with
>  these timestamps:
> 
>  13:00:00: A commit for change visibility happens.
>  13:00:00: A segment merge is triggered by the commit.
>  (It's a big merge that takes exactly 3 minutes.)
>  13:00:05: A deleteByQuery is sent.
>  13:00:15: An update to the index is sent.
>  13:00:25: An update to the index is sent.
>  13:00:35: An update to the index is sent.
>  13:00:45: An update to the index is sent.
>  13:00:55: An update to the index is sent.
>  13:01:05: An update to the index is sent.
>  13:01:15: An update to the index is sent.
>  13:01:25: An update to the index is sent.
>  {time passes, more updates might be sent}
>  13:03:00: The merge finishes.
> 
>  Here's what would happen in this scenario:  The DBQ and all of the
>  update requests sent *after* the DBQ will block until the merge
>  finishes.  That means that it's going to take up to three minutes for
>  Solr to respond to those requests.  If the client that is sending the
>  request is configured with a 60 second socket timeout, which
> inter-node
>  requests made by Solr are by default, then it is going to experience a
>  timeout error.  The request will probably complete successfully once
> the
>  merge finishes, but the connection is gone, and the client has already
>  received an error.
> 
>  Now imagine what happens if an optimize (forced merge of the entire
>  index) is requested on an index that's 50GB.  That optimize may take
> 2-3
>  hours, possibly longer.  A deleteByQuery started on that index after
> the
>  optimize begins (and any updates requested after the DBQ) will pause
>  until the optimize is done.  A pause of 2 hours or more is a BIG
> >> problem.
> 
>  This is why deleteByQuery is not recommended.
> 
>  If the deleteByQuery were changed into a two-step process

Re: Async exceptions during distributed update

2018-05-09 Thread Emir Arnautović
Hi Jay,
Network blip might be the cause, but also the consequence of this issue. Maybe 
you can try avoiding DBQ while indexing and see if it is the cause. You can do 
thread dump on “the other” node and see if there are blocked threads and that 
can give you more clues what’s going on.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 8 May 2018, at 17:53, Jay Potharaju  wrote:
> 
> Hi Emir,
> I was seeing this error as long as the indexing was running. Once I stopped
> the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
> but have not seen anything out of the ordinary except for a small network
> blip. In my experience solr generally recovers after a network blip and
> there are a few errors for streaming solr client...but have never seen this
> error before.
> 
> Thanks
> Jay
> 
> Thanks
> Jay Potharaju
> 
> 
> On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Jay,
>> This is low ingestion rate. What is the size of your index? What is heap
>> size? I am guessing that this is not a huge index, so  I am leaning toward
>> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
>> is blocking indexing. Though, it is strange that it is happening only on
>> one node if you are sending updates randomly to both nodes. Do you monitor
>> your hosts/Solr? Do you see anything different at the time when timeouts
>> happen?
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 8 May 2018, at 03:23, Jay Potharaju  wrote:
>>> 
>>> I have about 3-5 updates per second.
>>> 
>>> 
 On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
 
> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> There are some deletes by query. I have not had any issues with DBQ,
> currently have 5.3 running in production.
 
 Here's the big problem with DBQ.  Imagine this sequence of events with
 these timestamps:
 
 13:00:00: A commit for change visibility happens.
 13:00:00: A segment merge is triggered by the commit.
 (It's a big merge that takes exactly 3 minutes.)
 13:00:05: A deleteByQuery is sent.
 13:00:15: An update to the index is sent.
 13:00:25: An update to the index is sent.
 13:00:35: An update to the index is sent.
 13:00:45: An update to the index is sent.
 13:00:55: An update to the index is sent.
 13:01:05: An update to the index is sent.
 13:01:15: An update to the index is sent.
 13:01:25: An update to the index is sent.
 {time passes, more updates might be sent}
 13:03:00: The merge finishes.
 
 Here's what would happen in this scenario:  The DBQ and all of the
 update requests sent *after* the DBQ will block until the merge
 finishes.  That means that it's going to take up to three minutes for
 Solr to respond to those requests.  If the client that is sending the
 request is configured with a 60 second socket timeout, which inter-node
 requests made by Solr are by default, then it is going to experience a
 timeout error.  The request will probably complete successfully once the
 merge finishes, but the connection is gone, and the client has already
 received an error.
 
 Now imagine what happens if an optimize (forced merge of the entire
 index) is requested on an index that's 50GB.  That optimize may take 2-3
 hours, possibly longer.  A deleteByQuery started on that index after the
 optimize begins (and any updates requested after the DBQ) will pause
 until the optimize is done.  A pause of 2 hours or more is a BIG
>> problem.
 
 This is why deleteByQuery is not recommended.
 
 If the deleteByQuery were changed into a two-step process involving a
 query to retrieve ID values and then one or more deleteById requests,
 then none of that blocking would occur.  The deleteById operation can
 run at the same time as a segment merge, so neither it nor subsequent
 update requests will have the significant pause.  From what I
 understand, you can even do commits in this scenario and have changes be
 visible before the merge completes.  I haven't verified that this is the
 case.
 
 Experienced devs: Can we fix this problem with DBQ?  On indexes with a
 uniqueKey, can DBQ be changed to use the two-step process I mentioned?
 
 Thanks,
 Shawn
 
>> 
>> 



Re: Async exceptions during distributed update

2018-05-08 Thread Jay Potharaju
Hi Emir,
I was seeing this error as long as the indexing was running. Once I stopped
the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
but have not seen anything out of the ordinary except for a small network
blip. In my experience solr generally recovers after a network blip and
there are a few errors for streaming solr client...but have never seen this
error before.

Thanks
Jay

Thanks
Jay Potharaju


On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> This is low ingestion rate. What is the size of your index? What is heap
> size? I am guessing that this is not a huge index, so  I am leaning toward
> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
> is blocking indexing. Though, it is strange that it is happening only on
> one node if you are sending updates randomly to both nodes. Do you monitor
> your hosts/Solr? Do you see anything different at the time when timeouts
> happen?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 03:23, Jay Potharaju  wrote:
> >
> > I have about 3-5 updates per second.
> >
> >
> >> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
> >>
> >>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> >>> There are some deletes by query. I have not had any issues with DBQ,
> >>> currently have 5.3 running in production.
> >>
> >> Here's the big problem with DBQ.  Imagine this sequence of events with
> >> these timestamps:
> >>
> >> 13:00:00: A commit for change visibility happens.
> >> 13:00:00: A segment merge is triggered by the commit.
> >> (It's a big merge that takes exactly 3 minutes.)
> >> 13:00:05: A deleteByQuery is sent.
> >> 13:00:15: An update to the index is sent.
> >> 13:00:25: An update to the index is sent.
> >> 13:00:35: An update to the index is sent.
> >> 13:00:45: An update to the index is sent.
> >> 13:00:55: An update to the index is sent.
> >> 13:01:05: An update to the index is sent.
> >> 13:01:15: An update to the index is sent.
> >> 13:01:25: An update to the index is sent.
> >> {time passes, more updates might be sent}
> >> 13:03:00: The merge finishes.
> >>
> >> Here's what would happen in this scenario:  The DBQ and all of the
> >> update requests sent *after* the DBQ will block until the merge
> >> finishes.  That means that it's going to take up to three minutes for
> >> Solr to respond to those requests.  If the client that is sending the
> >> request is configured with a 60 second socket timeout, which inter-node
> >> requests made by Solr are by default, then it is going to experience a
> >> timeout error.  The request will probably complete successfully once the
> >> merge finishes, but the connection is gone, and the client has already
> >> received an error.
> >>
> >> Now imagine what happens if an optimize (forced merge of the entire
> >> index) is requested on an index that's 50GB.  That optimize may take 2-3
> >> hours, possibly longer.  A deleteByQuery started on that index after the
> >> optimize begins (and any updates requested after the DBQ) will pause
> >> until the optimize is done.  A pause of 2 hours or more is a BIG
> problem.
> >>
> >> This is why deleteByQuery is not recommended.
> >>
> >> If the deleteByQuery were changed into a two-step process involving a
> >> query to retrieve ID values and then one or more deleteById requests,
> >> then none of that blocking would occur.  The deleteById operation can
> >> run at the same time as a segment merge, so neither it nor subsequent
> >> update requests will have the significant pause.  From what I
> >> understand, you can even do commits in this scenario and have changes be
> >> visible before the merge completes.  I haven't verified that this is the
> >> case.
> >>
> >> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> >> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: Async exceptions during distributed update

2018-05-08 Thread Emir Arnautović
Hi Jay,
This is low ingestion rate. What is the size of your index? What is heap size? 
I am guessing that this is not a huge index, so  I am leaning toward what Shawn 
mentioned - some combination of DBQ/merge/commit/optimise that is blocking 
indexing. Though, it is strange that it is happening only on one node if you 
are sending updates randomly to both nodes. Do you monitor your hosts/Solr? Do 
you see anything different at the time when timeouts happen?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 8 May 2018, at 03:23, Jay Potharaju  wrote:
> 
> I have about 3-5 updates per second.
> 
> 
>> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
>> 
>>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>>> There are some deletes by query. I have not had any issues with DBQ,
>>> currently have 5.3 running in production.
>> 
>> Here's the big problem with DBQ.  Imagine this sequence of events with
>> these timestamps:
>> 
>> 13:00:00: A commit for change visibility happens.
>> 13:00:00: A segment merge is triggered by the commit.
>> (It's a big merge that takes exactly 3 minutes.)
>> 13:00:05: A deleteByQuery is sent.
>> 13:00:15: An update to the index is sent.
>> 13:00:25: An update to the index is sent.
>> 13:00:35: An update to the index is sent.
>> 13:00:45: An update to the index is sent.
>> 13:00:55: An update to the index is sent.
>> 13:01:05: An update to the index is sent.
>> 13:01:15: An update to the index is sent.
>> 13:01:25: An update to the index is sent.
>> {time passes, more updates might be sent}
>> 13:03:00: The merge finishes.
>> 
>> Here's what would happen in this scenario:  The DBQ and all of the
>> update requests sent *after* the DBQ will block until the merge
>> finishes.  That means that it's going to take up to three minutes for
>> Solr to respond to those requests.  If the client that is sending the
>> request is configured with a 60 second socket timeout, which inter-node
>> requests made by Solr are by default, then it is going to experience a
>> timeout error.  The request will probably complete successfully once the
>> merge finishes, but the connection is gone, and the client has already
>> received an error.
>> 
>> Now imagine what happens if an optimize (forced merge of the entire
>> index) is requested on an index that's 50GB.  That optimize may take 2-3
>> hours, possibly longer.  A deleteByQuery started on that index after the
>> optimize begins (and any updates requested after the DBQ) will pause
>> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
>> 
>> This is why deleteByQuery is not recommended.
>> 
>> If the deleteByQuery were changed into a two-step process involving a
>> query to retrieve ID values and then one or more deleteById requests,
>> then none of that blocking would occur.  The deleteById operation can
>> run at the same time as a segment merge, so neither it nor subsequent
>> update requests will have the significant pause.  From what I
>> understand, you can even do commits in this scenario and have changes be
>> visible before the merge completes.  I haven't verified that this is the
>> case.
>> 
>> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
>> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
>> 
>> Thanks,
>> Shawn
>> 



Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
I have about 3-5 updates per second.


> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
> 
>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> There are some deletes by query. I have not had any issues with DBQ,
>> currently have 5.3 running in production.
> 
> Here's the big problem with DBQ.  Imagine this sequence of events with
> these timestamps:
> 
> 13:00:00: A commit for change visibility happens.
> 13:00:00: A segment merge is triggered by the commit.
> (It's a big merge that takes exactly 3 minutes.)
> 13:00:05: A deleteByQuery is sent.
> 13:00:15: An update to the index is sent.
> 13:00:25: An update to the index is sent.
> 13:00:35: An update to the index is sent.
> 13:00:45: An update to the index is sent.
> 13:00:55: An update to the index is sent.
> 13:01:05: An update to the index is sent.
> 13:01:15: An update to the index is sent.
> 13:01:25: An update to the index is sent.
> {time passes, more updates might be sent}
> 13:03:00: The merge finishes.
> 
> Here's what would happen in this scenario:  The DBQ and all of the
> update requests sent *after* the DBQ will block until the merge
> finishes.  That means that it's going to take up to three minutes for
> Solr to respond to those requests.  If the client that is sending the
> request is configured with a 60 second socket timeout, which inter-node
> requests made by Solr are by default, then it is going to experience a
> timeout error.  The request will probably complete successfully once the
> merge finishes, but the connection is gone, and the client has already
> received an error.
> 
> Now imagine what happens if an optimize (forced merge of the entire
> index) is requested on an index that's 50GB.  That optimize may take 2-3
> hours, possibly longer.  A deleteByQuery started on that index after the
> optimize begins (and any updates requested after the DBQ) will pause
> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
> 
> This is why deleteByQuery is not recommended.
> 
> If the deleteByQuery were changed into a two-step process involving a
> query to retrieve ID values and then one or more deleteById requests,
> then none of that blocking would occur.  The deleteById operation can
> run at the same time as a segment merge, so neither it nor subsequent
> update requests will have the significant pause.  From what I
> understand, you can even do commits in this scenario and have changes be
> visible before the merge completes.  I haven't verified that this is the
> case.
> 
> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> 
> Thanks,
> Shawn
> 


Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Thanks for explaining that Shawn!
Emir, I use php library called solarium to do updates/deletes to solr. The 
request is sent to any of the available nodes in the cluster.

> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
> 
>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> There are some deletes by query. I have not had any issues with DBQ,
>> currently have 5.3 running in production.
> 
> Here's the big problem with DBQ.  Imagine this sequence of events with
> these timestamps:
> 
> 13:00:00: A commit for change visibility happens.
> 13:00:00: A segment merge is triggered by the commit.
> (It's a big merge that takes exactly 3 minutes.)
> 13:00:05: A deleteByQuery is sent.
> 13:00:15: An update to the index is sent.
> 13:00:25: An update to the index is sent.
> 13:00:35: An update to the index is sent.
> 13:00:45: An update to the index is sent.
> 13:00:55: An update to the index is sent.
> 13:01:05: An update to the index is sent.
> 13:01:15: An update to the index is sent.
> 13:01:25: An update to the index is sent.
> {time passes, more updates might be sent}
> 13:03:00: The merge finishes.
> 
> Here's what would happen in this scenario:  The DBQ and all of the
> update requests sent *after* the DBQ will block until the merge
> finishes.  That means that it's going to take up to three minutes for
> Solr to respond to those requests.  If the client that is sending the
> request is configured with a 60 second socket timeout, which inter-node
> requests made by Solr are by default, then it is going to experience a
> timeout error.  The request will probably complete successfully once the
> merge finishes, but the connection is gone, and the client has already
> received an error.
> 
> Now imagine what happens if an optimize (forced merge of the entire
> index) is requested on an index that's 50GB.  That optimize may take 2-3
> hours, possibly longer.  A deleteByQuery started on that index after the
> optimize begins (and any updates requested after the DBQ) will pause
> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
> 
> This is why deleteByQuery is not recommended.
> 
> If the deleteByQuery were changed into a two-step process involving a
> query to retrieve ID values and then one or more deleteById requests,
> then none of that blocking would occur.  The deleteById operation can
> run at the same time as a segment merge, so neither it nor subsequent
> update requests will have the significant pause.  From what I
> understand, you can even do commits in this scenario and have changes be
> visible before the merge completes.  I haven't verified that this is the
> case.
> 
> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> 
> Thanks,
> Shawn
> 


Re: Async exceptions during distributed update

2018-05-07 Thread Shawn Heisey
On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> There are some deletes by query. I have not had any issues with DBQ,
> currently have 5.3 running in production.

Here's the big problem with DBQ.  Imagine this sequence of events with
these timestamps:

13:00:00: A commit for change visibility happens.
13:00:00: A segment merge is triggered by the commit.
(It's a big merge that takes exactly 3 minutes.)
13:00:05: A deleteByQuery is sent.
13:00:15: An update to the index is sent.
13:00:25: An update to the index is sent.
13:00:35: An update to the index is sent.
13:00:45: An update to the index is sent.
13:00:55: An update to the index is sent.
13:01:05: An update to the index is sent.
13:01:15: An update to the index is sent.
13:01:25: An update to the index is sent.
{time passes, more updates might be sent}
13:03:00: The merge finishes.

Here's what would happen in this scenario:  The DBQ and all of the
update requests sent *after* the DBQ will block until the merge
finishes.  That means that it's going to take up to three minutes for
Solr to respond to those requests.  If the client that is sending the
request is configured with a 60 second socket timeout, which inter-node
requests made by Solr are by default, then it is going to experience a
timeout error.  The request will probably complete successfully once the
merge finishes, but the connection is gone, and the client has already
received an error.

Now imagine what happens if an optimize (forced merge of the entire
index) is requested on an index that's 50GB.  That optimize may take 2-3
hours, possibly longer.  A deleteByQuery started on that index after the
optimize begins (and any updates requested after the DBQ) will pause
until the optimize is done.  A pause of 2 hours or more is a BIG problem.

This is why deleteByQuery is not recommended.

If the deleteByQuery were changed into a two-step process involving a
query to retrieve ID values and then one or more deleteById requests,
then none of that blocking would occur.  The deleteById operation can
run at the same time as a segment merge, so neither it nor subsequent
update requests will have the significant pause.  From what I
understand, you can even do commits in this scenario and have changes be
visible before the merge completes.  I haven't verified that this is the
case.

Experienced devs: Can we fix this problem with DBQ?  On indexes with a
uniqueKey, can DBQ be changed to use the two-step process I mentioned?

Thanks,
Shawn



Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
How many concurrent updates can be sent? Do you always send updates to the
same node? Do you use solrj?

Emir

On Tue, May 8, 2018, 1:02 AM Jay Potharaju  wrote:

> The updates are pushed in real time not batched. No complex analysis and
> everything is committed using autocommit settings in solr.
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > How do you send documents? Large batches? Complex analysis? Do you send
> all
> > batches to the same node? How do you commit? Do you delete by query while
> > indexing?
> >
> > Emir
> >
> > On Tue, May 8, 2018, 12:30 AM Jay Potharaju 
> wrote:
> >
> > > I didn't see any OOM errors in the logs on either of the nodes. I saw
> GC
> > > pause of 1 second on the box that was throwing error ...but nothing on
> > the
> > > other node. Any other recommendations?
> > > Thanks
> > >
> > >
> > > Thanks
> > > Jay Potharaju
> > >
> > >
> > > On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju 
> > > wrote:
> > >
> > > > Ah thanks for explaining that!
> > > >
> > > > Thanks
> > > > Jay Potharaju
> > > >
> > > >
> > > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> > > > emir.arnauto...@sematext.com> wrote:
> > > >
> > > >> Node A receives batch of documents to index. It forwards documents
> to
> > > >> shards that are on the node B. Node B is having issues with GC so it
> > > takes
> > > >> a while to respond. Node A sees it as read timeout and reports it in
> > > logs.
> > > >> So the issue is on node B not node A.
> > > >>
> > > >> Emir
> > > >> --
> > > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > > >> Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > > >>
> > > >>
> > > >>
> > > >> > On 7 May 2018, at 18:39, Jay Potharaju 
> > wrote:
> > > >> >
> > > >> > Yes, the nodes are well balanced. I am just using these boxes for
> > > >> indexing
> > > >> > the data and is not serving any traffic at this time.  The error
> > > >> indicates
> > > >> > it is having issues errors on the shards that are hosted on the
> box
> > > and
> > > >> not
> > > >> > on the other box.
> > > >> > I will check GC logs to see if there were any issues.
> > > >> > thanks
> > > >> >
> > > >> > Thanks
> > > >> > Jay Potharaju
> > > >> >
> > > >> >
> > > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> > > >> > emir.arnauto...@sematext.com> wrote:
> > > >> >
> > > >> >> Hi Jay,
> > > >> >> My first guess would be that there was some major GC on other box
> > so
> > > it
> > > >> >> did not respond on time. Are your nodes well balanced - do they
> > serve
> > > >> equal
> > > >> >> amount of data?
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Emir
> > > >> >> --
> > > >> >> Monitoring - Log Management - Alerting - Anomaly Detection
> > > >> >> Solr & Elasticsearch Consulting Support Training -
> > > >> http://sematext.com/
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>> On 7 May 2018, at 18:11, Jay Potharaju 
> > > wrote:
> > > >> >>>
> > > >> >>> Hi,
> > > >> >>> I am seeing the following lines in the error log. My setup has 2
> > > >> nodes in
> > > >> >>> the solrcloud cluster, each node has 3 shards with no
> replication.
> > > >> From
> > > >> >> the
> > > >> >>> error log it seems like all the shards on this box are throwing
> > > async
> > > >> >>> exception errors. Other node in the cluster does not have any
> > errors
> > > >> in
> > > >> >> the
> > > >> >>> logs. Any suggestions on how to tackle this error?
> > > >> >>>
> > > >> >>> Solr setup
> > > >> >>> Solr:6.6.3
> > > >> >>> 2Nodes: 3 shards each
> > > >> >>>
> > > >> >>>
> > > >> >>> ERROR org.apache.solr.servlet.HttpSolrCall
> > [test_shard3_replica1] ?
> > > >> >>>
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> > > >> >> DistributedUpdatesAsyncException:
> > > >> >>> Async exception during distributed update: Read timed out
> > > >> >>> at
> > > >> >>>
> > > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> > > >> >> DistributedUpdateProcessor.java:972)
> > > >> >>> at
> > > >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.
> > finish(
> > > >> >> DistributedUpdateProcessor.java:1911)
> > > >> >>> at
> > > >> >>> org.apache.solr.handler.ContentStreamHandlerBase.
> > handleRequestBody(
> > > >> >> ContentStreamHandlerBase.java:78)
> > > >> >>> at
> > > >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> > > >> >> RequestHandlerBase.java:173)
> > > >> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > > >> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
> > > >> java:723)
> > > >> >>> at org.apache.solr.servlet.HttpSolrCall.call(
> > HttpSolrCall.java:529)
> > > >> >>> at
> > > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > >> >> SolrDispatchFilter.java:361)
> > > >> >>> at
> > > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > >> >> SolrDispatchFilter.java:3

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
There are some deletes by query. I have not had any issues with DBQ,
currently have 5.3 running in production.

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 4:02 PM, Jay Potharaju  wrote:

> The updates are pushed in real time not batched. No complex analysis and
> everything is committed using autocommit settings in solr.
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> How do you send documents? Large batches? Complex analysis? Do you send
>> all
>> batches to the same node? How do you commit? Do you delete by query while
>> indexing?
>>
>> Emir
>>
>> On Tue, May 8, 2018, 12:30 AM Jay Potharaju 
>> wrote:
>>
>> > I didn't see any OOM errors in the logs on either of the nodes. I saw GC
>> > pause of 1 second on the box that was throwing error ...but nothing on
>> the
>> > other node. Any other recommendations?
>> > Thanks
>> >
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>> >
>> > On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju 
>> > wrote:
>> >
>> > > Ah thanks for explaining that!
>> > >
>> > > Thanks
>> > > Jay Potharaju
>> > >
>> > >
>> > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
>> > > emir.arnauto...@sematext.com> wrote:
>> > >
>> > >> Node A receives batch of documents to index. It forwards documents to
>> > >> shards that are on the node B. Node B is having issues with GC so it
>> > takes
>> > >> a while to respond. Node A sees it as read timeout and reports it in
>> > logs.
>> > >> So the issue is on node B not node A.
>> > >>
>> > >> Emir
>> > >> --
>> > >> Monitoring - Log Management - Alerting - Anomaly Detection
>> > >> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> > >>
>> > >>
>> > >>
>> > >> > On 7 May 2018, at 18:39, Jay Potharaju 
>> wrote:
>> > >> >
>> > >> > Yes, the nodes are well balanced. I am just using these boxes for
>> > >> indexing
>> > >> > the data and is not serving any traffic at this time.  The error
>> > >> indicates
>> > >> > it is having issues errors on the shards that are hosted on the box
>> > and
>> > >> not
>> > >> > on the other box.
>> > >> > I will check GC logs to see if there were any issues.
>> > >> > thanks
>> > >> >
>> > >> > Thanks
>> > >> > Jay Potharaju
>> > >> >
>> > >> >
>> > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
>> > >> > emir.arnauto...@sematext.com> wrote:
>> > >> >
>> > >> >> Hi Jay,
>> > >> >> My first guess would be that there was some major GC on other box
>> so
>> > it
>> > >> >> did not respond on time. Are your nodes well balanced - do they
>> serve
>> > >> equal
>> > >> >> amount of data?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Emir
>> > >> >> --
>> > >> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> > >> >> Solr & Elasticsearch Consulting Support Training -
>> > >> http://sematext.com/
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>> On 7 May 2018, at 18:11, Jay Potharaju 
>> > wrote:
>> > >> >>>
>> > >> >>> Hi,
>> > >> >>> I am seeing the following lines in the error log. My setup has 2
>> > >> nodes in
>> > >> >>> the solrcloud cluster, each node has 3 shards with no
>> replication.
>> > >> From
>> > >> >> the
>> > >> >>> error log it seems like all the shards on this box are throwing
>> > async
>> > >> >>> exception errors. Other node in the cluster does not have any
>> errors
>> > >> in
>> > >> >> the
>> > >> >>> logs. Any suggestions on how to tackle this error?
>> > >> >>>
>> > >> >>> Solr setup
>> > >> >>> Solr:6.6.3
>> > >> >>> 2Nodes: 3 shards each
>> > >> >>>
>> > >> >>>
>> > >> >>> ERROR org.apache.solr.servlet.HttpSolrCall
>> [test_shard3_replica1] ?
>> > >> >>> null:org.apache.solr.update.processor.DistributedUpdateProce
>> ssor$
>> > >> >> DistributedUpdatesAsyncException:
>> > >> >>> Async exception during distributed update: Read timed out
>> > >> >>> at
>> > >> >>>
>> > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
>> > >> >> DistributedUpdateProcessor.java:972)
>> > >> >>> at
>> > >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.
>> finish(
>> > >> >> DistributedUpdateProcessor.java:1911)
>> > >> >>> at
>> > >> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleReque
>> stBody(
>> > >> >> ContentStreamHandlerBase.java:78)
>> > >> >>> at
>> > >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> > >> >> RequestHandlerBase.java:173)
>> > >> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
>> > >> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
>> > >> java:723)
>> > >> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
>> 529)
>> > >> >>> at
>> > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > >> >> SolrDispatchFilter.java:361)
>> > >> >>> at
>> > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > >> >> SolrDispatchFilter.java:305)
>> > >> >>> at
>> > >> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> > >> >> doFilter(Servlet

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
The updates are pushed in real time not batched. No complex analysis and
everything is committed using autocommit settings in solr.

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> How do you send documents? Large batches? Complex analysis? Do you send all
> batches to the same node? How do you commit? Do you delete by query while
> indexing?
>
> Emir
>
> On Tue, May 8, 2018, 12:30 AM Jay Potharaju  wrote:
>
> > I didn't see any OOM errors in the logs on either of the nodes. I saw GC
> > pause of 1 second on the box that was throwing error ...but nothing on
> the
> > other node. Any other recommendations?
> > Thanks
> >
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju 
> > wrote:
> >
> > > Ah thanks for explaining that!
> > >
> > > Thanks
> > > Jay Potharaju
> > >
> > >
> > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Node A receives batch of documents to index. It forwards documents to
> > >> shards that are on the node B. Node B is having issues with GC so it
> > takes
> > >> a while to respond. Node A sees it as read timeout and reports it in
> > logs.
> > >> So the issue is on node B not node A.
> > >>
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >>
> > >>
> > >>
> > >> > On 7 May 2018, at 18:39, Jay Potharaju 
> wrote:
> > >> >
> > >> > Yes, the nodes are well balanced. I am just using these boxes for
> > >> indexing
> > >> > the data and is not serving any traffic at this time.  The error
> > >> indicates
> > >> > it is having issues errors on the shards that are hosted on the box
> > and
> > >> not
> > >> > on the other box.
> > >> > I will check GC logs to see if there were any issues.
> > >> > thanks
> > >> >
> > >> > Thanks
> > >> > Jay Potharaju
> > >> >
> > >> >
> > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> > >> > emir.arnauto...@sematext.com> wrote:
> > >> >
> > >> >> Hi Jay,
> > >> >> My first guess would be that there was some major GC on other box
> so
> > it
> > >> >> did not respond on time. Are your nodes well balanced - do they
> serve
> > >> equal
> > >> >> amount of data?
> > >> >>
> > >> >> Thanks,
> > >> >> Emir
> > >> >> --
> > >> >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> >> Solr & Elasticsearch Consulting Support Training -
> > >> http://sematext.com/
> > >> >>
> > >> >>
> > >> >>
> > >> >>> On 7 May 2018, at 18:11, Jay Potharaju 
> > wrote:
> > >> >>>
> > >> >>> Hi,
> > >> >>> I am seeing the following lines in the error log. My setup has 2
> > >> nodes in
> > >> >>> the solrcloud cluster, each node has 3 shards with no replication.
> > >> From
> > >> >> the
> > >> >>> error log it seems like all the shards on this box are throwing
> > async
> > >> >>> exception errors. Other node in the cluster does not have any
> errors
> > >> in
> > >> >> the
> > >> >>> logs. Any suggestions on how to tackle this error?
> > >> >>>
> > >> >>> Solr setup
> > >> >>> Solr:6.6.3
> > >> >>> 2Nodes: 3 shards each
> > >> >>>
> > >> >>>
> > >> >>> ERROR org.apache.solr.servlet.HttpSolrCall
> [test_shard3_replica1] ?
> > >> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> > >> >> DistributedUpdatesAsyncException:
> > >> >>> Async exception during distributed update: Read timed out
> > >> >>> at
> > >> >>>
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> > >> >> DistributedUpdateProcessor.java:972)
> > >> >>> at
> > >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.
> finish(
> > >> >> DistributedUpdateProcessor.java:1911)
> > >> >>> at
> > >> >>> org.apache.solr.handler.ContentStreamHandlerBase.
> handleRequestBody(
> > >> >> ContentStreamHandlerBase.java:78)
> > >> >>> at
> > >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> > >> >> RequestHandlerBase.java:173)
> > >> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > >> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
> > >> java:723)
> > >> >>> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:529)
> > >> >>> at
> > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> >> SolrDispatchFilter.java:361)
> > >> >>> at
> > >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> >> SolrDispatchFilter.java:305)
> > >> >>> at
> > >> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > >> >> doFilter(ServletHandler.java:1691)
> > >> >>> at
> > >> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > >> >> ServletHandler.java:582)
> > >> >>> at
> > >> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > >> >> ScopedHandler.java:143)
> > >> >>> at
> > >> >>> org.eclipse.jetty.security.SecurityHandler.handle(
> > >> >> SecurityHandler.java:548)
> > 

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
How do you send documents? Large batches? Complex analysis? Do you send all
batches to the same node? How do you commit? Do you delete by query while
indexing?

Emir

On Tue, May 8, 2018, 12:30 AM Jay Potharaju  wrote:

> I didn't see any OOM errors in the logs on either of the nodes. I saw GC
> pause of 1 second on the box that was throwing error ...but nothing on the
> other node. Any other recommendations?
> Thanks
>
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju 
> wrote:
>
> > Ah thanks for explaining that!
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Node A receives batch of documents to index. It forwards documents to
> >> shards that are on the node B. Node B is having issues with GC so it
> takes
> >> a while to respond. Node A sees it as read timeout and reports it in
> logs.
> >> So the issue is on node B not node A.
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >> > On 7 May 2018, at 18:39, Jay Potharaju  wrote:
> >> >
> >> > Yes, the nodes are well balanced. I am just using these boxes for
> >> indexing
> >> > the data and is not serving any traffic at this time.  The error
> >> indicates
> >> > it is having issues errors on the shards that are hosted on the box
> and
> >> not
> >> > on the other box.
> >> > I will check GC logs to see if there were any issues.
> >> > thanks
> >> >
> >> > Thanks
> >> > Jay Potharaju
> >> >
> >> >
> >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> >> > emir.arnauto...@sematext.com> wrote:
> >> >
> >> >> Hi Jay,
> >> >> My first guess would be that there was some major GC on other box so
> it
> >> >> did not respond on time. Are your nodes well balanced - do they serve
> >> equal
> >> >> amount of data?
> >> >>
> >> >> Thanks,
> >> >> Emir
> >> >> --
> >> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> >> Solr & Elasticsearch Consulting Support Training -
> >> http://sematext.com/
> >> >>
> >> >>
> >> >>
> >> >>> On 7 May 2018, at 18:11, Jay Potharaju 
> wrote:
> >> >>>
> >> >>> Hi,
> >> >>> I am seeing the following lines in the error log. My setup has 2
> >> nodes in
> >> >>> the solrcloud cluster, each node has 3 shards with no replication.
> >> From
> >> >> the
> >> >>> error log it seems like all the shards on this box are throwing
> async
> >> >>> exception errors. Other node in the cluster does not have any errors
> >> in
> >> >> the
> >> >>> logs. Any suggestions on how to tackle this error?
> >> >>>
> >> >>> Solr setup
> >> >>> Solr:6.6.3
> >> >>> 2Nodes: 3 shards each
> >> >>>
> >> >>>
> >> >>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> >> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> >> >> DistributedUpdatesAsyncException:
> >> >>> Async exception during distributed update: Read timed out
> >> >>> at
> >> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> >> >> DistributedUpdateProcessor.java:972)
> >> >>> at
> >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
> >> >> DistributedUpdateProcessor.java:1911)
> >> >>> at
> >> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> >> >> ContentStreamHandlerBase.java:78)
> >> >>> at
> >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> >> >> RequestHandlerBase.java:173)
> >> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> >> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
> >> java:723)
> >> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> >> >>> at
> >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> >> SolrDispatchFilter.java:361)
> >> >>> at
> >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> >> SolrDispatchFilter.java:305)
> >> >>> at
> >> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> >> >> doFilter(ServletHandler.java:1691)
> >> >>> at
> >> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> >> >> ServletHandler.java:582)
> >> >>> at
> >> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> >> ScopedHandler.java:143)
> >> >>> at
> >> >>> org.eclipse.jetty.security.SecurityHandler.handle(
> >> >> SecurityHandler.java:548)
> >> >>> at
> >> >>> org.eclipse.jetty.server.session.SessionHandler.
> >> >> doHandle(SessionHandler.java:226)
> >> >>> at
> >> >>> org.eclipse.jetty.server.handler.ContextHandler.
> >> >> doHandle(ContextHandler.java:1180)
> >> >>> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> >> >> ServletHandler.java:512)
> >> >>> at
> >> >>> org.eclipse.jetty.server.session.SessionHandler.
> >> >> doScope(SessionHandler.java:185)
> >> >>> at
> >> >>> org.eclipse.jetty.server.handler.ContextHandler.
> >> >> doScope(ContextHandler.java:1112)
> >>

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
I didn't see any OOM errors in the logs on either of the nodes. I saw GC
pause of 1 second on the box that was throwing error ...but nothing on the
other node. Any other recommendations?
Thanks


Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju  wrote:

> Ah thanks for explaining that!
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Node A receives batch of documents to index. It forwards documents to
>> shards that are on the node B. Node B is having issues with GC so it takes
>> a while to respond. Node A sees it as read timeout and reports it in logs.
>> So the issue is on node B not node A.
>>
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 7 May 2018, at 18:39, Jay Potharaju  wrote:
>> >
>> > Yes, the nodes are well balanced. I am just using these boxes for
>> indexing
>> > the data and is not serving any traffic at this time.  The error
>> indicates
>> > it is having issues errors on the shards that are hosted on the box and
>> not
>> > on the other box.
>> > I will check GC logs to see if there were any issues.
>> > thanks
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>> >
>> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
>> > emir.arnauto...@sematext.com> wrote:
>> >
>> >> Hi Jay,
>> >> My first guess would be that there was some major GC on other box so it
>> >> did not respond on time. Are your nodes well balanced - do they serve
>> equal
>> >> amount of data?
>> >>
>> >> Thanks,
>> >> Emir
>> >> --
>> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> >> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> >>
>> >>
>> >>
>> >>> On 7 May 2018, at 18:11, Jay Potharaju  wrote:
>> >>>
>> >>> Hi,
>> >>> I am seeing the following lines in the error log. My setup has 2
>> nodes in
>> >>> the solrcloud cluster, each node has 3 shards with no replication.
>> From
>> >> the
>> >>> error log it seems like all the shards on this box are throwing async
>> >>> exception errors. Other node in the cluster does not have any errors
>> in
>> >> the
>> >>> logs. Any suggestions on how to tackle this error?
>> >>>
>> >>> Solr setup
>> >>> Solr:6.6.3
>> >>> 2Nodes: 3 shards each
>> >>>
>> >>>
>> >>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
>> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
>> >> DistributedUpdatesAsyncException:
>> >>> Async exception during distributed update: Read timed out
>> >>> at
>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
>> >> DistributedUpdateProcessor.java:972)
>> >>> at
>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
>> >> DistributedUpdateProcessor.java:1911)
>> >>> at
>> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>> >> ContentStreamHandlerBase.java:78)
>> >>> at
>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> >> RequestHandlerBase.java:173)
>> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
>> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
>> java:723)
>> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
>> >>> at
>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >> SolrDispatchFilter.java:361)
>> >>> at
>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >> SolrDispatchFilter.java:305)
>> >>> at
>> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> >> doFilter(ServletHandler.java:1691)
>> >>> at
>> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> >> ServletHandler.java:582)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> >> ScopedHandler.java:143)
>> >>> at
>> >>> org.eclipse.jetty.security.SecurityHandler.handle(
>> >> SecurityHandler.java:548)
>> >>> at
>> >>> org.eclipse.jetty.server.session.SessionHandler.
>> >> doHandle(SessionHandler.java:226)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.ContextHandler.
>> >> doHandle(ContextHandler.java:1180)
>> >>> at org.eclipse.jetty.servlet.ServletHandler.doScope(
>> >> ServletHandler.java:512)
>> >>> at
>> >>> org.eclipse.jetty.server.session.SessionHandler.
>> >> doScope(SessionHandler.java:185)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.ContextHandler.
>> >> doScope(ContextHandler.java:1112)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> >> ScopedHandler.java:141)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
>> >> ContextHandlerCollection.java:213)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.HandlerCollection.
>> >> handle(HandlerCollection.java:119)
>> >>> at
>> >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> >> HandlerWrapper.java:134)
>> >>> at
>> >>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handl

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Ah thanks for explaining that!

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Node A receives batch of documents to index. It forwards documents to
> shards that are on the node B. Node B is having issues with GC so it takes
> a while to respond. Node A sees it as read timeout and reports it in logs.
> So the issue is on node B not node A.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 May 2018, at 18:39, Jay Potharaju  wrote:
> >
> > Yes, the nodes are well balanced. I am just using these boxes for
> indexing
> > the data and is not serving any traffic at this time.  The error
> indicates
> > it is having issues errors on the shards that are hosted on the box and
> not
> > on the other box.
> > I will check GC logs to see if there were any issues.
> > thanks
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Jay,
> >> My first guess would be that there was some major GC on other box so it
> >> did not respond on time. Are your nodes well balanced - do they serve
> equal
> >> amount of data?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 7 May 2018, at 18:11, Jay Potharaju  wrote:
> >>>
> >>> Hi,
> >>> I am seeing the following lines in the error log. My setup has 2 nodes
> in
> >>> the solrcloud cluster, each node has 3 shards with no replication. From
> >> the
> >>> error log it seems like all the shards on this box are throwing async
> >>> exception errors. Other node in the cluster does not have any errors in
> >> the
> >>> logs. Any suggestions on how to tackle this error?
> >>>
> >>> Solr setup
> >>> Solr:6.6.3
> >>> 2Nodes: 3 shards each
> >>>
> >>>
> >>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> >> DistributedUpdatesAsyncException:
> >>> Async exception during distributed update: Read timed out
> >>> at
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> >> DistributedUpdateProcessor.java:972)
> >>> at
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
> >> DistributedUpdateProcessor.java:1911)
> >>> at
> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> >> ContentStreamHandlerBase.java:78)
> >>> at
> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> >> RequestHandlerBase.java:173)
> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> >>> at
> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> SolrDispatchFilter.java:361)
> >>> at
> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> SolrDispatchFilter.java:305)
> >>> at
> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> >> doFilter(ServletHandler.java:1691)
> >>> at
> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> >> ServletHandler.java:582)
> >>> at
> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> ScopedHandler.java:143)
> >>> at
> >>> org.eclipse.jetty.security.SecurityHandler.handle(
> >> SecurityHandler.java:548)
> >>> at
> >>> org.eclipse.jetty.server.session.SessionHandler.
> >> doHandle(SessionHandler.java:226)
> >>> at
> >>> org.eclipse.jetty.server.handler.ContextHandler.
> >> doHandle(ContextHandler.java:1180)
> >>> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> >> ServletHandler.java:512)
> >>> at
> >>> org.eclipse.jetty.server.session.SessionHandler.
> >> doScope(SessionHandler.java:185)
> >>> at
> >>> org.eclipse.jetty.server.handler.ContextHandler.
> >> doScope(ContextHandler.java:1112)
> >>> at
> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> ScopedHandler.java:141)
> >>> at
> >>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> >> ContextHandlerCollection.java:213)
> >>> at
> >>> org.eclipse.jetty.server.handler.HandlerCollection.
> >> handle(HandlerCollection.java:119)
> >>> at
> >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> >> HandlerWrapper.java:134)
> >>> at
> >>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> >> RewriteHandler.java:335)
> >>> at
> >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> >> HandlerWrapper.java:134)
> >>> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> >>> at
> >>> org.eclipse.jetty.server.HttpConnection.onFillable(
> >> HttpConnection.java:251)
> >>> at
> >>> org.eclipse.jetty.io.AbstractConnection$Re

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
Node A receives batch of documents to index. It forwards documents to shards 
that are on the node B. Node B is having issues with GC so it takes a while to 
respond. Node A sees it as read timeout and reports it in logs. So the issue is 
on node B not node A.

Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 May 2018, at 18:39, Jay Potharaju  wrote:
> 
> Yes, the nodes are well balanced. I am just using these boxes for indexing
> the data and is not serving any traffic at this time.  The error indicates
> it is having issues errors on the shards that are hosted on the box and not
> on the other box.
> I will check GC logs to see if there were any issues.
> thanks
> 
> Thanks
> Jay Potharaju
> 
> 
> On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Jay,
>> My first guess would be that there was some major GC on other box so it
>> did not respond on time. Are your nodes well balanced - do they serve equal
>> amount of data?
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 7 May 2018, at 18:11, Jay Potharaju  wrote:
>>> 
>>> Hi,
>>> I am seeing the following lines in the error log. My setup has 2 nodes in
>>> the solrcloud cluster, each node has 3 shards with no replication. From
>> the
>>> error log it seems like all the shards on this box are throwing async
>>> exception errors. Other node in the cluster does not have any errors in
>> the
>>> logs. Any suggestions on how to tackle this error?
>>> 
>>> Solr setup
>>> Solr:6.6.3
>>> 2Nodes: 3 shards each
>>> 
>>> 
>>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
>>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
>> DistributedUpdatesAsyncException:
>>> Async exception during distributed update: Read timed out
>>> at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
>> DistributedUpdateProcessor.java:972)
>>> at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
>> DistributedUpdateProcessor.java:1911)
>>> at
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>> ContentStreamHandlerBase.java:78)
>>> at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> RequestHandlerBase.java:173)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
>>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:361)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:305)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> doFilter(ServletHandler.java:1691)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> ServletHandler.java:582)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> ScopedHandler.java:143)
>>> at
>>> org.eclipse.jetty.security.SecurityHandler.handle(
>> SecurityHandler.java:548)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.
>> doHandle(SessionHandler.java:226)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.
>> doHandle(ContextHandler.java:1180)
>>> at org.eclipse.jetty.servlet.ServletHandler.doScope(
>> ServletHandler.java:512)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.
>> doScope(SessionHandler.java:185)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.
>> doScope(ContextHandler.java:1112)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> ScopedHandler.java:141)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
>> ContextHandlerCollection.java:213)
>>> at
>>> org.eclipse.jetty.server.handler.HandlerCollection.
>> handle(HandlerCollection.java:119)
>>> at
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> HandlerWrapper.java:134)
>>> at
>>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
>> RewriteHandler.java:335)
>>> at
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> HandlerWrapper.java:134)
>>> at org.eclipse.jetty.server.Server.handle(Server.java:534)
>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>>> at
>>> org.eclipse.jetty.server.HttpConnection.onFillable(
>> HttpConnection.java:251)
>>> at
>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
>> AbstractConnection.java:273)
>>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>>> at
>>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
>> SelectChannelEndPoint.java:93)
>>> at
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>> QueuedThreadPool.java:671)
>>> at
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
>> QueuedThreadP

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Yes, the nodes are well balanced. I am just using these boxes for indexing
the data and is not serving any traffic at this time.  The error indicates
it is having issues errors on the shards that are hosted on the box and not
on the other box.
I will check GC logs to see if there were any issues.
thanks

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> My first guess would be that there was some major GC on other box so it
> did not respond on time. Are your nodes well balanced - do they serve equal
> amount of data?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 May 2018, at 18:11, Jay Potharaju  wrote:
> >
> > Hi,
> > I am seeing the following lines in the error log. My setup has 2 nodes in
> > the solrcloud cluster, each node has 3 shards with no replication. From
> the
> > error log it seems like all the shards on this box are throwing async
> > exception errors. Other node in the cluster does not have any errors in
> the
> > logs. Any suggestions on how to tackle this error?
> >
> > Solr setup
> > Solr:6.6.3
> > 2Nodes: 3 shards each
> >
> >
> > ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> > null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> DistributedUpdatesAsyncException:
> > Async exception during distributed update: Read timed out
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> DistributedUpdateProcessor.java:972)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
> DistributedUpdateProcessor.java:1911)
> > at
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:78)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:173)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:361)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:305)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:335)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> > at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> > at
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> > at
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> > at java.lang.Thread.run(Unknown Source)
> >
> >
> > Thanks
> > Jay
>
>


Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
Hi Jay,
My first guess would be that there was some major GC on other box so it did not 
respond on time. Are your nodes well balanced - do they serve equal amount of 
data?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 May 2018, at 18:11, Jay Potharaju  wrote:
> 
> Hi,
> I am seeing the following lines in the error log. My setup has 2 nodes in
> the solrcloud cluster, each node has 3 shards with no replication. From the
> error log it seems like all the shards on this box are throwing async
> exception errors. Other node in the cluster does not have any errors in the
> logs. Any suggestions on how to tackle this error?
> 
> Solr setup
> Solr:6.6.3
> 2Nodes: 3 shards each
> 
> 
> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> Async exception during distributed update: Read timed out
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Unknown Source)
> 
> 
> Thanks
> Jay



Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Hi,
I am seeing the following lines in the error log. My setup has 2 nodes in
the solrcloud cluster, each node has 3 shards with no replication. From the
error log it seems like all the shards on this box are throwing async
exception errors. Other node in the cluster does not have any errors in the
logs. Any suggestions on how to tackle this error?

Solr setup
Solr:6.6.3
2Nodes: 3 shards each


ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Read timed out
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Unknown Source)


Thanks
Jay