from:"Jay Potharaju"

Re: Configuring shardhandler factory for select handler

2020-03-30 Thread Jay Potharaju

figured out referred to the docs here
https://github.com/apache/lucene-solr/commit/0ce635ec01e9d3ce04a5fbf5d472ea9d5d28bfee?short_path=421a323#diff-421a323f596319f0485e0b03070d94e6


Thanks
Jay



On Mon, Mar 30, 2020 at 3:38 PM Jay Potharaju  wrote:

> Hi,
> I am trying to update the connection & sockettime out value for my
> `select` handler. After updating the configs i do not see that value being
> set and it defaults to 60 sec.
> How can i update these values?
>
> Also looks like the docs have sockeTimeout & connectionTimeout values
> swapped.
>
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java#L193
>
>
>
> https://lucene.apache.org/solr/guide/7_0/distributed-requests.html#configuring-the-shardhandlerfactory
>
> Thanks
> Jay
>
>

Configuring shardhandler factory for select handler

2020-03-30 Thread Jay Potharaju

Hi,
I am trying to update the connection & sockettime out value for my `select`
handler. After updating the configs i do not see that value being set and
it defaults to 60 sec.
How can i update these values?

Also looks like the docs have sockeTimeout & connectionTimeout values
swapped.

https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java#L193


https://lucene.apache.org/solr/guide/7_0/distributed-requests.html#configuring-the-shardhandlerfactory

Thanks
Jay

Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju

Thanks Shawn!
Can any of the committers comment about the CDCR error that I posted above?

Thanks
Jay



On Fri, Oct 25, 2019 at 2:56 PM Shawn Heisey  wrote:

> On 10/25/2019 3:22 PM, Jay Potharaju wrote:
> > Is there a solr slack channel?
>
> People with @apache.org email addresses can readily join the ASF
> workspace, I do not know whether it is possible for others.  That
> workspace might be only for ASF members.
>
> https://the-asf.slack.com
>
> In that workspace, there is a lucene-solr channel and a solr-dev channel.
>
> Thanks,
> Shawn
>

Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju

Is there a solr slack channel?
Thanks
Jay Potharaju



On Fri, Oct 25, 2019 at 9:00 AM Jay Potharaju  wrote:

> Hi,
> I am frequently seeing cdcr-replicator null pointer exception errors in
> the logs.
> Any suggestions on how to address this?
> *Solr version: 7.7.2*
>
> ExecutorUtil
> Uncaught exception java.lang.NullPointerException thrown by thread:
> cdcr-replicator-773-thread-3
> java.lang.Exception: Submitter stack trace
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:184)
> at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$start$1(CdcrReplicatorScheduler.java:76)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Thanks
> Jay
>
>

cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju

Hi,
I am frequently seeing cdcr-replicator null pointer exception errors in the
logs.
Any suggestions on how to address this?
*Solr version: 7.7.2*

ExecutorUtil
Uncaught exception java.lang.NullPointerException thrown by thread:
cdcr-replicator-773-thread-3
java.lang.Exception: Submitter stack trace
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:184)
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$start$1(CdcrReplicatorScheduler.java:76)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Thanks
Jay

Solr field auditing

2019-09-09 Thread Jay Potharaju

Hi,
I am trying to implement some auditing fields in solr to track when was the
last time a document was updated. Basically when a document is updated, I
would like to store when the last time updated + the current timestamp.
example :
*First time indexing*
Doc1 : {id:1, category:shoes, update_date: NOW(), last_update_date:[NOW()]}
*After Update*
Doc1: {id:1, category:shirt, update_date: NOW(), last_update_date:[NOW(),
'2019-09-01']}

I know this can be done easily by logging something in the DB also during
indexing. Or during indexing I can make a call to solr and get the last
indexed time and update the field during indexing.
But was trying to see if update request processor can be used to do this.
Any suggestions?

Thanks
Jay

Re: Relevance by term position

2019-07-08 Thread Jay Potharaju

Thanks, use of payloads works for my use case.
Jay

> On Jun 28, 2019, at 6:46 AM, Alexandre Rafalovitch  wrote:
> 
> This past thread may be relevant: 
> https://markmail.org/message/aau6bjllkpwcpmro
> It suggests that using SpanFirst of XMLQueryParser will have automatic
> boost for earlier matches.
> The other approach suggested was to use Payloads (which got better
> since the original thread).
> 
> Regards,
>   Alex.
> 
>> On Thu, 27 Jun 2019 at 22:01, Jay Potharaju  wrote:
>> 
>> Hi,
>> I am trying to implement autocomplete feature that should rank documents 
>> based on term position in the search field.
>> Example-
>> Doc1- hello world
>> Doc2- blue sky hello
>> Doc3 - John hello
>> 
>> Searching for hello should return
>> Hello world
>> John hello
>> Blue sky hello
>> 
>> I am currently using ngram to do autocomplete. But this does not allow me to 
>> rank results based on term position.
>> 
>> Any suggestions on how this can be done?
>> Thanks
>>

Relevance by term position

2019-06-27 Thread Jay Potharaju

Hi,
I am trying to implement autocomplete feature that should rank documents based 
on term position in the search field.
Example- 
Doc1- hello world 
Doc2- blue sky hello
Doc3 - John hello

Searching for hello should return 
Hello world
John hello
Blue sky hello

I am currently using ngram to do autocomplete. But this does not allow me to 
rank results based on term position.

Any suggestions on how this can be done?
Thanks

Autocomplete results from multiple fields

2019-05-09 Thread Jay Potharaju

Hi
I am trying to implement autocomplete feature from two fields, name & 
attribute1. I have setup ngram for both the fields and have copyfield that 
stores the ngrams results. 
fl only contains name which is used  by ui. 
Is it possible to display records that match on values from attribute1? The 
only way I can think of doing so is adding another document with type attribute 
? Are there any other options that I can use?
Thanks
Jay

CDCR - shards not in sync

2019-04-15 Thread Jay Potharaju

Hi,
I have a collection with 8 shards. 6 out of the shards are in sync but the
other 2 are lagging behind by more than 10 plus hours. The tlog is only 0.5
GB in size. I have tried stopping and starting CDCR number of times but it
has not helped.
>From what i have noticed there is always a shard that is slower than others.

Solr version: 7.7.0
CDCR config

  
2
10
4500
  

  
6
  


Thanks
Jay

Solr 7.7 - group faceting errors

2019-03-29 Thread Jay Potharaju

   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: unexpected docvalues type
NUMERIC for field 'product_id' (expected=SORTED). Re-index with
correct docvalues type.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:340)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:392)
at 
org.apache.lucene.search.grouping.TermGroupFacetCollector$MV.doSetNextReader(TermGroupFacetCollector.java:312)
at 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
at 
org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:721)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:497)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:826)
... 46 more

Thanks
Jay Potharaju

Re: Re: solr _route_ key now working

2019-03-27 Thread Jay Potharaju

I was reading the debug info incorrectly it is working as expected
...thanks for the help.
Thanks
Jay Potharaju



On Tue, Mar 26, 2019 at 10:58 PM Jay Potharaju 
wrote:

> Edwin, I tried escaping the special characters but it does not seems to
> work. I am using 7.7
> Thanks Jeremy for the example.
> id:123:456!789
> I do see that the data for the same key is co-located in the same shard by
> running. I can see that all the data is co-located in the same shard when
> querying the shard.
> fq=fieldB:456=shard1.
>
> Any suggestions why that would not be working when using _route_ to query
> the documents.
>
> Thanks
> Jay Potharaju
>
>
>
> On Tue, Mar 26, 2019 at 5:58 AM Branham, Jeremy (Experis) <
> jb...@allstate.com> wrote:
>
>> Jay –
>> I’m not familiar with the document ID format you mention [having a “:” in
>> the prefix], but it looks similar to the composite ID routing I’m using.
>> Document Id format: “a/1!id”
>>
>> Then I can use a _route_ value of “a/1!” when querying.
>>
>> Example Doc IDs:
>> a/1!768456
>> a/1!563575
>> b/1!456234
>> b/1!245698
>>
>> The document ID prefix “x/1!” tells Solr to spread the documents over ½
>> of the available shards. When querying with the same value for _route_ it
>> will retrieve documents only from those shards.
>>
>> Jeremy Branham
>> jb...@allstate.com
>>
>> On 3/25/19, 9:13 PM, "Zheng Lin Edwin Yeo"  wrote:
>>
>> Hi,
>>
>> Sorry, didn't see that you have an exclamation mark in your query as
>> well.
>> You will need to escape the exclamation mark as well.
>> So you can try it with the query _route_=“123\:456\!”
>>
>> You can refer to the message in the link on which special characters
>> requires escaping.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_21914956_which-2Dspecial-2Dcharacters-2Dneed-2Descaping-2Din-2Da-2Dsolr-2Dquery=DwIFaQ=gtIjdLs6LnStUpy9cTOW9w=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0=81cWucTr4zf8Cn2FliZ2fYFfqIb_g605mWVAxLxuQCc=30JCckpa6ctmrBupqeGhxJ7pPIcicy7VcIoeTEw_vpQ=
>>
>> By the way, which Solr version are you using?
>>
>> Regards,
>> Edwin
>>
>> On Tue, 26 Mar 2019 at 01:12, Jay Potharaju 
>> wrote:
>>
>> > That did not work . Any other suggestions
>> > My id is 123:456!678
>> > Tried running query as _route_=“123\:456!” But didn’t give expected
>> > results
>> > Thanks
>> > Jay
>> >
>> > > On Mar 24, 2019, at 8:30 PM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > wrote:
>> > >
>> > > Hi,
>> > >
>> > > The character ":" is a special character, so it requires escaping
>> during
>> > > the search.
>> > > You can try to search with query _route_="a\:b!".
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >> On Mon, 25 Mar 2019 at 07:59, Jay Potharaju <
>> jspothar...@gmail.com>
>> > wrote:
>> > >>
>> > >> Hi,
>> > >> My document id has a format of a:b!c, when I query
>> _route_="a:b!" it
>> > does
>> > >> not return any values. Any suggestions?
>> > >>
>> > >> Thanks
>> > >> Jay Potharaju
>> > >>
>> >
>>
>>
>>

Re: Re: solr _route_ key now working

2019-03-26 Thread Jay Potharaju

Edwin, I tried escaping the special characters but it does not seems to
work. I am using 7.7
Thanks Jeremy for the example.
id:123:456!789
I do see that the data for the same key is co-located in the same shard by
running. I can see that all the data is co-located in the same shard when
querying the shard.
fq=fieldB:456=shard1.

Any suggestions why that would not be working when using _route_ to query
the documents.

Thanks
Jay Potharaju



On Tue, Mar 26, 2019 at 5:58 AM Branham, Jeremy (Experis) <
jb...@allstate.com> wrote:

> Jay –
> I’m not familiar with the document ID format you mention [having a “:” in
> the prefix], but it looks similar to the composite ID routing I’m using.
> Document Id format: “a/1!id”
>
> Then I can use a _route_ value of “a/1!” when querying.
>
> Example Doc IDs:
> a/1!768456
> a/1!563575
> b/1!456234
> b/1!245698
>
> The document ID prefix “x/1!” tells Solr to spread the documents over ½ of
> the available shards. When querying with the same value for _route_ it will
> retrieve documents only from those shards.
>
> Jeremy Branham
> jb...@allstate.com
>
> On 3/25/19, 9:13 PM, "Zheng Lin Edwin Yeo"  wrote:
>
> Hi,
>
> Sorry, didn't see that you have an exclamation mark in your query as
> well.
> You will need to escape the exclamation mark as well.
> So you can try it with the query _route_=“123\:456\!”
>
> You can refer to the message in the link on which special characters
> requires escaping.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_21914956_which-2Dspecial-2Dcharacters-2Dneed-2Descaping-2Din-2Da-2Dsolr-2Dquery=DwIFaQ=gtIjdLs6LnStUpy9cTOW9w=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0=81cWucTr4zf8Cn2FliZ2fYFfqIb_g605mWVAxLxuQCc=30JCckpa6ctmrBupqeGhxJ7pPIcicy7VcIoeTEw_vpQ=
>
> By the way, which Solr version are you using?
>
> Regards,
> Edwin
>
> On Tue, 26 Mar 2019 at 01:12, Jay Potharaju 
> wrote:
>
> > That did not work . Any other suggestions
> > My id is 123:456!678
> > Tried running query as _route_=“123\:456!” But didn’t give expected
> > results
> > Thanks
> > Jay
> >
> > > On Mar 24, 2019, at 8:30 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > The character ":" is a special character, so it requires escaping
> during
> > > the search.
> > > You can try to search with query _route_="a\:b!".
> > >
> > > Regards,
> > > Edwin
>     > >
> > >> On Mon, 25 Mar 2019 at 07:59, Jay Potharaju <
> jspothar...@gmail.com>
> > wrote:
> > >>
> > >> Hi,
> > >> My document id has a format of a:b!c, when I query _route_="a:b!"
> it
> > does
> > >> not return any values. Any suggestions?
> > >>
> > >> Thanks
> > >> Jay Potharaju
> > >>
> >
>
>
>

Re: Java 9 & solr 7.7.0

2019-03-25 Thread Jay Potharaju

I just learnt that java 11 is . Is anyone using open jdk11 in production?
Thanks


> On Mar 23, 2019, at 5:15 PM, Jay Potharaju  wrote:
> 
> I have not kept up with jdk versions ...will try with jdk 11 and see if it 
> addresses the high cpu issue. Thanks
> 
> 
>> On Mar 23, 2019, at 11:48 AM, Jay Potharaju  wrote:
>> 
>> Thanks for that info Tim 
>> 
>>> On Mar 23, 2019, at 11:26 AM, Tim Underwood  wrote:
>>> 
>>> We are successfully running Solr 7.6.0 (and 7.5.0 before it) on OpenJDK 11
>>> without problems.  We are also using G1.  We do not use Solr Cloud but do
>>> rely on the legacy replication.
>>> 
>>> -Tim
>>> 
>>> On Sat, Mar 23, 2019 at 10:13 AM Erick Erickson 
>>> wrote:
>>> 
>>>> I am, in fact, trying to get a summary of all this together, we’ll see how
>>>> successful I am.
>>>> 
>>>> I can say that Solr is tested (and has been for quite some time) against
>>>> JDK 8,9,10,11,12 and even 13. JDK9, from a 10,000 foot perspective, has a
>>>> success rate in our automated tests that’s in line with all the other JDKs.
>>>> 
>>>> That said, people seem to be settling on JDK11 anecdotally, what’s your
>>>> reason for using 9 .vs. 11?
>>>> 
>>>> Finally, there was one issue with JDK 9 and Kerberos that I’m unsure what
>>>> the resolution is, if there is any. If you use Kerberos, be sure to test
>>>> that first.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Mar 23, 2019, at 9:47 AM, Jay Potharaju 
>>>> wrote:
>>>>> 
>>>>> Thanks I missed that info. Will try running with jdk9 and see if it
>>>> addresses the issue.
>>>>> Jay
>>>>> 
>>>>>>> On Mar 23, 2019, at 9:00 AM, Shawn Heisey  wrote:
>>>>>>> 
>>>>>>> On 3/23/2019 8:12 AM, Jay Potharaju wrote:
>>>>>>> Can I use java 9 with 7.7.0. I am planning to test if fixes issue with
>>>> high cpu that I am running into.
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8129861
>>>>>>> Was solr 7.7 tested with java 9?
>>>>>> 
>>>>>> The info for the 7.0.0 release said it was qualified with Java 9, so
>>>> you should be fine running 7.7.x in Java 9 as well.  I do not know if it
>>>> works with Java 10, 11, or 12.
>>>>>> 
>>>>>> Thanks,
>>>>>> Shawn
>>>> 
>>>>

Re: solr _route_ key now working

2019-03-25 Thread Jay Potharaju

That did not work . Any other suggestions 
My id is 123:456!678
Tried running query as _route_=“123\:456!” But didn’t give expected results 
Thanks
Jay

> On Mar 24, 2019, at 8:30 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> The character ":" is a special character, so it requires escaping during
> the search.
> You can try to search with query _route_="a\:b!".
> 
> Regards,
> Edwin
> 
>> On Mon, 25 Mar 2019 at 07:59, Jay Potharaju  wrote:
>> 
>> Hi,
>> My document id has a format of a:b!c, when I query _route_="a:b!" it does
>> not return any values. Any suggestions?
>> 
>> Thanks
>> Jay Potharaju
>>

solr _route_ key now working

2019-03-24 Thread Jay Potharaju

Hi,
My document id has a format of a:b!c, when I query _route_="a:b!" it does
not return any values. Any suggestions?

Thanks
Jay Potharaju

Re: Java 9 & solr 7.7.0

2019-03-23 Thread Jay Potharaju

I have not kept up with jdk versions ...will try with jdk 11 and see if it 
addresses the high cpu issue. Thanks


> On Mar 23, 2019, at 11:48 AM, Jay Potharaju  wrote:
> 
> Thanks for that info Tim 
> 
>> On Mar 23, 2019, at 11:26 AM, Tim Underwood  wrote:
>> 
>> We are successfully running Solr 7.6.0 (and 7.5.0 before it) on OpenJDK 11
>> without problems.  We are also using G1.  We do not use Solr Cloud but do
>> rely on the legacy replication.
>> 
>> -Tim
>> 
>> On Sat, Mar 23, 2019 at 10:13 AM Erick Erickson 
>> wrote:
>> 
>>> I am, in fact, trying to get a summary of all this together, we’ll see how
>>> successful I am.
>>> 
>>> I can say that Solr is tested (and has been for quite some time) against
>>> JDK 8,9,10,11,12 and even 13. JDK9, from a 10,000 foot perspective, has a
>>> success rate in our automated tests that’s in line with all the other JDKs.
>>> 
>>> That said, people seem to be settling on JDK11 anecdotally, what’s your
>>> reason for using 9 .vs. 11?
>>> 
>>> Finally, there was one issue with JDK 9 and Kerberos that I’m unsure what
>>> the resolution is, if there is any. If you use Kerberos, be sure to test
>>> that first.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Mar 23, 2019, at 9:47 AM, Jay Potharaju 
>>> wrote:
>>>> 
>>>> Thanks I missed that info. Will try running with jdk9 and see if it
>>> addresses the issue.
>>>> Jay
>>>> 
>>>>>> On Mar 23, 2019, at 9:00 AM, Shawn Heisey  wrote:
>>>>>> 
>>>>>> On 3/23/2019 8:12 AM, Jay Potharaju wrote:
>>>>>> Can I use java 9 with 7.7.0. I am planning to test if fixes issue with
>>> high cpu that I am running into.
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8129861
>>>>>> Was solr 7.7 tested with java 9?
>>>>> 
>>>>> The info for the 7.0.0 release said it was qualified with Java 9, so
>>> you should be fine running 7.7.x in Java 9 as well.  I do not know if it
>>> works with Java 10, 11, or 12.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>> 
>>>

Re: Java 9 & solr 7.7.0

2019-03-23 Thread Jay Potharaju

Thanks for that info Tim 

> On Mar 23, 2019, at 11:26 AM, Tim Underwood  wrote:
> 
> We are successfully running Solr 7.6.0 (and 7.5.0 before it) on OpenJDK 11
> without problems.  We are also using G1.  We do not use Solr Cloud but do
> rely on the legacy replication.
> 
> -Tim
> 
> On Sat, Mar 23, 2019 at 10:13 AM Erick Erickson 
> wrote:
> 
>> I am, in fact, trying to get a summary of all this together, we’ll see how
>> successful I am.
>> 
>> I can say that Solr is tested (and has been for quite some time) against
>> JDK 8,9,10,11,12 and even 13. JDK9, from a 10,000 foot perspective, has a
>> success rate in our automated tests that’s in line with all the other JDKs.
>> 
>> That said, people seem to be settling on JDK11 anecdotally, what’s your
>> reason for using 9 .vs. 11?
>> 
>> Finally, there was one issue with JDK 9 and Kerberos that I’m unsure what
>> the resolution is, if there is any. If you use Kerberos, be sure to test
>> that first.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 23, 2019, at 9:47 AM, Jay Potharaju 
>> wrote:
>>> 
>>> Thanks I missed that info. Will try running with jdk9 and see if it
>> addresses the issue.
>>> Jay
>>> 
>>>>> On Mar 23, 2019, at 9:00 AM, Shawn Heisey  wrote:
>>>>> 
>>>>> On 3/23/2019 8:12 AM, Jay Potharaju wrote:
>>>>> Can I use java 9 with 7.7.0. I am planning to test if fixes issue with
>> high cpu that I am running into.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8129861
>>>>> Was solr 7.7 tested with java 9?
>>>> 
>>>> The info for the 7.0.0 release said it was qualified with Java 9, so
>> you should be fine running 7.7.x in Java 9 as well.  I do not know if it
>> works with Java 10, 11, or 12.
>>>> 
>>>> Thanks,
>>>> Shawn
>> 
>>

Re: Java 9 & solr 7.7.0

2019-03-23 Thread Jay Potharaju

Thanks I missed that info. Will try running with jdk9 and see if it addresses 
the issue.
Jay

> On Mar 23, 2019, at 9:00 AM, Shawn Heisey  wrote:
> 
>> On 3/23/2019 8:12 AM, Jay Potharaju wrote:
>> Can I use java 9 with 7.7.0. I am planning to test if fixes issue with high 
>> cpu that I am running into.
>> https://bugs.openjdk.java.net/browse/JDK-8129861
>> Was solr 7.7 tested with java 9?
> 
> The info for the 7.0.0 release said it was qualified with Java 9, so you 
> should be fine running 7.7.x in Java 9 as well.  I do not know if it works 
> with Java 10, 11, or 12.
> 
> Thanks,
> Shawn

Java 9 & solr 7.7.0

2019-03-23 Thread Jay Potharaju

Hi
Can I use java 9 with 7.7.0. I am planning to test if fixes issue with high cpu 
that I am running into. 
https://bugs.openjdk.java.net/browse/JDK-8129861
Was solr 7.7 tested with java 9?

Thanks 
Jay

Re: CDCR issues

2019-03-22 Thread Jay Potharaju

This might be causing the high CPU in 7.7.x.

https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
That is related to this JDK bug
https://bugs.openjdk.java.net/browse/JDK-8129861.

Thanks
Jay Potharaju

On Thu, Mar 21, 2019 at 10:20 PM Jay Potharaju 
wrote:

> Hi,
> I just enabled CDCR for one  collection. I am seeing high CPU usage and
> the high number of tlog files and increasing.
> The collection does not have lot of data , just started reindexing of
> data.
> .
> Solr 7.7.0 , implicit sharding 8 shards
> I have enabled buffer on source side and disabled buffer on target side.
> The number of replicators is set to 4.
>  Any suggestions on how to tackle high cpu and growing tlog. The tlog are
> small in size but for the one shard I checked there were about 100 of them.
>
> Thanks
> Jay

CDCR issues

2019-03-21 Thread Jay Potharaju

Hi,
I just enabled CDCR for one  collection. I am seeing high CPU usage and the 
high number of tlog files and increasing.
The collection does not have lot of data , just started reindexing of data. 
.
Solr 7.7.0 , implicit sharding 8 shards
I have enabled buffer on source side and disabled buffer on target side. 
The number of replicators is set to 4.
 Any suggestions on how to tackle high cpu and growing tlog. The tlog are small 
in size but for the one shard I checked there were about 100 of them. 

Thanks
Jay

Re: copyfield not working

2019-01-14 Thread Jay Potharaju

 thanks for the info Andrea!
Thanks
Jay



On Sun, Jan 13, 2019 at 11:53 PM Andrea Gazzarini 
wrote:

> Hi Jay, the text analysis always operates on the indexed content. The
> stored content of a filed is left untouched unless you do something
> before it gets indexed (e.g. on client side or by an
> UpdateRequestProcessor).
>
> Cheers,
> Andrea
>
> On 14/01/2019 08:46, Jay Potharaju wrote:
> > Hi,
> > I have a copy field in which i am copying the contents of text_en field
> to
> > another custom field.
> > After indexing i was expecting any of the special characters in the
> > paragraph to be removed, but it does not look like that is happening. The
> > copied content is same as the what is there in the source. I ran analysis
> > ...looks like the pattern matching works as expected and the special
> > characters are removed.
> >
> > Any suggestions?
> > 
>  <
> > charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
> > "['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" />  > "solr.StandardTokenizerFactory"/>  > "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> > "lang/stopwords_en.txt" /> 
> <
> > filter class="solr.EnglishPossessiveFilterFactory"/>  > "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
>   > fieldType>
> >
> > Thanks
> > Jay
> >
>

Re: copyfield not working

2019-01-13 Thread Jay Potharaju

copyfield syntax from my schema file...

Thanks
Jay



On Sun, Jan 13, 2019 at 11:46 PM Jay Potharaju 
wrote:

> Hi,
> I have a copy field in which i am copying the contents of text_en field to
> another custom field.
> After indexing i was expecting any of the special characters in the
> paragraph to be removed, but it does not look like that is happening. The
> copied content is same as the what is there in the source. I ran analysis
> ...looks like the pattern matching works as expected and the special
> characters are removed.
>
> Any suggestions?
>  
>  "['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" />  "solr.StandardTokenizerFactory"/>  "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> "lang/stopwords_en.txt" />  <
> filter class="solr.EnglishPossessiveFilterFactory"/>  "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
> 
>
> Thanks
> Jay
>
>

copyfield not working

2019-01-13 Thread Jay Potharaju

Hi,
I have a copy field in which i am copying the contents of text_en field to
another custom field.
After indexing i was expecting any of the special characters in the
paragraph to be removed, but it does not look like that is happening. The
copied content is same as the what is there in the source. I ran analysis
...looks like the pattern matching works as expected and the special
characters are removed.

Any suggestions?
  <
charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
"['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" /><
filter class="solr.EnglishPossessiveFilterFactory"/>   

Thanks
Jay

Re: Search query with & without question mark

2019-01-13 Thread Jay Potharaju

the parsedquery is same when debugging, but when calculating the scores
different fields are being taken into consideration. Why would that be the
case? My guess is that the suggeststopfilterfactory is not working as i
expect it to and causing this weird situation.

Updated field type definition:
  <
charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
"['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" /><
filter class="solr.EnglishPossessiveFilterFactory"/>   

Debug Query:
*"rawquerystring":"how do i add a field"*,
"querystring":"how do i add a field",
"parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
DisjunctionMaxQuery((topic_title_plain:do))
DisjunctionMaxQuery((topic_title_plain:i))
DisjunctionMaxQuery((topic_title_plain:add))
DisjunctionMaxQuery((topic_title_plain:a))
DisjunctionMaxQuery((topic_title_plain:field/no_coord",
"parsedquery_toString":"+((topic_title_plain:how)
(topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
(topic_title_plain:a) (topic_title_plain:field))",
"explain":{
  "1":"
6.1034017 = sum of:
  2.0065408 = weight(topic_title_plain:add in 107) [SchemaSimilarity],
result of:
2.0065408 = score(doc=107,freq=1.0 = termFreq=1.0
), product of:
  2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
32.0 = docFreq
275.0 = docCount
  0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
3.4436364 = avgFieldLength
4.0 = fieldLength
  4.096861 = weight(topic_title_plain:field in 107) [SchemaSimilarity],
result of:
4.096861 = score(doc=107,freq=1.0 = termFreq=1.0
), product of:
  4.367638 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
3.0 = docFreq
275.0 = docCount
  0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
3.4436364 = avgFieldLength
4.0 = fieldLength
"},

*rawquerystring":"how do i add a field?",*
"querystring":"how do i add a field?",
"parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
DisjunctionMaxQuery((topic_title_plain:do))
DisjunctionMaxQuery((topic_title_plain:i))
DisjunctionMaxQuery((topic_title_plain:add))
DisjunctionMaxQuery((topic_title_plain:a))
DisjunctionMaxQuery((topic_title_plain:field/no_coord",
"parsedquery_toString":"+((topic_title_plain:how)
(topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
(topic_title_plain:a) (topic_title_plain:field))",
"explain":{
  "2":"
3.798876 = sum of:
  2.033249 = weight(topic_title_plain:how in 202) [SchemaSimilarity],
result of:
2.033249 = score(doc=202,freq=1.0 = termFreq=1.0
), product of:
  2.4634004 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
23.0 = docFreq
275.0 = docCount
  0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
- b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
3.4436364 = avgFieldLength
5.2244897 = fieldLength
*  1.7656271 = weight(topic_title_plain:add in 202) [SchemaSimilarity],
result of:*
1.7656271 = score(doc=202,freq=1.0 = termFreq=1.0
), product of:
  2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
32.0 = docFreq
275.0 = docCount
  0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
- b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
    3.4436364 = avgFieldLength
5.2244897 = fieldLength
"},
Thanks
Jay



On Sun, Jan 13, 2019 at 10:32 PM Erick Erickson 
wrote:

> What does adding =query show in both cases?
>
> Best,
> Erick
>
> On Sun, Jan 13, 2019 at 9:30 PM Jay Potharaju 
> wrote:
> >
> > Hi,
> > When searching  I get different results when the query contains question
> > mark vs without question mark  . The field i am searching on does not
> have
> > any question marks.
> > Any suggestions?
> >
> > 
>  <
> > tokenizer class="solr.StandardTokenizerFactory"/>  > "solr.PatternReplaceFilterFactory" pattern=
> > "['!#\$%'\(\)\*+,-\./:;=?@\[\]^_`{|}~]" replacement=" " replace="all" />
> <
> > filter class="solr.LowerCaseFilterFactory"/>  > "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> > "lang/stopwords_en.txt" />  > "solr.EnglishPossessiveFilterFactory"/>  > "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
>   > fieldType>
> >
> > Thanks
> > Jay
>

Search query with & without question mark

2019-01-13 Thread Jay Potharaju

Hi,
When searching  I get different results when the query contains question
mark vs without question mark  . The field i am searching on does not have
any question marks.
Any suggestions?

  <
tokenizer class="solr.StandardTokenizerFactory"/>  <
filter class="solr.LowerCaseFilterFactory"/> 

Thanks
Jay

Re: CDCR Custom Document Routing

2018-07-02 Thread Jay Potharaju

Solr cdcr : https://issues.apache.org/jira/browse/SOLR-12380
deletebyid: https://issues.apache.org/jira/browse/SOLR-8889

Thanks
Jay Potharaju



On Mon, Jul 2, 2018 at 5:41 PM Jay Potharaju  wrote:

> Hi Amrit,
> I am using a curl command to send a request to solr for deleting
> documents. That is because deleteById does not work for collections setup
> with implicit routing.
>
> curl http:/localhost:8983/solr/test_5_replica2/update/json/ -H
> 'Content-type:application/json/docs' -d '{
> "delete": {"id":"documentid13123123"}
> }'
> The deletes doesn't seem to propagate correctly to the target side.
>
> Thanks
> Jay Potharaju
>
>
>
> On Mon, Jul 2, 2018 at 5:37 PM Amrit Sarkar 
> wrote:
>
>> Jay,
>>
>> Can you sample delete command you are firing at the source to understand
>> the issue with Cdcr.
>>
>> On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:
>>
>> > Hi
>> > The current cdcr setup does not work if my collection uses implicit
>> > routing.
>> > In my testing i found that adding documents works without any problems.
>> It
>> > doesn't seem to work correctly when deleting documents.
>> > Is there an alternative to cdcr that would work in cross data center
>> > scenario.
>> >
>> > Setup:
>> > 8 shards : 2 on each node
>> > Solr:6.6.4
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>>
>

Re: CDCR Custom Document Routing

2018-07-02 Thread Jay Potharaju

Hi Amrit,
I am using a curl command to send a request to solr for deleting documents.
That is because deleteById does not work for collections setup with
implicit routing.

curl http:/localhost:8983/solr/test_5_replica2/update/json/ -H
'Content-type:application/json/docs' -d '{
"delete": {"id":"documentid13123123"}
}'
The deletes doesn't seem to propagate correctly to the target side.

Thanks
Jay Potharaju

On Mon, Jul 2, 2018 at 5:37 PM Amrit Sarkar  wrote:

> Jay,
>
> Can you sample delete command you are firing at the source to understand
> the issue with Cdcr.
>
> On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:
>
> > Hi
> > The current cdcr setup does not work if my collection uses implicit
> > routing.
> > In my testing i found that adding documents works without any problems.
> It
> > doesn't seem to work correctly when deleting documents.
> > Is there an alternative to cdcr that would work in cross data center
> > scenario.
> >
> > Setup:
> > 8 shards : 2 on each node
> > Solr:6.6.4
> >
> > Thanks
> > Jay Potharaju
> >
>

CDCR Custom Document Routing

2018-07-02 Thread Jay Potharaju

Hi
The current cdcr setup does not work if my collection uses implicit
routing.
In my testing i found that adding documents works without any problems. It
doesn't seem to work correctly when deleting documents.
Is there an alternative to cdcr that would work in cross data center
scenario.

Setup:
8 shards : 2 on each node
Solr:6.6.4

Thanks
Jay Potharaju

Re: deletebyQuery vs deletebyId

2018-05-24 Thread Jay Potharaju

Hi Erick,
Yes, I commented on the ticket ...after finding it during my search for the
issue in the solr JIRA.

Setup:
2 Nodes, 6 shards , 3 shards on each node (no replication)
Collection uses implicit routing.

Just to give some background ... The first time I tried it ...it worked but
then when i went back later and tested it again ...and it was only working
intermittently... that lead me to believe either there was a problem on how
i was posting the request or a solr issue..

Based on your suggestion about using httpclient I just tried posting a
request directly to shard and it works
curl http://solrserver:8983/solr/test_shardaa_replica1/update/json/ -H
'Content-type:application/json/docs' -d '{
  "delete": {"id":"aa:1112312:444"}
}'

Thanks
Jay



On Wed, May 23, 2018 at 9:03 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
> And are you the "Jay" who commented there?
>
> On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
> <erickerick...@gmail.com> wrote:
> > Tell us some more about your setup, particularly:
> > - you mention routing key. Is the collection used with implicit
> > routing or compositeID?
> > - What does adding =query show?
> > - I'm not entirely sure, frankly, how delete by id and having a
> > different routing field play together. The supposition behind
> > deleteById is that the deletions can be routed to the correct leader
> > by hashing on the id field.
> >
> > Best,
> > Erick
> >
> > On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> >> Thanks Emir & Shawn for chiming in!.
> >> I am testing deleteById in solr6.6.3 and it does not seem to work. I
> have a
> >> 6 shards in my collection and when sending query to solr a routing key
> is
> >> also passed. Also tested this in solr 5.3 also, with same results.
> >> Any suggestions why that would be happening?
> >>
> >> Thanks
> >> Jay
> >>
> >>
> >>
> >> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Jay,
> >>> Solr does not handle it differently from any other DBQ. It will show
> less
> >>> issues then some other DBQ because affects less documents but the
> mechanics
> >>> of DBQ is the same and does not play well with concurrent changes of
> index
> >>> (merges/updates) especially in SolrCloud mode. Here are some thoughts
> on
> >>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> >>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
> >>> > On 23 May 2018, at 02:35, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> >>> >
> >>> > Hi,
> >>> > I have a quick question about deletebyQuery vs deleteById. When using
> >>> > deleteByQuery, if query is id:123 is that same as deleteById in
> terms of
> >>> > performance.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Jay
> >>>
> >>>
>

Re: deletebyQuery vs deletebyId

2018-05-23 Thread Jay Potharaju

Thanks Emir & Shawn for chiming in!.
I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
6 shards in my collection and when sending query to solr a routing key is
also passed. Also tested this in solr 5.3 also, with same results.
Any suggestions why that would be happening?

Thanks
Jay



On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> Solr does not handle it differently from any other DBQ. It will show less
> issues then some other DBQ because affects less documents but the mechanics
> of DBQ is the same and does not play well with concurrent changes of index
> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 23 May 2018, at 02:35, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > Hi,
> > I have a quick question about deletebyQuery vs deleteById. When using
> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
> > performance.
> >
> >
> > Thanks
> > Jay
>
>

deletebyQuery vs deletebyId

2018-05-22 Thread Jay Potharaju

Hi,
I have a quick question about deletebyQuery vs deleteById. When using
deleteByQuery, if query is id:123 is that same as deleteById in terms of
performance.


Thanks
Jay

Re: Async exceptions during distributed update

2018-05-14 Thread Jay Potharaju

Adding some more context to my last email
Solr:6.6.3
2 nodes : 3 shards each
No replication .
Can someone answer the following questions 
1) any ideas on why the following errors keep happening. AFAIK streaming solr 
clients error is  because of timeouts when connecting to other nodes. 
Async errors are also network related as explained earlier in the email by Emir.
There were no network issues but the error has comeback and filling up my logs. 
2) is anyone using solr 6.6.3 in production and what has their experience been 
so far.
3) is there any good documentation or blog post that would explain about inner 
working of solrcloud networking?

Thanks
Jay
org.apache.solr.update.StreamingSolrClients  
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during 


> On May 13, 2018, at 9:21 PM, Jay Potharaju <jspothar...@gmail.com> wrote:
> 
> Hi,
> I restarted both my solr servers but I am seeing the async error again. In 
> older 5x version of solrcloud, solr would normally recover gracefully in case 
> of network errors, but solr 6.6.3 does not seem to be doing that. At this 
> time I am not doing only a small percentage of  deletebyquery operations, its 
> mostly indexing of documents only.
> I have not noticed any network blip like last time.  Any suggestions or is 
> any else also having the same issue on solr 6.6.3?
> 
>   I am again seeing the following two errors back to back. 
> 
>  ERROR org.apache.solr.update.StreamingSolrClients  
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Read timed out
> Thanks
> Jay 
>  
> 
> 
>> On Wed, May 9, 2018 at 12:34 AM Emir Arnautović 
>> <emir.arnauto...@sematext.com> wrote:
>> Hi Jay,
>> Network blip might be the cause, but also the consequence of this issue. 
>> Maybe you can try avoiding DBQ while indexing and see if it is the cause. 
>> You can do thread dump on “the other” node and see if there are blocked 
>> threads and that can give you more clues what’s going on.
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>> > On 8 May 2018, at 17:53, Jay Potharaju <jspothar...@gmail.com> wrote:
>> > 
>> > Hi Emir,
>> > I was seeing this error as long as the indexing was running. Once I stopped
>> > the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
>> > but have not seen anything out of the ordinary except for a small network
>> > blip. In my experience solr generally recovers after a network blip and
>> > there are a few errors for streaming solr client...but have never seen this
>> > error before.
>> > 
>> > Thanks
>> > Jay
>> > 
>> > Thanks
>> > Jay Potharaju
>> > 
>> > 
>> > On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
>> > emir.arnauto...@sematext.com> wrote:
>> > 
>> >> Hi Jay,
>> >> This is low ingestion rate. What is the size of your index? What is heap
>> >> size? I am guessing that this is not a huge index, so  I am leaning toward
>> >> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
>> >> is blocking indexing. Though, it is strange that it is happening only on
>> >> one node if you are sending updates randomly to both nodes. Do you monitor
>> >> your hosts/Solr? Do you see anything different at the time when timeouts
>> >> happen?
>> >> 
>> >> Thanks,
>> >> Emir
>> >> --
>> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >> 
>> >> 
>> >> 
>> >>> On 8 May 2018, at 03:23, Jay Potharaju <jspothar...@gmail.com> wrote:
>> >>> 
>> >>> I have about 3-5 updates per second.
>> >>> 
>> >>> 
>> >>>> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>> >>>> 
>> >>>>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> >>>>> There are some deletes by query. I have not had any issues with DBQ,
>> >>>>> currently have 5.3 running in production.
>> >>>> 
>> >>>> Here's the big problem with DBQ.  Imagine

Re: Async exceptions during distributed update

2018-05-13 Thread Jay Potharaju

Hi,
I restarted both my solr servers but I am seeing the async error again. In
older 5x version of solrcloud, solr would normally recover gracefully in
case of network errors, but solr 6.6.3 does not seem to be doing that. At
this time I am not doing only a small percentage of  deletebyquery
operations, its mostly indexing of documents only.
I have not noticed any network blip like last time.  Any suggestions or is
any else also having the same issue on solr 6.6.3?

  I am again seeing the following two errors back to back.

 ERROR org.apache.solr.update.StreamingSolrClients

org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Read timed out
Thanks
Jay



On Wed, May 9, 2018 at 12:34 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> Network blip might be the cause, but also the consequence of this issue.
> Maybe you can try avoiding DBQ while indexing and see if it is the cause.
> You can do thread dump on “the other” node and see if there are blocked
> threads and that can give you more clues what’s going on.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 17:53, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > Hi Emir,
> > I was seeing this error as long as the indexing was running. Once I
> stopped
> > the indexing the errors also stopped.  Yes, we do monitor both hosts &
> solr
> > but have not seen anything out of the ordinary except for a small network
> > blip. In my experience solr generally recovers after a network blip and
> > there are a few errors for streaming solr client...but have never seen
> this
> > error before.
> >
> > Thanks
> > Jay
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Jay,
> >> This is low ingestion rate. What is the size of your index? What is heap
> >> size? I am guessing that this is not a huge index, so  I am leaning
> toward
> >> what Shawn mentioned - some combination of DBQ/merge/commit/optimise
> that
> >> is blocking indexing. Though, it is strange that it is happening only on
> >> one node if you are sending updates randomly to both nodes. Do you
> monitor
> >> your hosts/Solr? Do you see anything different at the time when timeouts
> >> happen?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 8 May 2018, at 03:23, Jay Potharaju <jspothar...@gmail.com> wrote:
> >>>
> >>> I have about 3-5 updates per second.
> >>>
> >>>
> >>>> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>>>
> >>>>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> >>>>> There are some deletes by query. I have not had any issues with DBQ,
> >>>>> currently have 5.3 running in production.
> >>>>
> >>>> Here's the big problem with DBQ.  Imagine this sequence of events with
> >>>> these timestamps:
> >>>>
> >>>> 13:00:00: A commit for change visibility happens.
> >>>> 13:00:00: A segment merge is triggered by the commit.
> >>>> (It's a big merge that takes exactly 3 minutes.)
> >>>> 13:00:05: A deleteByQuery is sent.
> >>>> 13:00:15: An update to the index is sent.
> >>>> 13:00:25: An update to the index is sent.
> >>>> 13:00:35: An update to the index is sent.
> >>>> 13:00:45: An update to the index is sent.
> >>>> 13:00:55: An update to the index is sent.
> >>>> 13:01:05: An update to the index is sent.
> >>>> 13:01:15: An update to the index is sent.
> >>>> 13:01:25: An update to the index is sent.
> >>>> {time passes, more updates might be sent}
> >>>> 13:03:00: The merge finishes.
> >>>>
> >>>> Here's what would happen in this scenario:  The DBQ and all of the
> >>>> update requests sent *after* the DBQ will block until the merge
> >>>> finishes.  That means that it's going to take up to three minutes for
> >>>> Solr to respond t

Re: Async exceptions during distributed update

2018-05-08 Thread Jay Potharaju

Hi Emir,
I was seeing this error as long as the indexing was running. Once I stopped
the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
but have not seen anything out of the ordinary except for a small network
blip. In my experience solr generally recovers after a network blip and
there are a few errors for streaming solr client...but have never seen this
error before.

Thanks
Jay

Thanks
Jay Potharaju


On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> This is low ingestion rate. What is the size of your index? What is heap
> size? I am guessing that this is not a huge index, so  I am leaning toward
> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
> is blocking indexing. Though, it is strange that it is happening only on
> one node if you are sending updates randomly to both nodes. Do you monitor
> your hosts/Solr? Do you see anything different at the time when timeouts
> happen?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 03:23, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > I have about 3-5 updates per second.
> >
> >
> >> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>
> >>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> >>> There are some deletes by query. I have not had any issues with DBQ,
> >>> currently have 5.3 running in production.
> >>
> >> Here's the big problem with DBQ.  Imagine this sequence of events with
> >> these timestamps:
> >>
> >> 13:00:00: A commit for change visibility happens.
> >> 13:00:00: A segment merge is triggered by the commit.
> >> (It's a big merge that takes exactly 3 minutes.)
> >> 13:00:05: A deleteByQuery is sent.
> >> 13:00:15: An update to the index is sent.
> >> 13:00:25: An update to the index is sent.
> >> 13:00:35: An update to the index is sent.
> >> 13:00:45: An update to the index is sent.
> >> 13:00:55: An update to the index is sent.
> >> 13:01:05: An update to the index is sent.
> >> 13:01:15: An update to the index is sent.
> >> 13:01:25: An update to the index is sent.
> >> {time passes, more updates might be sent}
> >> 13:03:00: The merge finishes.
> >>
> >> Here's what would happen in this scenario:  The DBQ and all of the
> >> update requests sent *after* the DBQ will block until the merge
> >> finishes.  That means that it's going to take up to three minutes for
> >> Solr to respond to those requests.  If the client that is sending the
> >> request is configured with a 60 second socket timeout, which inter-node
> >> requests made by Solr are by default, then it is going to experience a
> >> timeout error.  The request will probably complete successfully once the
> >> merge finishes, but the connection is gone, and the client has already
> >> received an error.
> >>
> >> Now imagine what happens if an optimize (forced merge of the entire
> >> index) is requested on an index that's 50GB.  That optimize may take 2-3
> >> hours, possibly longer.  A deleteByQuery started on that index after the
> >> optimize begins (and any updates requested after the DBQ) will pause
> >> until the optimize is done.  A pause of 2 hours or more is a BIG
> problem.
> >>
> >> This is why deleteByQuery is not recommended.
> >>
> >> If the deleteByQuery were changed into a two-step process involving a
> >> query to retrieve ID values and then one or more deleteById requests,
> >> then none of that blocking would occur.  The deleteById operation can
> >> run at the same time as a segment merge, so neither it nor subsequent
> >> update requests will have the significant pause.  From what I
> >> understand, you can even do commits in this scenario and have changes be
> >> visible before the merge completes.  I haven't verified that this is the
> >> case.
> >>
> >> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> >> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

I have about 3-5 updates per second.


> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> There are some deletes by query. I have not had any issues with DBQ,
>> currently have 5.3 running in production.
> 
> Here's the big problem with DBQ.  Imagine this sequence of events with
> these timestamps:
> 
> 13:00:00: A commit for change visibility happens.
> 13:00:00: A segment merge is triggered by the commit.
> (It's a big merge that takes exactly 3 minutes.)
> 13:00:05: A deleteByQuery is sent.
> 13:00:15: An update to the index is sent.
> 13:00:25: An update to the index is sent.
> 13:00:35: An update to the index is sent.
> 13:00:45: An update to the index is sent.
> 13:00:55: An update to the index is sent.
> 13:01:05: An update to the index is sent.
> 13:01:15: An update to the index is sent.
> 13:01:25: An update to the index is sent.
> {time passes, more updates might be sent}
> 13:03:00: The merge finishes.
> 
> Here's what would happen in this scenario:  The DBQ and all of the
> update requests sent *after* the DBQ will block until the merge
> finishes.  That means that it's going to take up to three minutes for
> Solr to respond to those requests.  If the client that is sending the
> request is configured with a 60 second socket timeout, which inter-node
> requests made by Solr are by default, then it is going to experience a
> timeout error.  The request will probably complete successfully once the
> merge finishes, but the connection is gone, and the client has already
> received an error.
> 
> Now imagine what happens if an optimize (forced merge of the entire
> index) is requested on an index that's 50GB.  That optimize may take 2-3
> hours, possibly longer.  A deleteByQuery started on that index after the
> optimize begins (and any updates requested after the DBQ) will pause
> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
> 
> This is why deleteByQuery is not recommended.
> 
> If the deleteByQuery were changed into a two-step process involving a
> query to retrieve ID values and then one or more deleteById requests,
> then none of that blocking would occur.  The deleteById operation can
> run at the same time as a segment merge, so neither it nor subsequent
> update requests will have the significant pause.  From what I
> understand, you can even do commits in this scenario and have changes be
> visible before the merge completes.  I haven't verified that this is the
> case.
> 
> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> 
> Thanks,
> Shawn
>

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

Thanks for explaining that Shawn!
Emir, I use php library called solarium to do updates/deletes to solr. The 
request is sent to any of the available nodes in the cluster.

> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>> There are some deletes by query. I have not had any issues with DBQ,
>> currently have 5.3 running in production.
> 
> Here's the big problem with DBQ.  Imagine this sequence of events with
> these timestamps:
> 
> 13:00:00: A commit for change visibility happens.
> 13:00:00: A segment merge is triggered by the commit.
> (It's a big merge that takes exactly 3 minutes.)
> 13:00:05: A deleteByQuery is sent.
> 13:00:15: An update to the index is sent.
> 13:00:25: An update to the index is sent.
> 13:00:35: An update to the index is sent.
> 13:00:45: An update to the index is sent.
> 13:00:55: An update to the index is sent.
> 13:01:05: An update to the index is sent.
> 13:01:15: An update to the index is sent.
> 13:01:25: An update to the index is sent.
> {time passes, more updates might be sent}
> 13:03:00: The merge finishes.
> 
> Here's what would happen in this scenario:  The DBQ and all of the
> update requests sent *after* the DBQ will block until the merge
> finishes.  That means that it's going to take up to three minutes for
> Solr to respond to those requests.  If the client that is sending the
> request is configured with a 60 second socket timeout, which inter-node
> requests made by Solr are by default, then it is going to experience a
> timeout error.  The request will probably complete successfully once the
> merge finishes, but the connection is gone, and the client has already
> received an error.
> 
> Now imagine what happens if an optimize (forced merge of the entire
> index) is requested on an index that's 50GB.  That optimize may take 2-3
> hours, possibly longer.  A deleteByQuery started on that index after the
> optimize begins (and any updates requested after the DBQ) will pause
> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
> 
> This is why deleteByQuery is not recommended.
> 
> If the deleteByQuery were changed into a two-step process involving a
> query to retrieve ID values and then one or more deleteById requests,
> then none of that blocking would occur.  The deleteById operation can
> run at the same time as a segment merge, so neither it nor subsequent
> update requests will have the significant pause.  From what I
> understand, you can even do commits in this scenario and have changes be
> visible before the merge completes.  I haven't verified that this is the
> case.
> 
> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> 
> Thanks,
> Shawn
>

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

There are some deletes by query. I have not had any issues with DBQ,
currently have 5.3 running in production.

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 4:02 PM, Jay Potharaju <jspothar...@gmail.com> wrote:

> The updates are pushed in real time not batched. No complex analysis and
> everything is committed using autocommit settings in solr.
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> How do you send documents? Large batches? Complex analysis? Do you send
>> all
>> batches to the same node? How do you commit? Do you delete by query while
>> indexing?
>>
>> Emir
>>
>> On Tue, May 8, 2018, 12:30 AM Jay Potharaju <jspothar...@gmail.com>
>> wrote:
>>
>> > I didn't see any OOM errors in the logs on either of the nodes. I saw GC
>> > pause of 1 second on the box that was throwing error ...but nothing on
>> the
>> > other node. Any other recommendations?
>> > Thanks
>> >
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>> >
>> > On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju <jspothar...@gmail.com>
>> > wrote:
>> >
>> > > Ah thanks for explaining that!
>> > >
>> > > Thanks
>> > > Jay Potharaju
>> > >
>> > >
>> > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
>> > > emir.arnauto...@sematext.com> wrote:
>> > >
>> > >> Node A receives batch of documents to index. It forwards documents to
>> > >> shards that are on the node B. Node B is having issues with GC so it
>> > takes
>> > >> a while to respond. Node A sees it as read timeout and reports it in
>> > logs.
>> > >> So the issue is on node B not node A.
>> > >>
>> > >> Emir
>> > >> --
>> > >> Monitoring - Log Management - Alerting - Anomaly Detection
>> > >> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> > >>
>> > >>
>> > >>
>> > >> > On 7 May 2018, at 18:39, Jay Potharaju <jspothar...@gmail.com>
>> wrote:
>> > >> >
>> > >> > Yes, the nodes are well balanced. I am just using these boxes for
>> > >> indexing
>> > >> > the data and is not serving any traffic at this time.  The error
>> > >> indicates
>> > >> > it is having issues errors on the shards that are hosted on the box
>> > and
>> > >> not
>> > >> > on the other box.
>> > >> > I will check GC logs to see if there were any issues.
>> > >> > thanks
>> > >> >
>> > >> > Thanks
>> > >> > Jay Potharaju
>> > >> >
>> > >> >
>> > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
>> > >> > emir.arnauto...@sematext.com> wrote:
>> > >> >
>> > >> >> Hi Jay,
>> > >> >> My first guess would be that there was some major GC on other box
>> so
>> > it
>> > >> >> did not respond on time. Are your nodes well balanced - do they
>> serve
>> > >> equal
>> > >> >> amount of data?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Emir
>> > >> >> --
>> > >> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> > >> >> Solr & Elasticsearch Consulting Support Training -
>> > >> http://sematext.com/
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>> On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com>
>> > wrote:
>> > >> >>>
>> > >> >>> Hi,
>> > >> >>> I am seeing the following lines in the error log. My setup has 2
>> > >> nodes in
>> > >> >>> the solrcloud cluster, each node has 3 shards with no
>> replication.
>> > >> From
>> > >> >> the
>> > >> >>> error log it seems like all the shards on this box are throwing
>> > async
>> > >> >>> exception errors. Other node in the cluster does not have any
>> erro

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

The updates are pushed in real time not batched. No complex analysis and
everything is committed using autocommit settings in solr.

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> How do you send documents? Large batches? Complex analysis? Do you send all
> batches to the same node? How do you commit? Do you delete by query while
> indexing?
>
> Emir
>
> On Tue, May 8, 2018, 12:30 AM Jay Potharaju <jspothar...@gmail.com> wrote:
>
> > I didn't see any OOM errors in the logs on either of the nodes. I saw GC
> > pause of 1 second on the box that was throwing error ...but nothing on
> the
> > other node. Any other recommendations?
> > Thanks
> >
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju <jspothar...@gmail.com>
> > wrote:
> >
> > > Ah thanks for explaining that!
> > >
> > > Thanks
> > > Jay Potharaju
> > >
> > >
> > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Node A receives batch of documents to index. It forwards documents to
> > >> shards that are on the node B. Node B is having issues with GC so it
> > takes
> > >> a while to respond. Node A sees it as read timeout and reports it in
> > logs.
> > >> So the issue is on node B not node A.
> > >>
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >>
> > >>
> > >>
> > >> > On 7 May 2018, at 18:39, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> > >> >
> > >> > Yes, the nodes are well balanced. I am just using these boxes for
> > >> indexing
> > >> > the data and is not serving any traffic at this time.  The error
> > >> indicates
> > >> > it is having issues errors on the shards that are hosted on the box
> > and
> > >> not
> > >> > on the other box.
> > >> > I will check GC logs to see if there were any issues.
> > >> > thanks
> > >> >
> > >> > Thanks
> > >> > Jay Potharaju
> > >> >
> > >> >
> > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> > >> > emir.arnauto...@sematext.com> wrote:
> > >> >
> > >> >> Hi Jay,
> > >> >> My first guess would be that there was some major GC on other box
> so
> > it
> > >> >> did not respond on time. Are your nodes well balanced - do they
> serve
> > >> equal
> > >> >> amount of data?
> > >> >>
> > >> >> Thanks,
> > >> >> Emir
> > >> >> --
> > >> >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> >> Solr & Elasticsearch Consulting Support Training -
> > >> http://sematext.com/
> > >> >>
> > >> >>
> > >> >>
> > >> >>> On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com>
> > wrote:
> > >> >>>
> > >> >>> Hi,
> > >> >>> I am seeing the following lines in the error log. My setup has 2
> > >> nodes in
> > >> >>> the solrcloud cluster, each node has 3 shards with no replication.
> > >> From
> > >> >> the
> > >> >>> error log it seems like all the shards on this box are throwing
> > async
> > >> >>> exception errors. Other node in the cluster does not have any
> errors
> > >> in
> > >> >> the
> > >> >>> logs. Any suggestions on how to tackle this error?
> > >> >>>
> > >> >>> Solr setup
> > >> >>> Solr:6.6.3
> > >> >>> 2Nodes: 3 shards each
> > >> >>>
> > >> >>>
> > >> >>> ERROR org.apache.solr.servlet.HttpSolrCall
> [test_shard3_replica1] ?
> > >> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> > >> >> DistributedUpdatesAsyncException:
> > >> >>> Async exception duri

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

I didn't see any OOM errors in the logs on either of the nodes. I saw GC
pause of 1 second on the box that was throwing error ...but nothing on the
other node. Any other recommendations?
Thanks


Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju <jspothar...@gmail.com> wrote:

> Ah thanks for explaining that!
>
> Thanks
> Jay Potharaju
>
>
> On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Node A receives batch of documents to index. It forwards documents to
>> shards that are on the node B. Node B is having issues with GC so it takes
>> a while to respond. Node A sees it as read timeout and reports it in logs.
>> So the issue is on node B not node A.
>>
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 7 May 2018, at 18:39, Jay Potharaju <jspothar...@gmail.com> wrote:
>> >
>> > Yes, the nodes are well balanced. I am just using these boxes for
>> indexing
>> > the data and is not serving any traffic at this time.  The error
>> indicates
>> > it is having issues errors on the shards that are hosted on the box and
>> not
>> > on the other box.
>> > I will check GC logs to see if there were any issues.
>> > thanks
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>> >
>> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
>> > emir.arnauto...@sematext.com> wrote:
>> >
>> >> Hi Jay,
>> >> My first guess would be that there was some major GC on other box so it
>> >> did not respond on time. Are your nodes well balanced - do they serve
>> equal
>> >> amount of data?
>> >>
>> >> Thanks,
>> >> Emir
>> >> --
>> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> >> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> >>
>> >>
>> >>
>> >>> On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com> wrote:
>> >>>
>> >>> Hi,
>> >>> I am seeing the following lines in the error log. My setup has 2
>> nodes in
>> >>> the solrcloud cluster, each node has 3 shards with no replication.
>> From
>> >> the
>> >>> error log it seems like all the shards on this box are throwing async
>> >>> exception errors. Other node in the cluster does not have any errors
>> in
>> >> the
>> >>> logs. Any suggestions on how to tackle this error?
>> >>>
>> >>> Solr setup
>> >>> Solr:6.6.3
>> >>> 2Nodes: 3 shards each
>> >>>
>> >>>
>> >>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
>> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
>> >> DistributedUpdatesAsyncException:
>> >>> Async exception during distributed update: Read timed out
>> >>> at
>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
>> >> DistributedUpdateProcessor.java:972)
>> >>> at
>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
>> >> DistributedUpdateProcessor.java:1911)
>> >>> at
>> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>> >> ContentStreamHandlerBase.java:78)
>> >>> at
>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> >> RequestHandlerBase.java:173)
>> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
>> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
>> java:723)
>> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
>> >>> at
>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >> SolrDispatchFilter.java:361)
>> >>> at
>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >> SolrDispatchFilter.java:305)
>> >>> at
>> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> >> doFilter(ServletHandler.java:1691)
>> >>> at
>> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> >> ServletHandler.ja

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

Ah thanks for explaining that!

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Node A receives batch of documents to index. It forwards documents to
> shards that are on the node B. Node B is having issues with GC so it takes
> a while to respond. Node A sees it as read timeout and reports it in logs.
> So the issue is on node B not node A.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 May 2018, at 18:39, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > Yes, the nodes are well balanced. I am just using these boxes for
> indexing
> > the data and is not serving any traffic at this time.  The error
> indicates
> > it is having issues errors on the shards that are hosted on the box and
> not
> > on the other box.
> > I will check GC logs to see if there were any issues.
> > thanks
> >
> > Thanks
> > Jay Potharaju
> >
> >
> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Jay,
> >> My first guess would be that there was some major GC on other box so it
> >> did not respond on time. Are your nodes well balanced - do they serve
> equal
> >> amount of data?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>> I am seeing the following lines in the error log. My setup has 2 nodes
> in
> >>> the solrcloud cluster, each node has 3 shards with no replication. From
> >> the
> >>> error log it seems like all the shards on this box are throwing async
> >>> exception errors. Other node in the cluster does not have any errors in
> >> the
> >>> logs. Any suggestions on how to tackle this error?
> >>>
> >>> Solr setup
> >>> Solr:6.6.3
> >>> 2Nodes: 3 shards each
> >>>
> >>>
> >>> ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> >> DistributedUpdatesAsyncException:
> >>> Async exception during distributed update: Read timed out
> >>> at
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> >> DistributedUpdateProcessor.java:972)
> >>> at
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
> >> DistributedUpdateProcessor.java:1911)
> >>> at
> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> >> ContentStreamHandlerBase.java:78)
> >>> at
> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> >> RequestHandlerBase.java:173)
> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> >>> at
> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> SolrDispatchFilter.java:361)
> >>> at
> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> SolrDispatchFilter.java:305)
> >>> at
> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> >> doFilter(ServletHandler.java:1691)
> >>> at
> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> >> ServletHandler.java:582)
> >>> at
> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> ScopedHandler.java:143)
> >>> at
> >>> org.eclipse.jetty.security.SecurityHandler.handle(
> >> SecurityHandler.java:548)
> >>> at
> >>> org.eclipse.jetty.server.session.SessionHandler.
> >> doHandle(SessionHandler.java:226)
> >>> at
> >>> org.eclipse.jetty.server.handler.ContextHandler.
> >> doHandle(ContextHandler.java:1180)
> >>> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> >> ServletHandler.java:512)
> >>> at
> >>> org.eclipse.jetty.server.session.SessionHandler.
>

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

Yes, the nodes are well balanced. I am just using these boxes for indexing
the data and is not serving any traffic at this time.  The error indicates
it is having issues errors on the shards that are hosted on the box and not
on the other box.
I will check GC logs to see if there were any issues.
thanks

Thanks
Jay Potharaju


On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> My first guess would be that there was some major GC on other box so it
> did not respond on time. Are your nodes well balanced - do they serve equal
> amount of data?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > Hi,
> > I am seeing the following lines in the error log. My setup has 2 nodes in
> > the solrcloud cluster, each node has 3 shards with no replication. From
> the
> > error log it seems like all the shards on this box are throwing async
> > exception errors. Other node in the cluster does not have any errors in
> the
> > logs. Any suggestions on how to tackle this error?
> >
> > Solr setup
> > Solr:6.6.3
> > 2Nodes: 3 shards each
> >
> >
> > ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
> > null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> DistributedUpdatesAsyncException:
> > Async exception during distributed update: Read timed out
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(
> DistributedUpdateProcessor.java:972)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(
> DistributedUpdateProcessor.java:1911)
> > at
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:78)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:173)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:361)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:305)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:335)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> > at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> > at
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> > at
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> > at java.lang.Thread.run(Unknown Source)
> >
> >
> > Thanks
> > Jay
>
>

Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju

Hi,
I am seeing the following lines in the error log. My setup has 2 nodes in
the solrcloud cluster, each node has 3 shards with no replication. From the
error log it seems like all the shards on this box are throwing async
exception errors. Other node in the cluster does not have any errors in the
logs. Any suggestions on how to tackle this error?

Solr setup
Solr:6.6.3
2Nodes: 3 shards each


ERROR org.apache.solr.servlet.HttpSolrCall  [test_shard3_replica1] ?
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Read timed out
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:972)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1911)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Unknown Source)


Thanks
Jay

Re: How to ptotect middile initials during search

2018-04-18 Thread Jay Potharaju

A is part of stopwords ...that is why it got dropped. Protected words will
only stop it from stemming

https://lucene.apache.org/solr/guide/6_6/language-analysis.html

Thanks
Jay Potharaju


On Wed, Apr 18, 2018 at 11:35 AM, Wendy2 <wendy@rcsb.org> wrote:

> Hi fellow Users,
>
> Why did Solr return "Ellington, W.R." when I did a name search for
> "Ellington, A."?
> I even added "A." in the protwords.txt file. The debugQuery shows that the
> middle initial got dropped in the parsedquery.
> How can I make Solr NOT to drop the middle initial?  Thanks for your
> help!!
>
> ==Search results
> Ellington, A.D.
> Ellington, R.W..
>
> ===debugQuery=
> {
>   "responseHeader":{
> "status":0,
> "QTime":51,
> "params":{
>   "q":"\"Ellington, A.\"",
>   "indent":"on",
>   "fl":"audit_author.name",
>   "wt":"json",
>   "debugQuery":"true"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "audit_author.name":"Azzi, A., Clark, S.A., Ellington, R.W.,
> Chapman, M.S."},
>   {
> "audit_author.name":"Ye, X., Gorin, A., Ellington, A.D., Patel,
> D.J."}]
>   },
>   "debug":{
> "rawquerystring":"\"Ellington, A.\"",
> "querystring":"\"Ellington, A.\"",
>
> "parsedquery":"(+DisjunctionMaxQuery(((entity_name_com.name:
> ellington)^20.0)))/no_coord",
> "parsedquery_toString":"+((entity_name_com.name:ellington)^20.0)",
>"QParser":"ExtendedDismaxQParser",
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-18 Thread Jay Potharaju

Thanks Eric & Shawn for chiming in ! In my solrconfig the lucene version is
set to 6.6.3.  I do see that the index has lucene54 files.

With respect to the error regarding group faceting error it is similar
to what is being reported in SOLR-7867
<https://issues.apache.org/jira/browse/SOLR-7867>.

Thanks
Jay Potharaju

On Tue, Apr 17, 2018 at 8:17 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/17/2018 8:44 PM, Erick Erickson wrote:
>
>> The other possibility is that you have LuceneMatchVersion set to
>> 5-something in solrconfig.xml.
>>
>
> It's my understanding that luceneMatchVersion does NOT affect index format
> in any way, that about the only things that pay attention to this value are
> a subset of analysis components. Do I have an incorrect understanding?
>
> Does a Solr user even have the ability to influence the index format used
> without writing custom code?
>
> Thanks,
> Shawn
>
>

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-17 Thread Jay Potharaju

(SimpleFacets.java:405)
at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:803)



Thanks
Jay Potharaju


On Tue, Apr 17, 2018 at 8:10 AM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi
> Has anyone seen issues with group faceting on multivalued fields in solr
> 6x? Can any of the committers comment?
> Thanks
> Jay
>
> On Apr 16, 2018, at 1:44 PM, Jay Potharaju <jspothar...@gmail.com> wrote:
>
> I deleted my collection and rebuilt it to check if there are any issues
> with indexing. I didn't see any errors during indexing. My collection is
> sharded and we use implicit routing...But after rebuilding my collection
> also I am getting errors on group faceting. This is not happening all the
> time but rather on small subset of data, which is fixed by reindexing.
>
> Any suggestions on what else to check for??
>
> Thanks
> Jay Potharaju
>
>
> On Mon, Apr 16, 2018 at 10:20 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
>> Hi,
>> I am testing solr 6.6.3 and have been running into intermittent group
>> faceting errors.  I did some bulk indexing to  initially setup the
>> collection I have multiple facet fields it only throws error on one of
>> the fields. The issue goes away when I reindex the data.
>>
>> > required="false" multiValued="true" docValues="true"/>
>>  I am upgrading from solr 5.3, didn't see this issue with the existing
>> version we are using.  Any suggestions why this might be happening?
>>
>> Exception during facet.field: category_id
>>   at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMeth
>> od(HttpSolrClient.java:612)
>>   at org.apache.solr.client.solrj.impl.HttpSolrClient.request(Htt
>> pSolrClient.java:279)
>>   at org.apache.solr.client.solrj.impl.HttpSolrClient.request(Htt
>> pSolrClient.java:268)
>>   at org.apache.solr.client.solrj.SolrClient.request(SolrClient.j
>> ava:1219)
>>   at org.apache.solr.handler.component.HttpShardHandler.lambda$
>> submit$0(HttpShardHandler.java:163)
>>   at java.util.concurrent.FutureTask.run(Unknown Source)
>>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>>   at java.util.concurrent.FutureTask.run(Unknown Source)
>>   at com.codahale.metrics.InstrumentedExecutorService$Instrumente
>> dRunnable.run(InstrumentedExecutorService.java:176)
>>   at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>> Thanks
>> Jay
>>
>>
>
>

Re: solr 6.6.3 intermittent group faceting errors

2018-04-17 Thread Jay Potharaju

Hi 
Has anyone seen issues with group faceting on multivalued fields in solr 6x? 
Can any of the committers comment?
Thanks
Jay

> On Apr 16, 2018, at 1:44 PM, Jay Potharaju <jspothar...@gmail.com> wrote:
> 
> I deleted my collection and rebuilt it to check if there are any issues with 
> indexing. I didn't see any errors during indexing. My collection is sharded 
> and we use implicit routing...But after rebuilding my collection also I am 
> getting errors on group faceting. This is not happening all the time but 
> rather on small subset of data, which is fixed by reindexing.
> 
> Any suggestions on what else to check for??
> 
> Thanks
> Jay Potharaju
>  
> 
>> On Mon, Apr 16, 2018 at 10:20 AM, Jay Potharaju <jspothar...@gmail.com> 
>> wrote:
>> Hi,
>> I am testing solr 6.6.3 and have been running into intermittent group 
>> faceting errors.  I did some bulk indexing to  initially setup the 
>> collection I have multiple facet fields it only throws error on one of the 
>> fields. The issue goes away when I reindex the data.
>> 
>> > required="false" multiValued="true" docValues="true"/>
>> 
>>  I am upgrading from solr 5.3, didn't see this issue with the existing 
>> version we are using.  Any suggestions why this might be happening?
>> 
>> Exception during facet.field: category_id
>>   at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
>>   at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
>>   at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
>>   at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>>   at 
>> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
>>   at java.util.concurrent.FutureTask.run(Unknown Source)
>>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>>   at java.util.concurrent.FutureTask.run(Unknown Source)
>>   at 
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>   at 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>> Thanks
>> Jay  
>>  
>

Re: solr 6.6.3 intermittent group faceting errors

2018-04-16 Thread Jay Potharaju

I deleted my collection and rebuilt it to check if there are any issues
with indexing. I didn't see any errors during indexing. My collection is
sharded and we use implicit routing...But after rebuilding my collection
also I am getting errors on group faceting. This is not happening all the
time but rather on small subset of data, which is fixed by reindexing.

Any suggestions on what else to check for??

Thanks
Jay Potharaju


On Mon, Apr 16, 2018 at 10:20 AM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> I am testing solr 6.6.3 and have been running into intermittent group
> faceting errors.  I did some bulk indexing to  initially setup the
> collection I have multiple facet fields it only throws error on one of
> the fields. The issue goes away when I reindex the data.
>
>  required="false" multiValued="true" docValues="true"/>
>  I am upgrading from solr 5.3, didn't see this issue with the existing
> version we are using.  Any suggestions why this might be happening?
>
> Exception during facet.field: category_id
>   at org.apache.solr.client.solrj.impl.HttpSolrClient.
> executeMethod(HttpSolrClient.java:612)
>   at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
> HttpSolrClient.java:279)
>   at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
> HttpSolrClient.java:268)
>   at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>   at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(
> HttpShardHandler.java:163)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at com.codahale.metrics.InstrumentedExecutorService$
> InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> Thanks
> Jay
>
>

solr 6.6.3 intermittent group faceting errors

2018-04-16 Thread Jay Potharaju

Hi,
I am testing solr 6.6.3 and have been running into intermittent group
faceting errors.  I did some bulk indexing to  initially setup the
collection I have multiple facet fields it only throws error on one of the
fields. The issue goes away when I reindex the data.


 I am upgrading from solr 5.3, didn't see this issue with the existing
version we are using.  Any suggestions why this might be happening?

Exception during facet.field: category_id
  at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
  at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
  at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
  at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
  at
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
Thanks
Jay

this IndexWriter is closed

2018-04-09 Thread Jay Potharaju

Hi,
I am getting Indexwriter is closed error only on some of my shards in the
collection. This seems to be happening on leader shards only. There is are
other shards on the box and they are not throwing any error. Also there is
enough disc space on the box available at this time.

Solr: 5.3.0.

Any recommendations on how to address this issue??

null:org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:719)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:733)
at 
org.apache.lucene.index.IndexWriter.deleteDocuments(IndexWriter.java:1438)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:408)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:960)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:1360)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1154)
at 
org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:163)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:116)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.FileChannelImpl.write(Unknown Source)
at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
at java.nio.channels.Channels.writeFully(Unknown Source)
at java.nio.channels.Channels.access$000(Unknown Source)
at java.nio.channels.Channels$1.write(Unknown Source)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:271)
at java.util.zip.CheckedOutputStream.write(Unknown Source)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at

Re: Solr 6.6.3: Errors when using facet.field

2018-03-16 Thread Jay Potharaju

This is my
query: 
facet=true=true=true=product_id=true=category_id

Field def:



Tried adding both docvalues & without docvalues.

Shards: 2

Has anyone else experienced this error?

Thanks


Thanks
Jay Potharaju


On Fri, Mar 16, 2018 at 2:20 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> It looks like it was fixed as part of 6.6.3 : SOLR-6160
> <http://issues.apache.org/jira/browse/SOLR-6160>.
> FYI: I have 2 shards in my test environment.
>
>
> Thanks
> Jay Potharaju
>
>
> On Fri, Mar 16, 2018 at 2:07 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
>> Hi,
>> I am running a simple query with group by & faceting.
>>
>> facet=true=true=true=product_
>> id=true=true=product_id=1
>>
>>
>> When I run the query I get errors
>>
>>  
>>   
>> org.apache.solr.common.SolrException
>> java.lang.IllegalStateException
>> > name="error-class">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
>> > name="root-error-class">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
>>   
>>   Error from server at 
>> http://localhost:9223/solr/test2_shard2_replica1: Exception during 
>> facet.field: category_id
>>   > name="trace">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>>  Error from server at http://localhost:9223/solr/test2_shard2_replica1: 
>> Exception during facet.field: category_id
>>  at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
>>  at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
>>  at 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
>>  at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>>  at 
>> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at 
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>  at 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  at java.lang.Thread.run(Thread.java:748)
>> 
>>   500
>>
>>
>> The above query worked in solr 5.3. Any suggestions ?
>> Thanks
>> Jay Potharaju
>>
>>
>
>

Re: Solr 6.6.3: Errors when using facet.field

2018-03-16 Thread Jay Potharaju

It looks like it was fixed as part of 6.6.3 : SOLR-6160
<http://issues.apache.org/jira/browse/SOLR-6160>.
FYI: I have 2 shards in my test environment.


Thanks
Jay Potharaju


On Fri, Mar 16, 2018 at 2:07 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> I am running a simple query with group by & faceting.
>
> facet=true=true=true=
> product_id=true=true=
> product_id=1
>
>
> When I run the query I get errors
>
>   
>   
> org.apache.solr.common.SolrException
> java.lang.IllegalStateException
>  name="error-class">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
>  name="root-error-class">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
>   
>   Error from server at 
> http://localhost:9223/solr/test2_shard2_replica1: Exception during 
> facet.field: category_id
>name="trace">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>  Error from server at http://localhost:9223/solr/test2_shard2_replica1: 
> Exception during facet.field: category_id
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
>   at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>   at 
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 
>   500
>
>
> The above query worked in solr 5.3. Any suggestions ?
> Thanks
> Jay Potharaju
>
>

Solr 6.6.3: Errors when using facet.field

2018-03-16 Thread Jay Potharaju

Hi,
I am running a simple query with group by & faceting.

facet=true=true=true=product_id=true=true=product_id=1


When I run the query I get errors


  
org.apache.solr.common.SolrException
java.lang.IllegalStateException
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
  
  Error from server at
http://localhost:9223/solr/test2_shard2_replica1: Exception during
facet.field: category_id
  org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://localhost:9223/solr/test2_shard2_replica1:
Exception during facet.field: category_id
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

  500


The above query worked in solr 5.3. Any suggestions ?
Thanks
Jay Potharaju

Re: Solr 6.6.3: Cant create shard name with hyphen

2018-03-14 Thread Jay Potharaju

nvm i see the first comment on the ticket ...that hypens are allowed
but not when it is the first character. It looks like 5.5 was the last
version where it was supported.
Thanks
Jay

Thanks
Jay Potharaju


On Wed, Mar 14, 2018 at 9:29 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Thanks for the reply Shawn. Was this a recent change ? As per the ticket
> it was fixed in 6.0. Is this change(no hyphens as starting of name )
> applicable to all 6x versions.
> Thanks
>
>
> > On Mar 14, 2018, at 6:25 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> >> On 3/14/2018 6:20 PM, Jay Potharaju wrote:
> >> I am creating a new collection in solr 6.6.3 and it wont allow me
> create a
> >> shard with hyphen. This ticket(
> >> https://issues.apache.org/jira/browse/SOLR-8725) was closed earlier.
> But it
> >> is not working for me in 6.6.3.
> >> Upgrading from 5.3 to 6.6.3.
> >
> > I can get 6.6.3 to create cores/collections with a hyphen in the name --
> > unless the hyphen is the first character.
> >
> > Names with hyphens as the first character are not allowed.  This is an
> > example of a name that is likely to not work properly with all Solr
> > features.  It is extremely unlikely that this rule will be changed.
> >
> > I can see from previous email you've sent to the list that you have
> > asked about SolrCloud, so I'm betting you're running in cloud mode.
> >
> > If you were in standalone mode, then I would say that you can probably
> > override this behavior by creating the index with an allowed name,
> > stopping Solr, renaming directories, editing core.properties files, and
> > starting Solr back up ... but if you're in cloud mode, the names are
> > also in ZooKeeper.  There are no easy tools that I know of for editing
> > the ZK database.
> >
> > You are free to change the source code to remove the restrictions that
> > have been added ... but those restrictions were put in place for good
> > reason.  Even if you manage to create an index with a name that's
> > currently not allowed, that name might have problems with at least one
> > Solr feature.
> >
> > Thanks,
> > Shawn
> >
>

Re: Solr 6.6.3: Cant create shard name with hyphen

2018-03-14 Thread Jay Potharaju

Thanks for the reply Shawn. Was this a recent change ? As per the ticket it was 
fixed in 6.0. Is this change(no hyphens as starting of name ) applicable to all 
6x versions.
Thanks


> On Mar 14, 2018, at 6:25 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 3/14/2018 6:20 PM, Jay Potharaju wrote:
>> I am creating a new collection in solr 6.6.3 and it wont allow me create a
>> shard with hyphen. This ticket(
>> https://issues.apache.org/jira/browse/SOLR-8725) was closed earlier. But it
>> is not working for me in 6.6.3.
>> Upgrading from 5.3 to 6.6.3.
> 
> I can get 6.6.3 to create cores/collections with a hyphen in the name --
> unless the hyphen is the first character.
> 
> Names with hyphens as the first character are not allowed.  This is an
> example of a name that is likely to not work properly with all Solr
> features.  It is extremely unlikely that this rule will be changed.
> 
> I can see from previous email you've sent to the list that you have
> asked about SolrCloud, so I'm betting you're running in cloud mode.
> 
> If you were in standalone mode, then I would say that you can probably
> override this behavior by creating the index with an allowed name,
> stopping Solr, renaming directories, editing core.properties files, and
> starting Solr back up ... but if you're in cloud mode, the names are
> also in ZooKeeper.  There are no easy tools that I know of for editing
> the ZK database.
> 
> You are free to change the source code to remove the restrictions that
> have been added ... but those restrictions were put in place for good
> reason.  Even if you manage to create an index with a name that's
> currently not allowed, that name might have problems with at least one
> Solr feature.
> 
> Thanks,
> Shawn
>

Re: Solr 6.6.3: Cant create shard name with hyphen

2018-03-14 Thread Jay Potharaju

I tested in solr 6.5.1 and there also it is broken. Any recommendation
which version of 6 is that feature functioning. At this time the shard name
cant be changed because of dependency with other applications.
Thanks

Thanks
Jay Potharaju

On Wed, Mar 14, 2018 at 5:20 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi ,
> I am creating a new collection in solr 6.6.3 and it wont allow me create a
> shard with hyphen. This ticket(https://issues.apache.
> org/jira/browse/SOLR-8725) was closed earlier. But it is not working for
> me in 6.6.3.
> Upgrading from 5.3 to 6.6.3.
>
> Thanks
> Jay
>
>

Solr 6.6.3: Cant create shard name with hyphen

2018-03-14 Thread Jay Potharaju

Hi ,
I am creating a new collection in solr 6.6.3 and it wont allow me create a
shard with hyphen. This ticket(
https://issues.apache.org/jira/browse/SOLR-8725) was closed earlier. But it
is not working for me in 6.6.3.
Upgrading from 5.3 to 6.6.3.

Thanks
Jay

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

2018-03-14 Thread Jay Potharaju

Thanks for the response Rick!. I checked 6.6.2 and it has the same issue.
The only work around that I have now is comment out the
SynonymGraphFilterFactory as we are not using synonyms as of now. But would
like to know how to address this issue once we start using it down the line.
Thanks
J

Thanks
Jay Potharaju

On Wed, Mar 14, 2018 at 1:02 PM, Rick Leir <rl...@leirtech.com> wrote:

> Jay
> Did you try using text_en_splitting copied out of another release?
> Though if someone went to the trouble of removing it from the example,
> there could be something broken in it.
> Cheers -- Rick
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

2018-03-13 Thread Jay Potharaju

I am upgrading to solr 6.6.3 and one of my fields uses text_en_splitting.
Are there any recommendations on how to adjust the fieldtype definition for
these fields.
Thanks

Thanks
Jay Potharaju


On Wed, Feb 7, 2018 at 5:09 AM, Steve Rowe <sar...@gmail.com> wrote:

> Thanks Webster,
>
> I created https://issues.apache.org/jira/browse/SOLR-11955 to work on
> this.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 6, 2018, at 2:47 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
> >
> > I noticed that in some of the current example schemas that are shipped
> with
> > Solr, there is a fieldtype, text_en_splitting, that feeds the output
> > of SynonymGraphFilterFactory into WordDelimiterGraphFilterFactory. So if
> > this isn't supported, the example should probably be updated or removed.
> >
> > On Mon, Feb 5, 2018 at 10:27 AM, Steve Rowe <sar...@gmail.com> wrote:
> >
> >> Hi Александр,
> >>
> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>>
> >>> There should be no problem with using them together.
> >>
> >> I believe Shawn is wrong.
> >>
> >> From <http://lucene.apache.org/core/7_2_0/analyzers-common/
> >> org/apache/lucene/analysis/synonym/SynonymGraphFilter.html>:
> >>
> >>> NOTE: this cannot consume an incoming graph; results will be undefined.
> >>
> >> Unfortunately, the ref guide entry for Synonym Graph Filter <
> >> https://lucene.apache.org/solr/guide/7_2/filter-
> descriptions.html#synonym-
> >> graph-filter> doesn’t include a warning about this, but it should, like
> >> the warning on Word Delimiter Graph Filter <https://lucene.apache.org/
> >> solr/guide/7_2/filter-descriptions.html#word-delimiter-graph-filter>:
> >>
> >>> Note: although this filter produces correct token graphs, it cannot
> >> consume an input token graph correctly.
> >>
> >> (I’ve just committed a change to the ref guide source to add this also
> on
> >> the Synonym Graph Filter and Managed Synonym Graph Filter entries, to be
> >> included in the ref guide for Solr 7.3.)
> >>
> >> In short, the combination of the two filters is not supported, because
> >> WDGF produces a token graph, which SGF cannot correctly interpret.
> >>
> >> Other filters also have this issue, see e.g. <
> https://issues.apache.org/
> >> jira/browse/LUCENE-3475> for ShingleFilter; this issue has gotten some
> >> attention recently, and hopefully it will inspire fixes elsewhere.
> >>
> >> Patches welcome!
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>
> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>>
> >>> On 2/5/2018 3:55 AM, Александр Шестак wrote:
> >>>>
> >>>> Hi, I have misunderstanding about usage of SynonymGraphFilterFactory
> >>>> and  WordDelimiterGraphFilterFactory. Can they be used together?
> >>>>
> >>>
> >>> There should be no problem with using them together.  But it is always
> >>> possible that the behavior will surprise you, while working 100% as
> >>> designed.
> >>>
> >>>> I have solr type configured in next way
> >>>>
> >>>>  >>>> autoGeneratePhraseQueries="true">
> >>>>  
> >>>>
> >>>> >>>>generateWordParts="1" generateNumberParts="1"
> >>>> splitOnNumerics="1"
> >>>>catenateWords="1" catenateNumbers="1" catenateAll="0"
> >>>> preserveOriginal="1" protected="protwords_en.txt"/>
> >>>>
> >>>>  
> >>>>  
> >>>>
> >>>> >>>>generateWordParts="1" generateNumberParts="1"
> >>>> splitOnNumerics="1"
> >>>>catenateWords="0" catenateNumbers="0" catenateAll="0"
> >>>> preserveOriginal="1" protected="protwords_en.txt"/>
> >>>>
> >>>> >>>>synonyms="synonyms_en.txt" ignoreCase="true"
> expand="true"/>
> >>>>  
> >>>> 
> >>>>
> >>

book on solr

2017-10-12 Thread Jay Potharaju

Hi,
I am looking for a book that covers some basic principles on how to scale
solr. Are there any suggestions.
Example how to scale , by adding shards or replicas in the case of high rps
and high index rates.

Any blog or documentation also that would provide some basic rules or
guidelines for scaling would also be great.

Thanks
Jay Potharaju

Best practices for backup & restore

2017-05-16 Thread Jay Potharaju

Hi,
I was wondering if there are any best practices for doing solr backup &
restore. In the past when running backup, I stopped indexing during the
backup process.

I am looking at this documentation and it says that indexing can continue
when backup is in progress.
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups

Any recommendations ?

-- 
Thanks
Jay

Re: solrcloud load balancing

2016-10-22 Thread Jay Potharaju

Thanks Erick & Shawn for the response.

In case of non-distributed queries(single shard with replicas) is there a
way for me to determine how long does it take to retrieve the documents
 and send the response.

In my load test , i see that the response time at the client API is in
seconds but I am not able to see any high response time in the solr logs.
Is it possible that the under high load it takes a long time to retrieve
and send the documents?
If i run the same query in browser individually it comes back in quick time.

Thanks
Jay

On Sat, Oct 22, 2016 at 6:14 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/22/2016 6:19 PM, Jay Potharaju wrote:
> > I am trying to understand how load balancing works in solrcloud.
> >
> > As per my understanding solrcloud provides load balancing when querying
> > using an http endpoint.  When a query is sent to any of the nodes , solr
> > will intelligently decide which server can fulfill the request and will
> be
> > processed by one of the nodes in the cluster.
>
> Erick already responded, but I had this mostly written before I saw his
> response.  I decided to send it anyway.
>
> > 1) Does the logic change when there is only 1 shard vs multiple shards?
>
> The way I understand it, each shard is independently load balanced.  You
> might have a situation where one shard has more replicas than another
> shard, and I believe in that even in that situation, all replicas should
> be used.
>
> > 2) Does the QTime displayed is sum of processing time for the query
> request + latency(if processed by another node) + time to decide which node
> will process the request(which i am guessing is minimal and can be ignored)
>
> There are three phases in a distributed (multi-shard) query.
>
> 1) Each shard is sent the query, with the field list set to include the
> score, the unique key field, and if there is a sort parameter, whichever
> fields are used for sorting.  These requests happen in parallel.
> Whichever request takes the longest will determine the total time for
> this phase.
>
> 2) The responses from the subqueries are combined to determine which
> documents will make up the final result.
>
> 3) Additional queries are sent to the individual shards to retrieve the
> matching documents.  These requests are also in parallel, so the slowest
> such request will determine the time for this whole phase.
>
> > 3) In my solr logs i display the "slow" queries, is the qtime displayed
> > takes all of the above and shows the correct time taken.
>
> For non-distributed queries, QTime includes the time required to process
> the query, but not the time to retrieve the documents and send the
> response.  I *think* that when the query is distributed, QTime will be
> the sum of the first two phases that I mentioned above, but I'm not 100%
> sure.
>
> Thanks,
> Shawn
>
>


-- 
Thanks
Jay Potharaju

Re: solrcloud load balancing

2016-10-22 Thread Jay Potharaju

Thanks Erick for the response
I am currently using a load balancer for my solrcloud, but was particularly
interested to know if solrcloud is doing load balancing internally in the
case of a single shard.
All the documentation that I have seen assumes multi-shard scenarios but
not for a single shard. Can you please point me to some code/documenation
that can help me understand this better.

Thanks
Jay

On Sat, Oct 22, 2016 at 6:00 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> 1) Single shards have some short circuiting in them. And anyway it's
> best to have some kind of load balancer in front or use SolrJ with
> CloudSolrClient. If you just use an HTTP end-point, you have a single
> point of failure if that node goes down.
>
> 2) yes. What it does _not_ include is the time taken to assemble the
> final document list, i.e. get the "fl" parameters. And also note that
> there's "the laggard problem" here. The time will be something close
> to the _longest_ time it takes any replica to respond. Say you have 4
> shards and the replica for one of them happens to hit a 5 second
> stop-the-world GC collection. Your QTime will be 5 seconds+. I really
> have no idea whether the QTime includes the decision process for
> selecting nodes, but I've also never heard of it being significant.
>
> 3) I guess, although I'm not quite sure I understand the question.
> Slow queries will include (roughly) the max of the sub-request QTimes.
>
> Best,
> Erick
>
> On Sat, Oct 22, 2016 at 5:19 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> > Hi,
> > I am trying to understand how load balancing works in solrcloud.
> >
> > As per my understanding solrcloud provides load balancing when querying
> > using an http endpoint.  When a query is sent to any of the nodes , solr
> > will intelligently decide which server can fulfill the request and will
> be
> > processed by one of the nodes in the cluster.
> >
> > 1) Does the logic change when there is only 1 shard vs multiple shards?
> >
> > 2) Does the QTime displayed is sum of processing time for the query
> request
> > + latency(if processed by another node) + time to decide which node will
> > process the request(which i am guessing is minimal and can be ignored)
> >
> > 3) In my solr logs i display the "slow" queries, is the qtime displayed
> > takes all of the above and shows the correct time taken.
> >
> > Solr version: 5.5.0
> >
> >
> > --
> > Thanks
> > Jay
>



-- 
Thanks
Jay Potharaju

solrcloud load balancing

2016-10-22 Thread Jay Potharaju

Hi,
I am trying to understand how load balancing works in solrcloud.

As per my understanding solrcloud provides load balancing when querying
using an http endpoint.  When a query is sent to any of the nodes , solr
will intelligently decide which server can fulfill the request and will be
processed by one of the nodes in the cluster.

1) Does the logic change when there is only 1 shard vs multiple shards?

2) Does the QTime displayed is sum of processing time for the query request
+ latency(if processed by another node) + time to decide which node will
process the request(which i am guessing is minimal and can be ignored)

3) In my solr logs i display the "slow" queries, is the qtime displayed
takes all of the above and shows the correct time taken.

Solr version: 5.5.0


-- 
Thanks
Jay

Re: json facet - date range & interval

2016-06-28 Thread Jay Potharaju

that worked ...thanks David!

On Tue, Jun 28, 2016 at 11:22 AM, David Santamauro <
david.santama...@gmail.com> wrote:

>
> Have you tried %-escaping?
>
> json.facet = {
>   daterange : { type  : range,
> field : datefield,
> start : "NOW/DAY%2D10DAYS",
> end   : "NOW/DAY",
> gap   : "%2B1DAY"
>
>   }
> }
>
>
> On 06/28/2016 01:19 PM, Jay Potharaju wrote:
>
>> json.facet={daterange : {type : range, field : datefield, start :
>> "NOW/DAY-10DAYS", end : "NOW/DAY",gap:"\+1DAY"} }
>>
>> Escaping the plus sign also gives the same error. Any other suggestions
>> how
>> can i make this work?
>> Thanks
>> Jay
>>
>> On Mon, Jun 27, 2016 at 10:23 PM, Erick Erickson <erickerick...@gmail.com
>> >
>> wrote:
>>
>> First thing I'd do is escape the plus. It's probably being interpreted
>>> as a space.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Jun 27, 2016 at 9:24 AM, Jay Potharaju <jspothar...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I am trying to use the json range facet with a tdate field. I tried the
>>>> following but get an error. Any suggestions on how to fix the following
>>>> error /examples for date range facets.
>>>>
>>>> json.facet={daterange : {type : range, field : datefield, start
>>>> :"NOW-10DAYS", end : "NOW/DAY", gap : "+1DAY" } }
>>>>
>>>>   msg": "Can't add gap 1DAY to value Fri Jun 17 15:49:36 UTC 2016 for
>>>>
>>> field:
>>>
>>>> datefield", "code": 400
>>>>
>>>> --
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>>
>>


-- 
Thanks
Jay Potharaju

Re: json facet - date range & interval

2016-06-28 Thread Jay Potharaju

json.facet={daterange : {type : range, field : datefield, start :
"NOW/DAY-10DAYS", end : "NOW/DAY",gap:"\+1DAY"} }

Escaping the plus sign also gives the same error. Any other suggestions how
can i make this work?
Thanks
Jay

On Mon, Jun 27, 2016 at 10:23 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> First thing I'd do is escape the plus. It's probably being interpreted
> as a space.
>
> Best,
> Erick
>
> On Mon, Jun 27, 2016 at 9:24 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> > Hi,
> > I am trying to use the json range facet with a tdate field. I tried the
> > following but get an error. Any suggestions on how to fix the following
> > error /examples for date range facets.
> >
> > json.facet={daterange : {type : range, field : datefield, start
> > :"NOW-10DAYS", end : "NOW/DAY", gap : "+1DAY" } }
> >
> >  msg": "Can't add gap 1DAY to value Fri Jun 17 15:49:36 UTC 2016 for
> field:
> > datefield", "code": 400
> >
> > --
> > Thanks
> > Jay
>



-- 
Thanks
Jay Potharaju

json facet - date range & interval

2016-06-27 Thread Jay Potharaju

Hi,
I am trying to use the json range facet with a tdate field. I tried the
following but get an error. Any suggestions on how to fix the following
error /examples for date range facets.

json.facet={daterange : {type : range, field : datefield, start
:"NOW-10DAYS", end : "NOW/DAY", gap : "+1DAY" } }

 msg": "Can't add gap 1DAY to value Fri Jun 17 15:49:36 UTC 2016 for field:
datefield", "code": 400

-- 
Thanks
Jay

Re: Sorting & searching on the same field

2016-06-24 Thread Jay Potharaju

Thanks Alex, I will check this out.
 Is it possible to do something at query time , using a function query to 
lowercase the field and then sort on it.?
Jay

> On Jun 24, 2016, at 12:03 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
> wrote:
> 
> Keep voting for SOLR-8362?
> 
> You could do your preprocessing in UpdateRequestProcessor chain. There
> is nothing specifically for Lower/Upper case, but there is a generic
> scripting one: 
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
>> On 24 June 2016 at 13:42, Jay Potharaju <jspothar...@gmail.com> wrote:
>> Any ideas on how to handle case insensitive search, string fields and
>> docvalues in 1 field?
>> 
>> On Thu, Jun 23, 2016 at 8:14 PM, Alexandre Rafalovitch <arafa...@gmail.com>
>> wrote:
>> 
>>> At least you don't need to store the sort field. Or even index, if it is
>>> docvalues (good for sort).
>>> 
>>> Regards,
>>>Alex
>>>> On 24 Jun 2016 9:01 AM, "Jay Potharaju" <jspothar...@gmail.com> wrote:
>>>> 
>>>> yes, that is what i thought. but was checking to see if there was
>>> something
>>>> I was missing.
>>>> Thanks
>>>> 
>>>> On Thu, Jun 23, 2016 at 12:55 PM, Ahmet Arslan <iori...@yahoo.com.invalid
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Jay,
>>>>> 
>>>>> I don't think it can be combined.
>>>>> Mainly because: searching requires a tokenized field.
>>>>> Sorting requires a single value (token) to be meaningful.
>>>>> 
>>>>> Ahmet
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thursday, June 23, 2016 7:43 PM, Jay Potharaju <
>>> jspothar...@gmail.com
>>>>> 
>>>>> wrote:
>>>>> Hi,
>>>>> I would like to have 1 field that can used for both searching and case
>>>>> insensitive sorting. As far as i know the only way to do is to have two
>>>>> fields one for searching (text_en) and one for sorting(lowercase &
>>>> string).
>>>>> Any ideas how the two can be combined into 1 field.
>>>>> 
>>>>> 
>>>>> --
>>>>> Thanks
>>>>> Jay Potharaju
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks
>>>> Jay Potharaju
>> 
>> 
>> 
>> --
>> Thanks
>> Jay Potharaju

Re: Sorting & searching on the same field

2016-06-23 Thread Jay Potharaju

Any ideas on how to handle case insensitive search, string fields and
docvalues in 1 field?

On Thu, Jun 23, 2016 at 8:14 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> At least you don't need to store the sort field. Or even index, if it is
> docvalues (good for sort).
>
> Regards,
> Alex
> On 24 Jun 2016 9:01 AM, "Jay Potharaju" <jspothar...@gmail.com> wrote:
>
> > yes, that is what i thought. but was checking to see if there was
> something
> > I was missing.
> > Thanks
> >
> > On Thu, Jun 23, 2016 at 12:55 PM, Ahmet Arslan <iori...@yahoo.com.invalid
> >
> > wrote:
> >
> > > Hi Jay,
> > >
> > > I don't think it can be combined.
> > > Mainly because: searching requires a tokenized field.
> > > Sorting requires a single value (token) to be meaningful.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Thursday, June 23, 2016 7:43 PM, Jay Potharaju <
> jspothar...@gmail.com
> > >
> > > wrote:
> > > Hi,
> > > I would like to have 1 field that can used for both searching and case
> > > insensitive sorting. As far as i know the only way to do is to have two
> > > fields one for searching (text_en) and one for sorting(lowercase &
> > string).
> > > Any ideas how the two can be combined into 1 field.
> > >
> > >
> > > --
> > > Thanks
> > > Jay Potharaju
> > >
> >
> >
> >
> > --
> > Thanks
> > Jay Potharaju
> >
>



-- 
Thanks
Jay Potharaju

clarification on using docvalues for sorting

2016-06-23 Thread Jay Potharaju

Hi,
I am trying to do a case insensitive sorting on couple of fields.
For this I am doing the following

 
 

   



Above would not allow using this datatype with docvalues. Docvalues can
only be used with string & trie fields.
And also docvalues are recommended for sorting & faceting.

How can i accomplish using docvalues for case-insensitive field types.?
Or what I am trying to do is not possible.

-- 
Thanks
Jay

Slow facet range performance

2016-06-23 Thread Jay Potharaju

Hi,
I am running facet query on a date field and the results are coming back on
an avg  500ms. The field is set to use docvalues & field type is tdate.

SOLR - 5.5

=true=date_field_field.facet.interval=date_field_field.facet.interval.set=[NOW-7DAY,NOW]_field.facet.interval.set=[NOW-30DAY,NOW-7DAY]_field.facet.interval.set=[NOW-1MONTH,NOW-7DAY]_field.facet.interval.set=[NOW-1YEAR,NOW-1MONTH]

Any suggestions on how to speed this up?
-- 
Thanks
Jay

Re: Sorting & searching on the same field

2016-06-23 Thread Jay Potharaju

yes, that is what i thought. but was checking to see if there was something
I was missing.
Thanks

On Thu, Jun 23, 2016 at 12:55 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi Jay,
>
> I don't think it can be combined.
> Mainly because: searching requires a tokenized field.
> Sorting requires a single value (token) to be meaningful.
>
> Ahmet
>
>
>
> On Thursday, June 23, 2016 7:43 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> Hi,
> I would like to have 1 field that can used for both searching and case
> insensitive sorting. As far as i know the only way to do is to have two
> fields one for searching (text_en) and one for sorting(lowercase & string).
> Any ideas how the two can be combined into 1 field.
>
>
> --
> Thanks
> Jay Potharaju
>



-- 
Thanks
Jay Potharaju

Sorting & searching on the same field

2016-06-23 Thread Jay Potharaju

Hi,
I would like to have 1 field that can used for both searching and case
insensitive sorting. As far as i know the only way to do is to have two
fields one for searching (text_en) and one for sorting(lowercase & string).
Any ideas how the two can be combined into 1 field.


-- 
Thanks
Jay Potharaju

Re: result grouping in sharded index

2016-06-15 Thread Jay Potharaju

Collapse would also not work since it requires all the data to be on the
same shard.
"In order to use these features with SolrCloud, the documents must be
located on the same shard. To ensure document co-location, you can define
the router.name parameter as compositeId when creating the collection. "

On Wed, Jun 15, 2016 at 3:03 AM, Tom Evans <tevans...@googlemail.com> wrote:

> Do you have to group, or can you collapse instead?
>
>
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>
> Cheers
>
> Tom
>
> On Tue, Jun 14, 2016 at 4:57 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> > Any suggestions on how to handle result grouping in sharded index?
> >
> >
> > On Mon, Jun 13, 2016 at 1:15 PM, Jay Potharaju <jspothar...@gmail.com>
> > wrote:
> >
> >> Hi,
> >> I am working on a functionality that would require me to group documents
> >> by a id field. I read that the ngroups feature would not work in a
> sharded
> >> index.
> >> Can someone recommend how to handle this in a sharded index?
> >>
> >>
> >> Solr Version: 5.5
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
> >>
> >> --
> >> Thanks
> >> Jay
> >>
> >>
> >
> >
> >
> > --
> > Thanks
> > Jay Potharaju
>



-- 
Thanks
Jay Potharaju

Re: result grouping in sharded index

2016-06-14 Thread Jay Potharaju

Any suggestions on how to handle result grouping in sharded index?


On Mon, Jun 13, 2016 at 1:15 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> I am working on a functionality that would require me to group documents
> by a id field. I read that the ngroups feature would not work in a sharded
> index.
> Can someone recommend how to handle this in a sharded index?
>
>
> Solr Version: 5.5
>
>
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>
> --
> Thanks
> Jay
>
>



-- 
Thanks
Jay Potharaju

result grouping in sharded index

2016-06-13 Thread Jay Potharaju

Hi,
I am working on a functionality that would require me to group documents by
a id field. I read that the ngroups feature would not work in a sharded
index.
Can someone recommend how to handle this in a sharded index?


Solr Version: 5.5

https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats

-- 
Thanks
Jay

Re: Slow date filter query

2016-05-30 Thread Jay Potharaju

There are about 30 Million Docs and the index size is 75 GB. Using a full
timestamp value when querying and not using NOW.  The fq queries covers
almost all the docs(20+ million) in the index.
Thanks


On Mon, May 30, 2016 at 8:17 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Oops, fat fingers.
>
> see:
> searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> If you're not re-using the _same_ filter query, you'll be better
> off using fq={!cache=false}range_query
>
> Best,
> Erick
>
> On Mon, May 30, 2016 at 8:16 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> > That does seem long, but you haven't provided many details
> > about the fields. Are there 100 docs in your index? 100M docs? 500M docs?
> >
> > Are you using NOW in appropriately? See:
> >
> > On Fri, May 27, 2016 at 1:32 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> >> Hi,
> >> I am running filter query(range query) on date fields(high cardinality)
> and
> >> the performance is really bad ...it takes about 2-5 seconds for it to
> come
> >> back with response. I am rebuilding the index to have docvalues & tdates
> >> instead of "date" field. But not sure if that will alleviate the problem
> >> because of high cardinality.
> >>
> >> Can I store the date as MMDD and run range queries on them instead
> of
> >> date fields?
> >> Is that a good option?
> >>
> >> --
> >> Thanks
> >> Jay
>



-- 
Thanks
Jay Potharaju

Slow date filter query

2016-05-27 Thread Jay Potharaju

Hi,
I am running filter query(range query) on date fields(high cardinality) and
the performance is really bad ...it takes about 2-5 seconds for it to come
back with response. I am rebuilding the index to have docvalues & tdates
instead of "date" field. But not sure if that will alleviate the problem
because of high cardinality.

Can I store the date as MMDD and run range queries on them instead of
date fields?
Is that a good option?

-- 
Thanks
Jay

Re: debugging solr query

2016-05-27 Thread Jay Potharaju

Thanks for the suggestion. At this time I wont be able to change any code
in the API ...my options are limited to changing things at the solr level.
Any suggestions regarding solr settings in config or schema changes are
something in my control.



On Fri, May 27, 2016 at 7:03 AM, Ahmet Arslan <iori...@yahoo.com> wrote:

> Hi Jay,
>
> Please separate the clauses. Feed one of them to the main q parameter with
> content score operator =^ since you are sorting on a structured field(e.g.
> date)
>
> q:fieldB:(123 OR 456)^=1.0
> =dt1:[date1 TO *]
> =dt2:[* TO NOW/DAY+1]
> =fieldA:abc
> =dt1 asc,field2 asc, fieldC desc
>
> Play with the caches.
> Also consider disabling caching, and/or supplying execution order for the
> filer queries.
> Please see :
> https://lucidworks.com/blog/2012/02/10/advanced-filter-caching-in-solr/
>
> Ahmet
>
>
>
> On Friday, May 27, 2016 4:01 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> I updated almost 1/3 of the data and ran my queries with new columns as
> mentioned earlier. The query returns data in  almost half the time as
> compared to before.
> I am thinking that if I update all the columns there would not be much
> difference in query response time.
>
>  
>
>   default=""/>
>
> Are there any suggestions on how handle filtering/querying/sorting on high
> cardinality date fields?
>
> Index size: 30Million
> Solr: 4.3.1
>
> Thanks
>
> On Thu, May 26, 2016 at 6:04 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
> > Hi,
> > Thanks for the feedback. The queries I run are very basic filter queries
> > with some sorting.
> >
> > q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
> > fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
> >
> > I noticed that the date fields(dt1,dt2) are using date instead of tdate
> > fields & there are no docValues set on any of the fields used for
> sorting.
> >
> > In order to fix this I plan to add a new field using tdate & docvalues
> > where required to the schema & update the new columns only for documents
> > that have fieldA set to abc. Once the fields are updated query on the new
> > fields to measure query performance .
> >
> >
> >- Would the new added fields be used effectively by the solr index
> >when querying & filtering? What I am not sure is whether only
> populating
> >small number of documents(fieldA:abc) that are used for the above
> query
> >provide performance benefits.
> >- Would there be a performance penalty because majority of the
> >documents(!fieldA:abc) dont have values in the new columns?
> >
> > Thanks
> >
> > On Wed, May 25, 2016 at 8:40 PM, Jay Potharaju <jspothar...@gmail.com>
> > wrote:
> >
> >> Any links that illustrate and talk about solr internals and how
> >> indexing/querying works would be a great help.
> >> Thanks
> >> Jay
> >>
> >> On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju <jspothar...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>> Thanks for the feedback. The queries I run are very basic filter
> queries
> >>> with some sorting.
> >>>
> >>> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
> >>> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
> >>>
> >>> I noticed that the date fields(dt1,dt2) are using date instead of tdate
> >>> fields & there are no docValues set on any of the fields used for
> sorting.
> >>>
> >>> In order to fix this I plan to add a new field using tdate & docvalues
> >>> where required to the schema & update the new columns only for
> documents
> >>> that have fieldA set to abc. Once the fields are updated query on the
> new
> >>> fields to measure query performance .
> >>>
> >>>
> >>>- Would the new added fields be used effectively by the solr index
> >>>when querying & filtering? What I am not sure is whether only
> populating
> >>>small number of documents(fieldA:abc) that are used for the above
> query
> >>>provide performance benefits.
> >>>- Would there be a performance penalty because majority of the
> >>>documents(!fieldA:abc) dont have values in the new columns?
> >>>
> >>>
> >>> Thanks
> >>> Jay
> >>>
> >>> On Tu

Re: debugging solr query

2016-05-27 Thread Jay Potharaju

I updated almost 1/3 of the data and ran my queries with new columns as
mentioned earlier. The query returns data in  almost half the time as
compared to before.
I am thinking that if I update all the columns there would not be much
difference in query response time.

 

 

Are there any suggestions on how handle filtering/querying/sorting on high
cardinality date fields?

Index size: 30Million
Solr: 4.3.1

Thanks

On Thu, May 26, 2016 at 6:04 AM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> Thanks for the feedback. The queries I run are very basic filter queries
> with some sorting.
>
> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>
> I noticed that the date fields(dt1,dt2) are using date instead of tdate
> fields & there are no docValues set on any of the fields used for sorting.
>
> In order to fix this I plan to add a new field using tdate & docvalues
> where required to the schema & update the new columns only for documents
> that have fieldA set to abc. Once the fields are updated query on the new
> fields to measure query performance .
>
>
>- Would the new added fields be used effectively by the solr index
>when querying & filtering? What I am not sure is whether only populating
>small number of documents(fieldA:abc) that are used for the above query
>provide performance benefits.
>- Would there be a performance penalty because majority of the
>    documents(!fieldA:abc) dont have values in the new columns?
>
> Thanks
>
> On Wed, May 25, 2016 at 8:40 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
>> Any links that illustrate and talk about solr internals and how
>> indexing/querying works would be a great help.
>> Thanks
>> Jay
>>
>> On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju <jspothar...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> Thanks for the feedback. The queries I run are very basic filter queries
>>> with some sorting.
>>>
>>> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
>>> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>>>
>>> I noticed that the date fields(dt1,dt2) are using date instead of tdate
>>> fields & there are no docValues set on any of the fields used for sorting.
>>>
>>> In order to fix this I plan to add a new field using tdate & docvalues
>>> where required to the schema & update the new columns only for documents
>>> that have fieldA set to abc. Once the fields are updated query on the new
>>> fields to measure query performance .
>>>
>>>
>>>- Would the new added fields be used effectively by the solr index
>>>when querying & filtering? What I am not sure is whether only populating
>>>small number of documents(fieldA:abc) that are used for the above query
>>>provide performance benefits.
>>>- Would there be a performance penalty because majority of the
>>>documents(!fieldA:abc) dont have values in the new columns?
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson <erickerick...@gmail.com
>>> > wrote:
>>>
>>>> Try adding debug=timing, that'll give you an idea of what component is
>>>> taking all the time.
>>>> From there, it's "more art than science".
>>>>
>>>> But you haven't given us much to go on. What is the query? Are you
>>>> grouping?
>>>> Faceting on high-cardinality fields? Returning 10,000 rows?
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
>>>> wrote:
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > Is it QueryComponent taking time?
>>>> > Ot other components?
>>>> >
>>>> > Also make sure there is plenty of RAM for OS cache.
>>>> >
>>>> > Ahmet
>>>> >
>>>> > On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju <
>>>> jspothar...@gmail.com> wrote:
>>>> >
>>>> >
>>>> >
>>>> > Hi,
>>>> > I am trying to debug solr performance problems on an old version of
>>>> solr,
>>>> > 4.3.1.
>>>> > The queries are taking really long -in the range of 2-5 seconds!!.
>>>> > Running filter query with only one condition also takes about a
>>>> second.
>>>> >
>>>> > There is memory available on the box for solr to use. I have been
>>>> looking
>>>> > at the following link but was looking for some more reference that
>>>> would
>>>> > tell me why a particular query is slow.
>>>> >
>>>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>>>> >
>>>> > Solr version:4.3.1
>>>> > Index size:128 GB
>>>> > Heap:65 GB
>>>> > Index size:75 GB
>>>> > Memory usage:70 GB
>>>> >
>>>> > Even though there is available memory is high all is not being used
>>>> ..i
>>>> > would expect the complete index to be in memory but it doesnt look
>>>> like it
>>>> > is. Any recommendations ??
>>>> >
>>>> > --
>>>> > Thanks
>>>> > Jay
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Jay Potharaju
>>>
>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>
>>
>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju

Re: debugging solr query

2016-05-26 Thread Jay Potharaju

Hi,
Thanks for the feedback. The queries I run are very basic filter queries
with some sorting.

q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc

I noticed that the date fields(dt1,dt2) are using date instead of tdate
fields & there are no docValues set on any of the fields used for sorting.

In order to fix this I plan to add a new field using tdate & docvalues
where required to the schema & update the new columns only for documents
that have fieldA set to abc. Once the fields are updated query on the new
fields to measure query performance .


   - Would the new added fields be used effectively by the solr index when
   querying & filtering? What I am not sure is whether only populating small
   number of documents(fieldA:abc) that are used for the above query provide
   performance benefits.
   - Would there be a performance penalty because majority of the
   documents(!fieldA:abc) dont have values in the new columns?

Thanks

On Wed, May 25, 2016 at 8:40 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Any links that illustrate and talk about solr internals and how
> indexing/querying works would be a great help.
> Thanks
> Jay
>
> On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
>> Hi,
>> Thanks for the feedback. The queries I run are very basic filter queries
>> with some sorting.
>>
>> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
>> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>>
>> I noticed that the date fields(dt1,dt2) are using date instead of tdate
>> fields & there are no docValues set on any of the fields used for sorting.
>>
>> In order to fix this I plan to add a new field using tdate & docvalues
>> where required to the schema & update the new columns only for documents
>> that have fieldA set to abc. Once the fields are updated query on the new
>> fields to measure query performance .
>>
>>
>>- Would the new added fields be used effectively by the solr index
>>when querying & filtering? What I am not sure is whether only populating
>>small number of documents(fieldA:abc) that are used for the above query
>>provide performance benefits.
>>- Would there be a performance penalty because majority of the
>>documents(!fieldA:abc) dont have values in the new columns?
>>
>>
>> Thanks
>> Jay
>>
>> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> Try adding debug=timing, that'll give you an idea of what component is
>>> taking all the time.
>>> From there, it's "more art than science".
>>>
>>> But you haven't given us much to go on. What is the query? Are you
>>> grouping?
>>> Faceting on high-cardinality fields? Returning 10,000 rows?
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
>>> wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > Is it QueryComponent taking time?
>>> > Ot other components?
>>> >
>>> > Also make sure there is plenty of RAM for OS cache.
>>> >
>>> > Ahmet
>>> >
>>> > On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju <
>>> jspothar...@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > Hi,
>>> > I am trying to debug solr performance problems on an old version of
>>> solr,
>>> > 4.3.1.
>>> > The queries are taking really long -in the range of 2-5 seconds!!.
>>> > Running filter query with only one condition also takes about a second.
>>> >
>>> > There is memory available on the box for solr to use. I have been
>>> looking
>>> > at the following link but was looking for some more reference that
>>> would
>>> > tell me why a particular query is slow.
>>> >
>>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>>> >
>>> > Solr version:4.3.1
>>> > Index size:128 GB
>>> > Heap:65 GB
>>> > Index size:75 GB
>>> > Memory usage:70 GB
>>> >
>>> > Even though there is available memory is high all is not being used ..i
>>> > would expect the complete index to be in memory but it doesnt look
>>> like it
>>> > is. Any recommendations ??
>>> >
>>> > --
>>> > Thanks
>>> > Jay
>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>
>>
>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju

Re: debugging solr query

2016-05-25 Thread Jay Potharaju

Any links that illustrate and talk about solr internals and how
indexing/querying works would be a great help.
Thanks
Jay

On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> Thanks for the feedback. The queries I run are very basic filter queries
> with some sorting.
>
> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>
> I noticed that the date fields(dt1,dt2) are using date instead of tdate
> fields & there are no docValues set on any of the fields used for sorting.
>
> In order to fix this I plan to add a new field using tdate & docvalues
> where required to the schema & update the new columns only for documents
> that have fieldA set to abc. Once the fields are updated query on the new
> fields to measure query performance .
>
>
>- Would the new added fields be used effectively by the solr index
>when querying & filtering? What I am not sure is whether only populating
>small number of documents(fieldA:abc) that are used for the above query
>provide performance benefits.
>- Would there be a performance penalty because majority of the
>documents(!fieldA:abc) dont have values in the new columns?
>
>
> Thanks
> Jay
>
> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Try adding debug=timing, that'll give you an idea of what component is
>> taking all the time.
>> From there, it's "more art than science".
>>
>> But you haven't given us much to go on. What is the query? Are you
>> grouping?
>> Faceting on high-cardinality fields? Returning 10,000 rows?
>>
>> Best,
>> Erick
>>
>> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
>> wrote:
>> >
>> >
>> > Hi,
>> >
>> > Is it QueryComponent taking time?
>> > Ot other components?
>> >
>> > Also make sure there is plenty of RAM for OS cache.
>> >
>> > Ahmet
>> >
>> > On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju <
>> jspothar...@gmail.com> wrote:
>> >
>> >
>> >
>> > Hi,
>> > I am trying to debug solr performance problems on an old version of
>> solr,
>> > 4.3.1.
>> > The queries are taking really long -in the range of 2-5 seconds!!.
>> > Running filter query with only one condition also takes about a second.
>> >
>> > There is memory available on the box for solr to use. I have been
>> looking
>> > at the following link but was looking for some more reference that would
>> > tell me why a particular query is slow.
>> >
>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > Solr version:4.3.1
>> > Index size:128 GB
>> > Heap:65 GB
>> > Index size:75 GB
>> > Memory usage:70 GB
>> >
>> > Even though there is available memory is high all is not being used ..i
>> > would expect the complete index to be in memory but it doesnt look like
>> it
>> > is. Any recommendations ??
>> >
>> > --
>> > Thanks
>> > Jay
>>
>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju

Re: How to save index data to other place? [scottchu]

2016-05-25 Thread Jay Potharaju

use property.*dataDir*=*value*
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties

On Wed, May 25, 2016 at 8:20 PM, scott.chu <scott@udngroup.com> wrote:

>
> When I create a collection, say named 'cugna'. Solr create a folder with
> same name under server\slolr, e.g. /local/solr-5.4.1/server/solr/cugna.
> Index data is also saved there. But wish to save index data on other
> folder, say /var/sc_data/cugna. How can I dothis?
>
> scott.chu，scott@udngroup.com
> 2016/5/26 (週四)
>



-- 
Thanks
Jay Potharaju

Re: How to perform a contains query

2016-05-25 Thread Jay Potharaju

code and a custom JMX configuration, allows remote attackers to
>> execute
>> : > arbitrary code by uploading and accessing a JSP file.",
>> : >
>> : > "summary": "A certain tomcat7 package for Apache Tomcat 7 in
>> Red Hat
>> : > Enterprise Linux (RHEL) 7 allows remote attackers to cause a denial of
>> : > service (CPU consumption) via a crafted request.  NOTE: this
>> vulnerability
>> : > exists because of an unspecified regression.",
>> : >
>> : > "summary": "Apache Tomcat 7.0.0 through 7.0.3, 6.0.x, and
>> 5.5.x,
>> : > when running within a SecurityManager, does not make the
>> ServletContext
>> : > attribute read-only, which allows local web applications to read or
>> write
>> : > files outside of the intended working directory, as demonstrated
>> using a
>> : > directory traversal attack.",
>> : >
>> : > "summary": "Apache Tomcat 7.0.11, when web.xml has no login
>> : > configuration, does not follow security constraints, which allows
>> remote
>> : > attackers to bypass intended access restrictions via HTTP requests to
>> a
>> : > meta-data complete web application.  NOTE: this vulnerability exists
>> because
>> : > of an incorrect fix for CVE-2011-1088 and CVE-2011-1419.",
>> : >
>> : > "summary": "Apache Tomcat 7.x before 7.0.11, when web.xml has
>> no
>> : > security constraints, does not follow ServletSecurity annotations,
>> which
>> : > allows remote attackers to bypass intended access restrictions via
>> HTTP
>> : > requests to a web application.  NOTE: this vulnerability exists
>> because of
>> : > an incomplete fix for CVE-2011-1088.",
>> : >
>> : > "summary": "The HTTP BIO connector in Apache Tomcat 7.0.x
>> before
>> : > 7.0.12 does not properly handle HTTP pipelining, which allows remote
>> : > attackers to read responses intended for other clients in
>> opportunistic
>> : > circumstances by examining the application data in HTTP packets,
>> related to
>> : > \"a mix-up of responses for requests from different users.\"",
>> : >
>> : > "summary": "Apache Tomcat 7.0.12 and 7.0.13 processes the
>> first
>> : > request to a servlet without following security constraints that have
>> been
>> : > configured through annotations, which allows remote attackers to
>> bypass
>> : > intended access restrictions via HTTP requests. NOTE: this
>> vulnerability
>> : > exists because of an incomplete fix for CVE-2011-1088, CVE-2011-1183,
>> and
>> : > CVE-2011-1419.",
>> : >
>> : > "summary": "Apache Tomcat 7.0.x before 7.0.17 permits web
>> : > applications to replace an XML parser used for other web
>> applications, which
>> : > allows local users to read or modify the (1) web.xml, (2)
>> context.xml, or
>> : > (3) tld files of arbitrary web applications via a crafted application
>> that
>> : > is loaded earlier than the target application.  NOTE: this
>> vulnerability
>> : > exists because of a CVE-2009-0783 regression.",
>> : >
>> : > "summary": "Certain AJP protocol connector implementations in
>> Apache
>> : > Tomcat 7.0.0 through 7.0.20, 6.0.0 through 6.0.33, 5.5.0 through
>> 5.5.33, and
>> : > possibly other versions allow remote attackers to spoof AJP requests,
>> bypass
>> : > authentication, and obtain sensitive information by causing the
>> connector to
>> : > interpret a request body as a new request.",
>> : >
>> : > "summary": "** DISPUTED ** Apache Tomcat 7.x uses
>> world-readable
>> : > permissions for the log directory and its files, which might allow
>> local
>> : > users to obtain sensitive information by reading a file. NOTE: One
>> Tomcat
>> : > distributor has stated \"The tomcat log directory does not contain any
>> : > sensitive information.\"",
>> : >
>> : > "summary":
>> "java/org/apache/catalina/core/AsyncContextImpl.java in
>> : > Apache Tomcat 7.x before 7.0.40 does not properly handle the throwing
>> of a
>> : > RuntimeException in an AsyncListener in an application, which allows
>> : > context-dependent attackers to obtain sensitive request information
>> intended
>> : > for other applications in opportunistic circumstances via an
>> application
>> : > that records the requests that it processes.",
>> : >
>> : > "summary": "Session fixation vulnerability in Apache Tomcat
>> 7.x
>> : > before 7.0.66, 8.x before 8.0.30, and 9.x before 9.0.0.M2, when
>> different
>> : > session settings are used for deployments of multiple versions of the
>> same
>> : > web application, might allow remote attackers to hijack web sessions
>> by
>> : > leveraging use of a requestedSessionSSL field for an unintended
>> request,
>> : > related to CoyoteAdapter.java and Request.java.",
>> : >
>> : > "summary": "The (1) Manager and (2) Host Manager applications
>> in
>> : > Apache Tomcat 7.x before 7.0.68, 8.x before 8.0.31, and 9.x before
>> 9.0.0.M2
>> : > establish sessions and send CSRF tokens for arbitrary new requests,
>> which
>> : > allows remote attackers to bypass a CSRF protection mechanism by
>> using a
>> : > token.",
>> : >
>> : > "summary": "The setGlobalContext method in
>> : > org/apache/naming/factory/ResourceLinkFactory.java in Apache Tomcat
>> 7.x
>> : > before 7.0.68, 8.x before 8.0.31, and 9.x before 9.0.0.M3 does not
>> consider
>> : > whether ResourceLinkFactory.setGlobalContext callers are authorized,
>> which
>> : > allows remote authenticated users to bypass intended SecurityManager
>> : > restrictions and read or write to arbitrary application data, or
>> cause a
>> : > denial of service (application disruption), via a web application
>> that sets
>> : > a crafted global context.",
>> :
>> :
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>


-- 
Thanks
Jay Potharaju

Re: debugging solr query

2016-05-25 Thread Jay Potharaju

Hi,
Thanks for the feedback. The queries I run are very basic filter queries
with some sorting.

q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc

I noticed that the date fields(dt1,dt2) are using date instead of tdate
fields & there are no docValues set on any of the fields used for sorting.

In order to fix this I plan to add a new field using tdate & docvalues
where required to the schema & update the new columns only for documents
that have fieldA set to abc. Once the fields are updated query on the new
fields to measure query performance .


   - Would the new added fields be used effectively by the solr index when
   querying & filtering? What I am not sure is whether only populating small
   number of documents(fieldA:abc) that are used for the above query provide
   performance benefits.
   - Would there be a performance penalty because majority of the
   documents(!fieldA:abc) dont have values in the new columns?


Thanks
Jay

On Tue, May 24, 2016 at 8:06 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Try adding debug=timing, that'll give you an idea of what component is
> taking all the time.
> From there, it's "more art than science".
>
> But you haven't given us much to go on. What is the query? Are you
> grouping?
> Faceting on high-cardinality fields? Returning 10,000 rows?
>
> Best,
> Erick
>
> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
> >
> >
> > Hi,
> >
> > Is it QueryComponent taking time?
> > Ot other components?
> >
> > Also make sure there is plenty of RAM for OS cache.
> >
> > Ahmet
> >
> > On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> >
> >
> >
> > Hi,
> > I am trying to debug solr performance problems on an old version of solr,
> > 4.3.1.
> > The queries are taking really long -in the range of 2-5 seconds!!.
> > Running filter query with only one condition also takes about a second.
> >
> > There is memory available on the box for solr to use. I have been looking
> > at the following link but was looking for some more reference that would
> > tell me why a particular query is slow.
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Solr version:4.3.1
> > Index size:128 GB
> > Heap:65 GB
> > Index size:75 GB
> > Memory usage:70 GB
> >
> > Even though there is available memory is high all is not being used ..i
> > would expect the complete index to be in memory but it doesnt look like
> it
> > is. Any recommendations ??
> >
> > --
> > Thanks
> > Jay
>



-- 
Thanks
Jay Potharaju

debugging solr query

2016-05-24 Thread Jay Potharaju

Hi,
I am trying to debug solr performance problems on an old version of solr,
4.3.1.
The queries are taking really long -in the range of 2-5 seconds!!.
Running filter query with only one condition also takes about a second.

There is memory available on the box for solr to use. I have been looking
at the following link but was looking for some more reference that would
tell me why a particular query is slow.

https://wiki.apache.org/solr/SolrPerformanceProblems

Solr version:4.3.1
Index size:128 GB
Heap:65 GB
Index size:75 GB
Memory usage:70 GB

Even though there is available memory is high all is not being used ..i
would expect the complete index to be in memory but it doesnt look like it
is. Any recommendations ??

-- 
Thanks
Jay

Re: Filter queries & caching

2016-05-10 Thread Jay Potharaju

Thanks for the explanation Eric.

So that I understand this clearly


1)  fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *])
&& fq=type:abc
2) fq= fromfield:[* TO NOW/DAY+1DAY]&& fq=tofield:[NOW/DAY-7DAY TO *]) &&
fq=type:abc

Using 1) would benefit from having 2 separate filter caches instead of 3
slots in the cache. But in general both would be using the filter cache.
And secondly it would  be more useful to use filter() in a scenario like
above(mentioned in your email).
Thanks




On Mon, May 9, 2016 at 9:43 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> You're confusing a query clause with fq when thinking about filter() I
> think.
>
> Essentially they don't need to be used together, i.e.
>
> q=myclause AND filter(field:value)
>
> is identical to
>
> q=myclause=field:value
>
> both in docs returned and filterCache usage.
>
> q=myclause(fq=field:value)
>
> actually uses two filterCache entries, so is probably not what you want to
> use.
>
> the filter() syntax attached to a q clause (not an fq clause) is meant
> to allow you to get speedups
> you want to use compound clauses without having every combination be
> separate filterCache entries.
>
> Consider the following:
> fq=A OR B
> fq=A AND B
> fq=A
> fq=B
>
> These would require 4 filterCache entries.
>
> q=filter(A) OR filter(B)
> q=filter(A) AND filter(B)
> q=filter(A)
> q=filter(B)
>
> would only require two. Yet all of them would be satisfied only by
> looking at the filterCache.
>
> Aside from the example immediately above, which one you use is largely
> a matter of taste.
>
> Best,
> Erick
>
> On Mon, May 9, 2016 at 12:47 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> > Thanks Ahmet...but I am not still clear how is adding filter() option
> > better or is it the same as filtercache?
> >
> > My question is below.
> >
> > "As mentioned above adding filter() will add the filter query to the
> cache.
> > This would mean that results are fetched from cache instead of running n
> > number of filter queries  in parallel.
> > Is it necessary to use the filter() option? I was under the impression
> that
> > all filter queries will get added to the "filtercache". What is the
> > advantage of using filter()?"
> >
> > Thanks
> >
> > On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> > wrote:
> >
> >> Hi,
> >>
> >> As I understand it useful incase you use an OR operator between two
> >> restricting clauses.
> >> Recall that multiple fq means implicit AND.
> >>
> >> ahmet
> >>
> >>
> >>
> >> On Monday, May 9, 2016 4:02 AM, Jay Potharaju <jspothar...@gmail.com>
> >> wrote:
> >> As mentioned above adding filter() will add the filter query to the
> cache.
> >> This would mean that results are fetched from cache instead of running n
> >> number of filter queries  in parallel.
> >> Is it necessary to use the filter() option? I was under the impression
> that
> >> all filter queries will get added to the "filtercache". What is the
> >> advantage of using filter()?
> >>
> >> *From
> >> doc:
> >>
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >> <
> >>
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >> >*
> >> This cache is used by SolrIndexSearcher for filters (DocSets) for
> unordered
> >> sets of all documents that match a query. The numeric attributes control
> >> the number of entries in the cache.
> >> Solr uses the filterCache to cache results of queries that use the fq
> >> search parameter. Subsequent queries using the same parameter setting
> >> result in cache hits and rapid returns of results. See Searching for a
> >> detailed discussion of the fq parameter.
> >>
> >> *From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
> >> <http://yonik.com/solr/query-syntax/#FilterQuery>*
> >>
> >> (Since Solr 5.4)
> >>
> >> A filter query retrieves a set of documents matching a query from the
> >> filter cache. Since scores are not cached, all documents that match the
> >> filter produce the same score (0 by default). Cached filters will be
> >> extremely fast when they are used again in another query.
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Fr

Error on creating new collection with existing configs

2016-05-09 Thread Jay Potharaju

Hi,
I created a new config and uploaded it to zk with the name test_conf. And
then created a collection which uses this config.

CREATE COLLECTION:
/solr/admin/collections?action=CREATE=test2=1=2=test_conf

 When indexing the data using DIH I get an error.

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode

for /configs/test2/dataimport.properties


When I create the collection using command line and dont pass the
configname but just the confdir, DIH indexing works.

Using Solr 5.5

Am I missing something??

-- 
Thanks
Jay

Re: Filter queries & caching

2016-05-09 Thread Jay Potharaju

Thanks Ahmet...but I am not still clear how is adding filter() option
better or is it the same as filtercache?

My question is below.

"As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?"

Thanks

On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> As I understand it useful incase you use an OR operator between two
> restricting clauses.
> Recall that multiple fq means implicit AND.
>
> ahmet
>
>
>
> On Monday, May 9, 2016 4:02 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> As mentioned above adding filter() will add the filter query to the cache.
> This would mean that results are fetched from cache instead of running n
> number of filter queries  in parallel.
> Is it necessary to use the filter() option? I was under the impression that
> all filter queries will get added to the "filtercache". What is the
> advantage of using filter()?
>
> *From
> doc:
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> <
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >*
> This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
> sets of all documents that match a query. The numeric attributes control
> the number of entries in the cache.
> Solr uses the filterCache to cache results of queries that use the fq
> search parameter. Subsequent queries using the same parameter setting
> result in cache hits and rapid returns of results. See Searching for a
> detailed discussion of the fq parameter.
>
> *From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
> <http://yonik.com/solr/query-syntax/#FilterQuery>*
>
> (Since Solr 5.4)
>
> A filter query retrieves a set of documents matching a query from the
> filter cache. Since scores are not cached, all documents that match the
> filter produce the same score (0 by default). Cached filters will be
> extremely fast when they are used again in another query.
>
>
> Thanks
>
>
> On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
> > We have high query load and considering that I think the suggestions made
> > above will help with performance.
> > Thanks
> > Jay
> >
> > On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> >> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> >> > With three separate
> >> > fq parameters, you'll get three cache entries in filterCache from the
> >> > one query.
> >>
> >> One more tidbit of information related to this:
> >>
> >> When you have multiple filters and they aren't cached, I am reasonably
> >> certain that they run in parallel.  Instead of one complex filter, you
> >> would have three simple filters running simultaneously.  For low to
> >> medium query loads on a server with a whole bunch of CPUs, where there
> >> is plenty of spare CPU power, this can be a real gain in performance ...
> >> but if the query load is really high, it might be a bad thing.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> > --
> > Thanks
> > Jay Potharaju
>
> >
> >
>
>
>
> --
> Thanks
> Jay Potharaju
>



-- 
Thanks
Jay Potharaju

Re: Filter queries & caching

2016-05-08 Thread Jay Potharaju

As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?

*From
doc: 
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
<https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig>*
This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
sets of all documents that match a query. The numeric attributes control
the number of entries in the cache.
Solr uses the filterCache to cache results of queries that use the fq
search parameter. Subsequent queries using the same parameter setting
result in cache hits and rapid returns of results. See Searching for a
detailed discussion of the fq parameter.

*From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
<http://yonik.com/solr/query-syntax/#FilterQuery>*

(Since Solr 5.4)

A filter query retrieves a set of documents matching a query from the
filter cache. Since scores are not cached, all documents that match the
filter produce the same score (0 by default). Cached filters will be
extremely fast when they are used again in another query.

Thanks

On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju <jspothar...@gmail.com> wrote:

> We have high query load and considering that I think the suggestions made
> above will help with performance.
> Thanks
> Jay
>
> On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
>> > With three separate
>> > fq parameters, you'll get three cache entries in filterCache from the
>> > one query.
>>
>> One more tidbit of information related to this:
>>
>> When you have multiple filters and they aren't cached, I am reasonably
>> certain that they run in parallel.  Instead of one complex filter, you
>> would have three simple filters running simultaneously.  For low to
>> medium query loads on a server with a whole bunch of CPUs, where there
>> is plenty of spare CPU power, this can be a real gain in performance ...
>> but if the query load is really high, it might be a bad thing.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Thanks
> Jay Potharaju
>
>

-- 
Thanks
Jay Potharaju

Re: understanding phonetic matching

2016-05-07 Thread Jay Potharaju

Thanks will check it out.


On Sat, May 7, 2016 at 7:05 PM, Susheel Kumar <susheel2...@gmail.com> wrote:

> Jay,
>
> There are mainly three phonetics algorithms available in Solr i.e.
> RefinedSoundex, DoubleMetaphone & BeiderMorse.  We did extensive comparison
> considering various tests cases and found BeiderMorse to be the best among
> those for finding sound like matches and it also supports multiple
> languages.  We also customized Beider Morse extensively for our use case.
>
> So please take a closer look at Beider Morse and i am sure it will help you
> out.
>
> Thanks,
> Susheel
>
> On Sat, May 7, 2016 at 2:13 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
> > Thanks for the feedback, I was getting correct results when searching for
> > jon & john. But when I tried other names like 'khloe' it matched on
> > 'collier' because the phonetic filter generated KL as the token.
> > Is phonetic filter the best way to find similar sounding names?
> >
> >
> > On Wed, Mar 23, 2016 at 12:01 AM, davidphilip cherian <
> > davidphilipcher...@gmail.com> wrote:
> >
> > > The "phonetic_en" analyzer definition available in solr-schema does
> > return
> > > documents having "Jon", "JN", "John" when search term is "John".
> Checkout
> > > screen shot here : http://imgur.com/0R6SvX2
> > >
> > > This wiki page explains how phonetic matching works :
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching#PhoneticMatching-DoubleMetaphone
> > >
> > >
> > > Hope that helps.
> > >
> > >
> > >
> > > On Wed, Mar 23, 2016 at 11:18 AM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > wrote:
> > >
> > > > I'd start by putting LowerCaseFF before the PhoneticFilter.
> > > >
> > > > But then, you say you were using Analysis screen and what? Do you get
> > > > the matches when you put your sample text and the query text in the
> > > > two boxes in the UI? I am not sure what "look at my solr data" means
> > > > in this particular context.
> > > >
> > > > Regards,
> > > >Alex.
> > > > 
> > > > Newsletter and resources for Solr beginners and intermediates:
> > > > http://www.solr-start.com/
> > > >
> > > >
> > > > On 23 March 2016 at 16:27, Jay Potharaju <jspothar...@gmail.com>
> > wrote:
> > > > > Hi,
> > > > > I am trying to do name matching using the phonetic filter factory.
> As
> > > > part
> > > > > of that I was analyzing the data using analysis screen in solr UI.
> > If i
> > > > > search for john, any documents containing john or jon should be
> > found.
> > > > >
> > > > > Following is my definition of the custom field that I use for
> > indexing
> > > > the
> > > > > data. When I look at my solr data I dont see any similar sounding
> > names
> > > > in
> > > > > my solr data, even though I have set inject="true". Is that not how
> > it
> > > is
> > > > > supposed to work?
> > > > > Can someone explain how phonetic matching works?
> > > > >
> > > > >   > > > positionIncrementGap
> > > > > ="100">
> > > > >
> > > > >  
> > > > >
> > > > > 
> > > > >
> > > > >  > > > encoder="DoubleMetaphone"
> > > > > inject="true" maxCodeLength="5"/>
> > > > >
> > > > > 
> > > > >
> > > > >  
> > > > >
> > > > > 
> > > > >
> > > > > --
> > > > > Thanks
> > > > > Jay
> > > >
> > >
> >
> >
> >
> > --
> > Thanks
> > Jay Potharaju
> >
>



-- 
Thanks
Jay Potharaju

Re: understanding phonetic matching

2016-05-07 Thread Jay Potharaju

Thanks for the feedback, I was getting correct results when searching for
jon & john. But when I tried other names like 'khloe' it matched on
'collier' because the phonetic filter generated KL as the token.
Is phonetic filter the best way to find similar sounding names?


On Wed, Mar 23, 2016 at 12:01 AM, davidphilip cherian <
davidphilipcher...@gmail.com> wrote:

> The "phonetic_en" analyzer definition available in solr-schema does return
> documents having "Jon", "JN", "John" when search term is "John". Checkout
> screen shot here : http://imgur.com/0R6SvX2
>
> This wiki page explains how phonetic matching works :
>
> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching#PhoneticMatching-DoubleMetaphone
>
>
> Hope that helps.
>
>
>
> On Wed, Mar 23, 2016 at 11:18 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> wrote:
>
> > I'd start by putting LowerCaseFF before the PhoneticFilter.
> >
> > But then, you say you were using Analysis screen and what? Do you get
> > the matches when you put your sample text and the query text in the
> > two boxes in the UI? I am not sure what "look at my solr data" means
> > in this particular context.
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 23 March 2016 at 16:27, Jay Potharaju <jspothar...@gmail.com> wrote:
> > > Hi,
> > > I am trying to do name matching using the phonetic filter factory. As
> > part
> > > of that I was analyzing the data using analysis screen in solr UI. If i
> > > search for john, any documents containing john or jon should be found.
> > >
> > > Following is my definition of the custom field that I use for indexing
> > the
> > > data. When I look at my solr data I dont see any similar sounding names
> > in
> > > my solr data, even though I have set inject="true". Is that not how it
> is
> > > supposed to work?
> > > Can someone explain how phonetic matching works?
> > >
> > >   > positionIncrementGap
> > > ="100">
> > >
> > >  
> > >
> > > 
> > >
> > >  > encoder="DoubleMetaphone"
> > > inject="true" maxCodeLength="5"/>
> > >
> > > 
> > >
> > >  
> > >
> > > 
> > >
> > > --
> > > Thanks
> > > Jay
> >
>



-- 
Thanks
Jay Potharaju

Re: Filter queries & caching

2016-05-06 Thread Jay Potharaju

We have high query load and considering that I think the suggestions made
above will help with performance.
Thanks
Jay

On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> > With three separate
> > fq parameters, you'll get three cache entries in filterCache from the
> > one query.
>
> One more tidbit of information related to this:
>
> When you have multiple filters and they aren't cached, I am reasonably
> certain that they run in parallel.  Instead of one complex filter, you
> would have three simple filters running simultaneously.  For low to
> medium query loads on a server with a whole bunch of CPUs, where there
> is plenty of spare CPU power, this can be a real gain in performance ...
> but if the query load is really high, it might be a bad thing.
>
> Thanks,
> Shawn
>
>


-- 
Thanks
Jay Potharaju

Re: Filter queries & caching

2016-05-06 Thread Jay Potharaju

Thanks Shawn,Erick & Ahmet , this was very helpful. 

> On May 6, 2016, at 6:19 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 5/5/2016 2:44 PM, Jay Potharaju wrote:
>> Are you suggesting rewriting it like this ?
>> fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
>> fq=filter(type:abc)
>> 
>> Is this a better use of the cache as supposed to fq=fromfield:[* TO
>> NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"
> 
> I keep writing emails and forgetting to send them.  Supplementing the
> excellent information you've already gotten:
> 
> Because all three clauses are ANDed together, what I would suggest doing
> is three filter queries:
> 
> fq=fromfield:[* TO NOW/DAY+1DAY]
> fq=tofield:[NOW/DAY-7DAY TO *]
> fq=type:abc
> 
> Whether or not to split your fq like this will depend on how you use
> filters, and how much memory you can let them use.  With three separate
> fq parameters, you'll get three cache entries in filterCache from the
> one query.  If the next query changes only one of those filters to
> something that's not in the cache yet, but leaves the other two alone,
> then Solr can get the results from the cache for two of them, and then
> will only need to run the query for one of them, saving time and system
> resources.
> 
> I removed the quotes from "abc" because for that specific example,
> quotes are not necessary.  For more complex information than abc, quotes
> might be important.  Experiment, and use what gets you the results you want.
> 
> Thanks,
> Shawn
>

Re: Filter queries & caching

2016-05-05 Thread Jay Potharaju

I have almost 50 million docs and growing ...that being said in a high
query volume case does it make sense to use

 fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *]  &&
type:"abc")

OR
fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
fq=filter(type:abc)

Is this something that I would need to determine by running some test
Thanks

On Thu, May 5, 2016 at 1:44 PM, Jay Potharaju <jspothar...@gmail.com> wrote:

> Are you suggesting rewriting it like this ?
> fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
> fq=filter(type:abc)
>
> Is this a better use of the cache as supposed to fq=fromfield:[* TO
> NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"
>
> Thanks
>
> On Thu, May 5, 2016 at 12:50 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
>
>> Hi,
>>
>> Cache enemy is not * but NOW. Since you round it to DAY, cache will work
>> within-day.
>> I would use separate filer queries, especially fq=type:abc for the
>> structured query so it will be cached independently.
>>
>> Also consider disabling caching (using cost) in expensive queries:
>> http://yonik.com/advanced-filter-caching-in-solr/
>>
>> Ahmet
>>
>>
>>
>> On Thursday, May 5, 2016 8:25 PM, Jay Potharaju <jspothar...@gmail.com>
>> wrote:
>> Hi,
>> I have a filter query that gets  documents based on date ranges from last
>> n
>> days to anytime in future.
>>
>> The objective is to get documents between a date range, but the start date
>> and end date values are stored in different fields and that is why I wrote
>> the filter query as below
>>
>> fq=fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] &&
>> type:"abc"
>>
>> The way these queries are currently written I think wont leverage the
>> filter cache because of "*". Is there a better way to write this query so
>> that I can leverage the cache.
>>
>>
>>
>> --
>> Thanks
>> Jay
>>
>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju

Re: Filter queries & caching

2016-05-05 Thread Jay Potharaju

Are you suggesting rewriting it like this ?
fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
fq=filter(type:abc)

Is this a better use of the cache as supposed to fq=fromfield:[* TO
NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"

Thanks

On Thu, May 5, 2016 at 12:50 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Cache enemy is not * but NOW. Since you round it to DAY, cache will work
> within-day.
> I would use separate filer queries, especially fq=type:abc for the
> structured query so it will be cached independently.
>
> Also consider disabling caching (using cost) in expensive queries:
> http://yonik.com/advanced-filter-caching-in-solr/
>
> Ahmet
>
>
>
> On Thursday, May 5, 2016 8:25 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
> Hi,
> I have a filter query that gets  documents based on date ranges from last n
> days to anytime in future.
>
> The objective is to get documents between a date range, but the start date
> and end date values are stored in different fields and that is why I wrote
> the filter query as below
>
> fq=fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] &&
> type:"abc"
>
> The way these queries are currently written I think wont leverage the
> filter cache because of "*". Is there a better way to write this query so
> that I can leverage the cache.
>
>
>
> --
> Thanks
> Jay
>



-- 
Thanks
Jay Potharaju

Filter queries & caching

2016-05-05 Thread Jay Potharaju

Hi,
I have a filter query that gets  documents based on date ranges from last n
days to anytime in future.

The objective is to get documents between a date range, but the start date
and end date values are stored in different fields and that is why I wrote
the filter query as below

fq=fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"

The way these queries are currently written I think wont leverage the
filter cache because of "*". Is there a better way to write this query so
that I can leverage the cache.



-- 
Thanks
Jay

Re: query action with wrong result size zero

2016-05-05 Thread Jay Potharaju

Can you check if the field you are searching on is case sensitive? You can
quickly test it by copying the exact contents of the brand field into your
query and comparing it against the query you have posted above.

On Thu, May 5, 2016 at 8:57 AM, mixiangliu <852262...@qq.com> wrote:

>
> i found a strange thing  with solr query，when i set the value of query
> field like "brand:amd"，the  size of query result is zero,but the real data
> is not zero，can  some body tell me why，thank you very much！！
> my english is not very good，wish some body understand my words!
>

-- 
Thanks
Jay Potharaju

Using updateRequest Processor with DIH

2016-05-01 Thread Jay Potharaju

Hi,
I was wondering if it is possible to use Update Request Processor with DIH.
I would like to update an index_time field whenever documents are
added/updated in the collection.
I know that I could easily pass a time stamp which would update the field
in my collection but I was trying to do it using Request processor.

I tried the following but got an error. Any recommendations on how to use
this correctly?



index_time




  data-config.xml
update_indextime



Error:
Error from server at unknown UpdateRequestProcessorChain: update_indextime

-- 
Thanks
Jay

1 2 >

1 - 100 of 143 matches

Mail list logo