Help understanding with Arbitrary Rectangle

2020-12-18 Thread hank
Hi,
I'm trying to search within a rectangle using the Filtering by an Arbitrary 
Rectangle method, by query below doesn't return any results, I know that items 
are present within that area.

http://drupalvm.local:8983/solr/cars/select?=:=locs_computed_location:[51.996461924257,-4.770558354898171%20TO%2052.094510963399976,-4.489583013101296]=json

In solr I have:

Field: locs_computed_location
Field-Type:org.apache.solr.schema.LatLonType

I know the documentation mentions this method does not support rectangles that 
cross the dateline, but shouldn't be an issue here. I'm a bit lost as to why 
this isn't working, there are no errors being shown to help debug this. Any 
ideas what or what isn't going on will be welcomed.

Thanks


Re: Need help to configure automated deletion of shard in solr

2020-12-08 Thread Pushkar Mishra
Hi Erick,

COLSTATUS does not work with Implicit router type collection  . Is there
any way to get the replica detail ?

Regards

On Mon, Nov 30, 2020 at 8:48 PM Erick Erickson 
wrote:

> Are you using the implicit router? Otherwise you cannot delete a shard.
> And you won’t have any shards that have zero documents anyway.
>
> It’d be a little convoluted, but you could use the collections COLSTATUS
> Api to
> find the names of all your replicas. Then query _one_ replica of each
> shard with something like
> solr/collection1_shard1_replica_n1/q=*:*=false
>
> that’ll return the number of live docs (i.e. non-deleted docs) and if it’s
> zero
> you can delete the shard.
>
> But the implicit router requires you take complete control of where
> documents
> go, i.e. which shard they land on.
>
> This really sounds like an XY problem. What’s the use  case you’re trying
> to support where you expect a shard’s number of live docs to drop to zero?
>
> Best,
> Erick
>
> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
> wrote:
> >
> > Hi Solr team,
> >
> > I am using solr cloud.(version 8.5.x). I have a need to find out a
> > configuration where I can delete a shard , when number of documents
> reaches
> > to zero in the shard , can some one help me out to achieve that ?
> >
> >
> > It is urgent , so a quick response will be highly appreciated .
> >
> > Thanks
> > Pushkar
> >
> > --
> > Pushkar Kumar Mishra
> > "Reactions are always instinctive whereas responses are always well
> thought
> > of... So start responding rather than reacting in life"
>
>

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"


Re: Need help to configure automated deletion of shard in solr

2020-12-02 Thread Erick Erickson
You can certainly use the TTL logic. Note the TimeRoutedAlias, but
the DocExpirationUpdateFactory. DocExpirationUpdateFactory
operates on each document individually so you can mix-n-match
if you want.

As for knowing when a shard is empty, I suggested a method for that
in one of the earlier e-mails.

If you have a collection per customer, and assuming that a customer
has the same retention policy for all docs, then TimeRoutedAlias would
work.

Best,
Erick

> On Dec 2, 2020, at 12:19 AM, Pushkar Mishra  wrote:
> 
> Hi Erick,
> It is implicit.
> TTL thing I have explored but due to some complications we can't use. that .
> Let me explain the actual use case .
> 
> We have limited space ,we can't keep storing the document for infinite
> time  . So based on the customer's retention policy ,I need to delete the
> documents. And in this process  if any shard gets empty , need to delete
> the shard as well.
> 
> So lets say , is there a way to know, when solr completes the purging of
> deleted documents, then based on that flag we can configure shard deletion
> 
> Thanks
> Pushkar
> 
> On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson 
> wrote:
> 
>> This is still confusing. You haven’t told us what router you are using,
>> compositeId or implicit?
>> 
>> If you’re using compositeId (the default), you will never have empty shards
>> because docs get assigned to shards via a hashing algorithm that
>> distributes
>> them very evenly across all available shards. You cannot delete any
>> shard when using compositeId as your routing method.
>> 
>> If you don’t know which router you’re using, then you’re using compositeId.
>> 
>> NOTE: for the rest, “documents” means non-deleted documents. Solr will
>> take care of purging the deleted documents automatically.
>> 
>> I think you’re making this much more difficult than you need to. Assuming
>> that the total number of documents remains relatively constant, you can
>> just
>> let Solr take care of it all and not bother with trying to individually
>> manage
>> shards by using the default compositeID routing.
>> 
>> If the number of docs increases you might need to use splitshard. But it
>> sounds like the total number of “live” documents isn’t going to increase.
>> 
>> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
>> after,
>> say, 30 dayswhich it doesn’t sound like you do, you can use
>> the “Time Routed Alias” option, see:
>> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>> 
>> Assuming your TTL isn’t a fixed-interval, you can configure
>> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>> 
>> And if you still think you need to handle this, you need to explain exactly
>> what problem you’re trying to solve because so far it appears that
>> you’re simply taking on way more work than you need to.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra 
>> wrote:
>>> 
>>> Hi Team,
>>> As I explained the use case , can someone help me out to find out the
>>> configuration way to delete the shard here ?
>>> A quick response  will be greatly appreciated.
>>> 
>>> Regards
>>> Pushkar
>>> 
>>> 
>>> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
>>>> wrote:
>>>> 
>>>>> Hi Erick,
>>>>> First of all thanks for your response . I will check the possibility  .
>>>>> Let me explain my problem  in detail :
>>>>> 
>>>>> 1. We have other use cases where we are making use of listener on
>>>>> postCommit to delete/shift/split the shards . So we have capability to
>>>>> delete the shards .
>>>>> 2. The current use case is , where we have to delete the documents from
>>>>> the shard , and during deletion process(it will be scheduled process,
>> may
>>>>> be hourly or daily, which will delete the documents) , if shards  gets
>>>>> empty (or may be lets  say nominal documents are left ) , then delete
>> the
>>>>> shard.  And I am exploring to do this using configuration .
>>>>> 
>>>> 3. Also it will not be in live shard for sure as only those documents
>> are
>>>> deleted which have TTL got over . TTL could be a month or year.
>>>> 
>>>> Please assist if you have any config based idea on 

Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Pushkar Mishra
Hi Erick,
It is implicit.
TTL thing I have explored but due to some complications we can't use. that .
Let me explain the actual use case .

We have limited space ,we can't keep storing the document for infinite
time  . So based on the customer's retention policy ,I need to delete the
documents. And in this process  if any shard gets empty , need to delete
the shard as well.

So lets say , is there a way to know, when solr completes the purging of
deleted documents, then based on that flag we can configure shard deletion

Thanks
Pushkar

On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson 
wrote:

> This is still confusing. You haven’t told us what router you are using,
> compositeId or implicit?
>
> If you’re using compositeId (the default), you will never have empty shards
> because docs get assigned to shards via a hashing algorithm that
> distributes
> them very evenly across all available shards. You cannot delete any
> shard when using compositeId as your routing method.
>
> If you don’t know which router you’re using, then you’re using compositeId.
>
> NOTE: for the rest, “documents” means non-deleted documents. Solr will
> take care of purging the deleted documents automatically.
>
> I think you’re making this much more difficult than you need to. Assuming
> that the total number of documents remains relatively constant, you can
> just
> let Solr take care of it all and not bother with trying to individually
> manage
> shards by using the default compositeID routing.
>
> If the number of docs increases you might need to use splitshard. But it
> sounds like the total number of “live” documents isn’t going to increase.
>
> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
> after,
> say, 30 dayswhich it doesn’t sound like you do, you can use
> the “Time Routed Alias” option, see:
> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>
> Assuming your TTL isn’t a fixed-interval, you can configure
> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>
> And if you still think you need to handle this, you need to explain exactly
> what problem you’re trying to solve because so far it appears that
> you’re simply taking on way more work than you need to.
>
> Best,
> Erick
>
> > On Dec 1, 2020, at 9:46 AM, Pushkar Mishra 
> wrote:
> >
> > Hi Team,
> > As I explained the use case , can someone help me out to find out the
> > configuration way to delete the shard here ?
> > A quick response  will be greatly appreciated.
> >
> > Regards
> > Pushkar
> >
> >
> > On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
> > wrote:
> >
> >>
> >>
> >> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
> >> wrote:
> >>
> >>> Hi Erick,
> >>> First of all thanks for your response . I will check the possibility  .
> >>> Let me explain my problem  in detail :
> >>>
> >>> 1. We have other use cases where we are making use of listener on
> >>> postCommit to delete/shift/split the shards . So we have capability to
> >>> delete the shards .
> >>> 2. The current use case is , where we have to delete the documents from
> >>> the shard , and during deletion process(it will be scheduled process,
> may
> >>> be hourly or daily, which will delete the documents) , if shards  gets
> >>> empty (or may be lets  say nominal documents are left ) , then delete
> the
> >>> shard.  And I am exploring to do this using configuration .
> >>>
> >> 3. Also it will not be in live shard for sure as only those documents
> are
> >> deleted which have TTL got over . TTL could be a month or year.
> >>
> >> Please assist if you have any config based idea on this
> >>
> >>> Regards
> >>> Pushkar
> >>>
> >>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
> >>> wrote:
> >>>
> >>>> Are you using the implicit router? Otherwise you cannot delete a
> shard.
> >>>> And you won’t have any shards that have zero documents anyway.
> >>>>
> >>>> It’d be a little convoluted, but you could use the collections
> COLSTATUS
> >>>> Api to
> >>>> find the names of all your replicas. Then query _one_ replica of each
> >>>> shard with something like
> >>>> solr/collection1_shard1_replica_n1/q=*:*=false
> >>>>
> >>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
> >>>> it’s zero
> >>>> you

Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Erick Erickson
This is still confusing. You haven’t told us what router you are using, 
compositeId or implicit?

If you’re using compositeId (the default), you will never have empty shards
because docs get assigned to shards via a hashing algorithm that distributes
them very evenly across all available shards. You cannot delete any
shard when using compositeId as your routing method.

If you don’t know which router you’re using, then you’re using compositeId.

NOTE: for the rest, “documents” means non-deleted documents. Solr will
take care of purging the deleted documents automatically.

I think you’re making this much more difficult than you need to. Assuming
that the total number of documents remains relatively constant, you can just
let Solr take care of it all and not bother with trying to individually manage
shards by using the default compositeID routing.

If the number of docs increases you might need to use splitshard. But it
sounds like the total number of “live” documents isn’t going to increase.

For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire after,
say, 30 dayswhich it doesn’t sound like you do, you can use
the “Time Routed Alias” option, see:
https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html

Assuming your TTL isn’t a fixed-interval, you can configure
DocExpirationUpdateProcessorFactory to deal with TTL automatically.

And if you still think you need to handle this, you need to explain exactly
what problem you’re trying to solve because so far it appears that 
you’re simply taking on way more work than you need to.

Best,
Erick

> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra  wrote:
> 
> Hi Team,
> As I explained the use case , can someone help me out to find out the
> configuration way to delete the shard here ?
> A quick response  will be greatly appreciated.
> 
> Regards
> Pushkar
> 
> 
> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
> wrote:
> 
>> 
>> 
>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
>> wrote:
>> 
>>> Hi Erick,
>>> First of all thanks for your response . I will check the possibility  .
>>> Let me explain my problem  in detail :
>>> 
>>> 1. We have other use cases where we are making use of listener on
>>> postCommit to delete/shift/split the shards . So we have capability to
>>> delete the shards .
>>> 2. The current use case is , where we have to delete the documents from
>>> the shard , and during deletion process(it will be scheduled process, may
>>> be hourly or daily, which will delete the documents) , if shards  gets
>>> empty (or may be lets  say nominal documents are left ) , then delete the
>>> shard.  And I am exploring to do this using configuration .
>>> 
>> 3. Also it will not be in live shard for sure as only those documents are
>> deleted which have TTL got over . TTL could be a month or year.
>> 
>> Please assist if you have any config based idea on this
>> 
>>> Regards
>>> Pushkar
>>> 
>>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
>>> wrote:
>>> 
>>>> Are you using the implicit router? Otherwise you cannot delete a shard.
>>>> And you won’t have any shards that have zero documents anyway.
>>>> 
>>>> It’d be a little convoluted, but you could use the collections COLSTATUS
>>>> Api to
>>>> find the names of all your replicas. Then query _one_ replica of each
>>>> shard with something like
>>>> solr/collection1_shard1_replica_n1/q=*:*=false
>>>> 
>>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>>> it’s zero
>>>> you can delete the shard.
>>>> 
>>>> But the implicit router requires you take complete control of where
>>>> documents
>>>> go, i.e. which shard they land on.
>>>> 
>>>> This really sounds like an XY problem. What’s the use  case you’re trying
>>>> to support where you expect a shard’s number of live docs to drop to
>>>> zero?
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>>>> wrote:
>>>>> 
>>>>> Hi Solr team,
>>>>> 
>>>>> I am using solr cloud.(version 8.5.x). I have a need to find out a
>>>>> configuration where I can delete a shard , when number of documents
>>>> reaches
>>>>> to zero in the shard , can some one help me out to achieve that ?
>>>>> 
>>>>> 
>>>>> It is urgent , so a quick response will be highly appreciated .
>>>>> 
>>>>> Thanks
>>>>> Pushkar
>>>>> 
>>>>> --
>>>>> Pushkar Kumar Mishra
>>>>> "Reactions are always instinctive whereas responses are always well
>>>> thought
>>>>> of... So start responding rather than reacting in life"
>>>> 
>>>> 
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"



Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Pushkar Mishra
Hi Team,
As I explained the use case , can someone help me out to find out the
configuration way to delete the shard here ?
A quick response  will be greatly appreciated.

Regards
Pushkar


On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
wrote:

>
>
> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
> wrote:
>
>> Hi Erick,
>> First of all thanks for your response . I will check the possibility  .
>> Let me explain my problem  in detail :
>>
>> 1. We have other use cases where we are making use of listener on
>> postCommit to delete/shift/split the shards . So we have capability to
>> delete the shards .
>> 2. The current use case is , where we have to delete the documents from
>> the shard , and during deletion process(it will be scheduled process, may
>> be hourly or daily, which will delete the documents) , if shards  gets
>> empty (or may be lets  say nominal documents are left ) , then delete the
>> shard.  And I am exploring to do this using configuration .
>>
> 3. Also it will not be in live shard for sure as only those documents are
> deleted which have TTL got over . TTL could be a month or year.
>
> Please assist if you have any config based idea on this
>
>> Regards
>> Pushkar
>>
>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
>> wrote:
>>
>>> Are you using the implicit router? Otherwise you cannot delete a shard.
>>> And you won’t have any shards that have zero documents anyway.
>>>
>>> It’d be a little convoluted, but you could use the collections COLSTATUS
>>> Api to
>>> find the names of all your replicas. Then query _one_ replica of each
>>> shard with something like
>>> solr/collection1_shard1_replica_n1/q=*:*=false
>>>
>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>> it’s zero
>>> you can delete the shard.
>>>
>>> But the implicit router requires you take complete control of where
>>> documents
>>> go, i.e. which shard they land on.
>>>
>>> This really sounds like an XY problem. What’s the use  case you’re trying
>>> to support where you expect a shard’s number of live docs to drop to
>>> zero?
>>>
>>> Best,
>>> Erick
>>>
>>> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>>> wrote:
>>> >
>>> > Hi Solr team,
>>> >
>>> > I am using solr cloud.(version 8.5.x). I have a need to find out a
>>> > configuration where I can delete a shard , when number of documents
>>> reaches
>>> > to zero in the shard , can some one help me out to achieve that ?
>>> >
>>> >
>>> > It is urgent , so a quick response will be highly appreciated .
>>> >
>>> > Thanks
>>> > Pushkar
>>> >
>>> > --
>>> > Pushkar Kumar Mishra
>>> > "Reactions are always instinctive whereas responses are always well
>>> thought
>>> > of... So start responding rather than reacting in life"
>>>
>>>

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"


Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra  wrote:

> Hi Erick,
> First of all thanks for your response . I will check the possibility  .
> Let me explain my problem  in detail :
>
> 1. We have other use cases where we are making use of listener on
> postCommit to delete/shift/split the shards . So we have capability to
> delete the shards .
> 2. The current use case is , where we have to delete the documents from
> the shard , and during deletion process(it will be scheduled process, may
> be hourly or daily, which will delete the documents) , if shards  gets
> empty (or may be lets  say nominal documents are left ) , then delete the
> shard.  And I am exploring to do this using configuration .
>
3. Also it will not be in live shard for sure as only those documents are
deleted which have TTL got over . TTL could be a month or year.

Please assist if you have any config based idea on this

> Regards
> Pushkar
>
> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
> wrote:
>
>> Are you using the implicit router? Otherwise you cannot delete a shard.
>> And you won’t have any shards that have zero documents anyway.
>>
>> It’d be a little convoluted, but you could use the collections COLSTATUS
>> Api to
>> find the names of all your replicas. Then query _one_ replica of each
>> shard with something like
>> solr/collection1_shard1_replica_n1/q=*:*=false
>>
>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>> it’s zero
>> you can delete the shard.
>>
>> But the implicit router requires you take complete control of where
>> documents
>> go, i.e. which shard they land on.
>>
>> This really sounds like an XY problem. What’s the use  case you’re trying
>> to support where you expect a shard’s number of live docs to drop to zero?
>>
>> Best,
>> Erick
>>
>> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>> wrote:
>> >
>> > Hi Solr team,
>> >
>> > I am using solr cloud.(version 8.5.x). I have a need to find out a
>> > configuration where I can delete a shard , when number of documents
>> reaches
>> > to zero in the shard , can some one help me out to achieve that ?
>> >
>> >
>> > It is urgent , so a quick response will be highly appreciated .
>> >
>> > Thanks
>> > Pushkar
>> >
>> > --
>> > Pushkar Kumar Mishra
>> > "Reactions are always instinctive whereas responses are always well
>> thought
>> > of... So start responding rather than reacting in life"
>>
>>


Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
Hi Erick,
First of all thanks for your response . I will check the possibility  .
Let me explain my problem  in detail :

1. We have other use cases where we are making use of listener on
postCommit to delete/shift/split the shards . So we have capability to
delete the shards .
2. The current use case is , where we have to delete the documents from the
shard , and during deletion process(it will be scheduled process, may be
hourly or daily, which will delete the documents) , if shards  gets empty
(or may be lets  say nominal documents are left ) , then delete the shard.
And I am exploring to do this using configuration .
Regards
Pushkar

On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
wrote:

> Are you using the implicit router? Otherwise you cannot delete a shard.
> And you won’t have any shards that have zero documents anyway.
>
> It’d be a little convoluted, but you could use the collections COLSTATUS
> Api to
> find the names of all your replicas. Then query _one_ replica of each
> shard with something like
> solr/collection1_shard1_replica_n1/q=*:*=false
>
> that’ll return the number of live docs (i.e. non-deleted docs) and if it’s
> zero
> you can delete the shard.
>
> But the implicit router requires you take complete control of where
> documents
> go, i.e. which shard they land on.
>
> This really sounds like an XY problem. What’s the use  case you’re trying
> to support where you expect a shard’s number of live docs to drop to zero?
>
> Best,
> Erick
>
> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
> wrote:
> >
> > Hi Solr team,
> >
> > I am using solr cloud.(version 8.5.x). I have a need to find out a
> > configuration where I can delete a shard , when number of documents
> reaches
> > to zero in the shard , can some one help me out to achieve that ?
> >
> >
> > It is urgent , so a quick response will be highly appreciated .
> >
> > Thanks
> > Pushkar
> >
> > --
> > Pushkar Kumar Mishra
> > "Reactions are always instinctive whereas responses are always well
> thought
> > of... So start responding rather than reacting in life"
>
>


Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Erick Erickson
Are you using the implicit router? Otherwise you cannot delete a shard.
And you won’t have any shards that have zero documents anyway.

It’d be a little convoluted, but you could use the collections COLSTATUS Api to
find the names of all your replicas. Then query _one_ replica of each
shard with something like
solr/collection1_shard1_replica_n1/q=*:*=false

that’ll return the number of live docs (i.e. non-deleted docs) and if it’s zero
you can delete the shard.

But the implicit router requires you take complete control of where documents
go, i.e. which shard they land on.

This really sounds like an XY problem. What’s the use  case you’re trying
to support where you expect a shard’s number of live docs to drop to zero?

Best,
Erick

> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra  wrote:
> 
> Hi Solr team,
> 
> I am using solr cloud.(version 8.5.x). I have a need to find out a
> configuration where I can delete a shard , when number of documents reaches
> to zero in the shard , can some one help me out to achieve that ?
> 
> 
> It is urgent , so a quick response will be highly appreciated .
> 
> Thanks
> Pushkar
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"



Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
Hi Solr team,

I am using solr cloud.(version 8.5.x). I have a need to find out a
configuration where I can delete a shard , when number of documents reaches
to zero in the shard , can some one help me out to achieve that ?


It is urgent , so a quick response will be highly appreciated .

Thanks
Pushkar

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"


Re: security.json help

2020-11-25 Thread Jason Gerlowski
Hi Mark,

It looks like you're using the "path" wildcard as it's intended, but
some bug is causing the behavior you're seeing.  It should be working
as you expected, but evidently it's not.

One potential workaround might be to leave out the "path" property
entirely in your "custom-example" permission.  When I do that (on Solr
8.6.2), I get the following behavior in the following pastebin link,
which looks close to what you're after: https://paste.apache.org/ygndt

Hope that helps!

Jason

On Mon, Oct 19, 2020 at 3:49 PM Mark Dadisman
 wrote:
>
> Hey, I'm new to configuring Solr. I'm trying to configure Solr with Rule 
> Based Authorization. 
> https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html
>
> I have permissions working if I allow everything with "all", but I want to 
> limit access so that a site can only access its own collection, in addition 
> to a server ping path, so I'm trying to add the collection-specific 
> permission at the top:
>
> "permissions": [
>   {
> "name": "custom-example",
> "collection": "example",
> "path": "*",
> "role": [
>   "admin",
>   "example"
> ]
>   },
>   {
> "name": "custom-collection",
> "collection": "*",
> "path": [
>   "/admin/luke",
>   "/admin/mbeans",
>   "/admin/system"
> ],
> "role": "*"
>   },
>   {
> "name": "custom-ping",
> "collection": null,
> "path": [
>   "/admin/info/system"
> ],
> "role": "*"
>   },
>   {
> "name": "all",
> "role": "admin"
>   }
> ]
>
> The rule "custom-ping" works, and "all" works. But when the above permissions 
> are used, access is denied to the "example" user-role for collection 
> "example" at the path "/solr/example/select". If I specify paths explicitly, 
> the permissions work, but I can't get permissions to work with path wildcards 
> for a specific collection.
>
> I also had to declare "custom-collection" with the specific paths needed to 
> get collection info in order for those paths to work. I would've expected 
> that these paths would be included in the collection-specific paths and be 
> covered by the first rule, but they aren't. For example, the call to 
> "/solr/example/admin/luke" will fail if the path is removed from this rule.
>
> I don't really want to specify every single path I might need to use. Am I 
> using the path wildcard wrong somehow? Is there a better way to do 
> collection-specific authorizations for a collection "example"?
>
> Thanks.
> - M
>


Re: Need help to resolve Apache Solr vulnerability

2020-11-12 Thread Dave
Solr isn’t meant to be public facing. Not sure how anyone would send these 
commands since it can’t be reached from the outside world 

> On Nov 12, 2020, at 7:12 AM, Sheikh, Wasim A. 
>  wrote:
> 
> Hi Team,
> 
> Currently we are facing the below vulnerability for Apache Solr tool. So can 
> you please check the below details and help us to fix this issue.
> 
> /etc/init.d/solr-master version
> 
> Server version: Apache Tomcat/7.0.62
> Server built: May 7 2015 17:14:55 UTC
> Server number: 7.0.62.0
> OS Name: Linux
> OS Version: 2.6.32-431.29.2.el6.x86_64
> Architecture: amd64
> JVM Version: 1.8.0_20-b26
> JVM Vendor: Oracle Corporation
> 
> 
> solr-spec-version:4.10.4,
> Solr is an enterprise search platform.
> Solr is prone to remote code execution vulnerability.
> 
> Affected Versions:
> Apache Solr version prior to 6.6.2 and prior to 7.1.0
> 
> QID Detection Logic (Unauthenticated):
> This QID sends specifically crafted request which include special entities in 
> the xml document and looks for the vulnerable response.
> Alternatively, in another check, this QID matches vulnerable versions in the 
> response webpage
> Successful exploitation allows attacker to execute arbitrary code.
> The vendor has issued updated packages to fix this vulnerability. For more 
> information about the vulnerability and obtaining patches, refer to the 
> following Fedora security advisories : HREF="https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache Solr 
> 6.6.2 For more information regarding the update can be found at  HREF="https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache Solr  
> 7.1.0.
> 
> 
> 
> 
> 
> 
> 
> Patch:
> Following are links for downloading patches to fix the vulnerabilities:
>  https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache 
> Solr 6.6.2 https://lucene.apache.org/solr/news.html; 
> TARGET="_blank">Apache Solr 7.1.0
> 
> 
> Thanks...
> Wasim Shaikh
> 
> 
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. Your privacy is important to us. Accenture uses your 
> personal data only in compliance with data protection laws. For further 
> information on how Accenture processes your personal data, please see our 
> privacy statement at https://www.accenture.com/us-en/privacy-policy.
> __
> 
> www.accenture.com


Need help to resolve Apache Solr vulnerability

2020-11-12 Thread Sheikh, Wasim A.
Hi Team,

Currently we are facing the below vulnerability for Apache Solr tool. So can 
you please check the below details and help us to fix this issue.

/etc/init.d/solr-master version

Server version: Apache Tomcat/7.0.62
Server built: May 7 2015 17:14:55 UTC
Server number: 7.0.62.0
OS Name: Linux
OS Version: 2.6.32-431.29.2.el6.x86_64
Architecture: amd64
JVM Version: 1.8.0_20-b26
JVM Vendor: Oracle Corporation


solr-spec-version:4.10.4,
Solr is an enterprise search platform.
Solr is prone to remote code execution vulnerability.

Affected Versions:
Apache Solr version prior to 6.6.2 and prior to 7.1.0

QID Detection Logic (Unauthenticated):
This QID sends specifically crafted request which include special entities in 
the xml document and looks for the vulnerable response.
Alternatively, in another check, this QID matches vulnerable versions in the 
response webpage
Successful exploitation allows attacker to execute arbitrary code.
The vendor has issued updated packages to fix this vulnerability. For more 
information about the vulnerability and obtaining patches, refer to the 
following Fedora security advisories :https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache Solr 
6.6.2 For more information regarding the update can be found at https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache Solr  
7.1.0.







Patch:
Following are links for downloading patches to fix the vulnerabilities:
 https://lucene.apache.org/solr/news.html; TARGET="_blank">Apache 
Solr 6.6.2 https://lucene.apache.org/solr/news.html; 
TARGET="_blank">Apache Solr 7.1.0


Thanks...
Wasim Shaikh



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy. Your privacy is important to us. Accenture uses your personal data only 
in compliance with data protection laws. For further information on how 
Accenture processes your personal data, please see our privacy statement at 
https://www.accenture.com/us-en/privacy-policy.
__

www.accenture.com


Re: Need help in understanding the below error message when running solr-exporter

2020-10-19 Thread yaswanth kumar
Can someone help on the above pls??

On Sat, Oct 17, 2020 at 6:22 AM yaswanth kumar 
wrote:

> Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic
> Authentication: Enabled
>
> I am trying to run the
>
> export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks
> -Djavax.net.ssl.trustStorePassword=solrssl
> -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
> -Dbasicauth=solrrocks:"
>
> export
> CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar"
>
> /bin/solr-exporter -p 8085 -z localhost:2181/solr -f
> ./conf/solr-exporter-config.xml -n 16
>
> and seeing these below messages and on the grafana solr dashboard I do see
> panels coming in but data is not populating on them.
>
> Can someone help me if I am missing something interms of configuration?
>
> WARN  - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async;
> Error occurred during metrics collection =>
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.lang.NullPointerException
> at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.lang.NullPointerException
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> ~[?:?]
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> ~[?:?]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
> [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown
> Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> [?:?]
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> [?:?]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
> [?:?]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?]
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> [?:?]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
> [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown
> Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
> [?:?]
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
> [?:?]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> [solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 2019-07-19 15:11:07]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
> Source) [solr-solrj-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92)
> ~[solr-pr

security.json help

2020-10-19 Thread Mark Dadisman
Hey, I'm new to configuring Solr. I'm trying to configure Solr with Rule Based 
Authorization. 
https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html

I have permissions working if I allow everything with "all", but I want to 
limit access so that a site can only access its own collection, in addition to 
a server ping path, so I'm trying to add the collection-specific permission at 
the top:

"permissions": [
  {
"name": "custom-example",
"collection": "example",
"path": "*",
"role": [
  "admin",
  "example"
]
  },
  {
"name": "custom-collection",
"collection": "*",
"path": [
  "/admin/luke",
  "/admin/mbeans",
  "/admin/system"
],
"role": "*"
  },
  {
"name": "custom-ping",
"collection": null,
"path": [
  "/admin/info/system"
],
"role": "*"
  },
  {
"name": "all",
"role": "admin"
  }
]

The rule "custom-ping" works, and "all" works. But when the above permissions 
are used, access is denied to the "example" user-role for collection "example" 
at the path "/solr/example/select". If I specify paths explicitly, the 
permissions work, but I can't get permissions to work with path wildcards for a 
specific collection.

I also had to declare "custom-collection" with the specific paths needed to get 
collection info in order for those paths to work. I would've expected that 
these paths would be included in the collection-specific paths and be covered 
by the first rule, but they aren't. For example, the call to 
"/solr/example/admin/luke" will fail if the path is removed from this rule.

I don't really want to specify every single path I might need to use. Am I 
using the path wildcard wrong somehow? Is there a better way to do 
collection-specific authorizations for a collection "example"?

Thanks.
- M



Need help in understanding the below error message when running solr-exporter

2020-10-17 Thread yaswanth kumar
Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic
Authentication: Enabled

I am trying to run the

export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks
-Djavax.net.ssl.trustStorePassword=solrssl
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
-Dbasicauth=solrrocks:"

export
CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar"

/bin/solr-exporter -p 8085 -z localhost:2181/solr -f
./conf/solr-exporter-config.xml -n 16

and seeing these below messages and on the grafana solr dashboard I do see
panels coming in but data is not populating on them.

Can someone help me if I am missing something interms of configuration?

WARN  - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async;
Error occurred during metrics collection =>
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.lang.NullPointerException
at
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.lang.NullPointerException
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
~[?:?]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
~[?:?]
at
org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown
Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
[?:?]
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
[?:?]
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
[?:?]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?]
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487)
[?:?]
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[?:?]
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?]
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
at
org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown
Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
[?:?]
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
[?:?]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
[?:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
Source) [solr-solrj-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92)
~[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$163/.get(Unknown
Source) ~[?:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
~[?:?]
... 5 more
Caused by: java.lang.NullPointerException
at
org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:112)
~[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - iver

Re: Need urgent help -- High cpu on solr

2020-10-16 Thread Rahul Goswami
In addition to the insightful pointers by Zisis and Erick, I would like to
mention an approach in the link below that I generally use to pinpoint
exactly which threads are causing the CPU spike. Knowing this you can
understand which aspect of Solr (search thread, GC, update thread etc) is
taking more CPU and develop a mitigation strategy accordingly. (eg: if it's
a GC thread, maybe try tuning the params or switch to G1 GC). Just helps to
take the guesswork out of the many possible causes. Of course the
suggestions received earlier are best practices and should be taken into
consideration nevertheless.

https://backstage.forgerock.com/knowledge/kb/article/a39551500

The hex number the author talks about in the link above is the native
thread id.

Best,
Rahul


On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson 
wrote:

> Zisis makes good points. One other thing is I’d look to
> see if the CPU spikes coincide with commits. But GC
> is where I’d look first.
>
> Continuing on with the theme of caches, yours are far too large
> at first glance. The default is, indeed, size=512. Every time
> you open a new searcher, you’ll be executing 128 queries
> for autowarming the filterCache and another 128 for the queryResultCache.
> autowarming alone might be accounting for it. I’d reduce
> the size back to 512 and an autowarm count nearer 16
> and monitor the cache hit ratio. There’s little or no benefit
> in squeezing the last few percent from the hit ratio. If your
> hit ratio is small even with the settings you have, then your caches
> don’t do you much good anyway so I’d make them much smaller.
>
> You haven’t told us how often your indexes are
> updated, which will be significant CPU hit due to
> your autowarming.
>
> Once you’re done with that, I’d then try reducing the heap. Most
> of the actual searching is done in Lucene via MMapDirectory,
> which resides in the OS memory space. See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Finally, if it is GC, consider G1GC if you’re not using that
> already.
>
> Best,
> Erick
>
>
> > On Oct 14, 2020, at 7:37 AM, Zisis T.  wrote:
> >
> > The values you have for the caches and the maxwarmingsearchers do not
> look
> > like the default. Cache sizes are 512 for the most part and
> > maxwarmingsearchers are 2 (if not limit them to 2)
> >
> > Sudden CPU spikes probably indicate GC issues. The #  of documents you
> have
> > is small, are they huge documents? The # of collections is OK in general
> but
> > since they are crammed in 5 Solr nodes the memory requirements might be
> > bigger. Especially if filter and the other caches get populated with 50K
> > entries.
> >
> > I'd first go through the GC activity to make sure that this is not
> causing
> > the issue. The fact that you lose some Solr servers is also an indicator
> of
> > large GC pauses that might create a problem when Solr communicates with
> > Zookeeper.
> >
> >
> >
> > --
> > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Re: Need urgent help -- High cpu on solr

2020-10-14 Thread Erick Erickson
Zisis makes good points. One other thing is I’d look to 
see if the CPU spikes coincide with commits. But GC
is where I’d look first.

Continuing on with the theme of caches, yours are far too large
at first glance. The default is, indeed, size=512. Every time
you open a new searcher, you’ll be executing 128 queries
for autowarming the filterCache and another 128 for the queryResultCache.
autowarming alone might be accounting for it. I’d reduce
the size back to 512 and an autowarm count nearer 16
and monitor the cache hit ratio. There’s little or no benefit
in squeezing the last few percent from the hit ratio. If your
hit ratio is small even with the settings you have, then your caches
don’t do you much good anyway so I’d make them much smaller.

You haven’t told us how often your indexes are
updated, which will be significant CPU hit due to
your autowarming.

Once you’re done with that, I’d then try reducing the heap. Most
of the actual searching is done in Lucene via MMapDirectory,
which resides in the OS memory space. See:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Finally, if it is GC, consider G1GC if you’re not using that
already.

Best,
Erick


> On Oct 14, 2020, at 7:37 AM, Zisis T.  wrote:
> 
> The values you have for the caches and the maxwarmingsearchers do not look
> like the default. Cache sizes are 512 for the most part and
> maxwarmingsearchers are 2 (if not limit them to 2)
> 
> Sudden CPU spikes probably indicate GC issues. The #  of documents you have
> is small, are they huge documents? The # of collections is OK in general but
> since they are crammed in 5 Solr nodes the memory requirements might be
> bigger. Especially if filter and the other caches get populated with 50K
> entries. 
> 
> I'd first go through the GC activity to make sure that this is not causing
> the issue. The fact that you lose some Solr servers is also an indicator of
> large GC pauses that might create a problem when Solr communicates with
> Zookeeper. 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Need urgent help -- High cpu on solr

2020-10-14 Thread Zisis T.
The values you have for the caches and the maxwarmingsearchers do not look
like the default. Cache sizes are 512 for the most part and
maxwarmingsearchers are 2 (if not limit them to 2)

Sudden CPU spikes probably indicate GC issues. The #  of documents you have
is small, are they huge documents? The # of collections is OK in general but
since they are crammed in 5 Solr nodes the memory requirements might be
bigger. Especially if filter and the other caches get populated with 50K
entries. 

I'd first go through the GC activity to make sure that this is not causing
the issue. The fact that you lose some Solr servers is also an indicator of
large GC pauses that might create a problem when Solr communicates with
Zookeeper. 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Need urgent help -- High cpu on solr

2020-10-13 Thread yaswanth kumar
I am using solr 8.2 with zoo 3.4, and configured 5 node solr cloud with
around 100 collections each collection having ~20k documents.

These nodes are vm's with 6 core cpu and 2 cores per socket. All of sudden
seeing hikes on CPU's and which brought down some nodes (GONE state on solr
cloud and also faced latencies while trying to login to those nodes ssh)

Memory : 32GB and 20GB was allotted for jvm heap on solr config.

 
 
 
 200
100
true
 false
4

These are just from the defaults that shipped with SOLR package.

One data point is that these nodes gets very frequent hits to them for
searching, so do I need to consider increasing the above sizes to get down
the CPU's and see more stable solr cloud?

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Need help in trying to understand the error

2020-10-13 Thread yaswanth kumar
I am seeing the below errors frequently on the solr logs, every
functionality seems to be working fine but not really sure why there are
lots of these errors happening in the backend

Using : solr8.2, zoo 3.4
we have enable solr basicauthentication with security.json

2020-10-13 20:37:12.320 ERROR (qtp969996005-4438) [   ]
o.a.s.c.s.i.HttpClientUtil  => org.apache.solr.common.SolrException:
javax.crypto.BadPaddingException: RSA private key operation failed
at org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325)
org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA
private key operation failed
at org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.security.PKIAuthenticationPlugin.setHeader(PKIAuthenticationPlugin.java:311)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.security.PKIAuthenticationPlugin$HttpHeaderClientInterceptor.process(PKIAuthenticationPlugin.java:271)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.client.solrj.impl.HttpClientUtil$DynamicInterceptor$1.accept(HttpClientUtil.java:179)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]
at
org.apache.solr.client.solrj.impl.HttpClientUtil$DynamicInterceptor$1.accept(HttpClientUtil.java:174)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]
at
java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:804)
~[?:?]
at
org.apache.solr.client.solrj.impl.HttpClientUtil$DynamicInterceptor.process(HttpClientUtil.java:174)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]
at
org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:133)
~[httpcore-4.4.10.jar:4.4.10]
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:183)
~[httpclient-4.5.6.jar:4.5.6]
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
~[httpclient-4.5.6.jar:4.5.6]
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
~[httpclient-4.5.6.jar:4.5.6]
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
~[httpclient-4.5.6.jar:4.5.6]
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
~[httpclient-4.5.6.jar:4.5.6]
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
~[httpclient-4.5.6.jar:4.5.6]
at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:688)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:550)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:04]
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
~[jetty-servlet-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
~[jetty-servlet-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
~[jetty-security-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)

Re: Help with uploading files to a core.

2020-10-11 Thread Walter Underwood
Solr is not a database. You can make a huge mess pretending it is a DB.

Also, it doesn’t store files.

What is your use case?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 11, 2020, at 1:28 PM, Guilherme dos Reis Meneguello 
>  wrote:
> 
> Hello! My name is Guilherme and I'm a new user of Solr.
> 
> Basically, I'm developing a database to help a research team in my
> university, but I'm having some problems uploading the files to the
> database. Either using curl commands or through the admin interface, I
> can't quite upload the files from my computer to Solr and set up the field
> types I want that file to have while indexed. I can do that through the
> document builder, but my intent was to have the research team I'm
> supporting just upload them through the terminal or something like that. My
> schema is all set up nicely, however the Solr's field class guessing isn't
> guessing correctly.
> 
> The reference guides in lucene apache's website didn't help me much. I'm
> pretty newbie when it comes to this field, but I feel it's something really
> basic that I'm missing. If anyone could help me or point me in the right
> direction, I'd be really thankful.
> 
> Regards,
> Guilherme.
> 
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> Livre
> de vírus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>.
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>



Re: Help with uploading files to a core.

2020-10-11 Thread Shawn Heisey

On 10/11/2020 2:28 PM, Guilherme dos Reis Meneguello wrote:

Hello! My name is Guilherme and I'm a new user of Solr.

Basically, I'm developing a database to help a research team in my
university, but I'm having some problems uploading the files to the
database. Either using curl commands or through the admin interface, I
can't quite upload the files from my computer to Solr and set up the field
types I want that file to have while indexed. I can do that through the
document builder, but my intent was to have the research team I'm
supporting just upload them through the terminal or something like that. My
schema is all set up nicely, however the Solr's field class guessing isn't
guessing correctly.


If you're using the capability to automatically add unknown fields, then 
your schema is NOT "all set up nicely".  It's apparently not set up at all.


The "add unknown fields" update processor is not recommended for 
production, because as you have noticed, it sometimes guesses the field 
type incorrectly.  The fact that it guesses incorrectly is not a bug ... 
we can't fix it because it's not actually broken.  Getting it right in 
every case is not possible.


Your best bet will be to set up the entire schema manually in advance of 
any indexing.  To do that, you're going to have to know every field that 
the data uses, and have field definitions already in the schema.


Thanks,
Shawn


Help with uploading files to a core.

2020-10-11 Thread Guilherme dos Reis Meneguello
Hello! My name is Guilherme and I'm a new user of Solr.

Basically, I'm developing a database to help a research team in my
university, but I'm having some problems uploading the files to the
database. Either using curl commands or through the admin interface, I
can't quite upload the files from my computer to Solr and set up the field
types I want that file to have while indexed. I can do that through the
document builder, but my intent was to have the research team I'm
supporting just upload them through the terminal or something like that. My
schema is all set up nicely, however the Solr's field class guessing isn't
guessing correctly.

The reference guides in lucene apache's website didn't help me much. I'm
pretty newbie when it comes to this field, but I feel it's something really
basic that I'm missing. If anyone could help me or point me in the right
direction, I'd be really thankful.

Regards,
Guilherme.

<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Livre
de vírus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Help using Noggit for streaming JSON data

2020-10-07 Thread Christopher Schultz
Yonic,

Thanks for the reply, and apologies for the long delay in this reply. Also 
apologies for top-posting, I’m writing from my phone. :(

Oh, of course... simply subclass the CharArr.

In my case, I should be able to immediately base64-decode the value (saves 1/4 
in-memory representation) and, if I do everything correctly, may be able to 
stream directly to my database.

With a *very* complicated CharArr implementation of course :)

Thanks,
-chris

> On Sep 17, 2020, at 12:22, Yonik Seeley  wrote:
> 
> See this method:
> 
>  /** Reads a JSON string into the output, decoding any escaped characters.
> */
>  public void getString(CharArr output) throws IOException
> 
> And then the idea is to create a subclass of CharArr to incrementally
> handle the string that is written to it.
> You could overload write methods, or perhaps reserve() to flush/handle the
> buffer when it reaches a certain size.
> 
> -Yonik
> 
> 
>> On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
>> ch...@christopherschultz.net> wrote:
>> 
>> All,
>> 
>> Is this an appropriate forum for asking questions about how to use
>> Noggit? The Github doesn't have any discussions available and filing an
>> "issue" to ask a question is kinda silly. I'm happy to be redirected to
>> the right place if this isn't appropriate.
>> 
>> I've been able to figure out most things in Noggit by reading the code,
>> but I have a new use-case where I expect that I'll have very large
>> values (base64-encoded binary) and I'd like to stream those rather than
>> calling parser.getString() and getting a potentially huge string coming
>> back. I'm streaming into a database so I never need the whole string in
>> one place at one time.
>> 
>> I was thinking something like this:
>> 
>> JSONParser p = ...;
>> 
>> int evt = p.nextEvent();
>> if(JSONParser.STRING == evt) {
>>  // Start streaming
>>  boolean eos = false;
>>  while(!eos) {
>>char c = p.getChar();
>>if(c == '"') {
>>  eos = true;
>>} else {
>>  append to stream
>>}
>>  }
>> }
>> 
>> But getChar() is not public. The only "documentation" I've really been
>> able to find for Noggit is this post from Yonic back in 2014:
>> 
>> http://yonik.com/noggit-json-parser/
>> 
>> It mostly says "Noggit is great!" and specifically mentions huge, long
>> strings but does not actually show any Java code to consume the JSON
>> data in any kind of streaming way.
>> 
>> The ObjectBuilder class is a great user of JSONParser, but it just
>> builds standard objects and would consume tons of memory in my case.
>> 
>> I know for sure that Solr consumes huge JSON documents and I'm assuming
>> that Noggit is being used in that situation, though I have not looked at
>> the code used to do that.
>> 
>> Any suggestions?
>> 
>> -chris
>> 


Re: Help using Noggit for streaming JSON data

2020-09-17 Thread Yonik Seeley
See this method:

  /** Reads a JSON string into the output, decoding any escaped characters.
*/
  public void getString(CharArr output) throws IOException

And then the idea is to create a subclass of CharArr to incrementally
handle the string that is written to it.
You could overload write methods, or perhaps reserve() to flush/handle the
buffer when it reaches a certain size.

-Yonik


On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> All,
>
> Is this an appropriate forum for asking questions about how to use
> Noggit? The Github doesn't have any discussions available and filing an
> "issue" to ask a question is kinda silly. I'm happy to be redirected to
> the right place if this isn't appropriate.
>
> I've been able to figure out most things in Noggit by reading the code,
> but I have a new use-case where I expect that I'll have very large
> values (base64-encoded binary) and I'd like to stream those rather than
> calling parser.getString() and getting a potentially huge string coming
> back. I'm streaming into a database so I never need the whole string in
> one place at one time.
>
> I was thinking something like this:
>
> JSONParser p = ...;
>
> int evt = p.nextEvent();
> if(JSONParser.STRING == evt) {
>   // Start streaming
>   boolean eos = false;
>   while(!eos) {
> char c = p.getChar();
> if(c == '"') {
>   eos = true;
> } else {
>   append to stream
> }
>   }
> }
>
> But getChar() is not public. The only "documentation" I've really been
> able to find for Noggit is this post from Yonic back in 2014:
>
> http://yonik.com/noggit-json-parser/
>
> It mostly says "Noggit is great!" and specifically mentions huge, long
> strings but does not actually show any Java code to consume the JSON
> data in any kind of streaming way.
>
> The ObjectBuilder class is a great user of JSONParser, but it just
> builds standard objects and would consume tons of memory in my case.
>
> I know for sure that Solr consumes huge JSON documents and I'm assuming
> that Noggit is being used in that situation, though I have not looked at
> the code used to do that.
>
> Any suggestions?
>
> -chris
>


Help using Noggit for streaming JSON data

2020-09-17 Thread Christopher Schultz
All,

Is this an appropriate forum for asking questions about how to use
Noggit? The Github doesn't have any discussions available and filing an
"issue" to ask a question is kinda silly. I'm happy to be redirected to
the right place if this isn't appropriate.

I've been able to figure out most things in Noggit by reading the code,
but I have a new use-case where I expect that I'll have very large
values (base64-encoded binary) and I'd like to stream those rather than
calling parser.getString() and getting a potentially huge string coming
back. I'm streaming into a database so I never need the whole string in
one place at one time.

I was thinking something like this:

JSONParser p = ...;

int evt = p.nextEvent();
if(JSONParser.STRING == evt) {
  // Start streaming
  boolean eos = false;
  while(!eos) {
char c = p.getChar();
if(c == '"') {
  eos = true;
} else {
  append to stream
}
  }
}

But getChar() is not public. The only "documentation" I've really been
able to find for Noggit is this post from Yonic back in 2014:

http://yonik.com/noggit-json-parser/

It mostly says "Noggit is great!" and specifically mentions huge, long
strings but does not actually show any Java code to consume the JSON
data in any kind of streaming way.

The ObjectBuilder class is a great user of JSONParser, but it just
builds standard objects and would consume tons of memory in my case.

I know for sure that Solr consumes huge JSON documents and I'm assuming
that Noggit is being used in that situation, though I have not looked at
the code used to do that.

Any suggestions?

-chris


SolrJ and AWS help!

2020-09-15 Thread Dhara Patel
Hi,

I’m a student working on a personal project that uses SolrJ to search a 
database using a solr core that lives on an AWS instance. Currently, I am using 
EmbeddedSolrServer() to initialize a Solr core.

CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
solr = new EmbeddedSolrServer(coreContainer, "test-vpn");

solr_active = true; //successfully connected to solr core on aws

I would love your input on whether or not this is the correct method for this 
particular implementation.

Thanks!
Dhara


Sent from Mail for Windows 10



Re: BasicAuth help

2020-09-08 Thread Dwane Hall
Just adding some assistance to the Solr-LDAP integration options. A colleague 
of mine wrote a plugin that adopts a similar approach to the one Jan suggested 
of "plugging-in" an LDAP provider.

He provides the following notes on its design and use



1.   It authenticates with LDAP on every request which can be expensive. In the 
same repo he's written an optimisation for a gremlin-ldap-plugin that can 
probably be ported here (Once LDAP successfully authenticates, caches 
credentials locally by BCrypt hashing it and using the cached hash to validate 
subsequent requests until cache timeout which is when it goes back to LDAP 
again. So, any password changes in LDAP are reflected correctly. This caching 
can be turned on and off with a param based on how expensive the LDAP auth is).

2.  He had to copy large swaths of code from 
org.apache.solr.security.RuleBasedAuthorizationPlugin into the ldap 
authorisation plugin because the Solr class is not extensible. A refactor the 
class to make the extension easier would prevent this.

3.  Finally, the inter-node authentication. Need to look into it to see if 
there is a mechanism to extend the inter-node auth to include roles in the 
payload so that LDAP role look up isn’t happening on every node that request 
ends up hitting.



But if someone really wants LDAP integration they can use it as is. It's a good 
starting point anyway.  (https://github.com/vjgorla/solr-ldap-plugin)

Thanks,

Dwane

From: Jan Høydahl 
Sent: Monday, 7 September 2020 5:21 PM
To: solr-user@lucene.apache.org 
Subject: Re: BasicAuth help

That github patch is interesting.
My initial proposal for how to plug LDAP into Solr was to make the 
AuthenticationProvider pluggable in BasicAuthPlugin, so you could plug in an 
LDAPAuthProvider. See https://issues.apache.org/jira/browse/SOLR-8951 
<https://issues.apache.org/jira/browse/SOLR-8951>. No need to replace the whole 
BasicAuth class I think. Anyone who wants to give it a shot, borrowing some 
code from the ldap_solr repo, feel free :)

Jan

> 4. sep. 2020 kl. 09:43 skrev Aroop Ganguly :
>
> Try looking at a simple ldap authentication suggested here: 
> https://github.com/itzmestar/ldap_solr 
> <https://github.com/itzmestar/ldap_solr>
> You can combine this for authentication and couple it with rule based 
> authorization.
>
>
>
>> On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT > <mailto:ali.vana...@dot.wi.gov>> wrote:
>>
>> Hello,
>>
>> Solr is running on windows machine and wondering if it possible to setup 
>> BasicAuth with the LDAP?
>>
>> Also, tried the example of Basic-Authentication that is published 
>> here<https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html#rule-based-authorization-plugin
>>  
>> <https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html#rule-based-authorization-plugin>>
>>  but this did not work too.
>>
>> Thanks...Ali
>>
>>
>



Re: BasicAuth help

2020-09-07 Thread Jan Høydahl
That github patch is interesting.
My initial proposal for how to plug LDAP into Solr was to make the 
AuthenticationProvider pluggable in BasicAuthPlugin, so you could plug in an 
LDAPAuthProvider. See https://issues.apache.org/jira/browse/SOLR-8951 
. No need to replace the whole 
BasicAuth class I think. Anyone who wants to give it a shot, borrowing some 
code from the ldap_solr repo, feel free :)

Jan

> 4. sep. 2020 kl. 09:43 skrev Aroop Ganguly :
> 
> Try looking at a simple ldap authentication suggested here: 
> https://github.com/itzmestar/ldap_solr 
> 
> You can combine this for authentication and couple it with rule based 
> authorization.
> 
> 
> 
>> On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT > > wrote:
>> 
>> Hello,
>> 
>> Solr is running on windows machine and wondering if it possible to setup 
>> BasicAuth with the LDAP?
>> 
>> Also, tried the example of Basic-Authentication that is published 
>> here>  
>> >
>>  but this did not work too.
>> 
>> Thanks...Ali
>> 
>> 
> 



Re: BasicAuth help

2020-09-04 Thread Joe Doupnik
    There is an effective alternative approach to placing 
authentication within Solr. It is to use the web server (say Apache) as 
a smart proxy to Solr and in so doing also apply access restrictions of 
various kinds. Thus Solr remains intact, no addition needed for 
authentication, and authentication can be accomplished with a known 
robust tool.
    Sketching the Apache part, to clarify matters. This example 
requires both an IP range and an LDAP authentication, and it supports 
https as well.


    
    require ip  11.22.33.44/24  5.6.7.8/28
        AuthType Basic
        AuthBasicProvider ldap
        AuthName "Solr"
        AuthLDAPUrl ldap://example.com/o=GCHQ?uid?one?(objectClass=user)
        require ldap-user admin james moneypenny
        proxypass  "http://localhost:8983/solr"  keepalive=on
        proxypassreverse  "http://localhost:8983/solr;
    

    Above, localhost can be replaced with the DNS name of another 
machine, that where Solr itself resides. The URI name /solr is clearly 
something which we can choose to suit ourselves. This example may be 
enhanced for local requirements.
    The Apache manual has full details, naturally. It is important to 
use proven robust tools when we deal with the bad guys.

    Thanks,
    Joe D.

On 04/09/2020 08:43, Aroop Ganguly wrote:

Try looking at a simple ldap authentication suggested here: 
https://github.com/itzmestar/ldap_solr 
You can combine this for authentication and couple it with rule based 
authorization.




On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT mailto:ali.vana...@dot.wi.gov>> wrote:

Hello,

Solr is running on windows machine and wondering if it possible to setup 
BasicAuth with the LDAP?

Also, tried the example of Basic-Authentication that is published 
here>
 but this did not work too.

Thanks...Ali








Re: BasicAuth help

2020-09-04 Thread Aroop Ganguly
Try looking at a simple ldap authentication suggested here: 
https://github.com/itzmestar/ldap_solr 
You can combine this for authentication and couple it with rule based 
authorization.



> On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT  > wrote:
> 
> Hello,
> 
> Solr is running on windows machine and wondering if it possible to setup 
> BasicAuth with the LDAP?
> 
> Also, tried the example of Basic-Authentication that is published 
> here  
> >
>  but this did not work too.
> 
> Thanks...Ali
> 
> 



Re: BasicAuth help

2020-09-03 Thread Jason Gerlowski
Hi Ali,

1. Solr doesn't have any support for LDAP authentication ootb (at
least, as far as I'm aware).  The BasicAuth plugin requires users to
be defined in the JSON configuration.

2. What failed when you ran the documented BasicAuth example?  What
error messages did you get etc.?  If there's something wrong with that
example, maybe we can fix the docs.

Jason

On Fri, Aug 28, 2020 at 3:28 PM Vanalli, Ali A - DOT
 wrote:
>
> Hello,
>
> Solr is running on windows machine and wondering if it possible to setup 
> BasicAuth with the LDAP?
>
> Also, tried the example of Basic-Authentication that is published 
> here
>  but this did not work too.
>
> Thanks...Ali
>
>


BasicAuth help

2020-08-28 Thread Vanalli, Ali A - DOT
Hello,

Solr is running on windows machine and wondering if it possible to setup 
BasicAuth with the LDAP?

Also, tried the example of Basic-Authentication that is published 
here
 but this did not work too.

Thanks...Ali




Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-11 Thread Doss
LAMP solution when the Apache server can't
>>>> anymore connect to the MySQL/MariaDB database.
>>>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>>>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>>>> This
>>>> is well explained in this great article
>>>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>>>
>>>> However, in general and more specifically in your case, I would
>>>> investigate
>>>> the root cause of your issue and do not try to find a workaround.
>>>>
>>>> Can you provide more information about your use case (we know : 3 node
>>>> SOLR
>>>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>>>
>>>>- hardware architecture and sizing
>>>>- JVM version / settings
>>>>- Solr settings
>>>>- collections and queries information
>>>>- gc logs or gceasy results
>>>>
>>>> Regards
>>>>
>>>> Dominique
>>>>
>>>>
>>>>
>>>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>>>
>>>> > Hi,
>>>> >
>>>> > In solr 8.3.1 source, I see the following , which I assume could be
>>>> the
>>>> > reason for the issue "Max requests queued per destination 3000
>>>> exceeded for
>>>> > HttpDestination",
>>>> >
>>>> >
>>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>>>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>>>> >
>>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>>>> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>>>> >
>>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>>>> >  return MAX_OUTSTANDING_REQUESTS * 3;
>>>> >
>>>> > how can I increase this?
>>>> >
>>>> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>>>> >
>>>> > > Hi,
>>>> > >
>>>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble
>>>> now and
>>>> > > then we are facing "Max requests queued per destination 3000
>>>> exceeded for
>>>> > > HttpDestination"
>>>> > >
>>>> > > After restart evering thing starts working fine until another
>>>> problem.
>>>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>>>> > >
>>>> > > Server 1:
>>>> > >*7722*  Threads are in TIMED_WATING
>>>> > >
>>>> >
>>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
>>>> > > ")
>>>> > > Server 2:
>>>> > >*4046*   Threads are in TIMED_WATING
>>>> > >
>>>> >
>>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
>>>> > > ")
>>>> > > Server 3:
>>>> > >*4210*   Threads are in TIMED_WATING
>>>> > >
>>>> >
>>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
>>>> > > ")
>>>> > >
>>>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how
>>>> can we
>>>> > > increase the 3000 limit?
>>>> > >
>>>> > > Sorry, since I haven't got any response to my previous query,  I am
>>>> > > creating this as new,
>>>> > >
>>>> > > Thanks,
>>>> > > Mohandoss.
>>>> > >
>>>> >
>>>>
>>>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Zookeeper Ensemble) ?
>>>
>>>- hardware architecture and sizing
>>>- JVM version / settings
>>>- Solr settings
>>>- collections and queries information
>>>- gc logs or gceasy results
>>>
>>> Regards
>>>
>>> Dominique
>>>
>>>
>>>
>>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>>
>>> > Hi,
>>> >
>>> > In solr 8.3.1 source, I see the following , which I assume could be the
>>> > reason for the issue "Max requests queued per destination 3000
>>> exceeded for
>>> > HttpDestination",
>>> >
>>> > solr/solrj/src/java/org/apache/solr/client/solrj/impl/
>>> Http2SolrClient.java:
>>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>>> > solr/solrj/src/java/org/apache/solr/client/solrj/impl/
>>> Http2SolrClient.java:
>>> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>>> > solr/solrj/src/java/org/apache/solr/client/solrj/impl/
>>> Http2SolrClient.java:
>>> >  return MAX_OUTSTANDING_REQUESTS * 3;
>>> >
>>> > how can I increase this?
>>> >
>>> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble
>>> now and
>>> > > then we are facing "Max requests queued per destination 3000
>>> exceeded for
>>> > > HttpDestination"
>>> > >
>>> > > After restart evering thing starts working fine until another
>>> problem.
>>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>>> > >
>>> > > Server 1:
>>> > >*7722*  Threads are in TIMED_WATING
>>> > >
>>> > ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>> ConditionObject@151d5f2f
>>> > > ")
>>> > > Server 2:
>>> > >*4046*   Threads are in TIMED_WATING
>>> > >
>>> > ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>> ConditionObject@1e0205c3
>>> > > ")
>>> > > Server 3:
>>> > >*4210*   Threads are in TIMED_WATING
>>> > >
>>> > ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>> ConditionObject@5ee792c0
>>> > > ")
>>> > >
>>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how
>>> can we
>>> > > increase the 3000 limit?
>>> > >
>>> > > Sorry, since I haven't got any response to my previous query,  I am
>>> > > creating this as new,
>>> > >
>>> > > Thanks,
>>> > > Mohandoss.
>>> > >
>>> >
>>>
>>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean
Doss,

See below.

Dominique


Le lun. 10 août 2020 à 17:41, Doss  a écrit :

> Hi Dominique,
>
> Thanks for your response. Find below the details, please do let me know if
> anything I missed.
>
>
> *- hardware architecture and sizing*
> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>
>
> *- JVM version / settings*
> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
> Default Settings including GC
>

I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is the
best choice for LTS version.


>
> *- Solr settings*
> >> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
>  class="org.apache.solr.index.TieredMergePolicyFactory"> name="maxMergeAtOnce">30 100
> 30.0 
>
>class="org.apache.lucene.index.ConcurrentMergeScheduler"> name="maxMergeCount">18 name="maxThreadCount">6
>

You change a lot of default values. Any specific raisons ? Il seems very
aggressive !


>
>
> *- collections and queries information   *
> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
> selects not yet started. Daily once we do delta import of cetrain fields of
> type multivalued with some good amount of data.
>
> *- gc logs or gceasy results*
>
> Easy GC Report says GC health is good, one server's gc report:
> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
> CPU Load Pattern:
> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>
>
You have to analyze GC on all nodes !
Your heap is very big. According to full GC frequency, I don't think you
really need such a big heap for only indexing. May be when you will perform
queries.

Did you check your network performances ?
Did you check Zookeeper logs ?


>
> Thanks,
> Doss.
>
>
>
> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi Doss,
>>
>> See a lot of TIMED_WATING connection occurs with high tcp traffic
>> infrastructure as in a LAMP solution when the Apache server can't
>> anymore connect to the MySQL/MariaDB database.
>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>> This
>> is well explained in this great article
>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>
>> However, in general and more specifically in your case, I would
>> investigate
>> the root cause of your issue and do not try to find a workaround.
>>
>> Can you provide more information about your use case (we know : 3 node
>> SOLR
>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>
>>- hardware architecture and sizing
>>- JVM version / settings
>>- Solr settings
>>- collections and queries information
>>- gc logs or gceasy results
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>
>> > Hi,
>> >
>> > In solr 8.3.1 source, I see the following , which I assume could be the
>> > reason for the issue "Max requests queued per destination 3000 exceeded
>> for
>> > HttpDestination",
>> >
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  return MAX_OUTSTANDING_REQUESTS * 3;
>> >
>> > how can I increase this?
>> >
>> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>> >
>> > > Hi,
>> > >
>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
>> and
>> > > then we are facing "Max requests queued per destination 3000 exceeded
>> for
>> > > HttpDestination"
>> > >
>> > > After restart evering thing starts working fine until another problem.
>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>> > >
>> > > Server 1:
>> > >*7722*  Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
>> > > ")
>> > > Server 2:
>> > >*4046*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
>> > > ")
>> > > Server 3:
>> > >*4210*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
>> > > ")
>> > >
>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
>> we
>> > > increase the 3000 limit?
>> > >
>> > > Sorry, since I haven't got any response to my previous query,  I am
>> > > creating this as new,
>> > >
>> > > Thanks,
>> > > Mohandoss.
>> > >
>> >
>>
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Hi Dominique,

Thanks for your response. Find below the details, please do let me know if
anything I missed.


*- hardware architecture and sizing*
>> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD


*- JVM version / settings*
>> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
Default Settings including GC


*- Solr settings*
>> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
30 100
30.0 

  186


*- collections and queries information   *
>> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
selects not yet started. Daily once we do delta import of cetrain fields of
type multivalued with some good amount of data.

*- gc logs or gceasy results*

Easy GC Report says GC health is good, one server's gc report:
https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
CPU Load Pattern:
https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing



Thanks,
Doss.



On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean 
wrote:

> Hi Doss,
>
> See a lot of TIMED_WATING connection occurs with high tcp traffic
> infrastructure as in a LAMP solution when the Apache server can't
> anymore connect to the MySQL/MariaDB database.
> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
> never net.ipv4.tcp_tw_recycle as you suggested in your previous post). This
> is well explained in this great article
> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>
> However, in general and more specifically in your case, I would investigate
> the root cause of your issue and do not try to find a workaround.
>
> Can you provide more information about your use case (we know : 3 node SOLR
> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>
>- hardware architecture and sizing
>- JVM version / settings
>- Solr settings
>- collections and queries information
>- gc logs or gceasy results
>
> Regards
>
> Dominique
>
>
>
> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>
> > Hi,
> >
> > In solr 8.3.1 source, I see the following , which I assume could be the
> > reason for the issue "Max requests queued per destination 3000 exceeded
> for
> > HttpDestination",
> >
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >  return MAX_OUTSTANDING_REQUESTS * 3;
> >
> > how can I increase this?
> >
> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
> >
> > > Hi,
> > >
> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
> and
> > > then we are facing "Max requests queued per destination 3000 exceeded
> for
> > > HttpDestination"
> > >
> > > After restart evering thing starts working fine until another problem.
> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
> > >
> > > Server 1:
> > >*7722*  Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> > > ")
> > > Server 2:
> > >*4046*   Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> > > ")
> > > Server 3:
> > >*4210*   Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> > > ")
> > >
> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
> we
> > > increase the 3000 limit?
> > >
> > > Sorry, since I haven't got any response to my previous query,  I am
> > > creating this as new,
> > >
> > > Thanks,
> > > Mohandoss.
> > >
> >
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean
Hi Doss,

See a lot of TIMED_WATING connection occurs with high tcp traffic
infrastructure as in a LAMP solution when the Apache server can't
anymore connect to the MySQL/MariaDB database.
In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
never net.ipv4.tcp_tw_recycle as you suggested in your previous post). This
is well explained in this great article
https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

However, in general and more specifically in your case, I would investigate
the root cause of your issue and do not try to find a workaround.

Can you provide more information about your use case (we know : 3 node SOLR
(8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?

   - hardware architecture and sizing
   - JVM version / settings
   - Solr settings
   - collections and queries information
   - gc logs or gceasy results

Regards

Dominique



Le lun. 10 août 2020 à 15:43, Doss  a écrit :

> Hi,
>
> In solr 8.3.1 source, I see the following , which I assume could be the
> reason for the issue "Max requests queued per destination 3000 exceeded for
> HttpDestination",
>
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>private static final int MAX_OUTSTANDING_REQUESTS = 1000;
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  return MAX_OUTSTANDING_REQUESTS * 3;
>
> how can I increase this?
>
> On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>
> > Hi,
> >
> > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
> > then we are facing "Max requests queued per destination 3000 exceeded for
> > HttpDestination"
> >
> > After restart evering thing starts working fine until another problem.
> > Once a problem occurred we are seeing soo many TIMED_WAITING threads
> >
> > Server 1:
> >*7722*  Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> > ")
> > Server 2:
> >*4046*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> > ")
> > Server 3:
> >*4210*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> > ")
> >
> > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
> > increase the 3000 limit?
> >
> > Sorry, since I haven't got any response to my previous query,  I am
> > creating this as new,
> >
> > Thanks,
> > Mohandoss.
> >
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Hi,

In solr 8.3.1 source, I see the following , which I assume could be the
reason for the issue "Max requests queued per destination 3000 exceeded for
HttpDestination",

solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
   private static final int MAX_OUTSTANDING_REQUESTS = 1000;
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
 available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
 return MAX_OUTSTANDING_REQUESTS * 3;

how can I increase this?

On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:

> Hi,
>
> We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
> then we are facing "Max requests queued per destination 3000 exceeded for
> HttpDestination"
>
> After restart evering thing starts working fine until another problem.
> Once a problem occurred we are seeing soo many TIMED_WAITING threads
>
> Server 1:
>*7722*  Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> ")
> Server 2:
>*4046*   Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> ")
> Server 3:
>*4210*   Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> ")
>
> Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
> increase the 3000 limit?
>
> Sorry, since I haven't got any response to my previous query,  I am
> creating this as new,
>
> Thanks,
> Mohandoss.
>


Fwd: Help For solr in data config.xml regarding fetching record

2020-08-10 Thread Rajat Diwate
Hi Team,
I need some help regarding this issue I have posted it on nabble but no
response yet
kindly check the shared link please provided an solution .if not please
reply with a suggested solution/mail where i can get help regarding this
issue.

https://lucene.472066.n3.nabble.com/Need-Help-For-solr-in-data-config-xml-regarding-fetching-record-td4461139.html

Regards,
Rajat


Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-09 Thread Doss
Hi,

We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
then we are facing "Max requests queued per destination 3000 exceeded for
HttpDestination"

After restart evering thing starts working fine until another problem. Once
a problem occurred we are seeing soo many TIMED_WAITING threads

Server 1:
   *7722*  Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
")
Server 2:
   *4046*   Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
")
Server 3:
   *4210*   Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
")

Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
increase the 3000 limit?

Sorry, since I haven't got any response to my previous query,  I am
creating this as new,

Thanks,
Mohandoss.


Re: partial search help request

2020-08-05 Thread Philip Smith
Great advice Erick, kindly appreciated.

I removed PorterStemFilter as you suggested and it worked as one would
expect it to. Very useful to learn about avoiding KeywordTokenizerFactory,
the limitation of the WhitespaceTokenizer and the testing approach.

Best,
Phil

On Wed, Aug 5, 2020 at 8:37 PM Erick Erickson 
wrote:

> First of all, lots of attachments are stripped by the mail server so a
> number of your attachments didn’t come through, although your field
> definitions did so we can’t see your results.
>
> KeywordTokenizerFactory is something I’d avoid at this point. It doesn’t
> break up the input at all, so input of “my dog has fleas” indexes exactly
> one token, “my dog has fleas” which is usually not what people want.
>
> For the other problems, I’d suggest several ways to narrow down the issue.
>
> 1> remove PorterStemFilter and see what you get. This is something of a
> long shot, but I’ve seen this cause unexpected results due to the
> altorighmic nature of the stemmer not quite matching your assumptions.
>
> 2> add =query to your URL and look particularly at the “parsed
> query” section. That’ll show you exactly how the search string was
> transmorgified prior to search and often offers clues.
>
> 3> Don’t use edismax to start. What you’ve shown looks correct, this is
> just on the theory that using something simpler to start means fewer moving
> parts.
>
>
> Also, be a little careful of WhitespaceTokenizer. For controlled
> experiments where you’re tightly controlling the input, but going to prod
> has some issues. That tokenizer works fine, it’s just that it’ll include,
> say, the period at the end of a sentence with the last word of the sentence…
>
> Best,
> Erick
>
> > On Aug 5, 2020, at 8:08 AM, Philip Smith  wrote:
> >
> > Hello,
> > I've had a break-through with my partial string search problem, I don't
> understand why though.
> >
> > I found yet another example,
> https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
> > and this one uses a different tokenizer, whitespaceTokenizerFactory
> >
> >  positionIncrementGap="100">
> >   
> > 
> >  maxGramSize="50"/>
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> >
> > The analysis results look very different. It seems to be returning the
> desired results so far.
> >
> >
> > I don't understand why the other examples that worked for other people
> weren't working for me. Is it version 8?
> > StandardTokenizerFactory didn't work and when I was trying with the
> KeywordTokenizerFactory it wasn't even matching the full search term.
> > If anyone can shed any light, then I'd be grateful.
> > Thanks.
> >
> >
> > On Wed, Aug 5, 2020 at 7:12 PM Philip Smith  wrote:
> > Hello,
> > I'm new to Solr and to this user group. Any help with this problem would
> be greatly appreciated.
> >
> > I'm trying to get partial keyword search results working. This seems
> like a fairly common problem, I've found numerous google results offering
> solutions
> > for instance
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> > but when I attempt to implement them I'm not receiving the desired
> results.
> >
> > I'm running solr 8.5.2 in standalone mode, manually editing the configs.
> >
> > I have configured the title field as
> >
> >  stored="true" multiValued="false"/>
> >
> > I have also tried it with this parameter
> omitTermFreqAndPositions="true"
> >
> > The field type definition is:
> >
> >omitNorms="false">
> >   
> > 
> > 
> > 
> > 
> >  maxGramSize="35" />
> >   
> >   
> > 
> > 
> > 
> > 
> >   
> > 
> >
> > I'm using edismax and searching on title.
> >
> >
> http://localhost:8983/solr/events/select?defType=edismax=title=title=educatio
> >
> > when using edge_ngram_test_5
> >
> > edu  correctly finds 4 results
> > educa   finds 0
> > educat  finds 0
> > educati finds 0
> > educatio   finds 0
> > education correctly finds 4.
> >
> > Steps taken between changes to the schema.
> > bin/solr restart
> > reimport data
> > core admin > reload core
> >
> > In admin, I see the correct value,
> > Typeedge_ngram_test_5 when I check in schema.
> >
> > In admin , when I check in analysis and

Re: partial search help request

2020-08-05 Thread Erick Erickson
First of all, lots of attachments are stripped by the mail server so a number 
of your attachments didn’t come through, although your field definitions did so 
we can’t see your results.

KeywordTokenizerFactory is something I’d avoid at this point. It doesn’t break 
up the input at all, so input of “my dog has fleas” indexes exactly one token, 
“my dog has fleas” which is usually not what people want.

For the other problems, I’d suggest several ways to narrow down the issue.

1> remove PorterStemFilter and see what you get. This is something of a long 
shot, but I’ve seen this cause unexpected results due to the altorighmic nature 
of the stemmer not quite matching your assumptions.

2> add =query to your URL and look particularly at the “parsed query” 
section. That’ll show you exactly how the search string was transmorgified 
prior to search and often offers clues.

3> Don’t use edismax to start. What you’ve shown looks correct, this is just on 
the theory that using something simpler to start means fewer moving parts.


Also, be a little careful of WhitespaceTokenizer. For controlled experiments 
where you’re tightly controlling the input, but going to prod has some issues. 
That tokenizer works fine, it’s just that it’ll include, say, the period at the 
end of a sentence with the last word of the sentence…

Best,
Erick

> On Aug 5, 2020, at 8:08 AM, Philip Smith  wrote:
> 
> Hello, 
> I've had a break-through with my partial string search problem, I don't 
> understand why though. 
> 
> I found yet another example, 
> https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
> and this one uses a different tokenizer, whitespaceTokenizerFactory
> 
> 
>   
> 
> 
> 
>   
>   
> 
> 
>   
> 
> 
> The analysis results look very different. It seems to be returning the 
> desired results so far. 
> 
> 
> I don't understand why the other examples that worked for other people 
> weren't working for me. Is it version 8?
> StandardTokenizerFactory didn't work and when I was trying with the 
> KeywordTokenizerFactory it wasn't even matching the full search term.
> If anyone can shed any light, then I'd be grateful.
> Thanks.
> 
> 
> On Wed, Aug 5, 2020 at 7:12 PM Philip Smith  wrote:
> Hello,
> I'm new to Solr and to this user group. Any help with this problem would be 
> greatly appreciated. 
> 
> I'm trying to get partial keyword search results working. This seems like a 
> fairly common problem, I've found numerous google results offering solutions 
> for instance 
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> but when I attempt to implement them I'm not receiving the desired results. 
> 
> I'm running solr 8.5.2 in standalone mode, manually editing the configs. 
> 
> I have configured the title field as 
> 
>  multiValued="false"/>
> 
> I have also tried it with this parameter  omitTermFreqAndPositions="true"  
> 
> The field type definition is:
> 
>omitNorms="false">
>   
> 
> 
> 
> 
>  maxGramSize="35" />
>   
>   
> 
> 
> 
> 
>   
> 
> 
> I'm using edismax and searching on title.
> 
> http://localhost:8983/solr/events/select?defType=edismax=title=title=educatio
> 
> when using edge_ngram_test_5
> 
> edu  correctly finds 4 results
> educa   finds 0
> educat  finds 0
> educati finds 0
> educatio   finds 0
> education correctly finds 4.
> 
> Steps taken between changes to the schema.
> bin/solr restart
> reimport data
> core admin > reload core
> 
> In admin, I see the correct value, 
> Typeedge_ngram_test_5 when I check in schema. 
> 
> In admin , when I check in analysis and search on text analyse 
> 
> 
> it appears to be breaking the word down into letters as I would guess is the 
> correct step.
> 
> These are the query results:
> 
> 
> it looks like it is applying the correct filter names and the search term 
> isn't being altered. I don't understand enough to be able to determine why 
> the query can't find the search result when it appears to have been indexed. 
> Any advice is very welcome as I've spent hours trying to get this working. 
> 
> 
> I've also tried with:
>  positionIncrementGap="100">
>   
> 
> 
>  maxGramSize="25"/>
>   
>   
> 
> 
>   
> 
> 
>  positionIncrementGap="100" >
>   
>
>  words="stopwords.txt" />
> 
>  maxGramSize="30"/> 
>   
>   
> 
>  words="stopwords.txt" />
>   
> 
>   
> 
> 
> 
>  positionIncrementGap="100" >
>   
>
> 
>  maxGramSize="25" />
>   
>   
> 
>   
> 
> 
> 
> Thanks in advance for any insights offered.
> Kind regards,
> Phil.



Re: partial search help request

2020-08-05 Thread Philip Smith
Hello,
I've had a break-through with my partial string search problem, I don't
understand why though.

I found yet another example,
https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
and this one uses a different tokenizer, whitespaceTokenizerFactory













The analysis results look very different. It seems to be returning the
desired results so far.
[image: image.png]

I don't understand why the other examples that worked for other people
weren't working for me. Is it version 8?
StandardTokenizerFactory didn't work and when I was trying with
the KeywordTokenizerFactory it wasn't even matching the full search term.
If anyone can shed any light, then I'd be grateful.
Thanks.


On Wed, Aug 5, 2020 at 7:12 PM Philip Smith  wrote:

> Hello,
> I'm new to Solr and to this user group. Any help with this problem
> would be greatly appreciated.
>
> I'm trying to get partial keyword search results working. This seems like
> a fairly common problem, I've found numerous google results offering
> solutions
> for instance
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> but when I attempt to implement them I'm not receiving the desired
> results.
>
> I'm running solr 8.5.2 in standalone mode, manually editing the configs.
>
> I have configured the title field as
>
>  multiValued="false"/>
>
> I have also tried it with this parameter  omitTermFreqAndPositions="true"
>
> The field type definition is:
>
>  "false">
> 
> 
> 
> 
> 
>  "35" />
> 
> 
> 
> 
> 
> 
> 
> 
>
> I'm using edismax and searching on title.
>
>
> http://localhost:8983/solr/events/select?defType=edismax=title=title=educatio
>
> when using edge_ngram_test_5
>
> edu  correctly finds 4 results
> educa   finds 0
> educat  finds 0
> educati finds 0
> educatio   finds 0
> education correctly finds 4.
>
> Steps taken between changes to the schema.
> bin/solr restart
> reimport data
> core admin > reload core
>
> In admin, I see the correct value,
> Typeedge_ngram_test_5 when I check in schema.
>
> In admin , when I check in analysis and search on text analyse
>
> [image: image.png]
> it appears to be breaking the word down into letters as I would guess is
> the correct step.
>
> These are the query results:
> [image: image.png]
>
> it looks like it is applying the correct filter names and the search term
> isn't being altered. I don't understand enough to be able to determine why
> the query can't find the search result when it appears to have been
> indexed. Any advice is very welcome as I've spent hours trying to get this
> working.
>
>
> I've also tried with:
>  positionIncrementGap="100">
> 
> 
> 
>  "25"/>
> 
> 
> 
> 
> 
> 
>
>  positionIncrementGap="100" >
> 
> 
>  "stopwords.txt" />
> 
>  "30"/> 
> 
> 
> 
>  "stopwords.txt" />
> 
> 
> 
> 
>
>
>  positionIncrementGap="100" >
> 
> 
> 
>  "25" />
> 
> 
> 
> 
> 
>
>
> Thanks in advance for any insights offered.
> Kind regards,
> Phil.
>


partial search help request

2020-08-05 Thread Philip Smith
Hello,
I'm new to Solr and to this user group. Any help with this problem would be
greatly appreciated.

I'm trying to get partial keyword search results working. This seems like a
fairly common problem, I've found numerous google results offering
solutions
for instance
https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
but when I attempt to implement them I'm not receiving the desired results.

I'm running solr 8.5.2 in standalone mode, manually editing the configs.

I have configured the title field as



I have also tried it with this parameter  omitTermFreqAndPositions="true"

The field type definition is:

















I'm using edismax and searching on title.

http://localhost:8983/solr/events/select?defType=edismax=title=title=educatio

when using edge_ngram_test_5

edu  correctly finds 4 results
educa   finds 0
educat  finds 0
educati finds 0
educatio   finds 0
education correctly finds 4.

Steps taken between changes to the schema.
bin/solr restart
reimport data
core admin > reload core

In admin, I see the correct value,
Typeedge_ngram_test_5 when I check in schema.

In admin , when I check in analysis and search on text analyse

[image: image.png]
it appears to be breaking the word down into letters as I would guess is
the correct step.

These are the query results:
[image: image.png]

it looks like it is applying the correct filter names and the search term
isn't being altered. I don't understand enough to be able to determine why
the query can't find the search result when it appears to have been
indexed. Any advice is very welcome as I've spent hours trying to get this
working.


I've also tried with:









































Thanks in advance for any insights offered.
Kind regards,
Phil.


Re: sorting help

2020-07-15 Thread Dave
That’s a good place to start. The idea was to make sure titles that started 
with a date would not always be at the forefront and the actual title of the 
doc would be sorted. 

> On Jul 15, 2020, at 4:58 PM, Erick Erickson  wrote:
> 
> Yeah, it’s always a question “how much is enough/too much”.
> 
> That looks reasonable for alphatitle, but what about title? Your original
> question was that the sorting changes depending on which field you 
> sort on. If your title field uses something that tokenizes or doesn’t
> include the same analysis chain (particularly the lowercasing
> and patternreplace) then I’d expect the order to change.
> 
> Best,
> Erick
> 
>> On Jul 15, 2020, at 4:49 PM, David Hastings  
>> wrote:
>> 
>> thanks, ill check the admin, didnt want to send a big clock of text but:
>> 
>> 
>>  -
>> -
>> 
>> Tokenizer:
>> org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
>> solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
>> -
>> 
>> Token Filters:
>> org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
>> solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
>> org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
>> solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
>> org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
>> ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
>> luceneMatchVersion: 7.1.0
>>  -
>> 
>>  Query Analyzer:
>>  
>> 
>>  org.apache.solr.analysis.TokenizerChain
>> -
>> 
>> Tokenizer:
>> org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
>> solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
>> -
>> 
>> Token Filters:
>> org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
>> solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
>> org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
>> solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
>> org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
>> ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
>> luceneMatchVersion: 7.1.0
>> 
>> 
>>> On Wed, Jul 15, 2020 at 4:47 PM Erick Erickson 
>>> wrote:
>>> 
>>> I’d look two places:
>>> 
>>> 1> try the admin/analysis page from the admin UI. In particular, look at
>>> what tokens actually get in the index.
>>> 
>>> 2> again, the admin UI will let you choose the field (alphatitle and
>>> title) and see what the actual indexed tokens are.
>>> 
>>> Both have the issue that I don’t know what tokenizer you are using. For
>>> sorting it better be something
>>> like KeywordTokenizer. Anything that breaks up the input into separate
>>> tokens will produce surprises.
>>> 
>>> And unless you have lowercaseFilter in front of your patternreplace,
>>> you’re removing uppercase characters.
>>> 
>>> Best,
>>> Erick
>>> 
 On Jul 15, 2020, at 3:06 PM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
 
 howdy,
 i have a field that sorts fine all other content, and i cant seem to
>>> debug
 why it wont sort for me on this one chunk of it.
 "sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}},
>>> "response
 ":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", {
 "title":"Finance,
 consolidation and rescheduling of debts", { "title":"Rights in former
 German Islands in Pacific", },
 
 its using a copyfield from "title" to "alphatitle" that replaces all
 punctuation
 pattern: ([^a-z])replace: allclass: solr.PatternReplaceFilterFactory
 
 and if i use just title it flips:
 
 "title":"Finance, consolidation and rescheduling of debts"}, {
>>> "title":"Rights
 in former German Islands in Pacific"}, { "title":"Money orders"}]
 
 and im banging my head trying to figure out what it is about this
 content in particular that is not sorting the way I would expect.
 don't suppose someone would be able to lead me to a good place to look?
>>> 
>>> 
> 


Re: sorting help

2020-07-15 Thread Erick Erickson
Yeah, it’s always a question “how much is enough/too much”.

That looks reasonable for alphatitle, but what about title? Your original
question was that the sorting changes depending on which field you 
sort on. If your title field uses something that tokenizes or doesn’t
include the same analysis chain (particularly the lowercasing
and patternreplace) then I’d expect the order to change.

Best,
Erick

> On Jul 15, 2020, at 4:49 PM, David Hastings  
> wrote:
> 
> thanks, ill check the admin, didnt want to send a big clock of text but:
> 
> 
>   -
>  -
> 
>  Tokenizer:
>  org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
>  solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
>  -
> 
>  Token Filters:
>  org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
>  solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
>  org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
>  solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
>  org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
>  ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
>  luceneMatchVersion: 7.1.0
>   -
> 
>   Query Analyzer:
>   
> 
>   org.apache.solr.analysis.TokenizerChain
>  -
> 
>  Tokenizer:
>  org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
>  solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
>  -
> 
>  Token Filters:
>  org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
>  solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
>  org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
>  solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
>  org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
>  ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
>  luceneMatchVersion: 7.1.0
> 
> 
> On Wed, Jul 15, 2020 at 4:47 PM Erick Erickson 
> wrote:
> 
>> I’d look two places:
>> 
>> 1> try the admin/analysis page from the admin UI. In particular, look at
>> what tokens actually get in the index.
>> 
>> 2> again, the admin UI will let you choose the field (alphatitle and
>> title) and see what the actual indexed tokens are.
>> 
>> Both have the issue that I don’t know what tokenizer you are using. For
>> sorting it better be something
>> like KeywordTokenizer. Anything that breaks up the input into separate
>> tokens will produce surprises.
>> 
>> And unless you have lowercaseFilter in front of your patternreplace,
>> you’re removing uppercase characters.
>> 
>> Best,
>> Erick
>> 
>>> On Jul 15, 2020, at 3:06 PM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>> 
>>> howdy,
>>> i have a field that sorts fine all other content, and i cant seem to
>> debug
>>> why it wont sort for me on this one chunk of it.
>>> "sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}},
>> "response
>>> ":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", {
>>> "title":"Finance,
>>> consolidation and rescheduling of debts", { "title":"Rights in former
>>> German Islands in Pacific", },
>>> 
>>> its using a copyfield from "title" to "alphatitle" that replaces all
>>> punctuation
>>> pattern: ([^a-z])replace: allclass: solr.PatternReplaceFilterFactory
>>> 
>>> and if i use just title it flips:
>>> 
>>> "title":"Finance, consolidation and rescheduling of debts"}, {
>> "title":"Rights
>>> in former German Islands in Pacific"}, { "title":"Money orders"}]
>>> 
>>> and im banging my head trying to figure out what it is about this
>>> content in particular that is not sorting the way I would expect.
>>> don't suppose someone would be able to lead me to a good place to look?
>> 
>> 



Re: sorting help

2020-07-15 Thread David Hastings
thanks, ill check the admin, didnt want to send a big clock of text but:


   -
  -

  Tokenizer:
  org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
  solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
  -

  Token Filters:
  org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
  solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
  org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
  solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
  org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
  ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
  luceneMatchVersion: 7.1.0
   -

   Query Analyzer:
   
   org.apache.solr.analysis.TokenizerChain
  -

  Tokenizer:
  org.apache.lucene.analysis.core.KeywordTokenizerFactoryclass:
  solr.KeywordTokenizerFactoryluceneMatchVersion: 7.1.0
  -

  Token Filters:
  org.apache.lucene.analysis.core.LowerCaseFilterFactoryclass:
  solr.LowerCaseFilterFactoryluceneMatchVersion: 7.1.0
  org.apache.lucene.analysis.miscellaneous.TrimFilterFactoryclass:
  solr.TrimFilterFactoryluceneMatchVersion: 7.1.0
  org.apache.lucene.analysis.pattern.PatternReplaceFilterFactorypattern:
  ([^a-z])replace: allclass: solr.PatternReplaceFilterFactoryreplacement
  luceneMatchVersion: 7.1.0


On Wed, Jul 15, 2020 at 4:47 PM Erick Erickson 
wrote:

> I’d look two places:
>
> 1> try the admin/analysis page from the admin UI. In particular, look at
> what tokens actually get in the index.
>
> 2> again, the admin UI will let you choose the field (alphatitle and
> title) and see what the actual indexed tokens are.
>
> Both have the issue that I don’t know what tokenizer you are using. For
> sorting it better be something
> like KeywordTokenizer. Anything that breaks up the input into separate
> tokens will produce surprises.
>
> And unless you have lowercaseFilter in front of your patternreplace,
> you’re removing uppercase characters.
>
> Best,
> Erick
>
> > On Jul 15, 2020, at 3:06 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > howdy,
> > i have a field that sorts fine all other content, and i cant seem to
> debug
> > why it wont sort for me on this one chunk of it.
> > "sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}},
> "response
> > ":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", {
> > "title":"Finance,
> > consolidation and rescheduling of debts", { "title":"Rights in former
> > German Islands in Pacific", },
> >
> > its using a copyfield from "title" to "alphatitle" that replaces all
> > punctuation
> > pattern: ([^a-z])replace: allclass: solr.PatternReplaceFilterFactory
> >
> > and if i use just title it flips:
> >
> > "title":"Finance, consolidation and rescheduling of debts"}, {
> "title":"Rights
> > in former German Islands in Pacific"}, { "title":"Money orders"}]
> >
> > and im banging my head trying to figure out what it is about this
> > content in particular that is not sorting the way I would expect.
> > don't suppose someone would be able to lead me to a good place to look?
>
>


Re: sorting help

2020-07-15 Thread Erick Erickson
I’d look two places:

1> try the admin/analysis page from the admin UI. In particular, look at what 
tokens actually get in the index.

2> again, the admin UI will let you choose the field (alphatitle and title) and 
see what the actual indexed tokens are.

Both have the issue that I don’t know what tokenizer you are using. For sorting 
it better be something
like KeywordTokenizer. Anything that breaks up the input into separate tokens 
will produce surprises.

And unless you have lowercaseFilter in front of your patternreplace, you’re 
removing uppercase characters.

Best,
Erick

> On Jul 15, 2020, at 3:06 PM, David Hastings  
> wrote:
> 
> howdy,
> i have a field that sorts fine all other content, and i cant seem to debug
> why it wont sort for me on this one chunk of it.
> "sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}}, "response
> ":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", {
> "title":"Finance,
> consolidation and rescheduling of debts", { "title":"Rights in former
> German Islands in Pacific", },
> 
> its using a copyfield from "title" to "alphatitle" that replaces all
> punctuation
> pattern: ([^a-z])replace: allclass: solr.PatternReplaceFilterFactory
> 
> and if i use just title it flips:
> 
> "title":"Finance, consolidation and rescheduling of debts"}, { "title":"Rights
> in former German Islands in Pacific"}, { "title":"Money orders"}]
> 
> and im banging my head trying to figure out what it is about this
> content in particular that is not sorting the way I would expect.
> don't suppose someone would be able to lead me to a good place to look?



sorting help

2020-07-15 Thread David Hastings
howdy,
i have a field that sorts fine all other content, and i cant seem to debug
why it wont sort for me on this one chunk of it.
"sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}}, "response
":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", {
"title":"Finance,
consolidation and rescheduling of debts", { "title":"Rights in former
German Islands in Pacific", },

its using a copyfield from "title" to "alphatitle" that replaces all
punctuation
pattern: ([^a-z])replace: allclass: solr.PatternReplaceFilterFactory

and if i use just title it flips:

"title":"Finance, consolidation and rescheduling of debts"}, { "title":"Rights
in former German Islands in Pacific"}, { "title":"Money orders"}]

and im banging my head trying to figure out what it is about this
content in particular that is not sorting the way I would expect.
don't suppose someone would be able to lead me to a good place to look?


Needs Help for Using Jaeger to Trace Solr

2020-06-01 Thread Yihao Huang
Hi,I am new to use Solr and Jaeger and currently I am working on how to use Jaeger to find trace of Solr. I get the problem that Jaeger seems unable to catch trace from Solr.I am using the Techproducts example data on Solr. I initiate the Solr service by running ./bin/solr start -e cloud. But the Jaeger UI (which I have tested to be effective using the demo application HotROD) does not show anything related to Solr. To solve the problem, I have attempted to:	1. Change the sampling rate by setting /admin/collections?action=""> but nothing changed on the Jaeger UI.	2. Set up tracer configurator in solr.xml under solr/example/cloud/node1/solr and solr/example/cloud/node2/solr as shown in the attached file, but it is reported that "ERROR: Did not see Solr at http://localhost:8983/solr come online within 30"I am not sure whether the operation I have made is correct. And I also notice that in https://lucene.apache.org/solr/guide/8_2/solr-tracing.html#jaeger-tracer-configurator, it is said that "Note that all library of jaegertracer-configurator must be included in the classpath of all nodes…”. So I also attempt to:		3. Run ./bin/solr start -e cloud -a “-classpath org.apache.solr.jaeger.JaegerTracerConfigurator” to include the classpath. But Solr reports "ERROR: Unbalanced quotes in "bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr” org.apache.solr.jaeger.JaegerTracerConfigurator" -a "-classpath”. I searched online and probably it is an unfixed bug of Solr (https://issues.apache.org/jira/browse/SOLR-8552). So this also doesn’t work.Again, as I am new to Solr and Jaeger, I am not sure whether these operations are stupid or not :-(. So I do hope that I can get some help from your team for making Jaeger and Solr work together. I would really appreciate your reply!Best regards,Yihao

solr.xml
Description: XML document


Re: Need help on handling large size of index.

2020-05-22 Thread Phill Campbell
Maybe your problems are in AWS land.


> On May 22, 2020, at 3:45 AM, Modassar Ather  wrote:
> 
> Thanks Erick and Phill.
> 
> We index data weekly once and that is why we do the optimisation and it has
> helped in faster query result. I will experiment with a fewer segments with
> the current hardware.
> The thing I am not  clear about is although there is no constant high usage
> of extra IOPs other than a couple of spike during optimisation why there is
> so much difference in optimisation time when there is extra IOPs vs no
> Extra IOPs.
> The optimisation on different datacenter machine which was of same
> configuration with SSD used to take 4-5 hours to optimise. This time to
> optimise is comparable to r5a.16xlarge with extra 3 IOPs time.
> 
> Best,
> Modassar
> 
> On Fri, May 22, 2020 at 12:56 AM Phill Campbell
>  wrote:
> 
>> The optimal size for a shard of the index is be definition what works best
>> on the hardware with the JVM heap that is in use.
>> More shards mean smaller sizes of the index for the shard as you already
>> know.
>> 
>> I spent months changing the sharing, the JVM heap, the GC values before
>> taking the system live.
>> RAM is important, and I run with enough to allow Solr to load the entire
>> index into RAM. From my understanding Solr uses the system to memory map
>> the index files. I might be wrong.
>> I experimented with less RAM and SSD drives and found that was another way
>> to get the performance I needed. Since RAM is cheaper, I choose that
>> approach.
>> 
>> Again we never optimize. When we have to recover we rebuild the index by
>> spinning up new machines and use a massive EMR (Map reduce job) to force
>> the data into the system. Takes about 3 hours. Solr can ingest data at an
>> amazing rate. Then we do a blue/green switch over.
>> 
>> Query time, from my experience with my environment, is improved with more
>> sharding and additional hardware. Not just more sharding on the same
>> hardware.
>> 
>> My fields are not stored either, except ID. There are some fields that are
>> indexed and have DocValues and those are used for sorting and facets. My
>> queries can have any number of wildcards as well, but my field’s data
>> lengths are maybe a maximum of 100 characters so proximity searching is not
>> too bad. I tokenize and index everything. I do not expand terms at query
>> time to get broader results, I index the alternatives and let the indexer
>> do what it does best.
>> 
>> If you are running in SolrCloud mode and you are using the embedded
>> zookeeper I would change that. Solr and ZK are very chatty with each other,
>> run ZK on machines in proximity to Solr.
>> 
>> Regards
>> 
>>> On May 21, 2020, at 2:46 AM, Modassar Ather 
>> wrote:
>>> 
>>> Thanks Phill for your response.
>>> 
>>> Optimal Index size: Depends on what you are optimizing for. Query Speed?
>>> Hardware utilization?
>>> We are optimising it for query speed. What I understand even if we set
>> the
>>> merge policy to any number the amount of hard disk will still be required
>>> for the bigger segment merges. Please correct me if I am wrong.
>>> 
>>> Optimizing the index is something I never do. We live with about 28%
>>> deletes. You should check your configuration for your merge policy.
>>> There is a delete of about 10-20% in our updates. We have no merge policy
>>> set in configuration as we do a full optimisation after the indexing.
>>> 
>>> Increased sharding has helped reduce query response time, but surely
>> there
>>> is a point where the colation of results starts to be the bottleneck.
>>> The query response time is my concern. I understand the aggregation of
>>> results may increase the search response time.
>>> 
>>> *What does your schema look like? I index around 120 fields per
>> document.*
>>> The schema has a combination of text and string fields. None of the field
>>> except Id field is stored. We also have around 120 fields. A few of them
>>> have docValues enabled.
>>> 
>>> *What does your queries look like? Mine are so varied that caching never
>>> helps, the same query rarely comes through.*
>>> Our search queries are combination of proximity, nested proximity and
>>> wildcards most of the time. The query can be very complex with 100s of
>>> wildcard and proximity terms in it. Different grouping option are also
>>> enabled on search result. And the search queries vary a lot.
>>&

Re: Need help on handling large size of index.

2020-05-22 Thread Modassar Ather
Thanks Erick and Phill.

We index data weekly once and that is why we do the optimisation and it has
helped in faster query result. I will experiment with a fewer segments with
the current hardware.
The thing I am not  clear about is although there is no constant high usage
of extra IOPs other than a couple of spike during optimisation why there is
so much difference in optimisation time when there is extra IOPs vs no
Extra IOPs.
The optimisation on different datacenter machine which was of same
configuration with SSD used to take 4-5 hours to optimise. This time to
optimise is comparable to r5a.16xlarge with extra 3 IOPs time.

Best,
Modassar

On Fri, May 22, 2020 at 12:56 AM Phill Campbell
 wrote:

> The optimal size for a shard of the index is be definition what works best
> on the hardware with the JVM heap that is in use.
> More shards mean smaller sizes of the index for the shard as you already
> know.
>
> I spent months changing the sharing, the JVM heap, the GC values before
> taking the system live.
> RAM is important, and I run with enough to allow Solr to load the entire
> index into RAM. From my understanding Solr uses the system to memory map
> the index files. I might be wrong.
> I experimented with less RAM and SSD drives and found that was another way
> to get the performance I needed. Since RAM is cheaper, I choose that
> approach.
>
> Again we never optimize. When we have to recover we rebuild the index by
> spinning up new machines and use a massive EMR (Map reduce job) to force
> the data into the system. Takes about 3 hours. Solr can ingest data at an
> amazing rate. Then we do a blue/green switch over.
>
> Query time, from my experience with my environment, is improved with more
> sharding and additional hardware. Not just more sharding on the same
> hardware.
>
> My fields are not stored either, except ID. There are some fields that are
> indexed and have DocValues and those are used for sorting and facets. My
> queries can have any number of wildcards as well, but my field’s data
> lengths are maybe a maximum of 100 characters so proximity searching is not
> too bad. I tokenize and index everything. I do not expand terms at query
> time to get broader results, I index the alternatives and let the indexer
> do what it does best.
>
> If you are running in SolrCloud mode and you are using the embedded
> zookeeper I would change that. Solr and ZK are very chatty with each other,
> run ZK on machines in proximity to Solr.
>
> Regards
>
> > On May 21, 2020, at 2:46 AM, Modassar Ather 
> wrote:
> >
> > Thanks Phill for your response.
> >
> > Optimal Index size: Depends on what you are optimizing for. Query Speed?
> > Hardware utilization?
> > We are optimising it for query speed. What I understand even if we set
> the
> > merge policy to any number the amount of hard disk will still be required
> > for the bigger segment merges. Please correct me if I am wrong.
> >
> > Optimizing the index is something I never do. We live with about 28%
> > deletes. You should check your configuration for your merge policy.
> > There is a delete of about 10-20% in our updates. We have no merge policy
> > set in configuration as we do a full optimisation after the indexing.
> >
> > Increased sharding has helped reduce query response time, but surely
> there
> > is a point where the colation of results starts to be the bottleneck.
> > The query response time is my concern. I understand the aggregation of
> > results may increase the search response time.
> >
> > *What does your schema look like? I index around 120 fields per
> document.*
> > The schema has a combination of text and string fields. None of the field
> > except Id field is stored. We also have around 120 fields. A few of them
> > have docValues enabled.
> >
> > *What does your queries look like? Mine are so varied that caching never
> > helps, the same query rarely comes through.*
> > Our search queries are combination of proximity, nested proximity and
> > wildcards most of the time. The query can be very complex with 100s of
> > wildcard and proximity terms in it. Different grouping option are also
> > enabled on search result. And the search queries vary a lot.
> >
> > Oh, another thing, are you concerned about  availability? Do you have a
> > replication factor > 1? Do you run those replicas in a different region
> for
> > safety?
> > How many zookeepers are you running and where are they?
> > As of now we do not have any replication factor. We are not using
> zookeeper
> > ensemble but would like to move to it sooner.
> >
> > Best,
> > Modassar
> >

Re: Need help on handling large size of index.

2020-05-21 Thread Phill Campbell
The optimal size for a shard of the index is be definition what works best on 
the hardware with the JVM heap that is in use.
More shards mean smaller sizes of the index for the shard as you already know. 

I spent months changing the sharing, the JVM heap, the GC values before taking 
the system live.
RAM is important, and I run with enough to allow Solr to load the entire index 
into RAM. From my understanding Solr uses the system to memory map the index 
files. I might be wrong.
I experimented with less RAM and SSD drives and found that was another way to 
get the performance I needed. Since RAM is cheaper, I choose that approach.

Again we never optimize. When we have to recover we rebuild the index by 
spinning up new machines and use a massive EMR (Map reduce job) to force the 
data into the system. Takes about 3 hours. Solr can ingest data at an amazing 
rate. Then we do a blue/green switch over.

Query time, from my experience with my environment, is improved with more 
sharding and additional hardware. Not just more sharding on the same hardware.

My fields are not stored either, except ID. There are some fields that are 
indexed and have DocValues and those are used for sorting and facets. My 
queries can have any number of wildcards as well, but my field’s data lengths 
are maybe a maximum of 100 characters so proximity searching is not too bad. I 
tokenize and index everything. I do not expand terms at query time to get 
broader results, I index the alternatives and let the indexer do what it does 
best.

If you are running in SolrCloud mode and you are using the embedded zookeeper I 
would change that. Solr and ZK are very chatty with each other, run ZK on 
machines in proximity to Solr.

Regards

> On May 21, 2020, at 2:46 AM, Modassar Ather  wrote:
> 
> Thanks Phill for your response.
> 
> Optimal Index size: Depends on what you are optimizing for. Query Speed?
> Hardware utilization?
> We are optimising it for query speed. What I understand even if we set the
> merge policy to any number the amount of hard disk will still be required
> for the bigger segment merges. Please correct me if I am wrong.
> 
> Optimizing the index is something I never do. We live with about 28%
> deletes. You should check your configuration for your merge policy.
> There is a delete of about 10-20% in our updates. We have no merge policy
> set in configuration as we do a full optimisation after the indexing.
> 
> Increased sharding has helped reduce query response time, but surely there
> is a point where the colation of results starts to be the bottleneck.
> The query response time is my concern. I understand the aggregation of
> results may increase the search response time.
> 
> *What does your schema look like? I index around 120 fields per document.*
> The schema has a combination of text and string fields. None of the field
> except Id field is stored. We also have around 120 fields. A few of them
> have docValues enabled.
> 
> *What does your queries look like? Mine are so varied that caching never
> helps, the same query rarely comes through.*
> Our search queries are combination of proximity, nested proximity and
> wildcards most of the time. The query can be very complex with 100s of
> wildcard and proximity terms in it. Different grouping option are also
> enabled on search result. And the search queries vary a lot.
> 
> Oh, another thing, are you concerned about  availability? Do you have a
> replication factor > 1? Do you run those replicas in a different region for
> safety?
> How many zookeepers are you running and where are they?
> As of now we do not have any replication factor. We are not using zookeeper
> ensemble but would like to move to it sooner.
> 
> Best,
> Modassar
> 
> On Thu, May 21, 2020 at 9:19 AM Shawn Heisey  wrote:
> 
>> On 5/20/2020 11:43 AM, Modassar Ather wrote:
>>> Can you please help me with following few questions?
>>> 
>>>- What is the ideal index size per shard?
>> 
>> We have no way of knowing that.  A size that works well for one index
>> use case may not work well for another, even if the index size in both
>> cases is identical.  Determining the ideal shard size requires
>> experimentation.
>> 
>> 
>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>> 
>>>- The optimisation takes lot of time and IOPs to complete. Will
>>>increasing the number of shards help in reducing the optimisation
>> time and
>>>IOPs?
>> 
>> No, changing the number of shards will not help with the time required
>> to optimize, and might make it slower.  Increasing the speed of the
>> disks won't help either.  Optimizing involves a lot more th

Re: Need help on handling large size of index.

2020-05-21 Thread Erick Erickson
Please consider _not_ optimizing. It’s kind of a misleading name anyway, and the
version of solr you’re using may have unintended consequences, see:

https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
and
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

There are situations where optimizing makes sense, but far too often people 
think
it’s A Good Thing (based almost entirely on the name, who _wouldn’t_ want an
optimized index?) without measuring, leading to tons of work to no real benefit.

Best,
Erick

> On May 21, 2020, at 4:58 AM, Modassar Ather  wrote:
> 
> Thanks Shawn for your response.
> 
> We have seen a performance increase in optimisation with a bigger number of
> IOPs. Without the IOPs we saw the optimisation took around 15-20 hours
> whereas the same index took 5-6 hours to optimise with higher IOPs.
> Yes the entire extra IOPs were never used to full other than a couple of
> spike in its usage. So not able to understand how the increased IOPs makes
> so much of difference.
> Can you please help me understand what it involves to optimise? Is it the
> more RAM/IOPs?
> 
> Search response time is very important. Please advise if we increase the
> shard with extra servers how much effect it may have on search response
> time.
> 
> Best,
> Modassar
> 
> On Thu, May 21, 2020 at 2:16 PM Modassar Ather 
> wrote:
> 
>> Thanks Phill for your response.
>> 
>> Optimal Index size: Depends on what you are optimizing for. Query Speed?
>> Hardware utilization?
>> We are optimising it for query speed. What I understand even if we set the
>> merge policy to any number the amount of hard disk will still be required
>> for the bigger segment merges. Please correct me if I am wrong.
>> 
>> Optimizing the index is something I never do. We live with about 28%
>> deletes. You should check your configuration for your merge policy.
>> There is a delete of about 10-20% in our updates. We have no merge policy
>> set in configuration as we do a full optimisation after the indexing.
>> 
>> Increased sharding has helped reduce query response time, but surely there
>> is a point where the colation of results starts to be the bottleneck.
>> The query response time is my concern. I understand the aggregation of
>> results may increase the search response time.
>> 
>> *What does your schema look like? I index around 120 fields per document.*
>> The schema has a combination of text and string fields. None of the field
>> except Id field is stored. We also have around 120 fields. A few of them
>> have docValues enabled.
>> 
>> *What does your queries look like? Mine are so varied that caching never
>> helps, the same query rarely comes through.*
>> Our search queries are combination of proximity, nested proximity and
>> wildcards most of the time. The query can be very complex with 100s of
>> wildcard and proximity terms in it. Different grouping option are also
>> enabled on search result. And the search queries vary a lot.
>> 
>> Oh, another thing, are you concerned about  availability? Do you have a
>> replication factor > 1? Do you run those replicas in a different region for
>> safety?
>> How many zookeepers are you running and where are they?
>> As of now we do not have any replication factor. We are not using
>> zookeeper ensemble but would like to move to it sooner.
>> 
>> Best,
>> Modassar
>> 
>> On Thu, May 21, 2020 at 9:19 AM Shawn Heisey  wrote:
>> 
>>> On 5/20/2020 11:43 AM, Modassar Ather wrote:
>>>> Can you please help me with following few questions?
>>>> 
>>>>- What is the ideal index size per shard?
>>> 
>>> We have no way of knowing that.  A size that works well for one index
>>> use case may not work well for another, even if the index size in both
>>> cases is identical.  Determining the ideal shard size requires
>>> experimentation.
>>> 
>>> 
>>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>> 
>>>>- The optimisation takes lot of time and IOPs to complete. Will
>>>>increasing the number of shards help in reducing the optimisation
>>> time and
>>>>IOPs?
>>> 
>>> No, changing the number of shards will not help with the time required
>>> to optimize, and might make it slower.  Increasing the speed of the
>>> disks won't help either.  Optimizing involves a lot more than just
>>> copying data -- it will never use all the available disk bandwidth of
>>> m

Re: Need help on handling large size of index.

2020-05-21 Thread Modassar Ather
Thanks Shawn for your response.

We have seen a performance increase in optimisation with a bigger number of
IOPs. Without the IOPs we saw the optimisation took around 15-20 hours
whereas the same index took 5-6 hours to optimise with higher IOPs.
Yes the entire extra IOPs were never used to full other than a couple of
spike in its usage. So not able to understand how the increased IOPs makes
so much of difference.
Can you please help me understand what it involves to optimise? Is it the
more RAM/IOPs?

Search response time is very important. Please advise if we increase the
shard with extra servers how much effect it may have on search response
time.

Best,
Modassar

On Thu, May 21, 2020 at 2:16 PM Modassar Ather 
wrote:

> Thanks Phill for your response.
>
> Optimal Index size: Depends on what you are optimizing for. Query Speed?
> Hardware utilization?
> We are optimising it for query speed. What I understand even if we set the
> merge policy to any number the amount of hard disk will still be required
> for the bigger segment merges. Please correct me if I am wrong.
>
> Optimizing the index is something I never do. We live with about 28%
> deletes. You should check your configuration for your merge policy.
> There is a delete of about 10-20% in our updates. We have no merge policy
> set in configuration as we do a full optimisation after the indexing.
>
> Increased sharding has helped reduce query response time, but surely there
> is a point where the colation of results starts to be the bottleneck.
> The query response time is my concern. I understand the aggregation of
> results may increase the search response time.
>
> *What does your schema look like? I index around 120 fields per document.*
> The schema has a combination of text and string fields. None of the field
> except Id field is stored. We also have around 120 fields. A few of them
> have docValues enabled.
>
> *What does your queries look like? Mine are so varied that caching never
> helps, the same query rarely comes through.*
> Our search queries are combination of proximity, nested proximity and
> wildcards most of the time. The query can be very complex with 100s of
> wildcard and proximity terms in it. Different grouping option are also
> enabled on search result. And the search queries vary a lot.
>
> Oh, another thing, are you concerned about  availability? Do you have a
> replication factor > 1? Do you run those replicas in a different region for
> safety?
> How many zookeepers are you running and where are they?
> As of now we do not have any replication factor. We are not using
> zookeeper ensemble but would like to move to it sooner.
>
> Best,
> Modassar
>
> On Thu, May 21, 2020 at 9:19 AM Shawn Heisey  wrote:
>
>> On 5/20/2020 11:43 AM, Modassar Ather wrote:
>> > Can you please help me with following few questions?
>> >
>> > - What is the ideal index size per shard?
>>
>> We have no way of knowing that.  A size that works well for one index
>> use case may not work well for another, even if the index size in both
>> cases is identical.  Determining the ideal shard size requires
>> experimentation.
>>
>>
>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> > - The optimisation takes lot of time and IOPs to complete. Will
>> > increasing the number of shards help in reducing the optimisation
>> time and
>> > IOPs?
>>
>> No, changing the number of shards will not help with the time required
>> to optimize, and might make it slower.  Increasing the speed of the
>> disks won't help either.  Optimizing involves a lot more than just
>> copying data -- it will never use all the available disk bandwidth of
>> modern disks.  SolrCloud does optimizes of the shard replicas making up
>> a full collection sequentially, not simultaneously.
>>
>> > - We are planning to reduce each shard index size to 30GB and the
>> entire
>> > 3.5 TB index will be distributed across more shards. In this case
>> to almost
>> > 70+ shards. Will this help?
>>
>> Maybe.  Maybe not.  You'll have to try it.  If you increase the number
>> of shards without adding additional servers, I would expect things to
>> get worse, not better.
>>
>> > Kindly share your thoughts on how best we can use Solr with such a large
>> > index size.
>>
>> Something to keep in mind -- memory is the resource that makes the most
>> difference in performance.  Buying enough memory to get decent
>> performance out of an index that big would probably be very expensive.
>> You should probably explore ways to make your index smaller.  Another
>> idea is to split things up so the most frequently accessed search data
>> is in a relatively small index and lives on beefy servers, and data used
>> for less frequent or data-mining queries (where performance doesn't
>> matter as much) can live on less expensive servers.
>>
>> Thanks,
>> Shawn
>>
>


Re: Need help on handling large size of index.

2020-05-21 Thread Modassar Ather
Thanks Phill for your response.

Optimal Index size: Depends on what you are optimizing for. Query Speed?
Hardware utilization?
We are optimising it for query speed. What I understand even if we set the
merge policy to any number the amount of hard disk will still be required
for the bigger segment merges. Please correct me if I am wrong.

Optimizing the index is something I never do. We live with about 28%
deletes. You should check your configuration for your merge policy.
There is a delete of about 10-20% in our updates. We have no merge policy
set in configuration as we do a full optimisation after the indexing.

Increased sharding has helped reduce query response time, but surely there
is a point where the colation of results starts to be the bottleneck.
The query response time is my concern. I understand the aggregation of
results may increase the search response time.

*What does your schema look like? I index around 120 fields per document.*
The schema has a combination of text and string fields. None of the field
except Id field is stored. We also have around 120 fields. A few of them
have docValues enabled.

*What does your queries look like? Mine are so varied that caching never
helps, the same query rarely comes through.*
Our search queries are combination of proximity, nested proximity and
wildcards most of the time. The query can be very complex with 100s of
wildcard and proximity terms in it. Different grouping option are also
enabled on search result. And the search queries vary a lot.

Oh, another thing, are you concerned about  availability? Do you have a
replication factor > 1? Do you run those replicas in a different region for
safety?
How many zookeepers are you running and where are they?
As of now we do not have any replication factor. We are not using zookeeper
ensemble but would like to move to it sooner.

Best,
Modassar

On Thu, May 21, 2020 at 9:19 AM Shawn Heisey  wrote:

> On 5/20/2020 11:43 AM, Modassar Ather wrote:
> > Can you please help me with following few questions?
> >
> > - What is the ideal index size per shard?
>
> We have no way of knowing that.  A size that works well for one index
> use case may not work well for another, even if the index size in both
> cases is identical.  Determining the ideal shard size requires
> experimentation.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> > - The optimisation takes lot of time and IOPs to complete. Will
> > increasing the number of shards help in reducing the optimisation
> time and
> > IOPs?
>
> No, changing the number of shards will not help with the time required
> to optimize, and might make it slower.  Increasing the speed of the
> disks won't help either.  Optimizing involves a lot more than just
> copying data -- it will never use all the available disk bandwidth of
> modern disks.  SolrCloud does optimizes of the shard replicas making up
> a full collection sequentially, not simultaneously.
>
> > - We are planning to reduce each shard index size to 30GB and the
> entire
> >     3.5 TB index will be distributed across more shards. In this case to
> almost
> > 70+ shards. Will this help?
>
> Maybe.  Maybe not.  You'll have to try it.  If you increase the number
> of shards without adding additional servers, I would expect things to
> get worse, not better.
>
> > Kindly share your thoughts on how best we can use Solr with such a large
> > index size.
>
> Something to keep in mind -- memory is the resource that makes the most
> difference in performance.  Buying enough memory to get decent
> performance out of an index that big would probably be very expensive.
> You should probably explore ways to make your index smaller.  Another
> idea is to split things up so the most frequently accessed search data
> is in a relatively small index and lives on beefy servers, and data used
> for less frequent or data-mining queries (where performance doesn't
> matter as much) can live on less expensive servers.
>
> Thanks,
> Shawn
>


Re: Need help on handling large size of index.

2020-05-20 Thread Shawn Heisey

On 5/20/2020 11:43 AM, Modassar Ather wrote:

Can you please help me with following few questions?

- What is the ideal index size per shard?


We have no way of knowing that.  A size that works well for one index 
use case may not work well for another, even if the index size in both 
cases is identical.  Determining the ideal shard size requires 
experimentation.


https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/


- The optimisation takes lot of time and IOPs to complete. Will
increasing the number of shards help in reducing the optimisation time and
IOPs?


No, changing the number of shards will not help with the time required 
to optimize, and might make it slower.  Increasing the speed of the 
disks won't help either.  Optimizing involves a lot more than just 
copying data -- it will never use all the available disk bandwidth of 
modern disks.  SolrCloud does optimizes of the shard replicas making up 
a full collection sequentially, not simultaneously.



- We are planning to reduce each shard index size to 30GB and the entire
3.5 TB index will be distributed across more shards. In this case to almost
70+ shards. Will this help?


Maybe.  Maybe not.  You'll have to try it.  If you increase the number 
of shards without adding additional servers, I would expect things to 
get worse, not better.



Kindly share your thoughts on how best we can use Solr with such a large
index size.


Something to keep in mind -- memory is the resource that makes the most 
difference in performance.  Buying enough memory to get decent 
performance out of an index that big would probably be very expensive. 
You should probably explore ways to make your index smaller.  Another 
idea is to split things up so the most frequently accessed search data 
is in a relatively small index and lives on beefy servers, and data used 
for less frequent or data-mining queries (where performance doesn't 
matter as much) can live on less expensive servers.


Thanks,
Shawn


Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common.

Optimal Index size: Depends on what you are optimizing for. Query Speed? 
Hardware utilization? 
Optimizing the index is something I never do. We live with about 28% deletes. 
You should check your configuration for your merge policy.
I run 120 shards, and I am currently redesigning for 256 shards.
Increased sharding has helped reduce query response time, but surely there is a 
point where the colation of results starts to be the bottleneck.
I run the 120 shards on 90 r4.4xlarge instances with a replication factor of 3.

The things missing are:
What does your schema look like? I index around 120 fields per document.
What does your queries look like? Mine are so varied that caching never helps, 
the same query rarely comes through.
My system takes continuous updates, yours does not.

It is really up to you to experiment.

If you follow the development pattern of Design By Use (DBU) the first thing 
you do for solr and even for SQL is to come up with your queries first. Then 
design the schema. Then figure out how to distribute it for performance.

Oh, another thing, are you concerned about  availability? Do you have a 
replication factor > 1? Do you run those replicas in a different region for 
safety?
How many zookeepers are you running and where are they?

Lots of questions.

Regards

> On May 20, 2020, at 11:43 AM, Modassar Ather  wrote:
> 
> Hi,
> 
> Currently we have index of size 3.5 TB. These index are distributed across
> 12 shards under two cores. The size of index on each shards are almost
> equal.
> We do a delta indexing every week and optimise the index.
> 
> The server configuration is as follows.
> 
>   - Solr Version  : 6.5.1
>   - AWS instance type : r5a.16xlarge
>   - CPU(s)  : 64
>   - RAM  : 512GB
>   - EBS size  : 7 TB (For indexing as well as index optimisation.)
>   - IOPs  : 3 (For faster index optimisation)
> 
> 
> Can you please help me with following few questions?
> 
>   - What is the ideal index size per shard?
>   - The optimisation takes lot of time and IOPs to complete. Will
>   increasing the number of shards help in reducing the optimisation time and
>   IOPs?
>   - We are planning to reduce each shard index size to 30GB and the entire
>   3.5 TB index will be distributed across more shards. In this case to almost
>   70+ shards. Will this help?
>   - Will adding so many new shards increase the search response time and
>   possibly how much?
>   - If we have to increase the shards should we do it on a single larger
>   server or should do it on multiple small servers?
> 
> 
> Kindly share your thoughts on how best we can use Solr with such a large
> index size.
> 
> Best,
> Modassar



Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common.

Optimal Index size: Depends on what you are optimizing for. Query Speed? 
Hardware utilization? 
Optimizing the index is something I never do. We live with about 28% deletes. 
You should check your configuration for your merge policy.
I run 120 shards, and I am currently redesigning for 256 shards.
Increased sharding has helped reduce query response time, but surely there is a 
point where the colation of results starts to be the bottleneck.
I run the 120 shards on 90 r4.4xlarge instances with a replication factor of 3.

The things missing are:
What does your schema look like? I index around 120 fields per document.
What does your queries look like? Mine are so varied that caching never helps, 
the same query rarely comes through.
My system takes continuous updates, yours does not.

It is really up to you to experiment.

If you follow the development pattern of Design By Use (DBU) the first thing 
you do for solr and even for SQL is to come up with your queries first. Then 
design the schema. Then figure out how to distribute it for performance.

Oh, another thing, are you concerned about  availability? Do you have a 
replication factor > 1? Do you run those replicas in a different region for 
safety?
How many zookeepers are you running and where are they?

Lots of questions.

Regards

> On May 20, 2020, at 11:43 AM, Modassar Ather  wrote:
> 
> Hi,
> 
> Currently we have index of size 3.5 TB. These index are distributed across
> 12 shards under two cores. The size of index on each shards are almost
> equal.
> We do a delta indexing every week and optimise the index.
> 
> The server configuration is as follows.
> 
>  - Solr Version  : 6.5.1
>  - AWS instance type : r5a.16xlarge
>  - CPU(s)  : 64
>  - RAM  : 512GB
>  - EBS size  : 7 TB (For indexing as well as index optimisation.)
>  - IOPs  : 3 (For faster index optimisation)
> 
> 
> Can you please help me with following few questions?
> 
>  - What is the ideal index size per shard?
>  - The optimisation takes lot of time and IOPs to complete. Will
>  increasing the number of shards help in reducing the optimisation time and
>  IOPs?
>  - We are planning to reduce each shard index size to 30GB and the entire
>  3.5 TB index will be distributed across more shards. In this case to almost
>  70+ shards. Will this help?
>  - Will adding so many new shards increase the search response time and
>  possibly how much?
>  - If we have to increase the shards should we do it on a single larger
>  server or should do it on multiple small servers?
> 
> 
> Kindly share your thoughts on how best we can use Solr with such a large
> index size.
> 
> Best,
> Modassar



Need help on handling large size of index.

2020-05-20 Thread Modassar Ather
Hi,

Currently we have index of size 3.5 TB. These index are distributed across
12 shards under two cores. The size of index on each shards are almost
equal.
We do a delta indexing every week and optimise the index.

The server configuration is as follows.

   - Solr Version  : 6.5.1
   - AWS instance type : r5a.16xlarge
   - CPU(s)  : 64
   - RAM  : 512GB
   - EBS size  : 7 TB (For indexing as well as index optimisation.)
   - IOPs  : 3 (For faster index optimisation)


Can you please help me with following few questions?

   - What is the ideal index size per shard?
   - The optimisation takes lot of time and IOPs to complete. Will
   increasing the number of shards help in reducing the optimisation time and
   IOPs?
   - We are planning to reduce each shard index size to 30GB and the entire
   3.5 TB index will be distributed across more shards. In this case to almost
   70+ shards. Will this help?
   - Will adding so many new shards increase the search response time and
   possibly how much?
   - If we have to increase the shards should we do it on a single larger
   server or should do it on multiple small servers?


Kindly share your thoughts on how best we can use Solr with such a large
index size.

Best,
Modassar


requesting help on auto scaling of solr

2020-04-17 Thread saicharan.k...@spglobal.com
Hello,

I am trying to add auto scaling for a cluster that is hosted on aws

I am using cluster policy as

{
"set-cluster-policy": [
{
"replica": "<2",
"shard": "#EACH",
"node": "#ANY"
}
]
}
Node-added-trigger

{
  "set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA"
  }
}

Node-lost-trigger as
{
  "set-trigger": {
"name": "node_lost_trigger",
"event": "nodeLost",
"waitFor": "120s",
"preferredOperation": "DELETENODE"
  }
}

When a node is terminated and new node gets added the replicas of lost node are 
not getting added to new node
Rather a replica of each shard is added on new node (10.24.76.89 is the new 
node )

[cid:image001.png@01D614CA.F11AE7F0]



What we expect is to add the replica of lost node on new node using autoscaling

Thank you,
charan




The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.


Re: requesting help to solve an issue

2020-04-09 Thread Sandeep Dharembra
The property feature didn't work for me. I wanted to spawn Tlog types on
certain nodes and pull on others. I started nodes with a type key with
values TLOG and PULL but that didn't work

I didn't probe further but followed a workaround with starting solr on a
certain port where I wanted tlogs and another port where I wanted pull
types. Then set the policies accordingly and it is working for me.

You can try something similar if you like or if someone else replies, I can
also benefit from it.

Thanks,
Sandeep

On Thu, Apr 9, 2020, 10:36 AM saicharan.k...@spglobal.com <
saicharan.k...@spglobal.com> wrote:

> Hi,
> That’s not happening the replicas are created even on property
> value-typeahead
> Thanks.
>
>
> -Original Message-
> From: Sandeep Dharembra 
> Sent: Wednesday, April 08, 2020 8:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: requesting help to solve an issue
>
> Hi,
>
> You have used ! - not operator, so, the replicas would be created on all
> nodes not having that property value (typeahead)
>
> Thanks
>
>
> On Wed, Apr 8, 2020, 4:21 PM saicharan.k...@spglobal.com <
> saicharan.k...@spglobal.com> wrote:
>
> > Hi there,
> >
> > We are trying to apply the following collection specific policy in
> > solr {
> > "set-policy": {
> > "generalpolicy": [
> > {
> > "replica": "<2",
> > "shard": "#EACH",
> > "sysprop.key":"!typeahead",
> > "strict": "true"
> > }
> > ]
> > }
> > }
> >
> > And when we are trying to create a collection using  the following
> > CREATE api
> >
> >
> > http://search-solr2-av.midevcld.spglobal.com:8983/solr/admin/collectio
> > ns?action=CREATE=generalcollection=1=
> > 2=generalpolicy
> >
> >
> > The replica's are spreading across other keys rather than being
> > confined to the key mentioned in the policy
> >
> > We have created the sysprop key during the node start-up by including
> > "-Dkey=typeahead" in user data script
> >
> > 
> >
> > The information contained in this message is intended only for the
> > recipient, and may be a confidential attorney-client communication or
> > may otherwise be privileged and confidential and protected from
> > disclosure. If the reader of this message is not the intended
> > recipient, or an employee or agent responsible for delivering this
> > message to the intended recipient, please be aware that any
> > dissemination or copying of this communication is strictly prohibited.
> > If you have received this communication in error, please immediately
> > notify us by replying to the message and deleting it from your
> > computer. S Global Inc. reserves the right, subject to applicable
> > local law, to monitor, review and process the content of any
> > electronic message or information sent to or from S Global Inc.
> > e-mail addresses without informing the sender or recipient of the
> > message. By sending electronic message or information to S Global
> > Inc. e-mail addresses you, as the sender, are consenting to S Global
> Inc. processing any of your personal data therein.
> >
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify us by replying to the message and deleting it
> from your computer. S Global Inc. reserves the right, subject to
> applicable local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc. e-mail
> addresses without informing the sender or recipient of the message. By
> sending electronic message or information to S Global Inc. e-mail
> addresses you, as the sender, are consenting to S Global Inc. processing
> any of your personal data therein.
>


RE: requesting help to solve an issue

2020-04-08 Thread saicharan.k...@spglobal.com
Hi,
That’s not happening the replicas are created even on property value-typeahead
Thanks.


-Original Message-
From: Sandeep Dharembra 
Sent: Wednesday, April 08, 2020 8:23 PM
To: solr-user@lucene.apache.org
Subject: Re: requesting help to solve an issue

Hi,

You have used ! - not operator, so, the replicas would be created on all nodes 
not having that property value (typeahead)

Thanks


On Wed, Apr 8, 2020, 4:21 PM saicharan.k...@spglobal.com < 
saicharan.k...@spglobal.com> wrote:

> Hi there,
>
> We are trying to apply the following collection specific policy in
> solr {
> "set-policy": {
> "generalpolicy": [
> {
> "replica": "<2",
> "shard": "#EACH",
> "sysprop.key":"!typeahead",
> "strict": "true"
> }
> ]
> }
> }
>
> And when we are trying to create a collection using  the following
> CREATE api
>
>
> http://search-solr2-av.midevcld.spglobal.com:8983/solr/admin/collectio
> ns?action=CREATE=generalcollection=1=
> 2=generalpolicy
>
>
> The replica's are spreading across other keys rather than being
> confined to the key mentioned in the policy
>
> We have created the sysprop key during the node start-up by including
> "-Dkey=typeahead" in user data script
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or
> may otherwise be privileged and confidential and protected from
> disclosure. If the reader of this message is not the intended
> recipient, or an employee or agent responsible for delivering this
> message to the intended recipient, please be aware that any
> dissemination or copying of this communication is strictly prohibited.
> If you have received this communication in error, please immediately
> notify us by replying to the message and deleting it from your
> computer. S Global Inc. reserves the right, subject to applicable
> local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc.
> e-mail addresses without informing the sender or recipient of the
> message. By sending electronic message or information to S Global
> Inc. e-mail addresses you, as the sender, are consenting to S Global Inc. 
> processing any of your personal data therein.
>



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.


Re: requesting help to solve an issue

2020-04-08 Thread Sandeep Dharembra
Hi,

You have used ! - not operator, so, the replicas would be created on all
nodes not having that property value (typeahead)

Thanks


On Wed, Apr 8, 2020, 4:21 PM saicharan.k...@spglobal.com <
saicharan.k...@spglobal.com> wrote:

> Hi there,
>
> We are trying to apply the following collection specific policy in solr
> {
> "set-policy": {
> "generalpolicy": [
> {
> "replica": "<2",
> "shard": "#EACH",
> "sysprop.key":"!typeahead",
> "strict": "true"
> }
> ]
> }
> }
>
> And when we are trying to create a collection using  the following  CREATE
> api
>
>
> http://search-solr2-av.midevcld.spglobal.com:8983/solr/admin/collections?action=CREATE=generalcollection=1=2=generalpolicy
>
>
> The replica's are spreading across other keys rather than being confined
> to the key mentioned in the policy
>
> We have created the sysprop key during the node start-up by including
> "-Dkey=typeahead" in user data script
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify us by replying to the message and deleting it
> from your computer. S Global Inc. reserves the right, subject to
> applicable local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc. e-mail
> addresses without informing the sender or recipient of the message. By
> sending electronic message or information to S Global Inc. e-mail
> addresses you, as the sender, are consenting to S Global Inc. processing
> any of your personal data therein.
>


requesting help to solve an issue

2020-04-08 Thread saicharan.k...@spglobal.com
Hi there,

We are trying to apply the following collection specific policy in solr
{
"set-policy": {
"generalpolicy": [
{
"replica": "<2",
"shard": "#EACH",
"sysprop.key":"!typeahead",
"strict": "true"
}
]
}
}

And when we are trying to create a collection using  the following  CREATE api

http://search-solr2-av.midevcld.spglobal.com:8983/solr/admin/collections?action=CREATE=generalcollection=1=2=generalpolicy


The replica's are spreading across other keys rather than being confined to the 
key mentioned in the policy

We have created the sysprop key during the node start-up by including  
"-Dkey=typeahead" in user data script



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.


Re: Need Help in Apache SOLR scores logic

2020-02-25 Thread Jon Kjær Amundsen
Relevance scoring has indeed changed since Solr 6 from the tf/idf vector
model to Okapi BM25.
You will need to set the similarity to ClassicSimilarityFactory in the
schema.

Consult the reference guide[1] how to do it.

[1]
https://lucene.apache.org/solr/guide/8_4/other-schema-elements.html#similarity

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: www.udbudsvagten.dk
Parken - Tårn D - 8. Sal
Øster Allé 48 | DK - 2100 København

<http://dk.linkedin.com/in/JonKjaerAmundsen/>

Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin
<http://www.linkedin.com/groups?groupDashboard==1862353> *


Den tir. 25. feb. 2020 kl. 18.24 skrev Karthik Reddy :

> Hello Team,
>
> How are you? This is Karthik Reddy and I am working as a Software
> Developer. I have one question regarding SOLR scores. One of the projects,
> which I am working on we are using Lucene Apache SOLR.
> We were using SOLR 5.4.1 initially and then migrated to SOLR 8.4.1. After
> migration, I do see the score which is returned by SOLR is got changed in
> 8.2.0. I would like to use the same score logic as SOLR 5.4.1. Could you
> please help what configuration should I change in SOLR 8.4.1 to get the
> same scores as version 5.4.1. Thanks in advance.
>
>
>
> Regards
> Karthik Reddy
>


Need Help in Apache SOLR scores logic

2020-02-25 Thread Karthik Reddy
Hello Team,

How are you? This is Karthik Reddy and I am working as a Software
Developer. I have one question regarding SOLR scores. One of the projects,
which I am working on we are using Lucene Apache SOLR.
We were using SOLR 5.4.1 initially and then migrated to SOLR 8.4.1. After
migration, I do see the score which is returned by SOLR is got changed in
8.2.0. I would like to use the same score logic as SOLR 5.4.1. Could you
please help what configuration should I change in SOLR 8.4.1 to get the
same scores as version 5.4.1. Thanks in advance.



Regards
Karthik Reddy


Re: Need help in configuring Spell check in Apache Solr 8.4

2020-02-05 Thread kumar gaurav
HI Seetesh

For IndexBasedSpellchecker default distanceMeasure is LevensteinDistance
itself . Thats why it is commented in the Reference Guide


regards
Kumar Gaurav

On Tue, Jan 28, 2020 at 1:01 PM seeteshh  wrote:

> Hello Kumar Gaurav
>
> For IndexBasedSpellchecker is there a better option of using
> org.apache.lucene.search.spell.LevensteinDistance as this is not valid in
> Solr 8.4
>
> This line seems to be commented in the Reference Guide
>
> Regards,
>
> Seetesh Hindlekar
>
>
>
> -
> Seetesh Hindlekar
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Need help in configuring Spell check in Apache Solr 8.4

2020-01-27 Thread seeteshh
My searchComponent is as follows

  
text_general

  default
  name
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  1
  5
  4
  0.01

   
 

wordbreak
solr.WordBreakSolrSpellChecker
name
true
true
10



  

and my requesthandler is



  
  name   
  default
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  


Regards,

Seetesh Hindlekar



-
Seetesh Hindlekar
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need help in configuring Spell check in Apache Solr 8.4

2020-01-27 Thread seeteshh
Hello Kumar Gaurav

For IndexBasedSpellchecker is there a better option of using
org.apache.lucene.search.spell.LevensteinDistance as this is not valid in
Solr 8.4

This line seems to be commented in the Reference Guide

Regards,

Seetesh Hindlekar



-
Seetesh Hindlekar
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 8.0 Json Facets are slow - need help

2020-01-23 Thread Mikhail Khludnev
Hello, Kumar.

I don't know. 3 / 84 ratio seems reasonable. The only unknown part of the
equation was that {!simpleFilter}. Anyway, profiler/sampler might get exact
answer.

On Fri, Jan 24, 2020 at 8:55 AM kumar gaurav  wrote:

> HI Mikhail
>
> Can you please see above debug log and help ?
>
> Thanks
>
>
> On Thu, Jan 23, 2020 at 12:05 AM kumar gaurav  wrote:
>
> > Also
> >
> > its not looks like box is slow . because for following query prepare time
> > is 3 ms but facet time is 84ms on the same box .Don't know why prepare
> time
> > was huge for that example :( .
> >
> > debug:
> > {
> >
> >- rawquerystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- querystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- parsedquery:
> >"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
> #color_refine:Blue #size_refine:L))"
> >,
> >- parsedquery_toString:
> >"ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
> #color_refine:Blue #size_refine:L)"
> >,
> >- explain:
> >{
> >   - 1729659: "
> >   2.0 = Score based on 2 child docs in range from 5103808 to
> 5104159, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4059732)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(size_refine:L in 4059732)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm ",
> >   - 1730320: "
> >   2.0 = Score based on 1 child docs in range from 5099889 to
> 5100070, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4055914)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(size_refine:L in 4055914)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm ",
> >   - 1730721: "
> >   2.0 = Score based on 4 child docs in range from 5097552 to
> 5097808, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4053487)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> &g

Re: Solr 8.0 Json Facets are slow - need help

2020-01-23 Thread kumar gaurav
HI Mikhail

Can you please see above debug log and help ?

Thanks


On Thu, Jan 23, 2020 at 12:05 AM kumar gaurav  wrote:

> Also
>
> its not looks like box is slow . because for following query prepare time
> is 3 ms but facet time is 84ms on the same box .Don't know why prepare time
> was huge for that example :( .
>
> debug:
> {
>
>- rawquerystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- querystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- parsedquery:
>"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku +store_873:1) 
> #color_refine:Blue #size_refine:L))"
>,
>- parsedquery_toString:
>"ToParentBlockJoinQuery (+(+docType:sku +store_873:1) #color_refine:Blue 
> #size_refine:L)"
>,
>- explain:
>{
>   - 1729659: "
>   2.0 = Score based on 2 child docs in range from 5103808 to 5104159, 
> best match:
> 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> 1.0 = store_873:1 0.0 = match on required clause, product of:
> 0.0 = # clause
> 1.0 = weight(color_refine:Blue in 4059732) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of documents containing term
>   1 = docCount, total number of documents with field
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = freq, occurrences of term within document
> 1.0 = fieldNorm 0.0 = match on required clause, product 
> of:
> 0.0 = # clause
> 1.0 = weight(size_refine:L in 4059732) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of documents containing term
>   1 = docCount, total number of documents with field
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = freq, occurrences of term within document
> 1.0 = fieldNorm ",
>   - 1730320: "
>   2.0 = Score based on 1 child docs in range from 5099889 to 5100070, 
> best match:
> 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> 1.0 = store_873:1 0.0 = match on required clause, product of:
> 0.0 = # clause
> 1.0 = weight(color_refine:Blue in 4055914) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of documents containing term
>   1 = docCount, total number of documents with field
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = freq, occurrences of term within document
> 1.0 = fieldNorm 0.0 = match on required clause, product 
> of:
> 0.0 = # clause
> 1.0 = weight(size_refine:L in 4055914) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of documents containing term
>   1 = docCount, total number of documents with field
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = freq, occurrences of term within document
> 1.0 = fieldNorm ",
>   - 1730721: "
>   2.0 = Score based on 4 child docs in range from 5097552 to 5097808, 
> best match:
> 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> 1.0 = store_873:1 0.0 = match on required clause, product of:
> 0.0 = # clause
> 1.0 = weight(color_refine:Blue in 4053487) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of documents containing term
>   1 = docCount, total number of documents with field
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = freq, occurrences of term within document
> 1.0 = fieldNorm 0.0 = match on required clause, product 
> of:
> 0.0 = # clause
> 1.0 = weight(size_refine:L in 4053487) 
> [DisabledStatisticsSimilarity], result of:
>   1.0 = score(freq=1.0), product of:
> 1.0 = idf(docFreq, docCount)
>   1 = docFreq, number of document

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Also

its not looks like box is slow . because for following query prepare time
is 3 ms but facet time is 84ms on the same box .Don't know why prepare time
was huge for that example :( .

debug:
{

   - rawquerystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - querystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - parsedquery:
   "AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
+store_873:1) #color_refine:Blue #size_refine:L))"
   ,
   - parsedquery_toString:
   "ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
#color_refine:Blue #size_refine:L)"
   ,
   - explain:
   {
  - 1729659: "
  2.0 = Score based on 2 child docs in range from 5103808 to
5104159, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4059732)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4059732)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1730320: "
  2.0 = Score based on 1 child docs in range from 5099889 to
5100070, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4055914)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4055914)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1730721: "
  2.0 = Score based on 4 child docs in range from 5097552 to
5097808, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4053487)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4053487)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1759239: "
  2.0 = Score based on 1 child docs in range from 5061166 to
5061231, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4017096)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = 

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Lots of thanks Mikhail.

Also can you please answer - Should i use docValues="true" for _root_
field to improve this json.facet performance ?

On Wed, Jan 22, 2020 at 11:42 PM Mikhail Khludnev  wrote:

> Initial request refers unknown (to me) query parser  {!simpleFilter, I
> can't comment on it.
> Parsing queries took in millis: - time: 261, usually prepare for query
> takes a moment. I suspect the box is really slow per se or encounter heavy
> load.
> And then facets took about 6 times more  - facet_module: {   - time: 1122,
> that a reasonable ratio.
> I also notice limit: -1 that's really expensive usually. If tweaking can't
> help, only profiling might give a clue.
> Note: in 8.5 there will be uniqueBlockQuery() operation, which is expected
> to be faster than uniqueBlock()
>
> On Wed, Jan 22, 2020 at 5:36 PM kumar gaurav  wrote:
>
> > HI Mikhail
> >
> > Here is full debug log . Please have a look .
> >
> > debug:
> > {
> >
> >- rawquerystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- querystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- parsedquery:
> >"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
> > +(store_873:1)^0.0) #(filter(color_refine:Black)
> > filter(color_refine:Blue"
> >,
> >- parsedquery_toString:
> >"ToParentBlockJoinQuery (+(+docType:sku +(store_873:1)^0.0)
> > #(filter(color_refine:Black) filter(color_refine:Blue)))"
> >,
> >- explain:
> >{
> >   - 5172: "
> >   1.0 = Score based on 240 child docs in range from 2572484 to
> > 2573162, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 5178: "
> >   1.0 = Score based on 304 child docs in range from 2571860 to
> > 2572404, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 9301: "
> >   1.0 = Score based on 93 child docs in range from 710150 to
> > 710796, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 118561: "
> >   1.0 = Score based on 177 child docs in range from 5728215 to
> > 5728505, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 266659: "
> >   1.0 = Score based on 89 child docs in range from 5368923 to
> > 5369396, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 323407: "
> >   1.0 = Score based on 321 child docs in range from 4807493 to
> > 4808441, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 381312: "
> >   1.0 = Score based on 232 child docs in range from 2660717 to
> > 2661101, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = 

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Initial request refers unknown (to me) query parser  {!simpleFilter, I
can't comment on it.
Parsing queries took in millis: - time: 261, usually prepare for query
takes a moment. I suspect the box is really slow per se or encounter heavy
load.
And then facets took about 6 times more  - facet_module: {   - time: 1122,
that a reasonable ratio.
I also notice limit: -1 that's really expensive usually. If tweaking can't
help, only profiling might give a clue.
Note: in 8.5 there will be uniqueBlockQuery() operation, which is expected
to be faster than uniqueBlock()

On Wed, Jan 22, 2020 at 5:36 PM kumar gaurav  wrote:

> HI Mikhail
>
> Here is full debug log . Please have a look .
>
> debug:
> {
>
>- rawquerystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- querystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- parsedquery:
>"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
> +(store_873:1)^0.0) #(filter(color_refine:Black)
> filter(color_refine:Blue"
>,
>- parsedquery_toString:
>"ToParentBlockJoinQuery (+(+docType:sku +(store_873:1)^0.0)
> #(filter(color_refine:Black) filter(color_refine:Blue)))"
>,
>- explain:
>{
>   - 5172: "
>   1.0 = Score based on 240 child docs in range from 2572484 to
> 2573162, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 5178: "
>   1.0 = Score based on 304 child docs in range from 2571860 to
> 2572404, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 9301: "
>   1.0 = Score based on 93 child docs in range from 710150 to
> 710796, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 118561: "
>   1.0 = Score based on 177 child docs in range from 5728215 to
> 5728505, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 266659: "
>   1.0 = Score based on 89 child docs in range from 5368923 to
> 5369396, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 323407: "
>   1.0 = Score based on 321 child docs in range from 4807493 to
> 4808441, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 381312: "
>   1.0 = Score based on 232 child docs in range from 2660717 to
> 2661101, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 851246: "
>   1.0 = Score based on 61 child docs in range from 730259 to
> 730562, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 1564330: "
>   1.0 = Score based on 12 child docs in range from 6831792 to
> 6832154, best match:
> 1.0 = sum of: 1.0 =

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
- excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq
excludeTags=rsizeRange_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "sizeRange_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 - material_refine:
 {
- domain:
{
   - excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq
excludeTags=rmaterial_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "material_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 - ageAppropriate_refine:
 {
- domain:
{
   - excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq
excludeTags=rageAppropriate_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "ageAppropriate_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 - price_refine:
 {
- domain:
{
   - excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq excludeTags=rprice_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "price_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 - size_refine:
 {
- domain:
{
   - excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "size_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 - inStoreOnline_refine:
 {
- domain:
{
   - excludeTags: "rassortment,top,top2,top3,top4,",
   - filter:
   [
  -
  "{!filters param=$child.fq
excludeTags=rinStoreOnline_refine v=$sq}"
  ,
  -
  "{!child of=$pq filters=$fq}docType:(product collection)",
  ],
   },
- type: "terms",
- field: "inStoreOnline_refine",
- limit: -1,
- facet:
{
   - productsCount: "uniqueBlock(_root_)"
   },
},
 }
  },
   - QParser: "BlockJoinParentQParser",
   - filter_queries:
   [
  - "{!tag=top2}(*:* -pvgc:true)",
  - "{!tag=top3}{!query v=$eligibleCollections}",
  - "{!tag=top3}{!query v=$eligibleCollections}",
  ],
   - parsed_filter_queries:
   [
  - "MatchAllDocsQuery(*:*) -pvgc:true",
  - "docType:product (+docType:collection +(eligibleToShow:[1 TO 1]))",
  - "docType:product (+docType:collection +(eligibleToShow:[1 TO 1]))",
  ],
   - timing:
   {
  - time: 1667,
  - prepare:
  {
 - time: 261,
 - query:
 {
    - time: 261
},
 - facet:
 {
- time: 0
},

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Screenshot didn't come though the list. That excerpt doesn't have any
informative numbers.

On Tue, Jan 21, 2020 at 5:18 PM kumar gaurav  wrote:

> Hi Mikhail
>
> Thanks for your reply . Please help me in this .
>
> Followings are the screenshot:-
>
> [image: image.png]
>
>
> [image: image.png]
>
>
> json facet debug Output:-
>
> json:
> {
>
>- facet:
>{
>   - color_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "color_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   - size_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "size_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   }
>
> }
>
>
>
> regards
> Kumar Gaurav
>
>
> On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:
>
>> Hi.
>> Can you share debugQuery=true output?
>>
>> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>>
>> > HI
>> >
>> > i have a parent child query in which i have used json facet for child
>> > faceting like following.
>> >
>> > qt=/dismax
>> > matchAllQueryRef1=+(+({!query v=$cq}))
>> > sq=+{!lucene v=$matchAllQueryRef1}
>> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
>> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
>> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
>> > qcolor_refine1=Blue
>> > qcolor_refine2=Other clrs
>> > cq=+{!simpleFilter v=docType:sku}
>> > pq=docType:(product)
>> > facet=true
>> > facet.mincount=1
>> > facet.limit=-1
>> > facet.missing=false
>> > json.facet= {color_refine:{
>> > domain:{
>> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
>> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>> >},
>> > type:terms,
>> > field:color_refine,
>> > limit:-1,
>> > facet:{productsCount:"uniqueBlock(_root_)"}}}
>> >
>> > schema :-
>> > > > multiValued="true" docValues="true"/>
>> >
>> > i have observed that json facets are slow . It is taking much time than
>> > expected .
>> > Can anyone please check this query specially child.fq and json.facet
>> part .
>> >
>> > Please help me in this .
>> >
>> > Thanks & regards
>> > Kumar Gaurav
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
HI Mikhail

Can you please help ?

On Tue, Jan 21, 2020 at 7:48 PM kumar gaurav  wrote:

> Hi Mikhail
>
> Thanks for your reply . Please help me in this .
>
> Followings are the screenshot:-
>
> [image: image.png]
>
>
> [image: image.png]
>
>
> json facet debug Output:-
>
> json:
> {
>
>- facet:
>{
>   - color_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "color_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   - size_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "size_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   }
>
> }
>
>
>
> regards
> Kumar Gaurav
>
>
> On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:
>
>> Hi.
>> Can you share debugQuery=true output?
>>
>> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>>
>> > HI
>> >
>> > i have a parent child query in which i have used json facet for child
>> > faceting like following.
>> >
>> > qt=/dismax
>> > matchAllQueryRef1=+(+({!query v=$cq}))
>> > sq=+{!lucene v=$matchAllQueryRef1}
>> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
>> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
>> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
>> > qcolor_refine1=Blue
>> > qcolor_refine2=Other clrs
>> > cq=+{!simpleFilter v=docType:sku}
>> > pq=docType:(product)
>> > facet=true
>> > facet.mincount=1
>> > facet.limit=-1
>> > facet.missing=false
>> > json.facet= {color_refine:{
>> > domain:{
>> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
>> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>> >},
>> > type:terms,
>> > field:color_refine,
>> > limit:-1,
>> > facet:{productsCount:"uniqueBlock(_root_)"}}}
>> >
>> > schema :-
>> > > > multiValued="true" docValues="true"/>
>> >
>> > i have observed that json facets are slow . It is taking much time than
>> > expected .
>> > Can anyone please check this query specially child.fq and json.facet
>> part .
>> >
>> > Please help me in this .
>> >
>> > Thanks & regards
>> > Kumar Gaurav
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>


Re: Need help in configuring Spell check in Apache Solr 8.4

2020-01-21 Thread kumar gaurav
Can you share spellcheck component and handler which you have used ?

On Mon, Jan 20, 2020 at 3:35 PM seeteshh  wrote:

> Hello all,
>
> I am not able to check and test the spell check feature in Apache solr 8.4
>
> Tried multiple examples including
>
>
> https://examples.javacodegeeks.com/enterprise-java/apache-solr/solr-spellcheck-example/
>
> However I am not getting any results
>
> Regards,
>
> Seetesh Hindlekar
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
Hi Mikhail

Thanks for your reply . Please help me in this .

Followings are the screenshot:-

[image: image.png]


[image: image.png]


json facet debug Output:-

json:
{

   - facet:
   {
  - color_refine:
  {
 - domain:
 {
- excludeTags: "rassortment,top,top2,top3,top4,",
- filter:
[
   -
   "{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}",
   - "{!child of=$pq filters=$fq}docType:(product collection)",
   ],
},
 - type: "terms",
 - field: "color_refine",
 - limit: -1,
 - facet:
 {
- productsCount: "uniqueBlock(_root_)"
},
 },
  - size_refine:
  {
 - domain:
 {
- excludeTags: "rassortment,top,top2,top3,top4,",
- filter:
[
   - "{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
   ,
   - "{!child of=$pq filters=$fq}docType:(product collection)",
   ],
},
 - type: "terms",
 - field: "size_refine",
 - limit: -1,
 - facet:
 {
- productsCount: "uniqueBlock(_root_)"
},
 },
  }

}



regards
Kumar Gaurav


On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:

> Hi.
> Can you share debugQuery=true output?
>
> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>
> > HI
> >
> > i have a parent child query in which i have used json facet for child
> > faceting like following.
> >
> > qt=/dismax
> > matchAllQueryRef1=+(+({!query v=$cq}))
> > sq=+{!lucene v=$matchAllQueryRef1}
> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
> > qcolor_refine1=Blue
> > qcolor_refine2=Other clrs
> > cq=+{!simpleFilter v=docType:sku}
> > pq=docType:(product)
> > facet=true
> > facet.mincount=1
> > facet.limit=-1
> > facet.missing=false
> > json.facet= {color_refine:{
> > domain:{
> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
> >},
> > type:terms,
> > field:color_refine,
> > limit:-1,
> > facet:{productsCount:"uniqueBlock(_root_)"}}}
> >
> > schema :-
> >  > multiValued="true" docValues="true"/>
> >
> > i have observed that json facets are slow . It is taking much time than
> > expected .
> > Can anyone please check this query specially child.fq and json.facet
> part .
> >
> > Please help me in this .
> >
> > Thanks & regards
> > Kumar Gaurav
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread Mikhail Khludnev
Hi.
Can you share debugQuery=true output?

On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:

> HI
>
> i have a parent child query in which i have used json facet for child
> faceting like following.
>
> qt=/dismax
> matchAllQueryRef1=+(+({!query v=$cq}))
> sq=+{!lucene v=$matchAllQueryRef1}
> q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
> child.fq={!tag=rcolor_refine}filter({!term f=color_refine
> v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
> qcolor_refine1=Blue
> qcolor_refine2=Other clrs
> cq=+{!simpleFilter v=docType:sku}
> pq=docType:(product)
> facet=true
> facet.mincount=1
> facet.limit=-1
> facet.missing=false
> json.facet= {color_refine:{
> domain:{
> filter:["{!filters param=$child.fq excludeTags=rcolor_refine
> v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>},
> type:terms,
> field:color_refine,
> limit:-1,
> facet:{productsCount:"uniqueBlock(_root_)"}}}
>
> schema :-
>  multiValued="true" docValues="true"/>
>
> i have observed that json facets are slow . It is taking much time than
> expected .
> Can anyone please check this query specially child.fq and json.facet part .
>
> Please help me in this .
>
> Thanks & regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
HI

i have a parent child query in which i have used json facet for child
faceting like following.

qt=/dismax
matchAllQueryRef1=+(+({!query v=$cq}))
sq=+{!lucene v=$matchAllQueryRef1}
q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
child.fq={!tag=rcolor_refine}filter({!term f=color_refine
v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
qcolor_refine1=Blue
qcolor_refine2=Other clrs
cq=+{!simpleFilter v=docType:sku}
pq=docType:(product)
facet=true
facet.mincount=1
facet.limit=-1
facet.missing=false
json.facet= {color_refine:{
domain:{
filter:["{!filters param=$child.fq excludeTags=rcolor_refine
v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
   },
type:terms,
field:color_refine,
limit:-1,
facet:{productsCount:"uniqueBlock(_root_)"}}}

schema :-


i have observed that json facets are slow . It is taking much time than
expected .
Can anyone please check this query specially child.fq and json.facet part .

Please help me in this .

Thanks & regards
Kumar Gaurav


Need help in configuring Spell check in Apache Solr 8.4

2020-01-20 Thread seeteshh
Hello all,

I am not able to check and test the spell check feature in Apache solr 8.4

Tried multiple examples including

https://examples.javacodegeeks.com/enterprise-java/apache-solr/solr-spellcheck-example/

However I am not getting any results 

Regards,

Seetesh Hindlekar



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2020-01-09 Thread Paras Lehana
Hi Ken,

I also recommend at least reading if not following "Taking Solr to
Production":
https://lucene.apache.org/solr/guide/8_4/taking-solr-to-production.html.

Following this cleared my doubts regarding upgradation and core referencing
while made upgradation very easy and fast.

While starting Solr, you can also define Solr Home (where your older core
lives) by using -s option.



On Wed, 25 Dec 2019 at 21:44, David Hastings  wrote:

> Exactly. Although I’m a bit curious why your going a .1 version up, I
> always wait until an x2, so I won’t be upgrading until 9.3
>
> > On Dec 25, 2019, at 9:45 AM, Erick Erickson 
> wrote:
> >
> > Should work. At any rate, just try it. Since all you’re doing is
> copying data, even if the new installation doesn’t work you still have the
> original.
> >
> >> On Dec 25, 2019, at 1:35 AM, Ken Walker  wrote:
> >>
> >> Hello Erick,
> >>
> >> Thanks for your reply!
> >>
> >> You mean that, we should follow below steps right?
> >> Here is the data directory path :
> >> solr/solr-8.2.0/server/solr/product/item_core/data
> >>
> >> STEPS :-
> >> 1. Stop old solr-8.2.0 server
> >> 2. Copy data directory (from old solr version to new solr version)
> >> copy solr/solr-8.2.1/server/solr/product/item_core/data to
> >> solr/solr-8.3.1/server/solr/product/item_core/data
> >> 3. Start new solr version solr-8.3.1
> >>
> >> Is it correct way to copy just index only from old to new solr version?
> >> Is it lose any data or anything break in new solr version ?
> >>
> >> Thanks in advance!
> >> -Ken
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-25 Thread David Hastings
Exactly. Although I’m a bit curious why your going a .1 version up, I always 
wait until an x2, so I won’t be upgrading until 9.3 

> On Dec 25, 2019, at 9:45 AM, Erick Erickson  wrote:
> 
> Should work. At any rate, just try it. Since all you’re doing is copying 
> data, even if the new installation doesn’t work you still have the original.
> 
>> On Dec 25, 2019, at 1:35 AM, Ken Walker  wrote:
>> 
>> Hello Erick,
>> 
>> Thanks for your reply!
>> 
>> You mean that, we should follow below steps right?
>> Here is the data directory path :
>> solr/solr-8.2.0/server/solr/product/item_core/data
>> 
>> STEPS :-
>> 1. Stop old solr-8.2.0 server
>> 2. Copy data directory (from old solr version to new solr version)
>> copy solr/solr-8.2.1/server/solr/product/item_core/data to
>> solr/solr-8.3.1/server/solr/product/item_core/data
>> 3. Start new solr version solr-8.3.1
>> 
>> Is it correct way to copy just index only from old to new solr version?
>> Is it lose any data or anything break in new solr version ?
>> 
>> Thanks in advance!
>> -Ken
> 


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-25 Thread Erick Erickson
Should work. At any rate, just try it. Since all you’re doing is copying data, 
even if the new installation doesn’t work you still have the original.

> On Dec 25, 2019, at 1:35 AM, Ken Walker  wrote:
> 
> Hello Erick,
> 
> Thanks for your reply!
> 
> You mean that, we should follow below steps right?
> Here is the data directory path :
> solr/solr-8.2.0/server/solr/product/item_core/data
> 
> STEPS :-
> 1. Stop old solr-8.2.0 server
> 2. Copy data directory (from old solr version to new solr version)
> copy solr/solr-8.2.1/server/solr/product/item_core/data to
> solr/solr-8.3.1/server/solr/product/item_core/data
> 3. Start new solr version solr-8.3.1
> 
> Is it correct way to copy just index only from old to new solr version?
> Is it lose any data or anything break in new solr version ?
> 
> Thanks in advance!
> -Ken



Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Ken Walker
Hello Erick,

Thanks for your reply!

You mean that, we should follow below steps right?
Here is the data directory path :
solr/solr-8.2.0/server/solr/product/item_core/data

STEPS :-
1. Stop old solr-8.2.0 server
2. Copy data directory (from old solr version to new solr version)
copy solr/solr-8.2.1/server/solr/product/item_core/data to
solr/solr-8.3.1/server/solr/product/item_core/data
3. Start new solr version solr-8.3.1

Is it correct way to copy just index only from old to new solr version?
Is it lose any data or anything break in new solr version ?

Thanks in advance!
-Ken


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Erick Erickson
Here’s the very simplest way:
1> shut down your 8.2 Solr instance
2> install your 8.3.1 instance on the same machine
3> when you start your 8.3.1 instance, specify the environment variable 
SOLR_HOME to point to the same one you used in 8.2

If you don’t know what SOLR_HOME used to point to, bring up your 8.2 instance 
first and look at the admin UI, your environment variables will point there.

NOTE: If yo do it this way, you ma _NOT_ have both 8.2 and 8.3.1 running the 
same time.

Best,
Erick

> On Dec 24, 2019, at 5:54 AM, Ken Walker  wrote:
> 
> Hello Jörn,
> 
> Thanks for your reply!
> 
> As per Shawn "Why not just copy the index and use it directly rather
> than importing it?  Solr 8.x can directly use indexes built by
> versions back to 7.0.0." in previous mail comment.
> 
> Is it possible and how we can do that ?
> 
> Thanks in advance
> - Ken
> 
> On Tue, Dec 24, 2019 at 3:26 PM Jörn Franke  wrote:
>> 
>> It seems that you got this handed over with little documentation. You have 
>> to explore what the import handler does. This is a custom configuration that 
>> you need to check how it works.
>> 
>> Then as already said. You can simply install another version of Solr if you 
>> are within a Solr major version 8.x in Linux is simply a symbolic link 
>> pointing from one Solr version to the other. In this way you can easily 
>> switch back as well.
>> 
>> Finally, check your memory consumption. Normally heap is significant smaller 
>> then the total available memory as the non-heap memory is used by Solr for 
>> caching.
>> 
>> If you have 8g mb of heap I would expect that the total amount of memory 
>> available is more than 32 gb.
>> As always it depends, but maybe you can give more details on no of cores, 
>> heap memory, total memory and if other processes than Solr run on the 
>> machine.
>> 
>>> Am 24.12.2019 um 05:59 schrieb Ken Walker :
>>> 
>>> Hello,
>>> 
>>> We are using solr version 8.2.0 in our production server.
>>> 
>>> We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
>>> version but we have faced out of memory error while importing data and
>>> then we have extended memory in our server and then again start
>>> importing process but it has work too slowy for 8GB data ( it has
>>> taken more than 2 days for importing data from solr 8.2.0 version to
>>> solr 8.3.1 version).
>>> 
>>> Could you please help me how we can do it fast for importing 8GB data
>>> from old solr version to new solr version?
>>> 
>>> We are using below command for importing data from one solr version to
>>> another solr version
>>> $ curl 
>>> 'http://IP-ADDRESS:8983/solr/items/dataimport?command=full-import=true=false=json=true=false=false=false'
>>> 
>>> Thanks in advance!
>>> - Ken



Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Ken Walker
Hello Jörn,

Thanks for your reply!

As per Shawn "Why not just copy the index and use it directly rather
than importing it?  Solr 8.x can directly use indexes built by
versions back to 7.0.0." in previous mail comment.

Is it possible and how we can do that ?

Thanks in advance
- Ken

On Tue, Dec 24, 2019 at 3:26 PM Jörn Franke  wrote:
>
> It seems that you got this handed over with little documentation. You have to 
> explore what the import handler does. This is a custom configuration that you 
> need to check how it works.
>
> Then as already said. You can simply install another version of Solr if you 
> are within a Solr major version 8.x in Linux is simply a symbolic link 
> pointing from one Solr version to the other. In this way you can easily 
> switch back as well.
>
> Finally, check your memory consumption. Normally heap is significant smaller 
> then the total available memory as the non-heap memory is used by Solr for 
> caching.
>
> If you have 8g mb of heap I would expect that the total amount of memory 
> available is more than 32 gb.
> As always it depends, but maybe you can give more details on no of cores, 
> heap memory, total memory and if other processes than Solr run on the machine.
>
> > Am 24.12.2019 um 05:59 schrieb Ken Walker :
> >
> > Hello,
> >
> > We are using solr version 8.2.0 in our production server.
> >
> > We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
> > version but we have faced out of memory error while importing data and
> > then we have extended memory in our server and then again start
> > importing process but it has work too slowy for 8GB data ( it has
> > taken more than 2 days for importing data from solr 8.2.0 version to
> > solr 8.3.1 version).
> >
> > Could you please help me how we can do it fast for importing 8GB data
> > from old solr version to new solr version?
> >
> > We are using below command for importing data from one solr version to
> > another solr version
> > $ curl 
> > 'http://IP-ADDRESS:8983/solr/items/dataimport?command=full-import=true=false=json=true=false=false=false'
> >
> > Thanks in advance!
> > - Ken


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Jörn Franke
It seems that you got this handed over with little documentation. You have to 
explore what the import handler does. This is a custom configuration that you 
need to check how it works.

Then as already said. You can simply install another version of Solr if you are 
within a Solr major version 8.x in Linux is simply a symbolic link pointing 
from one Solr version to the other. In this way you can easily switch back as 
well.

Finally, check your memory consumption. Normally heap is significant smaller 
then the total available memory as the non-heap memory is used by Solr for 
caching.

If you have 8g mb of heap I would expect that the total amount of memory 
available is more than 32 gb.
As always it depends, but maybe you can give more details on no of cores, heap 
memory, total memory and if other processes than Solr run on the machine.

> Am 24.12.2019 um 05:59 schrieb Ken Walker :
> 
> Hello,
> 
> We are using solr version 8.2.0 in our production server.
> 
> We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
> version but we have faced out of memory error while importing data and
> then we have extended memory in our server and then again start
> importing process but it has work too slowy for 8GB data ( it has
> taken more than 2 days for importing data from solr 8.2.0 version to
> solr 8.3.1 version).
> 
> Could you please help me how we can do it fast for importing 8GB data
> from old solr version to new solr version?
> 
> We are using below command for importing data from one solr version to
> another solr version
> $ curl 
> 'http://IP-ADDRESS:8983/solr/items/dataimport?command=full-import=true=false=json=true=false=false=false'
> 
> Thanks in advance!
> - Ken


can you help me?

2019-12-24 Thread Jie ant
Highlight, display when the query ID is available


Change to title. The return value of highlighting is only the ID information, 
and the content is empty

Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Ken Walker
Hello Shawn,

Thanks for your reply!

Actually we don't know how its works ( just copy the index ) so could
you please give us some reference urls or any steps for it?

Thanks in advance
- Ken

On Tue, Dec 24, 2019 at 11:56 AM Shawn Heisey  wrote:
>
> On 12/23/2019 9:58 PM, Ken Walker wrote:
> > We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
> > version but we have faced out of memory error while importing data and
> > then we have extended memory in our server and then again start
> > importing process but it has work too slowy for 8GB data ( it has
> > taken more than 2 days for importing data from solr 8.2.0 version to
> > solr 8.3.1 version).
> >
> > Could you please help me how we can do it fast for importing 8GB data
> > from old solr version to new solr version?
>
> Why not just copy the index and use it directly rather than importing
> it?  Solr 8.x can directly use indexes built by versions back to 7.0.0.
>
> Thanks,
> Shawn


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-23 Thread Shawn Heisey

On 12/23/2019 9:58 PM, Ken Walker wrote:

We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
version but we have faced out of memory error while importing data and
then we have extended memory in our server and then again start
importing process but it has work too slowy for 8GB data ( it has
taken more than 2 days for importing data from solr 8.2.0 version to
solr 8.3.1 version).

Could you please help me how we can do it fast for importing 8GB data
from old solr version to new solr version?


Why not just copy the index and use it directly rather than importing 
it?  Solr 8.x can directly use indexes built by versions back to 7.0.0.


Thanks,
Shawn


Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-23 Thread Ken Walker
Hello,

We are using solr version 8.2.0 in our production server.

We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
version but we have faced out of memory error while importing data and
then we have extended memory in our server and then again start
importing process but it has work too slowy for 8GB data ( it has
taken more than 2 days for importing data from solr 8.2.0 version to
solr 8.3.1 version).

Could you please help me how we can do it fast for importing 8GB data
from old solr version to new solr version?

We are using below command for importing data from one solr version to
another solr version
$ curl 
'http://IP-ADDRESS:8983/solr/items/dataimport?command=full-import=true=false=json=true=false=false=false'

Thanks in advance!
- Ken


Re: Need help in GeoSpatial Searching into Solr Server

2019-12-23 Thread Erick Erickson
Why are you using  text field for location? You must use the proper field type.

You need to follow the instructions in the “spatial search” section of
the reference guide, here’s the ref guide for Solr 7:

https://lucene.apache.org/solr/guide/7_7/spatial-search.html

Best,
Erick


> On Dec 23, 2019, at 6:53 AM, niraj kumar  wrote:
> 
> I have 100 documents into Solr, type of location field is
> *org.apache.solr.schema.TextField.*
> 
> I am unable to run any query to search nearby points with reference to that
> field.
> 
> So if you can help into it or provide some program reference in JAVA with
> same kind of implementation.
> 
> 
> Thanks,
> Niraj



Need help in GeoSpatial Searching into Solr Server

2019-12-23 Thread niraj kumar
I have 100 documents into Solr, type of location field is
*org.apache.solr.schema.TextField.*

I am unable to run any query to search nearby points with reference to that
field.

So if you can help into it or provide some program reference in JAVA with
same kind of implementation.


Thanks,
Niraj


Re: Need some help on solr versions (LTS vs stable)

2019-11-13 Thread Adam Walz
The LTS idea I believe comes from the solr downloads page where 7.7.x is
designated as LTS. https://lucene.apache.org/solr/downloads.html

On Wed, Nov 13, 2019 at 9:41 AM Shawn Heisey  wrote:

> On 11/6/2019 9:58 AM, suyog joshi wrote:
> > So we can say its better to go with latest stable version (8.x) instead
> of
> > 7.x, which is LTS right now, but can soon become EOL post launching of
> 9.x
> > sometime early next year.
>
> I don't know where you got the idea that 7.x is LTS ... but I do not
> think that is correct.  I don't think we have a version that could be
> called LTS, at least not the way I have seen the term used.
>
> It's true that 7.x currently is in a state where it is unlikely to have
> its feature list changed, which could be seen as stability.  But chances
> are that if you DO run into a bug with a 7.x version, the fix for that
> problem will probably only make it into the current stable branch, so
> you'd be upgrading to at least an 8.x version in order to obtain the fix.
>
> Changing to an LTS model would mean changes to the way development is
> done on the project.  Change is always scary.  I've asked on the dev
> list about this.
>
> Thanks,
> Shawn
>


-- 
Adam Walz


Re: Need some help on solr versions (LTS vs stable)

2019-11-13 Thread Shawn Heisey

On 11/6/2019 9:58 AM, suyog joshi wrote:

So we can say its better to go with latest stable version (8.x) instead of
7.x, which is LTS right now, but can soon become EOL post launching of 9.x
sometime early next year.


I don't know where you got the idea that 7.x is LTS ... but I do not 
think that is correct.  I don't think we have a version that could be 
called LTS, at least not the way I have seen the term used.


It's true that 7.x currently is in a state where it is unlikely to have 
its feature list changed, which could be seen as stability.  But chances 
are that if you DO run into a bug with a 7.x version, the fix for that 
problem will probably only make it into the current stable branch, so 
you'd be upgrading to at least an 8.x version in order to obtain the fix.


Changing to an LTS model would mean changes to the way development is 
done on the project.  Change is always scary.  I've asked on the dev 
list about this.


Thanks,
Shawn


  1   2   3   4   5   6   7   8   9   10   >