Re: Multi tenant setup

2017-06-13 Thread Zisis T.
We are talking about fewer collections,so that won't be an issue. 

The problem comes when - using the proposed setup - I want to send a query
across all those collections and get properly ranked results. Each
collection has its own IDF etc, so the scores are not comparable. This means
that most probably results from one collection will dominate the results. 

This led me to try the /DistributedIDF/ configuration but this did not work
either due to the issues described in the link of the original post. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-tenant-setup-tp4340377p4340421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Failure to load shards

2017-06-13 Thread Erick Erickson
John:

The patch certainly doesn't apply cleanly to 5.4. that said, there
were just a few conflicts so that part doesn't look too bad. I don't
know anyone who has actually backported it that far so it's unexplored
territory.

Note that the patch was generated for 6x which requires Java 1.8
whereas the 5x code line required only Java 1.7. Not saying 1.8 is
_required_ unless this patch have some Java8 idioms, and even in that
case you might be able to unwind them.

that said, Solr5 works with Java 8 anyway, so if I had to choose I'd
just compile under Java8. You can force this by building like this:

ant -Djavac.source=1.8 -Djavac.target=1.8 whatever_target

I'd definitely check out the Overseer queues before bothering though.

Good luck!
Erick

On Tue, Jun 13, 2017 at 11:33 AM, John Bickerstaff
 wrote:
> Eric,
>
> We're using Solr 5.5.4 and aren't really eager to change at this moment...
>
> Off the top of your head - what probability that the patch here:
> https://issues.apache.org/jira/browse/SOLR-10524
>
> ... will work in 5.5.4 with minimal difficulty?
>
> For example, were there other classes introduced in 6 that the patch
> uses/depends on?
>
> Thanks...
>
> On Fri, Jun 9, 2017 at 12:03 PM, John Bickerstaff 
> wrote:
>
>> Hi all,
>>
>> Here's my situation...
>>
>> In AWS with zookeeper / solr.
>>
>> When trying to spin up additional Solr boxes from an "auto scaling group"
>> I get this failure.
>>
>> The code used is exactly the same code that successfully spun up the first
>> 3 or 4 solr boxes in each "auto scaling group"
>>
>> Below is a copy of my email to some of my compatriots within the company
>> who also use solr/zookeeper
>>
>> I'm looking for any advice on what _might_ be the cause of this
>> failure...  Overload on Zookeeper in some way is our best guess.
>>
>> I know this isn't a zookeeper forum - - just hoping someone out there has
>> some experience troubleshooting similar issues.
>>
>> Many thanks in advance...
>>
>> =
>>
>> We have 6 zookeepers. (3 of them are observers).
>>
>> They are not under a load balancer
>>
>> How do I check if zookeeper nodes are under heavy load?
>>
>>
>> The problem arises when we try to scale up with more solr nodes. Current
>> setup we have 160 nodes connected to zookeeper. Each node with 40 cores, so
>> around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
>> one time.
>>
>> And we are getting errors like these that stops the index distribution
>> process:
>>
>> 2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
>> Error creating core [p44_b1_s37]: Could not get shard id for core:
>> p44_b1_s37
>>
>>
>> org.apache.solr.common.SolrException: Could not get shard id for core:
>> p44_b1_s37
>>
>> at org.apache.solr.cloud.ZkController.waitForShardId(
>> ZkController.java:1496)
>>
>> at org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess
>> (ZkController.java:1438)
>>
>> at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1548)
>>
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>>
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)
>>
>> at com.ancestry.solr.servlet.AcomServlet.indexTransfer(
>> AcomServlet.java:319)
>>
>> at com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(
>> AcomServlet.java:303)
>>
>> at com.ancestry.solr.service.IndexTransferWorker.run(
>> IndexTransferWorker.java:78)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1142)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:617)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> Which we predict has to do with zookeeper not responding fast enough.
>>


Re: Failure to load shards

2017-06-13 Thread John Bickerstaff
Eric,

We're using Solr 5.5.4 and aren't really eager to change at this moment...

Off the top of your head - what probability that the patch here:
https://issues.apache.org/jira/browse/SOLR-10524

... will work in 5.5.4 with minimal difficulty?

For example, were there other classes introduced in 6 that the patch
uses/depends on?

Thanks...

On Fri, Jun 9, 2017 at 12:03 PM, John Bickerstaff 
wrote:

> Hi all,
>
> Here's my situation...
>
> In AWS with zookeeper / solr.
>
> When trying to spin up additional Solr boxes from an "auto scaling group"
> I get this failure.
>
> The code used is exactly the same code that successfully spun up the first
> 3 or 4 solr boxes in each "auto scaling group"
>
> Below is a copy of my email to some of my compatriots within the company
> who also use solr/zookeeper
>
> I'm looking for any advice on what _might_ be the cause of this
> failure...  Overload on Zookeeper in some way is our best guess.
>
> I know this isn't a zookeeper forum - - just hoping someone out there has
> some experience troubleshooting similar issues.
>
> Many thanks in advance...
>
> =
>
> We have 6 zookeepers. (3 of them are observers).
>
> They are not under a load balancer
>
> How do I check if zookeeper nodes are under heavy load?
>
>
> The problem arises when we try to scale up with more solr nodes. Current
> setup we have 160 nodes connected to zookeeper. Each node with 40 cores, so
> around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
> one time.
>
> And we are getting errors like these that stops the index distribution
> process:
>
> 2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
> Error creating core [p44_b1_s37]: Could not get shard id for core:
> p44_b1_s37
>
>
> org.apache.solr.common.SolrException: Could not get shard id for core:
> p44_b1_s37
>
> at org.apache.solr.cloud.ZkController.waitForShardId(
> ZkController.java:1496)
>
> at org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess
> (ZkController.java:1438)
>
> at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1548)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)
>
> at com.ancestry.solr.servlet.AcomServlet.indexTransfer(
> AcomServlet.java:319)
>
> at com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(
> AcomServlet.java:303)
>
> at com.ancestry.solr.service.IndexTransferWorker.run(
> IndexTransferWorker.java:78)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> Which we predict has to do with zookeeper not responding fast enough.
>


Solr 6: how to get SortedSetDocValues from index by field name

2017-06-13 Thread SOLR4189
How do I get SortedSetDocValues from index by field name?

I try it and it works for me but I didn't understand why to use
leaves.get(0)? What does it mean? (I saw such using in
TestUninvertedReader.java of SOLR-6.5.1):

*Map mapping = new HashMap<>();
mapping.put(fieldName, UninvertingReader.Type.SORTED);

SolrIndexSearcher searcher = req.getSearcher();

DirectoryReader dReader = searcher.getIndexReader();
LeafReader reader = null;

if (!dReader.leaves.isEmpty()) {
  reader = dReader.leaves().get(0).reader;
  return null;
}

SortedSetDocValues sourceIndex = reader.getSortedSetDocValues(fieldName);*

Maybe do I need to use SlowAtomicReader, like it:

*
UninvertingReader reader = new
UninvertingReader(searcher.getSlowAtomicReader(), mapping)*;

What is right way to get SortedSetDocValues and why?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-how-to-get-SortedSetDocValues-from-index-by-field-name-tp4340388.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi tenant setup

2017-06-13 Thread Susheel Kumar
Going with single cluster having multiple collections (for each client) is
what I would try.  How many clients do you have? If 10K, mean 10K
collections and then how many documents, their size etc. you will need to
come up with to nail down #machines and their memory/cpu requirements.
Going with single collection is not really a multi-tenant setup and also
when you have different schema's.

Thanks,
Susheel


On Tue, Jun 13, 2017 at 12:35 PM, Zisis T.  wrote:

> I'm trying to setup a multi-tenant Solr cluster (v6.5.1) which must meet
> the
> following requirements. The tenants are different customers with similar
> type of data.
>
> * Ability to query per client but also across all clients
> * Don't want to hit all shards for all type of requests (per client, across
> clients)
> * Don't want to have everything under a single multi-sharded collection to
> avoid a SPOF and maintenance headaches
>(e.g. a schema change will force an all-client reindexing. single huge
> backup/restore)
> * Ability to semi-support different schemas.
>
> Based on the above I ruled out the following setups
> * Single multi-sharded collection for all clients and all its variations
> (e.g. multiple clients in a singe shard)
> * One collection per client
>
> My preference lies in a setup like the following
> * Create a limited # of collections
> * Split the clients in the collections created above based on some criteria
> (size, content-type)
> * Client specific requests will be limited in a single collection
> * Across clients requests will target a limited # of collections (using
> =col_1,col_2,col_3)
>
> The approach above meets the requirements posted above but the issue that
> is
> blocking me is the Distributed IDF not working properly across collections.
> (Check comment#3, bullet#2 of
> http://lucene.472066.n3.nabble.com/Distributed-IDF-in-
> inter-collections-distributed-queries-td4317519.html)
>
>
> -> Do you see anything wrong with my assumptions/approach above? Are there
> any alternatives besides having separate clusters for the search across
> clients and the individual clients?
> -> Is it safe to go with a single collection? If it is, I still need to
> handle the possible different schemas per client somehow.
> -> Is there a way to enforce local stats when quering a single collection
> and use global stats only when querying across collections? (see link
> above)
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Multi-tenant-setup-tp4340377.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Multi tenant setup

2017-06-13 Thread Zisis T.
I'm trying to setup a multi-tenant Solr cluster (v6.5.1) which must meet the
following requirements. The tenants are different customers with similar
type of data.

* Ability to query per client but also across all clients
* Don't want to hit all shards for all type of requests (per client, across
clients)
* Don't want to have everything under a single multi-sharded collection to
avoid a SPOF and maintenance headaches 
   (e.g. a schema change will force an all-client reindexing. single huge
backup/restore)
* Ability to semi-support different schemas.

Based on the above I ruled out the following setups 
* Single multi-sharded collection for all clients and all its variations
(e.g. multiple clients in a singe shard)
* One collection per client 

My preference lies in a setup like the following
* Create a limited # of collections
* Split the clients in the collections created above based on some criteria
(size, content-type)
* Client specific requests will be limited in a single collection
* Across clients requests will target a limited # of collections (using
=col_1,col_2,col_3)

The approach above meets the requirements posted above but the issue that is
blocking me is the Distributed IDF not working properly across collections.
(Check comment#3, bullet#2 of
http://lucene.472066.n3.nabble.com/Distributed-IDF-in-inter-collections-distributed-queries-td4317519.html)


-> Do you see anything wrong with my assumptions/approach above? Are there
any alternatives besides having separate clusters for the search across
clients and the individual clients?
-> Is it safe to go with a single collection? If it is, I still need to
handle the possible different schemas per client somehow.
-> Is there a way to enforce local stats when quering a single collection
and use global stats only when querying across collections? (see link above)

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-tenant-setup-tp4340377.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Same queries taking more time

2017-06-13 Thread Erick Erickson
Well, segments won't be merged except during active indexing,
specifically when you issue a commit. Do you have any commit settings
specified.

You can't tell much by raw heap size since you don't control when
garbage collections occur. What happens if you attach, say, jconsole
to it and force a GC?

Finally, have you done anything with your merge policy settings? What
version of Solr?

Best,
Erick

On Tue, Jun 13, 2017 at 7:12 AM, kshitij tyagi
 wrote:
> Hi,
>
> We are using master slave architecture, here are the observations:
>
> 1. Heap size and connections on slave are increasing and leading to more
> query time.
>
> 2. We are noticing on solr admin UI that segment count is huge and also no
> merging is taking place.
>
> 3. We have not made any changes and a new searcher was opened around 5 hrs
> ago by solr and since then we are seeing such issues.
>
> What are the aspects we should check as of now?
>
> Help appreciated.
>
> Regards,
> Kshitij


Re: Setting q parameter for fetch Streaming expression

2017-06-13 Thread Joel Bernstein
Currently you cannot specify a query for fetch. You can filter tuples
emitted by fetch by wrapping it in a "having" expression.

In the future I think it makes sense to support filter queries with fetch.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 13, 2017 at 4:46 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> For the Streaming expression on Fetch, is it possible to have set the q
> parameter for the "addresses" collection?
> In the below example from the Solr Documentation, it is only setting the q
> parameter for the "people" collection.
>
> I'm using Solr 6.5.1.
>
> fetch(addresses,
>   search(people, q="*:*", fl="username, firstName, lastName",
> sort="username
> asc"),
>   fl="streetAddress, city, state, country, zip",
>   on="username=userId")
>
> Regards,
> Edwin
>


Same queries taking more time

2017-06-13 Thread kshitij tyagi
Hi,

We are using master slave architecture, here are the observations:

1. Heap size and connections on slave are increasing and leading to more
query time.

2. We are noticing on solr admin UI that segment count is huge and also no
merging is taking place.

3. We have not made any changes and a new searcher was opened around 5 hrs
ago by solr and since then we are seeing such issues.

What are the aspects we should check as of now?

Help appreciated.

Regards,
Kshitij


Re: Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread Erik Hatcher
Inner purely negative queries match nothing.  A query is about matching, and 
skipping over things that don’t match.  The fix is when using (-something) to 
do (*:* -something) to match everything and skip the negative clause items.

In your example, try fq=((*:* -documentTypeId:3) AND companyId:29096)

Erik

> On Jun 13, 2017, at 3:15 AM, abhi Abhishek  wrote:
> 
> Hi Everyone,
> 
>I have hit a weird behavior of Boolean Query, when I am
> running the query with below param’s  it’s not behaving as expected. can
> you please help me understand the behavior here?
> 
> 
> 
> q=*:*=((-documentTypeId:3)+AND+companyId:29096)=2.2=0=10=on=true
> 
> èReturns 0 matches
> 
> filter_queries: ((-documentTypeId:3) AND companyId:29096)
> 
> parsed_filter_queries: +(-documentTypeId:3) +companyId:29096
> 
> 
> 
> q=*:*=(-documentTypeId:3+AND+companyId:29096)=2.2=0=10=on=true
> 
> è returns 1600 matches
> 
> filter_queries:(-documentTypeId:3 AND companyId:29096)
> 
> parsed_filter_queries:-documentTypeId:3 +companyId:29096
> 
> 
> 
> Can you please help me understand what am I missing here?
> 
> 
> Thanks in Advance.
> 
> 
> Thanks & Best Regards,
> 
> Abhishek



RE: Managed Synonyms query

2017-06-13 Thread Sweta Parekh
Hi All,

Appreciate if anyone can help to understand index side synonyms implementation

Regards,
Sweta Parekh
Search / CRO - Associate Program Manager


From: Sweta Parekh
Sent: Monday, June 05, 2017 8:46 PM
To: 'solr-user@lucene.apache.org'
Subject: Managed Synonyms query

Hi All,

We are using managed synonyms functionality at index time but one-way & 
replacement are not working as desired. Below are some examples that are not 
working
Can anyone help to understand how the index time synonyms works in Solr . We 
are using edismax with mm=2>-1 5>100%, ps=5

Below entries highlighted in yellow are not working. We need to search to 
return results of "From" word along with "To" term but not vice-versa

rust-oleum ; rust-oleum, rust o leum-  this is not working
rust-oleum  ; rust-oleum, rust oleum -  in the analyzer it is showing that rust 
oleum is replaced with rust-oleum and then it should show result .
rust-oleum   ;rust-oleum, rust oleum Canada  this will work once above entry 
works
rust-oleum; rust-oleum, rust-oleum Canada   this is working
rust-oleum ;rust-oleum, rustoleum canada  this is working

eye wash  ;   eye wash, douche occulaire  - this is not working
eye wash  ;   eye wash, douche oculaire - this is working


Gatorade; gatoraid - this is not working . We want to replace to term 
"gatoraid" with from term "Gatorade" but output is reversed. When we switch the 
entry, there is not impact of the rule

Regards,
Sweta Parekh
Search / CRO - Associate Program Manager
Digital Marketing Services
sweta.par...@clerx.com
Extn: 284887 | Mobile: +(91) 9004667625
eClerx Services Limited [www.eClerx.com]



Setting q parameter for fetch Streaming expression

2017-06-13 Thread Zheng Lin Edwin Yeo
Hi,

For the Streaming expression on Fetch, is it possible to have set the q
parameter for the "addresses" collection?
In the below example from the Solr Documentation, it is only setting the q
parameter for the "people" collection.

I'm using Solr 6.5.1.

fetch(addresses,
  search(people, q="*:*", fl="username, firstName, lastName",
sort="username
asc"),
  fl="streetAddress, city, state, country, zip",
  on="username=userId")

Regards,
Edwin


Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread abhi Abhishek
Hi Everyone,

I have hit a weird behavior of Boolean Query, when I am
running the query with below param’s  it’s not behaving as expected. can
you please help me understand the behavior here?



q=*:*=((-documentTypeId:3)+AND+companyId:29096)=2.2=0=10=on=true

 èReturns 0 matches

filter_queries: ((-documentTypeId:3) AND companyId:29096)

parsed_filter_queries: +(-documentTypeId:3) +companyId:29096



q=*:*=(-documentTypeId:3+AND+companyId:29096)=2.2=0=10=on=true

è returns 1600 matches

filter_queries:(-documentTypeId:3 AND companyId:29096)

parsed_filter_queries:-documentTypeId:3 +companyId:29096



Can you please help me understand what am I missing here?


Thanks in Advance.


Thanks & Best Regards,

Abhishek