Re: SQL rpt_location question

2017-03-24 Thread Joel Bernstein
You can use the _query_ field to support any Solr query in the where clause:

select a, b from c where _query_='(any solr query)'

This is definitely supported in the 6.5 release which is the first release
with Apache Calcite as the SQL engine. But I believe it's also supported in
older versions of the SQL handler.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Mar 24, 2017 at 10:09 AM, GW  wrote:

> Dear reader,
>
> I've found that using the distinct clause gives me the list I want.
>
> I also have a multivalued rpt_location in the collection that I'd like to
> use in the filter.
>
> Is this possible in any way shape of form?
>
> Many thanks in advance,
>
> Greg
>


Re: Difference between hashJoin and innerJoin in Streaming Expression

2017-03-24 Thread Joel Bernstein
The innerJoin is a merge join and the hashJoin is a hash join.

The merge join can support joins of unlimited size and never runs out of
memory. But it requires that both sides of the join are sorted on the join
keys.

The hash join reads one side of the join into a hash map keyed on the join
keys. This doesn't require any specific sort but it is limited in size by
how much data can fit in the hash map.

You can parallelize both joins using the parallel function to improve
scalability and performance.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Mar 24, 2017 at 4:49 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> What is the main difference between hashJoin and innerJoin in Solr
> Streaming Expression?
>
> I understand that both will emit a tuple containing the fields of both
> tuples.
>
> When I tried both hashJoin and innerJoin with the same query, I get exactly
> the same results, and there is no difference in performance.
>
> Under what circumstances should we use hashJoin, and under what
> circumstances should we use innerJoin?
>
> Regards,
> Edwin
>


Re: JSON Facet API Virtual Field Support

2017-03-24 Thread Yonik Seeley
On Fri, Mar 24, 2017 at 7:52 PM, Furkan KAMACI  wrote:
> Hi,
>
> I test JSON Facet API of Solr. Is it possible to create a virtual field
> which is generated by using existing fields at response and supports
> elementary arithmetic operations?
>
> Example:
>
> Schema fields:
>
> products,
> sold_products,
> date
>
> I want to run a date range facet and add another field to response which is
> the percentage of sold products (ratio will be calculated as sold_products
> * 100 / products)

Currently only half supported.  By this I mean we can do math on
fields and aggregate them per bucket.
Basically sum(div(sold_products,products)), assuming products and
sold_products exist on each document.

What we can't do yet is do math on aggregations:
  div(sum(sold_products),sum(products))

If the former works for you, simply place that in a facet block within
a parent facet (like your range facet).
http://yonik.com/solr-facet-functions/

-Yonik


JSON Facet API Virtual Field Support

2017-03-24 Thread Furkan KAMACI
Hi,

I test JSON Facet API of Solr. Is it possible to create a virtual field
which is generated by using existing fields at response and supports
elementary arithmetic operations?

Example:

Schema fields:

products,
sold_products,
date

I want to run a date range facet and add another field to response which is
the percentage of sold products (ratio will be calculated as sold_products
* 100 / products)

Kind Regards,
Furkan KAMACI


Re: Licensing issue advice for Solr.

2017-03-24 Thread Chris Hostetter

: I know that the product in general is licensed as Apache 2.0, but 
unfortunately there are packages 
: included in the build that are considered "non-permissive" by my company and 
as such, means that 
...
: It appears that the vast majority of the licensing issues are within the 
contrib directory. I know these 
...
: just the Solr and Lucene server distribution (minus demos and contrib). Some 
of the packages are 
: dual licensed so I am able to deal with that by selecting which we wish to 
use, but there are some 
: that are either not licensed at all or are only non-permissive (ie: not 
Apache, BSD, MIT, etc.) like 
: GPL, CDDL, etc. 

Can you give a specific example of a dependency that you are seeing used 
in contrib that is not (dual) licensed under a "permissive" license?

Every jar Lucene/Solr depends on has a corrisponding file in our 
*/licenses directories -- so I'm not sure what you mean by "not licensed 
at all"  There should most cerntainly not be anything that's licensed as 
GPL (w/o a dual license option that's more permissive).  

Granted: If you consider CDDL to be problematic then you will certainly 
have some problems since the javax.servlet *API* is itself under the 
CDDL, so you're kind of going to be out of luck as far as being able to 
run Solr ... or any java based app that uses servlets.


Having said all of that

If your legal department feels (for whatever reasons) that solr-core's 
dependencies are "ok", but there are contribs with deps that are "not ok", 
then there is an easy solution:

1) download a "src" release
2) run "cd solr/webapp && ant dist"

You'll now have a fully functionaly copy of the solr app you can run, and 
only the compile dependencies for the core server & solrj will have been 
downloaded to your machine -- nothing in contrib will be built, let alone 
have any 3rd party deps downloaded.


-Hoss
http://www.lucidworks.com/


Shingles from WDFF

2017-03-24 Thread Ken Krugler
Hi all,

I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid 
forward-porting, but I don’t think there’s an equivalent in Solr 5/6.

Specifically it’s applying shingles to the output of something like the 
WordDelimiterFilter - e.g. MySuperSink gets split into “My” “Super” “Sink”, and 
then shingled (if we’re using shingle size of 2) to be “My”, “MySuper”, 
“Super”, “SuperSink”, “Sink”.

I can’t just follow the WDF with a single filter because shingles aren’t 
created across terms coming into the WDF - it’s only for the pieces generated 
by the WDF.

Or is there actually a way to make this work with Solr 5/6?

Thanks,

— Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Re: Licensing issue advice for Solr.

2017-03-24 Thread Alexandre Rafalovitch
There is no official build with minimal Solr configuration. Some
downstream projects may do so, but we don't keep track of their
installation specifics.

If it is an issue with contrib directory I would think you should be
able to just not use it or even delete it.

As to the searching, you've already been shown Markmail. There is also
http://search-lucene.com/?project=Solr=mail+_hash_+user=license

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 24 March 2017 at 13:53,   wrote:
> Hi all,
>
> I'm just getting started with Solr (6.4.2) and am trying to get approval for 
> usage in my workplace.
> I know that the product in general is licensed as Apache 2.0, but 
> unfortunately there are packages
> included in the build that are considered "non-permissive" by my company and 
> as such, means that
> I am having trouble getting things approved.
>
> It appears that the vast majority of the licensing issues are within the 
> contrib directory. I know these
> provide significant functionality for Solr, but I was wondering if there is 
> an official build that contains
> just the Solr and Lucene server distribution (minus demos and contrib). Some 
> of the packages are
> dual licensed so I am able to deal with that by selecting which we wish to 
> use, but there are some
> that are either not licensed at all or are only non-permissive (ie: not 
> Apache, BSD, MIT, etc.) like
> GPL, CDDL, etc.
>
> Has anyone had to deal with this in the past. My apologies if this has been 
> discussed before, but
> it doesn't appear that the mail list archive has a search option (correct me 
> if I'm wrong on that).
>
> Thanks
>
>


Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
Hi Shawn,
It looks like you probably have pointed to the root cause of the issue.
I am using a java client and using HttpClient library directly to fire the
Http get queries. I am not using SolrJ client for firing the queries.

The following is my code

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet(url);
CloseableHttpResponse response = null;
try {
   response = httpclient.execute(httpGet);
} finally {

}

Is a route per httpHost/port combination?

I will try to set the MaxConnPerRoute and maxTotalConn to a larger value
and see if that gives me some benefit.


Thanks
Suresh









On 3/24/17 6:05 AM, "Shawn Heisey"  wrote:

>On 3/23/2017 6:10 PM, Suresh Pendap wrote:
>> I performed the test with 1 thread, 10 client threads and 50 client
>> threads. I noticed that as I increased the number of threads, the
>> query latency kept increasing drastically which I was not expecting.
>
>What language and Solr library was the client using?  If it's Java and
>SolrJ, then the following will apply.  If the client is written in a
>language other than Java, you may find that there are similar default
>settings in the HTTP library:
>
>A dependency called HttpClient is used by SolrJ.  The default settings
>for HttpClient are only capable of making *two* simultaneous connections
>to a target server.  Further connections will wait until existing
>connections are complete.  Unless it is overridden, SolrJ creates the
>HttpClient object with default settings.
>
>If more threads are needed, the SolrClient object must be built with a
>custom HttpClient.  Here's some SolrJ code to create a
>multithread-capable client object (300 threads to a single server):
>
>  RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
>  .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
>  httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
> 
>.setMaxConnPerRoute(300).setMaxConnTotal(5000).disableAutomaticRetries()
>  .build();
>  client = new HttpSolrClient(serverBaseUrl, httpClient);
>
>I have also placed this code at the following URL.  It will expire in
>one month:
>
>http://apaste.info/BpoWY
>
>A similar technique can be used with CloudSolrClient if needed.
>
>It's my opinion that SolrJ needs to create client objects by default
>that are capable of more threads, but I have not yet put any time into
>making it happen.
>
>Thanks,
>Shawn
>



Re: Licensing issue advice for Solr.

2017-03-24 Thread Pablo Pita Leira
No answer from my side, but if you like to search the mailing list, you 
can try this:


http://markmail.org/search/?q=license+list%3Aorg.apache.lucene.solr-user


On 24.03.2017 18:53, russell.lemas...@comcast.net wrote:

Hi all,

I'm just getting started with Solr (6.4.2) and am trying to get approval for 
usage in my workplace.
I know that the product in general is licensed as Apache 2.0, but unfortunately 
there are packages
included in the build that are considered "non-permissive" by my company and as 
such, means that
I am having trouble getting things approved.

It appears that the vast majority of the licensing issues are within the 
contrib directory. I know these
provide significant functionality for Solr, but I was wondering if there is an 
official build that contains
just the Solr and Lucene server distribution (minus demos and contrib). Some of 
the packages are
dual licensed so I am able to deal with that by selecting which we wish to use, 
but there are some
that are either not licensed at all or are only non-permissive (ie: not Apache, 
BSD, MIT, etc.) like
GPL, CDDL, etc.

Has anyone had to deal with this in the past. My apologies if this has been 
discussed before, but
it doesn't appear that the mail list archive has a search option (correct me if 
I'm wrong on that).

Thanks







Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
Erick,
- I think I checked that my QueryResultsCache and DocumentCache ratios
were close to 1. I will double check that by repeating my test.
- I think checking the Qtimes in the log is a very good suggestion, I will
also check that the next time I run my test
- It is not possible as the client is just a java client program which
just fires the queries using a REST Client API

Is there any way that SOLR publishes its thread pool statistics?

For e.g in cassandra you have a command like nodetool tpstats which
provides a nice table stats for all the thread pools involved,
how many are pending jobs etc?


Thanks
Suresh

On 3/23/17 9:33 PM, "Erick Erickson"  wrote:

>I'd check my I/O. Since you're firing the same query, I expect that
>you aren't I/O bound at all, since, as you say, the docs should
>already be in memory. This assumes that your document cache size is >
>0. You can check this. Go to the admin UI, select one of your cores
>(not collection) and go to plugins/stats. You should see the
>documentCache as one of the entries and you should be hitting an
>insane hit ratio close to 100% as your test runs.
>
>Also check your queryResultCache. That also should be near 100% in
>your test. Do note that these caches really never hit this "for real",
>but as you indicated this is a highly artificial test so such high hit
>ratios are what I'd expect.
>
>Assuming that those caches are being hit near 100%, Solr really isn't
>doing any work to speak of so there almost has to be some kind of
>queueing going on.
>
>The fact that your CPU is only running 8-10% is an indication that
>you're requests are queued up somewhere, but where I have no clue. The
>Jetty thread pool is quite large.  What are the QTimes reported in the
>responses? My guess is that the QTime stays pretty constant (and very
>low) even as your response time increases, another indication that
>you're queueing.
>
>Hmmm, is it possible that on the queueing is on the _client_ side?
>What aggregate throughput to you get if you fire up 10 _clients_ each
>with one thread rather than 1 client and 10 threads? That's a shot in
>the dark, but worth a try I suppose. And how does your client fire
>queries? SolrJ? Http? Jmeter or the like?
>
>But yeah, this is weird. Since you're firing the same query, Solr
>isn't really doing any work at all.
>
>Best,
>Erick
>
>On Thu, Mar 23, 2017 at 7:56 PM, Aman Deep Singh
> wrote:
>> You can play with the merge factor in the index config.
>> If their is no frequent updates then make it 2 ,it will give you High
>> throughput and less latency.
>>
>> On 24-Mar-2017 8:22 AM, "Zheng Lin Edwin Yeo" 
>>wrote:
>>
>>> I also did find that beyond 10 threads for 8GB heap size , there isn't
>>>much
>>> improvement with the performance. But you can increase your heap size a
>>> little if your system allows it.
>>>
>>> By the way, which Solr version are you using?
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>> On 24 March 2017 at 09:21, Matt Magnusson  wrote:
>>>
>>> > Out of curosity, what is your index size? I'm trying to do something
>>> > similar with maximizing output, I'm currently looking at streaming
>>> > expressions which I'm seeing some interesting results for, I'm also
>>> > finding that the direct mass query route seems to hit a wall for
>>> > performance. I'm also finding that about 10 threads seems to be an
>>> > optimum number.
>>> >
>>> > On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap
>>>
>>> > wrote:
>>> > > Hi,
>>> > > I am new to SOLR search engine technology and I am trying to get
>>>some
>>> > performance numbers to get maximum throughput from the SOLR cluster
>>>of a
>>> > given size.
>>> > > I am currently doing only query load testing in which I randomly
>>>fire a
>>> > bunch of queries to the SOLR cluster to generate the query load.  I
>>> > understand that it is not the ideal workload as the
>>> > > ingestion and commits happening invalidate the Solr Caches, so it
>>>is
>>> > advisable to perform query load along with some documents being
>>>ingested.
>>> > >
>>> > > The SOLR cluster was made up of 2 shards and 2 replicas. So there
>>>were
>>> > total 4 replicas serving the queries. The SOLR nodes were running on
>>>an
>>> LXD
>>> > container with 12 cores and 88GB RAM.
>>> > > The heap size allocated was 8g min and 8g max. All the other SOLR
>>> > configurations were default.
>>> > >
>>> > > The client node was running on an 8 core VM.
>>> > >
>>> > > I performed the test with 1 thread, 10 client threads and 50 client
>>> > threads.  I noticed that as I increased the number of threads, the
>>>query
>>> > latency kept increasing drastically which I was not expecting.
>>> > >
>>> > > Since my initial test was randomly picking queries from a file, I
>>> > decided to keep things constant and ran the program which fired the
>>>same
>>> > query again and again. Since it is the same query, all the documents

Re: Architecture suggestions

2017-03-24 Thread Shawn Heisey
On 3/24/2017 7:47 AM, vrindavda wrote:
>  In my case query rate will be average or say low, 100-120 concorrent
> requests.

That is not a low query rate.  A low query rate would be X queries per
second, where X is a small single-digit number.  If there are 100
*simultaneous* requests, then the query rate is likely at least several
hundred per second, which is very high.  Handling that many requests per
second with an index of the size you have mentioned is almost certainly
going to require more than two servers/replicas.

> As per my understanding replica too aid shards in getting result documents,
> correct if I am wrong.

SolrCloud will automatically load balance requests sent to a single
server across the cloud, taking advantage of multiple replicas. 
Depending on what kind of client software is in use, a separate load
balancer might still be a good idea, so the IP address and port isn't a
single point of failure.  If you have software that can move the IP
address to another machine in the event of a failure, that would
probably be enough.

Thanks,
Shawn



Licensing issue advice for Solr.

2017-03-24 Thread russell . lemaster
Hi all, 

I'm just getting started with Solr (6.4.2) and am trying to get approval for 
usage in my workplace. 
I know that the product in general is licensed as Apache 2.0, but unfortunately 
there are packages 
included in the build that are considered "non-permissive" by my company and as 
such, means that 
I am having trouble getting things approved. 

It appears that the vast majority of the licensing issues are within the 
contrib directory. I know these 
provide significant functionality for Solr, but I was wondering if there is an 
official build that contains 
just the Solr and Lucene server distribution (minus demos and contrib). Some of 
the packages are 
dual licensed so I am able to deal with that by selecting which we wish to use, 
but there are some 
that are either not licensed at all or are only non-permissive (ie: not Apache, 
BSD, MIT, etc.) like 
GPL, CDDL, etc. 

Has anyone had to deal with this in the past. My apologies if this has been 
discussed before, but 
it doesn't appear that the mail list archive has a search option (correct me if 
I'm wrong on that). 

Thanks 




Re: to handle expired documents: collection alias or delete by id query

2017-03-24 Thread Tom Evans
On Thu, Mar 23, 2017 at 6:10 AM, Derek Poh  wrote:
> Hi
>
> I have collections of products. I am doing indexing 3-4 times daily.
> Every day there are products that expired and I need to remove them from
> these collectionsdaily.
>
> Ican think of 2 ways to do this.
> 1. using collection aliasto switch between a main and temp collection.
> - clear and index the temp collection
> - create alias to temp collection.
> - clear and index the main collection.
> - create alias to main collection.
>
> this way require additional collections.
>

Another way of doing this is to have a moving alias (not constantly
clearing the "temp" collection). If you reindex daily, your index
would be called "products_mmdd" with an alias to "products". The
advantage of this is that you can roll back to a previous version of
the index if there are problems, and each index is guaranteed to be
freshly created with no artifacts.

The biggest consideration for me would be how long indexing your full
corpus takes you. If you can do it in a small period of time, then
full indexes would be preferable. If it takes a very long time,
deleting is preferable.

If you are doing a cloud setup, full indexes are even more appealing.
You can create the new collection on a single node (even if sharded;
just place each shard on the same node). This would only place the
indexing cost on that one node, whilst other nodes would be unaffected
by indexing degrading regular query response time. You also don't have
to distribute the documents around the cluster. There is no
distributed indexing in Solr, each replica has to index each document
again, even if it is not the leader.

Once indexing is complete, you can expand the collection by adding
replicas of that shard on other nodes - perhaps even removing it from
the node that did the indexing. We have a node that solely does
indexing, before the collection is queried for anything it is added to
the querying nodes.

You can do this manually, or you can automate it using the collections API.

Cheers

Tom


SQL rpt_location question

2017-03-24 Thread GW
Dear reader,

I've found that using the distinct clause gives me the list I want.

I also have a multivalued rpt_location in the collection that I'd like to
use in the filter.

Is this possible in any way shape of form?

Many thanks in advance,

Greg


Re: Architecture suggestions

2017-03-24 Thread vrindavda
Thanks Shawn,
 In my case query rate will be average or say low, 100-120 concorrent
requests.

As per my understanding replica too aid shards in getting result documents,
correct if I am wrong.

Moreover, I intend to have fault tolerant architecture, hence opting for
shards/replicas on different server.

Please advice.

Thanks,
Vrinda Davda



On 24-Mar-2017 6:53 PM, "Shawn Heisey-2 [via Lucene]" <
ml-node+s472066n4326641...@n3.nabble.com> wrote:

On 3/24/2017 1:15 AM, vrindavda wrote:
> Thanks Erick and Emir , for your prompt reply.
>
> We are expecting around 50M documents to sit on 80GB . I understand that
> there is no equation to predict the number/size of server. But
considering
> to have minimal fault tolerant architecture, Will 2 shards and 2 replicas
> with 128GB RAM, 4 core solr instance be advisable ? Will that suffice ?
>
> I am planning to use two solr instances for shards and replicas each and
3
> instances for zookeeper. Please suggest if I am in right direction.

If you have two servers with 128GB and the entire index will be 80GB in
size, this should work well.  The heap would likely be fine at around
8GB, so each server would have a complete copy of the index and would
have enough memory available to cache it entirely.  With two servers,
you want two replicas, regardless of the number of shards.  When I say
two replicas, I am talking about a total of two copies -- not a leader
and two followers.

If the query rate is very low, then sharding would be worthwhile,
because multiple CPUs will be used by a single query.  If the query rate
is high, then you would want all the documents in a single shard, so the
CPUs are not overwhelmed.  If you don't know what the query rate will
be, assume it will be high.

A more detailed discussion:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



--
If you reply to this email, your message will be added to the discussion
below:
http://lucene.472066.n3.nabble.com/Architecture-
suggestions-tp4326436p4326641.html
To unsubscribe from Architecture suggestions, click here

.
NAML





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Architecture-suggestions-tp4326436p4326642.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Architecture suggestions

2017-03-24 Thread Shawn Heisey
On 3/24/2017 1:15 AM, vrindavda wrote:
> Thanks Erick and Emir , for your prompt reply.
>
> We are expecting around 50M documents to sit on 80GB . I understand that
> there is no equation to predict the number/size of server. But considering
> to have minimal fault tolerant architecture, Will 2 shards and 2 replicas
> with 128GB RAM, 4 core solr instance be advisable ? Will that suffice ?
>
> I am planning to use two solr instances for shards and replicas each and 3
> instances for zookeeper. Please suggest if I am in right direction.

If you have two servers with 128GB and the entire index will be 80GB in
size, this should work well.  The heap would likely be fine at around
8GB, so each server would have a complete copy of the index and would
have enough memory available to cache it entirely.  With two servers,
you want two replicas, regardless of the number of shards.  When I say
two replicas, I am talking about a total of two copies -- not a leader
and two followers.

If the query rate is very low, then sharding would be worthwhile,
because multiple CPUs will be used by a single query.  If the query rate
is high, then you would want all the documents in a single shard, so the
CPUs are not overwhelmed.  If you don't know what the query rate will
be, assume it will be high.

A more detailed discussion:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: unable to get more throughput with more threads

2017-03-24 Thread Shawn Heisey
On 3/23/2017 6:10 PM, Suresh Pendap wrote:
> I performed the test with 1 thread, 10 client threads and 50 client
> threads. I noticed that as I increased the number of threads, the
> query latency kept increasing drastically which I was not expecting. 

What language and Solr library was the client using?  If it's Java and
SolrJ, then the following will apply.  If the client is written in a
language other than Java, you may find that there are similar default
settings in the HTTP library:

A dependency called HttpClient is used by SolrJ.  The default settings
for HttpClient are only capable of making *two* simultaneous connections
to a target server.  Further connections will wait until existing
connections are complete.  Unless it is overridden, SolrJ creates the
HttpClient object with default settings.

If more threads are needed, the SolrClient object must be built with a
custom HttpClient.  Here's some SolrJ code to create a
multithread-capable client object (300 threads to a single server):

  RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
  .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
  httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
 
.setMaxConnPerRoute(300).setMaxConnTotal(5000).disableAutomaticRetries()
  .build();
  client = new HttpSolrClient(serverBaseUrl, httpClient);

I have also placed this code at the following URL.  It will expire in
one month:

http://apaste.info/BpoWY

A similar technique can be used with CloudSolrClient if needed.

It's my opinion that SolrJ needs to create client objects by default
that are capable of more threads, but I have not yet put any time into
making it happen.

Thanks,
Shawn



Difference between hashJoin and innerJoin in Streaming Expression

2017-03-24 Thread Zheng Lin Edwin Yeo
Hi,

What is the main difference between hashJoin and innerJoin in Solr
Streaming Expression?

I understand that both will emit a tuple containing the fields of both
tuples.

When I tried both hashJoin and innerJoin with the same query, I get exactly
the same results, and there is no difference in performance.

Under what circumstances should we use hashJoin, and under what
circumstances should we use innerJoin?

Regards,
Edwin


Re: Architecture suggestions

2017-03-24 Thread vrindavda
Thanks Erick and Emir , for your prompt reply.

We are expecting around 50M documents to sit on 80GB . I understand that
there is no equation to predict the number/size of server. But considering
to have minimal fault tolerant architecture, Will 2 shards and 2 replicas
with 128GB RAM, 4 core solr instance be advisable ? Will that suffice ?

I am planning to use two solr instances for shards and replicas each and 3
instances for zookeeper. Please suggest if I am in right direction.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Architecture-suggestions-tp4326436p4326612.html
Sent from the Solr - User mailing list archive at Nabble.com.