Re: Index very large number of documents from large number of clients

2015-08-15 Thread Toke Eskildsen
Troy Edwards wrote: > 1) There are about 6000 clients > 2) The number of documents from each client are about 50 (average > document size is about 400 bytes) So roughly 3 billion documents / 1TB index size. So at least 2 shards, due to the 2 billion limit in Lucene. If you want more advice t

Re: Index very large number of documents from large number of clients

2015-08-15 Thread Erick Erickson
Piling on here. At the scale you're talking, I suspect you'll not only have a bunch of servers, you'll really have a bunch of completely separate "Solr Clouds", complete with their own Zookeepers etc. Partly for administrative sake, partly for stability, etc. Not sure that'll be true, mind you, bu

Re: Admin Login

2015-08-15 Thread Erick Erickson
Scott: You better not even let them access Solr directly. http://server:port/solr/admin/collections?ACTION=delete&name=collection. Try it sometime on a collection that's not important ;) But as Walter said, that'd be similar to allowing end users unrestricted access to a SOL database, t

Re: Index very large number of documents from large number of clients

2015-08-15 Thread Shawn Heisey
On 8/15/2015 2:03 PM, Troy Edwards wrote: > I am using SolrCloud > > My initial requirements are: > > 1) There are about 6000 clients > 2) The number of documents from each client are about 50 (average > document size is about 400 bytes) > 3 I have to wipe off the index/collection every night

Re: Admin Login

2015-08-15 Thread Scott Derrick
Walter, actually that explains it perfectly! I will move behind my apache server... thanks, Scott On 8/15/2015 6:15 PM, Walter Underwood wrote: No one runs a public-facing Solr server. Just like no one runs a public-facing MySQL server. wunder Walter Underwood wun...@wunderwood.org http://

Re: Admin Login

2015-08-15 Thread Walter Underwood
No one runs a public-facing Solr server. Just like no one runs a public-facing MySQL server. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 15, 2015, at 4:15 PM, Scott Derrick wrote: > I'm somewhat puzzled there is no built in security. I can'

Admin Login

2015-08-15 Thread Scott Derrick
I'm somewhat puzzled there is no built in security. I can't image anybody is running a public facing solr server with the admin page wide open? I've searched and haven't found any solutions that work out of the box. I've tried the solutions here to no avail. https://wiki.apache.org/solr/Solr

Re: Cache for percentiles facets

2015-08-15 Thread Erick Erickson
You have to provide a lot more info about your problem, including what you've tried, what your data looks like, etc. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Sat, Aug 15, 2015 at 10:27 AM, Håvard Wahl Kongsgård wrote: > Hi, I have tried various options to s

Re: Index very large number of documents from large number of clients

2015-08-15 Thread Alexandre Rafalovitch
This is beyond my direct area of expertise, but one way to look at this would be: 1) Create new collections offline. Down to each of the 6000 clients having its own private collection (embedded SolrJ/server). Or some sort of mini-hubs, e.g. a server per N clients. 2) Bring those collections into ce

Index very large number of documents from large number of clients

2015-08-15 Thread Troy Edwards
I am using SolrCloud My initial requirements are: 1) There are about 6000 clients 2) The number of documents from each client are about 50 (average document size is about 400 bytes) 3 I have to wipe off the index/collection every night and create new Any thoughts/ideas/suggestions on: 1) Ho

Cache for percentiles facets

2015-08-15 Thread Håvard Wahl Kongsgård
Hi, I have tried various options to speed up percentile calculation for facets. But the internal solr cache only speed up my queries from 22 to 19 sec. I'am using the new json facets http://yonik.com/json-facet-api/ Any tips for caching stats? -Håvard

Re: phonetic filter factory question

2015-08-15 Thread Alexandre Rafalovitch
>From the "teaching to fish" category of advice (since I don't know the actual answer). Did you try "Analysis" screen in the Admin UI? If you check "Verbose output" mark, you will see all the offsets and can easily confirm the detailed behavior for yourself. Regards, Alex. Solr Analyzers,

phonetic filter factory question

2015-08-15 Thread Jamie Johnson
The JavaDoc says that the PhoneticFilterFactory will "inject" tokens with an offset of 0 into the stream. I'm assuming this means an offset of 0 from the token that it is analyzing, is that right? I am trying to collapse some of my schema, I currently have a text field that I use for general purp

Re: Big SolrCloud cluster with a lot of collections

2015-08-15 Thread Toke Eskildsen
yura last wrote: > Hi All, I am testing a SolrCloud with many collections. The version is 5.2.1 > and I installed 3 machines – each one with 4 cores and 8 GB Ram.Then I > created collections with 3 shards and replication factor of 2. It gives me 2 > cores per collection on each machine.I reached a

Re: Big SolrCloud cluster with a lot of collections

2015-08-15 Thread Jack Krupansky
1. Keep the number of collections down to the low hundreds max. Preferably no more than a few dozen or a hundred. 2. 8GB is too small to be useful. 16 GB min. 3. If you need large numbers of machines, organize them as separate clusters. 4. Figure 100 to 200 million documents on a Solr server. E.g.,

Re: Solr relevant results

2015-08-15 Thread Alexandre Rafalovitch
If I understood your question correctly, that's what I am suggesting to try. Notice that, as I mentioned earlier, that ignores all the complexity of similarity, ranking, etc that Solr offers. But it does not seem you need it in your particular case, as you are just searching for presence/absence o

Big SolrCloud cluster with a lot of collections

2015-08-15 Thread yura last
Hi All, I am testing a SolrCloud with many collections. The version is 5.2.1 and I installed 3 machines – each one with 4 cores and 8 GB Ram.Then I created collections with 3 shards and replication factor of 2. It gives me 2 cores per collection on each machine.I reached almost 900 collections a