Re: Shard size variation

2018-04-30 Thread Shawn Heisey
On 4/30/2018 2:56 PM, Michael Joyner wrote: > Based on experience, 2x head room is room is not always enough, > sometimes not even 3x, if you are optimizing from many segments down > to 1 segment in a single go. In all situations a user is likely to encounter in the wild, having enough extra disk

Re: Load Balancing between Two Cloud Clusters

2018-04-30 Thread Shawn Heisey
On 4/30/2018 12:03 PM, Monica Skidmore wrote: > As we try to set up an external load balancer to go between two clusters, > though, we still have some questions. We need a way to determine that a node > is still 'alive' and should be in the load balancer, and we need a way to > know that a new

Re: Shard size variation

2018-04-30 Thread Antony A
Thank you all. I have around 70% free space in production. I will compute for the additional fields. Sent from my mobile. Please excuse any typos. > On Apr 30, 2018, at 5:10 PM, Erick Erickson wrote: > > There's really no good way to purge deleted documents from the

Re: Load Balancing between Two Cloud Clusters

2018-04-30 Thread Erick Erickson
"We need a way to determine that a node is still 'alive' and should be in the load balancer, and we need a way to know that a new node is now available and fully ready with its replicas to add to the load balancer." Why? If a Solr node is running but the replicas aren't up yet, it'll pass the

Re: Shard size variation

2018-04-30 Thread Erick Erickson
There's really no good way to purge deleted documents from the index other than to wait until merging happens. Optimize/forceMerge and expungeDeletes both suffer from the problem that they create massive segments that then stick around for a very long time, see:

Re: Shard size variation

2018-04-30 Thread Michael Joyner
Based on experience, 2x head room is room is not always enough, sometimes not even 3x, if you are optimizing from many segments down to 1 segment in a single go. We have however figured out a way that can work with as little as 51% free space via the following iteration cycle: public void

Re: Shard size variation

2018-04-30 Thread Walter Underwood
You need 2X the minimum index size in disk space anyway, so don’t worry about keeping the indexes as small as possible. Worry about having enough headroom. If your indexes are 250 GB, you need 250 GB of free space. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/

Re: Shard size variation

2018-04-30 Thread Antony A
Thanks Erick/Deepak. The cloud is running on baremetal (128 GB/24 cpu). Is there an option to run a compact on the data files to make the size equal on both the clouds? I am trying find all the options before I add the new fields into the production cloud. Thanks AA On Mon, Apr 30, 2018 at

Re: Confusing SOLR results after upgrading from 4.10 to 7.1

2018-04-30 Thread Susheel Kumar
This may not be the reason but i noticed you have FlattenGraphFilterFactory at query time while its only required at index time. I would suggest to go Analysis tab if not checked already. Thnx On Mon, Apr 30, 2018 at 2:22 PM, Hodder, Rick wrote: > I upgraded from SOLR 4.10

Confusing SOLR results after upgrading from 4.10 to 7.1

2018-04-30 Thread Hodder, Rick
I upgraded from SOLR 4.10 to SOLR 7.1 In the core, I have a string field called "company" and string field "year", and I have an index on company called IDX_Company. Here is the definition of the company field, and the definition of text_general in my schema in 4.10

Re: Load Balancing between Two Cloud Clusters

2018-04-30 Thread Monica Skidmore
Thank you, Erick. That confirms our understanding for a single cluster, or once we select a node from one of the two clusters to query. As we try to set up an external load balancer to go between two clusters, though, we still have some questions. We need a way to determine that a node is

Re: Load Balancing between Two Cloud Clusters

2018-04-30 Thread Erick Erickson
Multiple clusters with the same dataset aren't load-balanced by Solr, you'll have to accomplish that from "outside", e.g. something that sends queries to each cluster. _Within_ a cluster (collection), as long as a request gets to any Solr node, sub-requests are distributed with an internal

Re: Team please help

2018-04-30 Thread Greg Solovyev
Sujeet, what do you mean by migrating? E.g., are you moving your data from Cloudera CDH to Azure HDI? Are migrating your application code written on top of Cloudera CDH to run on top of Azure HDI? As far as I know, Azure HDI does not include Solr, so if your application on top of Cloudera CDH is

Load Balancing between Two Cloud Clusters

2018-04-30 Thread Monica Skidmore
We are migrating from a master-slave configuration to Solr cloud (7.3) and have questions about the preferred way to load balance between the two clusters. It looks like we want to use a load balancer that directs queries to any of the server nodes in either cluster, trusting that node to

Re: Shard size variation

2018-04-30 Thread Erick Erickson
Anthony: You are probably seeing the results of removing deleted documents from the shards as they're merged. Even on replicas in the same _shard_, the size of the index on disk won't necessarily be identical. This has to do with which segments are selected for merging, which are not necessarily

Re: Multiple Solr Versions on same tomcat instance

2018-04-30 Thread THADC
thank you Shawn, We will go ahead and migrate to solr 7.3.0 and run as a stand alone appliance and so I do not need to worry about the tomcat issue. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Shard size variation

2018-04-30 Thread Deepak Goel
Could you please also give the machine details of the two clouds you are running? Deepak "The greatness of a nation can be judged by the way its animals are treated. Please stop cruelty to Animals, become a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool

Re: Shard size variation

2018-04-30 Thread Antony A
Hi Shawn, The cloud is running version 6.2.1. with ClassicIndexSchemaFactory The sum of size from admin UI on all the shards is around 265 G vs 224 G between the two clouds. I created the collection using "numShards" so compositeId router. If you need more information, please let me know.

Re: Shard size variation

2018-04-30 Thread Shawn Heisey
On 4/30/2018 9:51 AM, Antony A wrote: I am running two separate solr clouds. I have 8 shards in each with a total of 300 million documents. Both the clouds are indexing the document from the same source/configuration. I am noticing there is a difference in the size of the collection between

Shard size variation

2018-04-30 Thread Antony A
Hi all, I am trying to find if anyone has suggestion for the below. I am running two separate solr clouds. I have 8 shards in each with a total of 300 million documents. Both the clouds are indexing the document from the same source/configuration. I am noticing there is a difference in the size

Re: Multiple Solr Versions on same tomcat instance

2018-04-30 Thread Shawn Heisey
On 4/30/2018 4:32 AM, THADC wrote: First of all, I have a second (unrelated) question on this solr user group. I hope it is ok to have more than one question being asked at the same time against this group. Please let me know if not. Anyway, I have a need to keep our existing solr version 4.7

RE: Missing bin folder

2018-04-30 Thread Lucía Sarni Cornes
Thanks Erick! Problem solved. De: Erick Erickson Enviado: lunes, 30 de abril de 2018 12:07 Para: solr-user Asunto: Re: Missing bin folder This was only added in 4.10, see: SOLR-3617 Best, Erick On Mon, Apr 30, 2018 at 6:24 AM, Lucía

Re: Missing bin folder

2018-04-30 Thread Erick Erickson
This was only added in 4.10, see: SOLR-3617 Best, Erick On Mon, Apr 30, 2018 at 6:24 AM, Lucía Sarni Cornes wrote: > I want to use Solr 4.8.1, and I download it from here but it's missing the > bin folder https://archive.apache.org/dist/lucene/solr/4.8.1/ > This seems

Re: solr can't find my config set when creating a new collection

2018-04-30 Thread Erick Erickson
The error message you were getting was that "multiValued" must be camel-cased multivalued .vs. multiValued This last one is correct. On Mon, Apr 30, 2018 at 6:25 AM, THADC wrote: > ok, I have fixed my issue. I needed to delete the config set first, so I did: >

Re: solr can't find my config set when creating a new collection

2018-04-30 Thread THADC
ok, I have fixed my issue. I needed to delete the config set first, so I did: http://localhost:8983/solr/admin/configs?action=DELETE=timConfig , then I ran: ./bin/solr zk upconfig -n timConfig -d /home/tim/solr-7.3.0/server/solr/configsets/timConfig/ -z localhost:2181 , then tried creating

Missing bin folder

2018-04-30 Thread Lucía Sarni Cornes
I want to use Solr 4.8.1, and I download it from here but it's missing the bin folder https://archive.apache.org/dist/lucene/solr/4.8.1/ This seems to be the case for other versions on the archive. I did installed the last version using brew but I need an older version that has the

Parent child documents partial update

2018-04-30 Thread Krishna Kumar Sharma
Hello can i update partial document of parent help me Thanks Krishna Kumar Sharma

Re: Regarding LTR feature

2018-04-30 Thread Alessandro Benedetti
Hi Prateek, with query and FQ Solr is expected to score a document only if that document is a match of all the FQ results intersected with the query results [1]. Then re-ranking happens, so effectively, only the top K intersected documents will be re-ranked. If you are curious about the code,

Re: How to create a solr collection providing as much searching flexibility as possible?

2018-04-30 Thread Alessandro Benedetti
Hi Raymond, as Charlie correctly stated, the input format is not that important, what is important is to focus on your requirements and properly design a configuration and data model to solve them. Extracting the information for such a data format is not going to be particularly challenging ( as

Multiple Solr Versions on same tomcat instance

2018-04-30 Thread THADC
Hello, First of all, I have a second (unrelated) question on this solr user group. I hope it is ok to have more than one question being asked at the same time against this group. Please let me know if not. Anyway, I have a need to keep our existing solr version 4.7 instance running while I test

Re: Error when starting standalone zookeeper instance

2018-04-30 Thread THADC
sorry, I found the zookeeper group and got my question answered there. I will be more careful in the future. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: solr can't find my config set when creating a new collection

2018-04-30 Thread THADC
thanks, I got further this time after first uploading. I ran: ./bin/solr zk upconfig -n timConfig -d /home/tim/solr-7.3.0/server/solr/configsets/timConfig/ -z localhost:2181 However, I then got an error when trying to create the collections again: responseHeader status 0 QTime

404 error on Solr 7.2.1 dataimport handler (on Windows via Cygwin)

2018-04-30 Thread PeterKerk
I'm running Solr 7.2.1 on Windows via Cygwin. I've installed Solr7.2.1 but I'm getting a 404 when trying to run the dataimport handler: http://localhost:8983/solr/tt-giftsamplecatalog/dataimport?command=full-import After calling this URL, I don't see any logging in the console. The error in my

Re: How to create a solr collection providing as much searching flexibility as possible?

2018-04-30 Thread Charlie Hull
On 29/04/2018 22:25, Raymond Xie wrote: Thank you Alessandro, It looks like my requirement is vague, but indeed I already indicated my data is in FIX format, which is a format, here is an example in the Wiki link in my original question: 8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS |