Hi Luca,
not sure if I understood well. Your question is
"Why are index times on a solr cloud collecton with 2 replicas higher than
on solr cloud with 1 replica" right?
Well with 2 replicas all docs have to be deparately indexed in 2 places and
solr has to confirm that both indexing went well.
Indexing times are lower on a solrcloud collection with 2 shards (just one
replica, the leader, per shard) because docs are indexed just once and the
load is spread on 2 servers instead of one
2015-12-30 2:03 GMT+01:00 Luca Quarello <[email protected]>:
> Hi,
>
> I have an 260M documents index (90GB) with this structure:
>
>
> <field name="fragment" type="text_general" indexed="true" stored="true"
> multiValued="false" termVectors="false" termPositions="false"
> termOffsets="false" />
>
> <field name="parentId" type="long" indexed="false" stored="true"
> multiValued="false"/>
>
> <field name="fragmentContentType" type="string" indexed="false"
> stored="true" multiValued="false"/>
>
> <field name="creationDate" type="date" indexed="true" stored="true"
> multiValued="false"/>
>
> <field name="creationTimestamp" type="date" indexed="true" stored="true"
> multiValued="false"/>
>
> <field name="visibility" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
> <field name="category" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
> <field name="marked" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
> <!-- catchall field, containing all other searchable text fields
> (implemented
>
> via copyField further on in this schema -->
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
> <copyField source="fragment" dest="text"/>
>
> <copyField source="parentId" dest="text"/>
>
> <copyField source="fragmentContentType" dest="text"/>
>
> <copyField source="creationDate" dest="text"/>
>
> <copyField source="visibility" dest="text"/>
>
> <copyField source="category" dest="text"/>
>
> <copyField source="marked" dest="text"/>
>
>
> where the fragmetnt field contains XML messagges.
>
> There is a search function that provide the messagges satisfying a search
> criterion.
>
>
> TARGET:
>
> To find the best configuration to optimize the response time of a two solr
> instances cloud with 2 VM with 8 core and 32 GB
>
>
> TEST RESULTS:
>
>
> 1.
>
> Configurations:
> 1.
>
> the better configuration without replicas
> - CONF1: 16 shards of 17M documents (8 per VM)
> 1.
>
> configuration with replica
> - CONF 2: 8 shards of 35M documents with replication factor of 1
> - CONF 3: 16 shards of 35M documents with replication factor of 1
>
>
>
> 1.
>
> Executed tests
>
>
> - sequential requests
> - 5 parallel requests
> - 10 parallel requests
> - 20 parallel requests
>
> in two scenarios: during an indexing phase and not
>
>
> Call are: http://localhost:8983/solr/sepa/select?
> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>
>
> 1.
>
> Test results
>
> All the test have point out an I/O utilization of 100MB/s during
>
> loading data on disk cache, disk cache utilization of 20GB and core
> utilization of 100% (all 8 cores)
>
>
>
> -
>
> No indexing
> -
>
> CONF1 (time average and maximum time)
> -
>
> sequential: 4,1 6,9
> -
>
> 5 parallel: 15,6 19,1
> -
>
> 10 parallel: 23,6 30,2
> -
>
> 20 parallel: 48 52,2
> -
>
> CONF2
> -
>
> sequential: 12,3 17,4
> -
>
> 5 parallel: 32,5 34,2
> -
>
> 10 parallel: 45,4 49
> -
>
> 20 parallel: 64,6 74
> -
>
> CONF3
> -
>
> sequential: 6,9 9,9
> -
>
> 5 parallel: 33,2 37,5
> -
>
> 10 parallel: 46 51
> -
>
> 20 parallel: 68 83
>
>
>
> -
>
> Indexing (into the solr admin console is it possible to view the
> total throughput?
> I find it only relative to a single shard).
>
>
> CONF1
>
> -
>
> sequential: 7,7 9,5
> -
>
> 5 parallel: 26,8 28,4
> -
>
> 10 parallel: 31,8 37,8
> -
>
> 20 parallel: 42 52,5
> -
>
> CONF2
> -
>
> sequential: 12,3 19
> -
>
> 5 parallel: 39 40,8
> -
>
> 10 parallel: 56,6 62,9
> -
>
> 20 parallel: 79 116
> -
>
> CONF3
> -
>
> sequential: 10 18,9
> -
>
> 5 parallel: 36,5 41,9
> -
>
> 10 parallel: 63,7 64,1
> -
>
> 20 parallel: 85 120
>
>
>
> I have two question:
>
> -
>
> the response times of the configuration with replica are worse (in test
> case of sequential requests worse of about three time) than the response
> times of the configuration without replica. Is it an expected result?
> - Why during index inserting and updating replicas doesn’t help to
> reduce the response time?
>