Solr Document Routing

2017-06-01 Thread Sathyam
HI,

I am indexing documents to a 10 shard collection (testcollection, having no
replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot
of peer to peer document distribution going on when I looked at the solr
logs.

An example log statement is as follows:
2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
s:shard8 r:core_node7 x:testcollection_shard8_replica1]
o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
 webapp=/solr path=/update params={update.distrib=TOLEADER=
http://10.199.42.29:8983/solr/testcollection_shard7_replica1/=javabin=2}{add=[BQECDwZGTCEBHZZBBiIP
(1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
(1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25

When I went through the code of CloudSolrClient on grepcode I saw that the
client itself finds out which server it needs to hit by using the message
id hash and getting the shard range information from state.json.
Then it is quite confusing to me why there is a distribution of data
between peers as there is no replication and each shard is a leader.

I would like to know why this is happening and how to avoid it or if the
above log statement means something else and I am misinterpreting something.

-- 
Sathyam Doraswamy


Re: Query regarding URL Analysers

2014-08-28 Thread Sathyam
Gentle Reminder


On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote:

 Hi,

 I needed to generate tokens out of a URL such that I am able to get
 hierarchical units of the URL as well as each individual entity as tokens.
 For example:
 *Given a URL : *

 http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz

 The tokens that I need are :

 *Hierarchical subsets of the URL*

 1 http://

 2 http://www.google.com/

 3 http://www.google.com/abcd/

  4 http://www.google.com/abcd/efgh/

 5 http://www.google.com/abcd/efgh/ijkl/

  6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

 *Individual elements in the path to the resource*

 7 abcd

 8 efgh

 9 ijkl

 10 mnop.php

 *Query Terms*

 11 a=10

 12 b=20

 13 c=30

 *Fragment*
 14 xyz

 This comes to a total of 14 tokens for the given URL.
 Basically a URL analyzer that creates tokens based on the categories
 mentioned in bold. Also a separate token for port(if mentioned).

 I would like to know how this can be achieved by using a single analyzer
 that uses a combination of the tokenizers and filters provided by solr.
 Also curious to know why there is a restriction of only *one  *tokenizer
 to be used in an analyzer.
 Looking forward to a response from your side telling the best possible way
 to achieve the closest to what I need.

 Thanks.
 --
 Sathyam Doraswamy






-- 
Sathyam Doraswamy


Query regarding URL Analysers

2014-08-21 Thread Sathyam
Hi,

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://

2 http://www.google.com/

3 http://www.google.com/abcd/

4 http://www.google.com/abcd/efgh/

5 http://www.google.com/abcd/efgh/ijkl/

 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

*Fragment*
14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer to
be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Thanks.
-- 
Sathyam Doraswamy