Solr Document Routing
HI, I am indexing documents to a 10 shard collection (testcollection, having no replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot of peer to peer document distribution going on when I looked at the solr logs. An example log statement is as follows: 2017-06-01 06:07:28.378 INFO (qtp1358444045-3673692) [c:testcollection s:shard8 r:core_node7 x:testcollection_shard8_replica1] o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1] webapp=/solr path=/update params={update.distrib=TOLEADER= http://10.199.42.29:8983/solr/testcollection_shard7_replica1/=javabin=2}{add=[BQECDwZGTCEBHZZBBiIP (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904), BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25 When I went through the code of CloudSolrClient on grepcode I saw that the client itself finds out which server it needs to hit by using the message id hash and getting the shard range information from state.json. Then it is quite confusing to me why there is a distribution of data between peers as there is no replication and each shard is a leader. I would like to know why this is happening and how to avoid it or if the above log statement means something else and I am misinterpreting something. -- Sathyam Doraswamy
Re: Query regarding URL Analysers
Gentle Reminder On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote: Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz The tokens that I need are : *Hierarchical subsets of the URL* 1 http:// 2 http://www.google.com/ 3 http://www.google.com/abcd/ 4 http://www.google.com/abcd/efgh/ 5 http://www.google.com/abcd/efgh/ijkl/ 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php *Individual elements in the path to the resource* 7 abcd 8 efgh 9 ijkl 10 mnop.php *Query Terms* 11 a=10 12 b=20 13 c=30 *Fragment* 14 xyz This comes to a total of 14 tokens for the given URL. Basically a URL analyzer that creates tokens based on the categories mentioned in bold. Also a separate token for port(if mentioned). I would like to know how this can be achieved by using a single analyzer that uses a combination of the tokenizers and filters provided by solr. Also curious to know why there is a restriction of only *one *tokenizer to be used in an analyzer. Looking forward to a response from your side telling the best possible way to achieve the closest to what I need. Thanks. -- Sathyam Doraswamy -- Sathyam Doraswamy
Query regarding URL Analysers
Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz The tokens that I need are : *Hierarchical subsets of the URL* 1 http:// 2 http://www.google.com/ 3 http://www.google.com/abcd/ 4 http://www.google.com/abcd/efgh/ 5 http://www.google.com/abcd/efgh/ijkl/ 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php *Individual elements in the path to the resource* 7 abcd 8 efgh 9 ijkl 10 mnop.php *Query Terms* 11 a=10 12 b=20 13 c=30 *Fragment* 14 xyz This comes to a total of 14 tokens for the given URL. Basically a URL analyzer that creates tokens based on the categories mentioned in bold. Also a separate token for port(if mentioned). I would like to know how this can be achieved by using a single analyzer that uses a combination of the tokenizers and filters provided by solr. Also curious to know why there is a restriction of only *one *tokenizer to be used in an analyzer. Looking forward to a response from your side telling the best possible way to achieve the closest to what I need. Thanks. -- Sathyam Doraswamy