Re: Mismatch between replication API & index.properties
Yes, that is what my understanding is but if you see the Replication handler response it is saying it is referring to the index folder not to the one shown in index.properties. Due to that confusion I am not able to delete the folder. Is this some bug or default behavior where irrespective of the index.properties it will always shows the index folder only. Solr version - 6.6.2 On Wed, Jul 31, 2019, 21:17 jai dutt wrote: > It's correct behaviour , Solr put replica index file in this format only > and you can find latest index pointing in index.properties file. Usually > afer successful full replication Solr remove old timestamp dir. > > On Wed, 31 Jul, 2019, 8:02 PM Aman Tandon, > wrote: > > > Hi, > > > > We are having a situation where whole disk space is full and in server > > where we are seeing the multiple index directories ending with the > > timestamp. Upon checking the index.properties file for a particular shard > > replica, it is not referring to the folder name *index *but when I am > using > > the replication API I am seeing it is pointing to *index *folder. Am I > > missing something? Kindly advise. > > > > *directory* > > > > > > > > *drwxrwxr-x. 2 fusion fusion 69632 Jul 30 23:24 indexdrwxrwxr-x. 2 fusion > > fusion 28672 Jul 31 03:02 index.20190731005047763drwxrwxr-x. 2 fusion > > fusion 4096 Jul 31 10:20 index.20190731095757917* > > -rw-rw-r--. 1 fusion fusion78 Jul 31 03:02 index.properties > > -rw-rw-r--. 1 fusion fusion 296 Jul 31 09:56 replication.properties > > drwxrwxr-x. 2 fusion fusion 4096 Jan 16 2019 snapshot_metadata > > drwxrwxr-x. 2 fusion fusion 4096 Jul 30 23:24 tlog > > > > *index.properties* > > > > #index.properties > > #Wed Jul 31 03:02:12 EDT 2019 > > index=index.20190731005047763 > > > > *REPLICATION API STATUS* > > > > > > 280.56 GB > > > > */opt/solr/x_shard4_replica3/data/index/* > > > > ... > > true > > false > > 1564543395563 > > 98884 > > ... > > ... > > > > Regards, > > Aman > > >
Mismatch between replication API & index.properties
Hi, We are having a situation where whole disk space is full and in server where we are seeing the multiple index directories ending with the timestamp. Upon checking the index.properties file for a particular shard replica, it is not referring to the folder name *index *but when I am using the replication API I am seeing it is pointing to *index *folder. Am I missing something? Kindly advise. *directory* *drwxrwxr-x. 2 fusion fusion 69632 Jul 30 23:24 indexdrwxrwxr-x. 2 fusion fusion 28672 Jul 31 03:02 index.20190731005047763drwxrwxr-x. 2 fusion fusion 4096 Jul 31 10:20 index.20190731095757917* -rw-rw-r--. 1 fusion fusion78 Jul 31 03:02 index.properties -rw-rw-r--. 1 fusion fusion 296 Jul 31 09:56 replication.properties drwxrwxr-x. 2 fusion fusion 4096 Jan 16 2019 snapshot_metadata drwxrwxr-x. 2 fusion fusion 4096 Jul 30 23:24 tlog *index.properties* #index.properties #Wed Jul 31 03:02:12 EDT 2019 index=index.20190731005047763 *REPLICATION API STATUS* 280.56 GB */opt/solr/x_shard4_replica3/data/index/* ... true false 1564543395563 98884 ... ... Regards, Aman
Re: Solr relevancy score different on replicated nodes
Thanks Erick for your suggestions and time. On Tue, Feb 12, 2019, 22:32 Erick Erickson You really only have four > 1> use exactstats. This won't guarantee precise matches, but they'll be > closer > 2> optimize (not particularly recommended, but if you're willing to do > it periodically it'll have the stats match until the next updates). > 3> use TLOG/PULL replicas and confine the requests to the PULL > replicas. There'll _still_ be some window for mismatches, > specifically the default is commit_interval/2 > 4> define the problem away. > > Best, > Erick > > On Tue, Feb 12, 2019 at 2:42 AM Aman Tandon > wrote: > > > > Hi Erick, > > > > Any suggestions on this? > > > > Regards, > > Aman > > > > On Fri, Feb 8, 2019, 17:07 Aman Tandon > > > > Hi Erick, > > > > > > I find this thread very relevant to the people who are facing the same > > > problem. > > > > > > In our case, we have a signals aggregation collection which is having > > > total of around 8 million records. We have Solr cloud architecture(3 > shards > > > and 4 replicas) and the whole size of index is of around 2.5 GB. > > > > > > We use this collection to fetch the most clicked products against a > query > > > and boost in search results. Boost score is the query score on > aggregation > > > collection. > > > > > > But when the query goes to different replica we get different boost > score > > > for some of the keywords, hence on page refresh results ordering keep > on > > > changing. > > > > > > In order to solve we tried the exactstats cache for distributed IDF > and on > > > debug level I am seeing global stats merge in logs but still the > different > > > scores coming on refreshing the results from aggregation collection. > > > > > > Our indexing occur once a day so should we do daily optimization or > should > > > we reduce merge segment count to 2/3 currently it is -1. > > > > > > What are your suggestions on this? > > > > > > Regards, > > > Aman > > > > > > On Fri, Feb 8, 2019, 00:15 Erick Erickson wrote: > > > > > >> Optimization is safe. The large segment is irrelevant, you'll > > >> lose a little parallelization, but on an index with this few > > >> documents I doubt you'll notice. > > >> > > >> As of Solr 5, optimize will respect the max segment size > > >> which defaults to 5G, but you're well under that limit. > > >> > > >> Best, > > >> Erick > > >> > > >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht > > > >> wrote: > > >> > > > >> > Thanks Erick and everyone.We are checking on stats cache. > > >> > > > >> > I noticed stats skew again and optimized the index to correct the > > >> same.As > > >> > per the documents. > > >> > > > >> > > > >> > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ > > >> > and > > >> > > > >> > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ > > >> > > > >> > wanted to check on below points considering we want stats skew to be > > >> > corrected. > > >> > > > >> > 1.When optimized single segment won't be natural merged easily.As we > > >> might > > >> > be doing manual optimize every time,what I visualize is at a certain > > >> point > > >> > in future we might be having a single large segment.What impact this > > >> large > > >> > segment is going to have? > > >> > Our index ~30k documents i.e files with content(Segment size <1Gb > as of > > >> now) > > >> > > > >> > 1.Do you recommend going for optimize in these situations?Probably > it > > >> will > > >> > be done only when stats skew.Is it safe? > > >> > > > >> > Regards > > >> > Ashish > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > -- > > >> > Sent from: > http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > > >> > > > >
Re: Solr relevancy score different on replicated nodes
Hi Erick, Any suggestions on this? Regards, Aman On Fri, Feb 8, 2019, 17:07 Aman Tandon Hi Erick, > > I find this thread very relevant to the people who are facing the same > problem. > > In our case, we have a signals aggregation collection which is having > total of around 8 million records. We have Solr cloud architecture(3 shards > and 4 replicas) and the whole size of index is of around 2.5 GB. > > We use this collection to fetch the most clicked products against a query > and boost in search results. Boost score is the query score on aggregation > collection. > > But when the query goes to different replica we get different boost score > for some of the keywords, hence on page refresh results ordering keep on > changing. > > In order to solve we tried the exactstats cache for distributed IDF and on > debug level I am seeing global stats merge in logs but still the different > scores coming on refreshing the results from aggregation collection. > > Our indexing occur once a day so should we do daily optimization or should > we reduce merge segment count to 2/3 currently it is -1. > > What are your suggestions on this? > > Regards, > Aman > > On Fri, Feb 8, 2019, 00:15 Erick Erickson >> Optimization is safe. The large segment is irrelevant, you'll >> lose a little parallelization, but on an index with this few >> documents I doubt you'll notice. >> >> As of Solr 5, optimize will respect the max segment size >> which defaults to 5G, but you're well under that limit. >> >> Best, >> Erick >> >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht >> wrote: >> > >> > Thanks Erick and everyone.We are checking on stats cache. >> > >> > I noticed stats skew again and optimized the index to correct the >> same.As >> > per the documents. >> > >> > >> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ >> > and >> > >> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ >> > >> > wanted to check on below points considering we want stats skew to be >> > corrected. >> > >> > 1.When optimized single segment won't be natural merged easily.As we >> might >> > be doing manual optimize every time,what I visualize is at a certain >> point >> > in future we might be having a single large segment.What impact this >> large >> > segment is going to have? >> > Our index ~30k documents i.e files with content(Segment size <1Gb as of >> now) >> > >> > 1.Do you recommend going for optimize in these situations?Probably it >> will >> > be done only when stats skew.Is it safe? >> > >> > Regards >> > Ashish >> > >> > >> > >> > >> > >> > >> > -- >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> >
Re: Solr relevancy score different on replicated nodes
Hi Erick, I find this thread very relevant to the people who are facing the same problem. In our case, we have a signals aggregation collection which is having total of around 8 million records. We have Solr cloud architecture(3 shards and 4 replicas) and the whole size of index is of around 2.5 GB. We use this collection to fetch the most clicked products against a query and boost in search results. Boost score is the query score on aggregation collection. But when the query goes to different replica we get different boost score for some of the keywords, hence on page refresh results ordering keep on changing. In order to solve we tried the exactstats cache for distributed IDF and on debug level I am seeing global stats merge in logs but still the different scores coming on refreshing the results from aggregation collection. Our indexing occur once a day so should we do daily optimization or should we reduce merge segment count to 2/3 currently it is -1. What are your suggestions on this? Regards, Aman On Fri, Feb 8, 2019, 00:15 Erick Erickson Optimization is safe. The large segment is irrelevant, you'll > lose a little parallelization, but on an index with this few > documents I doubt you'll notice. > > As of Solr 5, optimize will respect the max segment size > which defaults to 5G, but you're well under that limit. > > Best, > Erick > > On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht > wrote: > > > > Thanks Erick and everyone.We are checking on stats cache. > > > > I noticed stats skew again and optimized the index to correct the same.As > > per the documents. > > > > > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ > > and > > > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ > > > > wanted to check on below points considering we want stats skew to be > > corrected. > > > > 1.When optimized single segment won't be natural merged easily.As we > might > > be doing manual optimize every time,what I visualize is at a certain > point > > in future we might be having a single large segment.What impact this > large > > segment is going to have? > > Our index ~30k documents i.e files with content(Segment size <1Gb as of > now) > > > > 1.Do you recommend going for optimize in these situations?Probably it > will > > be done only when stats skew.Is it safe? > > > > Regards > > Ashish > > > > > > > > > > > > > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified
I see two rows params, looks like which will be overwritten by rows=2, and then your tags filter is resulting only one document. Please remove extra rows and try. On Mon, Jan 21, 2019, 08:44 Pratik Patel Hi Everyone! > > I am trying to use MLT request handler. My query matches more than one > documents but the response always seems to pick up the first document and > interestingTerms also seems to be corresponding to that single document > only. > > What I am expecting is that if my query matches multiple documents then the > InterestingTerms handler result also corresponds to that set of documents > and not the first document. > > Following is my query, > > > http://localhost:8081/solr/collection1/mlt?debugQuery=on=tags:test=true=mlt.fl=textpropertymlt=details=1=2=3=*:*=100=2=0 > > Ultimately, my goal is to get interesting terms corresponding to this whole > set of documents. I don't need similar documents as such. If not with mlt, > is there any other way I can achieve this? that is, given a query matching > set of documents, find interestingTerms for that set of documents based on > tf-idf? > > Thanks! > Pratik >
Re: solr-query
Hi Shilpa, I am assuming you know the functionality of synonym. Synonym in Solr can be applied over the tokens getting indexed/queried for the field. In order to apply synonym to a field you need to update the configuration file schema.xml where you also define a file (synonym.txt is default, you can create per field separate file as well) which is keeping synonym for your business requirement. You can also define synonym to apply at index or query time or both for a field. However if you applying at index time, then any new synonym addition to synonym file, require to reindex the whole collection. You could read more about synonym at here. https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-SynonymGraphFilter Also a good blogs on multi word synonym. https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ On Fri, Jan 18, 2019, 21:44 Shilpa Solanki Hello, > > can you ask me how we use synonyms with apache solr. > > > Thanks & Regards, > Shilpa solanki >
Re: Zookeeper timeout issue -
As Jan mentioned also see GC activity or memory issues, also check out for the threads by looking if any thread pending/waiting too long. On Fri, Dec 28, 2018, 16:14 AshB Hi Dominique, > > Yes,we are load testing with 50 users.We tried changing the timeout but its > not reflecting. > > Regards > Ashish > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: REBALANCELEADERS is not reliable
++ correction On Fri, Nov 30, 2018, 01:10 Aman Tandon For me today, I deleted the leader replica of one of the two shard > collection. Then other replicas of that shard wasn't getting elected for > leader. > > After waiting for long tried the setting addreplicaprop preferred leader > on one of the replica then tried FORCELEADER but no luck. Then also tried > rebalance but no help. Finally have to recreate the whole collection. > > Not sure what was the issue but both FORCELEADER AND REBALANCING didn't > work if there was no leader however preferred leader property was setted. > > On Wed, Nov 28, 2018, 12:54 Bernd Fehling wrote: > >> Hi Vadim, >> >> thanks for confirming. >> So it seems to be a general problem with Solr 6.x, 7.x and might >> be still there in the most recent versions. >> >> But where to start to debug this problem, is it something not >> correctly stored in zookeeper or is overseer the problem? >> >> I was also reading something about a "leader queue" where possible >> leaders have to be requeued or something similar. >> >> May be I should try to get a situation where a "locked" core >> is on the overseer and then connect the debugger to it and step >> through it. >> Peeking and poking around, like old Commodore 64 days :-) >> >> Regards, Bernd >> >> >> Am 27.11.18 um 15:47 schrieb Vadim Ivanov: >> > Hi, Bernd >> > I have tried REBALANCELEADERS with Solr 6.3 and 7.5 >> > I had very similar results and notion that it's not reliable :( >> > -- >> > Br, Vadim >> > >> >> -Original Message- >> >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] >> >> Sent: Tuesday, November 27, 2018 5:13 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: REBALANCELEADERS is not reliable >> >> >> >> Hi list, >> >> >> >> unfortunately REBALANCELEADERS is not reliable and the leader >> >> election has unpredictable results with SolrCloud 6.6.5 and >> >> Zookeeper 3.4.10. >> >> Seen with 5 shards / 3 replicas. >> >> >> >> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active. >> >> - setting with ADDREPLICAPROP the property preferredLeader to other >> replicas >> >> - calling REBALANCELEADERS >> >> - some leaders have changed, some not. >> >> >> >> I then tried: >> >> - removing all preferredLeader properties from replicas which >> succeeded. >> >> - trying again REBALANCELEADERS for the rest. No success. >> >> - Shutting down nodes to force the leader to a specific replica left >> running. >> >>No success. >> >> - calling REBALANCELEADERS responds that the replica is inactive!!! >> >> - calling CLUSTERSTATUS reports that the replica is active!!! >> >> >> >> Also, the replica which don't want to become leader is not in the list >> >> of collections->[collection_name]->leader_elect->shard1..x->election >> >> >> >> Where is CLUSTERSTATUS getting it's state info from? >> >> >> >> Has anyone else problems with REBALANCELEADERS? >> >> >> >> I noticed that the Reference Guide writes "preferredLeader" (with >> capital "L") >> >> but the JAVA code has "preferredleader". >> >> >> >> Regards, Bernd >> > >> >
Re: REBALANCELEADERS is not reliable
For me today, I deleted the leader replica of one of the two shard collection. Then other replica of that shard was getting elected for leader. After waiting for long tried the setting addreplicaprop preferred leader on one of the replica then tried FORCELEADER but no luck. Then also tried rebalance but no help. Finally have to recreate the whole collection. Not sure what was the issue but both FORCELEADER AND REBALANCING didn't work if there was no leader however preferred leader property was setted. On Wed, Nov 28, 2018, 12:54 Bernd Fehling Hi Vadim, > > thanks for confirming. > So it seems to be a general problem with Solr 6.x, 7.x and might > be still there in the most recent versions. > > But where to start to debug this problem, is it something not > correctly stored in zookeeper or is overseer the problem? > > I was also reading something about a "leader queue" where possible > leaders have to be requeued or something similar. > > May be I should try to get a situation where a "locked" core > is on the overseer and then connect the debugger to it and step > through it. > Peeking and poking around, like old Commodore 64 days :-) > > Regards, Bernd > > > Am 27.11.18 um 15:47 schrieb Vadim Ivanov: > > Hi, Bernd > > I have tried REBALANCELEADERS with Solr 6.3 and 7.5 > > I had very similar results and notion that it's not reliable :( > > -- > > Br, Vadim > > > >> -Original Message- > >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] > >> Sent: Tuesday, November 27, 2018 5:13 PM > >> To: solr-user@lucene.apache.org > >> Subject: REBALANCELEADERS is not reliable > >> > >> Hi list, > >> > >> unfortunately REBALANCELEADERS is not reliable and the leader > >> election has unpredictable results with SolrCloud 6.6.5 and > >> Zookeeper 3.4.10. > >> Seen with 5 shards / 3 replicas. > >> > >> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active. > >> - setting with ADDREPLICAPROP the property preferredLeader to other > replicas > >> - calling REBALANCELEADERS > >> - some leaders have changed, some not. > >> > >> I then tried: > >> - removing all preferredLeader properties from replicas which succeeded. > >> - trying again REBALANCELEADERS for the rest. No success. > >> - Shutting down nodes to force the leader to a specific replica left > running. > >>No success. > >> - calling REBALANCELEADERS responds that the replica is inactive!!! > >> - calling CLUSTERSTATUS reports that the replica is active!!! > >> > >> Also, the replica which don't want to become leader is not in the list > >> of collections->[collection_name]->leader_elect->shard1..x->election > >> > >> Where is CLUSTERSTATUS getting it's state info from? > >> > >> Has anyone else problems with REBALANCELEADERS? > >> > >> I noticed that the Reference Guide writes "preferredLeader" (with > capital "L") > >> but the JAVA code has "preferredleader". > >> > >> Regards, Bernd > > >
Re: SOLR Partial search
Hi Piyush, I suppose your end goal is to search special chars too and I hope you are using it typeahead. Keyword tokenizer keep the complete string as token. So when you search with partial it won't match. You could add the n-gram filter. Then output of keyword tokenizer will be broken in configured grams and that might help you. Please give it a try and let us know. Regards, Aman On Mon, Oct 8, 2018, 10:24 Rathor, Piyush (US - Philadelphia) < prat...@deloitte.com> wrote: > HI All, > > > > I am trying to use “KeywordTokenizerFactory” to consider searching against > the special characters in the search. > > But the partial search does not work well with “KeywordTokenizerFactory”. > > > > The partial match results are better in “StandardTokenizerFactory”. > > > > Field type – text_general > > > > Example for both scenarios : > > Partial search parameter: Nah' > > Expected result on top : Nah’bir > > > > Partial Search : shar > > Full Name : Sharma > > > > Please let me know if there is something that can be done to cater both > special characters and partial matches together. > > > > Thanks & Regards > > Piyush R > > > > This message (including any attachments) contains confidential information > intended for a specific individual and purpose, and is protected by law. If > you are not the intended recipient, you should delete this message and any > disclosure, copying, or distribution of this message, or the taking of any > action based on it, by you is strictly prohibited. > > v.E.1 >
Re: deleted master index files replica did not replicate
Hi Jeff, I suppose there should be slave configuration in solrconfig files which says to ping master to check for the version and get the modified files. If replication is configured in slave you will see commands getting triggered and you could get some idea from there. Also you could paste that log if it not clear. Regards, Aman On Mon, Jun 4, 2018, 23:57 Jeff Courtade wrote: > To be clear I deleted the actual index files out from under the running > master > > On Mon, Jun 4, 2018, 2:25 PM Jeff Courtade wrote: > > > So are you saying it should have? > > > > It really acted like a normal function this happened on 5 different pairs > > in the same way. > > > > > > On Mon, Jun 4, 2018, 2:23 PM Aman Tandon > wrote: > > > >> Could you please check the replication request commands in solr logs of > >> slave and see if it is complaining anything. > >> > >> On Mon, Jun 4, 2018, 23:45 Jeff Courtade > wrote: > >> > >> > Hi, > >> > > >> > This I think is a very simple question. > >> > > >> > I have a solr 4.3 master slave setup. > >> > > >> > Simple replication. > >> > > >> > The master and slave were both running and synchronized up to date > >> > > >> > I went on the master and deleted the index files while solr was > running. > >> > solr created new empty index files and continued to serve requests. > >> > The slave did not delete its indexes and kept all of the old data in > >> place > >> > and continued to serve requests. > >> > > >> > This was strange as I would have thought the replica would have > >> replicated > >> > an empty index from the master. > >> > > >> > Does anyone have an explanation for this? I am fairly certain I just > am > >> not > >> > understanding something basic. > >> > > >> > J > >> > > >> > -- > >> > > >> > Jeff Courtade > >> > M: 240.507.6116 > >> > > >> > > -- > > > > Jeff Courtade > > M: 240.507.6116 > > > -- > > Jeff Courtade > M: 240.507.6116 >
Re: UUIDUpdateProcessorFactory can cause duplicate documents?
Hi, Suppose id field is the UUID linked field in the configuration and if this is missing in the document coming to index then it will generate a UUID and set it in id field. However if id field is present with some value then it shouldn't. Kindly refer http://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html On Mon, Jun 4, 2018, 23:52 S G wrote: > Hi, > > Is it correct to assume that UUIDUpdateProcessorFactory will produce 2 > documents even if the same document is indexed twice without the "id" field > ? > > And to avoid such a thing, we can use the technique mentioned in > https://wiki.apache.org/solr/Deduplication ? > > Thanks > SG >
Re: deleted master index files replica did not replicate
Could you please check the replication request commands in solr logs of slave and see if it is complaining anything. On Mon, Jun 4, 2018, 23:45 Jeff Courtade wrote: > Hi, > > This I think is a very simple question. > > I have a solr 4.3 master slave setup. > > Simple replication. > > The master and slave were both running and synchronized up to date > > I went on the master and deleted the index files while solr was running. > solr created new empty index files and continued to serve requests. > The slave did not delete its indexes and kept all of the old data in place > and continued to serve requests. > > This was strange as I would have thought the replica would have replicated > an empty index from the master. > > Does anyone have an explanation for this? I am fairly certain I just am not > understanding something basic. > > J > > -- > > Jeff Courtade > M: 240.507.6116 >
Re: Solr score use cases
Hi Faraz, Solr score which you could retrieved by adding in fl parameter could be helpful to understand the following: 1) search relevance ranking: how much score solr has given to the top & second top document, and with debug=true you could better understand what is causing that score. 2) You could use the function query to multiply score with some feature e.g. paid customers score, popularity score, etc to improve the relevance as per the business. I am able to think these few points only, someone can also put more light if I am missing anything. I hope this is what you want to know. Regards, Aman On Dec 1, 2017 13:38, "Faraz Fallahi"wrote: Hi A simple question: what are the most common use cases for the solr score of documents retrieved after firing queries? I dont have a real understanding of its purpose at the moment. Thx for helping
Range facet over currency field
Hi, I have a doubt regarding how to do the range facet on some different currency. I have indexed the price data in USD inside the field price_usd_c and I have currency.xml which is getting generated by a process. If I want to do the range facet on the field price_usd_c in Euro currency, then how could I do it and what is the syntax of it. Is there any way to do so? If so kindly help. Regards, Aman
Re: How to build solr
Hi Srini, Kindly refer to the READ.ME section of this link of GitHub, this should work. https://github.com/apache/lucene-solr/blob/master/README.md With regards, Aman Tandon On Sep 21, 2017 1:53 PM, "srini sampath" <sampathsrini.c...@gmail.com> wrote: > Hi, > How to build and compile solr in my locale machine? it seems the > https://wiki.apache.org/solr/HowToCompileSolr page became obsolete. > Thanks in advance >
Re: Provide suggestion on indexing performance
Hi Shawn, Thanks for your reply, this is really helpful. I will try this out to see the performance with the docValues. With regards, Aman Tandon On Sep 15, 2017 9:10 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: > On 9/11/2017 9:06 PM, Aman Tandon wrote: > > We want to know about the indexing performance in the below mentioned > > scenarios, consider the total number of 10 string fields and total number > > of documents are 10 million. > > > > 1) indexed=true, stored=true > > 2) indexed=true, docValues=true > > > > Which one should we prefer in terms of indexing performance, please share > > your experience. > > There are several settings in the schema for each field, things like > indexed, stored, docValues, multiValued, and others. You should base > your choices on what you need Solr to do. Choosing these settings based > purely on desired indexing speed may result in Solr not doing what you > want it to do. > > When the indexing system sends data to Solr with several threads or > processes, Solr is *usually* capable of indexing data faster than most > systems can supply it. The more settings you disable on a field, the > faster Solr will be able to index. > > It is not possible to provide precise numbers, because performance > depends on many factors, some of which you may not even know until you > build a production system. > > https://lucidworks.com/sizing-hardware-in-the-abstract-why- > we-dont-have-a-definitive-answer/ > > All that said ... docValues MIGHT be a little bit faster than stored, > because stored data is compressed, and the compression takes CPU time. > On a fully populated production system, that statement might turn out to > be wrong. There may be factors that result in stored fields working > better. The best way to decide is to try it both ways with all your data. > > Thanks, > Shawn > >
Re: Provide suggestion on indexing performance
Hi Tom, Thanks for your suggestion and the information. I will try this out to test and will share the results. On Sep 14, 2017 2:32 PM, "Sreenivas.T" <sree...@gmail.com> wrote: > I agree with Tom. Doc values and stored fields are present for different > reasons. Doc values is another index that gets build for faster > sorting/faceting. > > On Wed, Sep 13, 2017 at 11:30 PM Tom Evans <tevans...@googlemail.com> > wrote: > > > On Tue, Sep 12, 2017 at 4:06 AM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > Hi, > > > > > > We want to know about the indexing performance in the below mentioned > > > scenarios, consider the total number of 10 string fields and total > number > > > of documents are 10 million. > > > > > > 1) indexed=true, stored=true > > > 2) indexed=true, docValues=true > > > > > > Which one should we prefer in terms of indexing performance, please > share > > > your experience. > > > > > > With regards, > > > Aman Tandon > > > > Your question doesn't make much sense. You turn on stored when you > > need to retrieve the original contents of the fields after searching, > > and you use docvalues to speed up faceting, sorting and grouping. > > Using docvalues to retrieve values during search is more expensive > > than simply using stored values, so if your primary aim is retrieving > > stored values, use stored=true. > > > > Secondly, the only way to answer performance questions for your schema > > and data is to try it out. Generate 10 million docs, store them in a > > doc (eg as CSV), and then use the post tool to try different schema > > and query options. > > > > Cheers > > > > Tom > > >
Provide suggestion on indexing performance
Hi, We want to know about the indexing performance in the below mentioned scenarios, consider the total number of 10 string fields and total number of documents are 10 million. 1) indexed=true, stored=true 2) indexed=true, docValues=true Which one should we prefer in terms of indexing performance, please share your experience. With regards, Aman Tandon
Re: Problems retrieving large documents
Did you find any error in Solr logs? On Sat, Jul 29, 2017, 23:13 Aman Tandon <amantandon...@gmail.com> wrote: > Hello, > > Kindly check the Solr logs when you are hitting the query. Attach the same > here, that I could gave more insight. > > For me it looks like the OOM, but check the Solr logs I hope we could get > more information from there. > > On Sat, Jul 29, 2017, 14:35 SOLR6932 <lbarlet...@gmail.com> wrote: > >> Hey all, >> I am using Solr 4.10.3 and my collection consists around 2300 large >> documents that are distributed across a number of shards. Each document is >> estimated to be around 50-70 megabytes. The queries that I run are >> sophisticated, involve a range of parameters and diverse query filters. >> Whenever I wish to retrieve all the returned document fields (fl:* [around >> 50 fields in my schema]), I receive an impossible exception - specifically >> /org.apache.solr.common.SolrException: Impossible Exception/ that is >> logged >> by both SolrCore and SolrDispachFilter. Has anyone experienced a similar >> problem and knows how to solve this issue? >> Thanks in advance, >> Louie. >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Problems-retrieving-large-documents-tp4348169.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
Re: edismax, pf2 and use of both AND and OR parameter
Hi, Ideally it should but from the debug query it seems like it is not respecting Boolean clauses. Anyone else could help here? Is this the ideal behavior? On Jul 31, 2017 5:47 PM, "Niraj Aswani" <nirajasw...@gmail.com> wrote: > Hi Aman, > > Thank you very much your reply. > > Let me elaborate my question a bit more using your example in this case. > > AFAIK, what the pf2 parameter is doing to the query is adding the following > phrase queries: > > (_text_:"system memory") (_text_:"memory oem") (_text_:"oem retail") > > There are three phrases being checked here: > - system memory > - memory oem > - oem retail > > However, what I actually expected it to look like is the following: > - system memory > - memory oem > - memory retail > > My understanding of the edismax parser is that it interprets the AND / OR > parameters correctly so it should generate the bi-gram phrases respecting > the AND /OR parameters as well, right? > > Am I missing something here? > > Regards, > Niraj > > On Mon, Jul 31, 2017 at 4:24 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > > > Hi Niraj, > > > > Should I expect it to check the following bigram phrases? > > > > Yes it will check. > > > > ex- documents & query is given below > > > > http://localhost:8983/solr/myfile/select?wt=xml=name; > > indent=on=*System > > AND Memory AND (OEM OR Retail)*=50=json&*qf=_text_=_text_* > > =true=edismax > > > > > > > > > > > > A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System > > Memory - OEM > > > > > > > > > > > > > > CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) > > System Memory - Retail > > > > > > > > > > > > > > CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) > > Dual Channel Kit System Memory - Retail > > > > > > > > > > > > > > *Below is the parsed query* > > > > > > +(+(_text_:system) +(_text_:memory) +((_text_:oem) (_text_:retail))) > > ((_text_:"system memory") (_text_:"memory oem") (_text_:"oem retail")) > > > > > > In case if you are in such scenarios where you need to knwo what query > will > > form, then you could us the debug=true to know more about the query & > > timings of different component. > > > > *And when the ps2 is not specified default ps will be applied on pf2.* > > > > I hope this helps. > > > > With Regards > > Aman Tandon > > > > On Mon, Jul 31, 2017 at 4:18 AM, Niraj Aswani <nirajasw...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am using solr 4.4 and bit confused about how does the edismax parser > > > treat the pf2 parameter when both the AND and OR operators are used in > > the > > > query with ps2=0 > > > > > > For example: > > > > > > pf2=title^100 > > > q=HDMI AND Video AND (Wire OR Cable) > > > > > > Should I expect it to check the following bigram phrases? > > > > > > hdmi video > > > video wire > > > video cable > > > > > > Regards > > > Niraj > > > > > >
Re: edismax, pf2 and use of both AND and OR parameter
Hi Niraj, Should I expect it to check the following bigram phrases? Yes it will check. ex- documents & query is given below http://localhost:8983/solr/myfile/select?wt=xml=name=on=*System AND Memory AND (OEM OR Retail)*=50=json&*qf=_text_=_text_* =true=edismax A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail *Below is the parsed query* +(+(_text_:system) +(_text_:memory) +((_text_:oem) (_text_:retail))) ((_text_:"system memory") (_text_:"memory oem") (_text_:"oem retail")) In case if you are in such scenarios where you need to knwo what query will form, then you could us the debug=true to know more about the query & timings of different component. *And when the ps2 is not specified default ps will be applied on pf2.* I hope this helps. With Regards Aman Tandon On Mon, Jul 31, 2017 at 4:18 AM, Niraj Aswani <nirajasw...@gmail.com> wrote: > Hi, > > I am using solr 4.4 and bit confused about how does the edismax parser > treat the pf2 parameter when both the AND and OR operators are used in the > query with ps2=0 > > For example: > > pf2=title^100 > q=HDMI AND Video AND (Wire OR Cable) > > Should I expect it to check the following bigram phrases? > > hdmi video > video wire > video cable > > Regards > Niraj >
Re: Problems retrieving large documents
Hello, Kindly check the Solr logs when you are hitting the query. Attach the same here, that I could gave more insight. For me it looks like the OOM, but check the Solr logs I hope we could get more information from there. On Sat, Jul 29, 2017, 14:35 SOLR6932wrote: > Hey all, > I am using Solr 4.10.3 and my collection consists around 2300 large > documents that are distributed across a number of shards. Each document is > estimated to be around 50-70 megabytes. The queries that I run are > sophisticated, involve a range of parameters and diverse query filters. > Whenever I wish to retrieve all the returned document fields (fl:* [around > 50 fields in my schema]), I receive an impossible exception - specifically > /org.apache.solr.common.SolrException: Impossible Exception/ that is logged > by both SolrCore and SolrDispachFilter. Has anyone experienced a similar > problem and knows how to solve this issue? > Thanks in advance, > Louie. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problems-retrieving-large-documents-tp4348169.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: solr cloud vs standalone solr
Hello Sara, There is hard n fast rule, performance depends on caches, RAM, hdd etc.and how much resourced you could invest to keep the acceptable performance. Information on Number of Indexed documents, number of dynamic fields can be viewed from the below link. I hope this helps. http://lucene.472066.n3.nabble.com/Solr-limitations-td4076250.html On Sat, Jul 29, 2017, 13:23 sara hajiliwrote: > hi all, > I want to know when standalone solr can't be sufficient for storing data > and we need to migrate to solr cloud?for example standalone solr take too > much time to return query result or to store document or etc. > > in other word ,what is best capacity and data index size in standalone > solr that doesn't bad effect on query running and data inserting > performance?and after passing this index size i must switch to solr cloud? >
Problem to specify end parameter for range facets
Hi, I want to do the range facets with gap of 10 and I don't know the end as it could be a very large value so how could I do that. Thanks Aman Tandon
Re: Multilevel sorting in JSON-facet
any help here? With Regards Aman Tandon On Thu, Nov 17, 2016 at 7:16 PM, Wonderful Little Things < amantandon...@gmail.com> wrote: > Hi, > > I want to do the sorting on multiple fields using the JSON-facet API, so > is this available? And if it is, then what would be the syntax? > > Thanks, > Aman Tandon >
Solr Job opportunity - Noida, India
Hi Everyone, If anyone is interested to apply for noida, India location for Solr Developer position, then please forward me your resume with the contact number and email. *Company Name: Genpact Headstrong Capital Markey* *Experience required:- 3 - 7 years* With Regards Aman Tandon
Re: Help: Lucidwork Fusion documentation
i am looking for lucidwork documentation. ok chris I will contact lucidwork then. thank you. On Friday, June 3, 2016, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > Lucidworks Fusion is a commercial product, not a part of the Apache > Software Foundation - questions about using it are not really appropriate > for this mailing list. You should contact Lucidworks support directly... > > https://lucidworks.com/company/contact/ > > ...with that in mind, the docs for Fusion can be found here... > > https://doc.lucidworks.com/index.html > > > > : Date: Fri, 3 Jun 2016 04:40:57 +0530 > : From: Aman Tandon <amantandon...@gmail.com> > : Reply-To: solr-user@lucene.apache.org > : To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > : Subject: Help: Lucidwork Fusion documentation > : > : Hi, > : > : How could I download the Fusion documentation pdf ? If anyone is aware, > : please help me!! > : > : With Regards > : Aman Tandon > : > > -Hoss > http://www.lucidworks.com/ > -- Sent from Gmail Mobile
Help: Lucidwork Fusion documentation
Hi, How could I download the Fusion documentation pdf ? If anyone is aware, please help me!! With Regards Aman Tandon
Re: Configure it on server
Hi Prateek, Your question is little ambiguous. Could you please describe it more precisely what you want to configure on server and what is your requirement and problem. This will be more helpful to understand your problem. With Regards Aman Tandon On Wed, Nov 18, 2015 at 4:29 PM, Prateek Sharma <prateek.sha...@amdocs.com> wrote: > Hi, > > Can you help me out how I can configure it on a server? > It was configured on one of our servers but I am unable to replicate it. > > Can you please help. > > Thanks, > Prateek > > This message and the information contained herein is proprietary and > confidential and subject to the Amdocs policy statement, > you may review at http://www.amdocs.com/email_disclaimer.asp >
Re: Exclude documents having same data in two fields
Hi, I tried to use the same as mentioned in the url <http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal> . And I used the description field to check because mapping field is multivalued. So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in my url, but I am getting this error. As mentioned below. Please take a look. *Solr Version 4.8.1* *Url is* http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax > > > 500 > 8 > > *:* > edismax > big*,title,catid > {!frange l=0 u=1}strdist(title,description,edit) > > > > > java.lang.RuntimeException at > org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455) > at > org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239) > at > org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108) > at > org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37) > at org.apache.solr.search.QParser.getParser(QParser.java:315) at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > 500 > > > With Regards Aman Tandon On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > Hi agree with Nutch, > using the Function Range Query Parser, should do your trick : > > > https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html > > Cheers > > On 8 October 2015 at 13:31, NutchDev <nutchsolru...@gmail.com> wrote: > > > Hi Aman, > > > > Have a look at this , it has query time approach also using Solr function > > query, > > > > > > > http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields > > > > > http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233489.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > -- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Exclude documents having same data in two fields
okay Thanks With Regards Aman Tandon On Fri, Oct 9, 2015 at 4:25 PM, Upayavira <u...@odoko.co.uk> wrote: > Just beware of performance here. This is fine for smaller indexes, but > for larger ones won't work so well. It will need to do this calculation > for every document in your index, thereby undoing all benefits of having > an inverted index. > > If your index (or resultset) is small enough, it can work, but might > catch you out later. > > Upayavira > > On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote: > > Hi, > > > > I tried to use the same as mentioned in the url > > < > http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal > > > > . > > > > And I used the description field to check because mapping field > > is multivalued. > > > > So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in > > my > > url, but I am getting this error. As mentioned below. Please take a look. > > > > *Solr Version 4.8.1* > > > > *Url is* > > > http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax > > > > > > > > > > > 500 > > > 8 > > > > > > *:* > > > edismax > > > big*,title,catid > > > {!frange l=0 u=1}strdist(title,description,edit) > > > > > > > > > > > > > > > java.lang.RuntimeException at > > > > org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455) > > > at > > > > org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239) > > > at > > > > org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108) > > > at > > > > org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37) > > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at > > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144) > > > at > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) > > > at > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) > > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) > > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > > at > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > > > at > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > > > at > > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > > > at > > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > > > at > > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > > > at > > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > > > at > > > > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) > > > at > > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > > > at > > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > > > at > > > > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) > > > at > > > > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > > > at > > > > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) > > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > 500 > > > > > > > > > > > > > With Regards > > Aman Tandon > > > > On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti < > > benedetti.ale
Re: Exclude documents having same data in two fields
Thanks Mikhail the suggestion. I will try that on monday will let you know. *@*Walter This was just an random requirement to find those fields which are not same and then reindex only those. I can full index but I was wondering if there might some function or something. With Regards Aman Tandon On Fri, Oct 9, 2015 at 9:05 PM, Mikhail Khludnev <mkhlud...@griddynamics.com > wrote: > Aman, > > You can invoke Terms Component for the filed M, let it returns terms: > {a,c,d,f} > then you invoke it for field T let it return {b,c,f,e}, > then you intersect both lists (it's quite romantic if they are kept > ordered), you've got {c,f} > and then you applies filter: > fq=-((+M:c +T:c) (+M:f +T:f)) > etc > > > On Thu, Oct 8, 2015 at 8:29 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > > > Hi, > > > > Is there a way in solr to remove all those documents from the search > > results in which two of the fields, *mapping* and *title* is the exactly > > same. > > > > With Regards > > Aman Tandon > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
Re: Exclude documents having same data in two fields
No Susheel, As our index size is 62 GB so it seems hard to find those records. With Regards Aman Tandon On Fri, Oct 9, 2015 at 7:30 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > Hi Aman, Did the problem resolved or still having some errors. > > Thnx > > On Fri, Oct 9, 2015 at 8:28 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > > > okay Thanks > > > > With Regards > > Aman Tandon > > > > On Fri, Oct 9, 2015 at 4:25 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > Just beware of performance here. This is fine for smaller indexes, but > > > for larger ones won't work so well. It will need to do this calculation > > > for every document in your index, thereby undoing all benefits of > having > > > an inverted index. > > > > > > If your index (or resultset) is small enough, it can work, but might > > > catch you out later. > > > > > > Upayavira > > > > > > On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote: > > > > Hi, > > > > > > > > I tried to use the same as mentioned in the url > > > > < > > > > > > http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal > > > > > > > > . > > > > > > > > And I used the description field to check because mapping field > > > > is multivalued. > > > > > > > > So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) > in > > > > my > > > > url, but I am getting this error. As mentioned below. Please take a > > look. > > > > > > > > *Solr Version 4.8.1* > > > > > > > > *Url is* > > > > > > > > > > http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax > > > > > > > > > > > > > > > > > > > 500 > > > > > 8 > > > > > > > > > > *:* > > > > > edismax > > > > > big*,title,catid > > > > > {!frange l=0 > u=1}strdist(title,description,edit) > > > > > > > > > > > > > > > > > > > > > > > > > java.lang.RuntimeException at > > > > > > > > > > > org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455) > > > > > at > > > > > > > > > > > org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239) > > > > > at > > > > > > > > > > > org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108) > > > > > at > > > > > > > > > > > org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37) > > > > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at > > > > > > > > > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144) > > > > > at > > > > > > > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) > > > > > at > > > > > > > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) > > > > > at > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) > > > > > at > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > > > > at > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > > > > > at > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > > > > > at > > > > > > > > > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > > > > > at > > > > > > > > > &g
Re: Exclude documents having same data in two fields
But I want to find do it at run time without index extra field With Regards Aman Tandon On Thu, Oct 8, 2015 at 11:55 AM, NutchDev <nutchsolru...@gmail.com> wrote: > One option could be creating another boolean field field1_equals_field2 and > set it to true for documents matching it while indexing. Use this field as > a > filter criteria while querying solr. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233411.html > Sent from the Solr - User mailing list archive at Nabble.com. >
which one is faster synonym_edismax & edismax faster?
Hi, Currently we are using the *synonym_edismax query parser* plugin to handle the multi-word synonym. I want to know which is more faster *edismax* or *synonym_edismax*. As we are having the very less amount of multi-words in our dictionary so we are thinking to use standard edismax query parser. Any suggestions or observations will be helpful. With Regards Aman Tandon
Exclude documents having same data in two fields
Hi, Is there a way in solr to remove all those documents from the search results in which two of the fields, *mapping* and *title* is the exactly same. With Regards Aman Tandon
Re: How to know index file in OS Cache
okay thanks Markus :) With Regards Aman Tandon On Fri, Sep 25, 2015 at 12:27 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello - as far as i remember, you don't. A file itself is not the unit to > cache, but blocks are. > Markus > > > -Original message- > > From:Aman Tandon <amantandon...@gmail.com> > > Sent: Friday 25th September 2015 5:56 > > To: solr-user@lucene.apache.org > > Subject: How to know index file in OS Cache > > > > Hi, > > > > Is there any way to know that the index file/s is present in the OS cache > > or RAM. I want to check if the index is present in the RAM or in OS cache > > and which files are not in either of them. > > > > With Regards > > Aman Tandon > > >
Re: How to know index file in OS Cache
Awesome thank you Mikhail. This is what I was looking for. This was just a random question poped up in my mind. So I just asked this on the group. With Regards Aman Tandon On Fri, Sep 25, 2015 at 2:49 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > What about Linux: > $less /proc//maps > $pmap > > On Fri, Sep 25, 2015 at 10:57 AM, Markus Jelsma < > markus.jel...@openindex.io> > wrote: > > > Hello - as far as i remember, you don't. A file itself is not the unit to > > cache, but blocks are. > > Markus > > > > > > -Original message- > > > From:Aman Tandon <amantandon...@gmail.com> > > > Sent: Friday 25th September 2015 5:56 > > > To: solr-user@lucene.apache.org > > > Subject: How to know index file in OS Cache > > > > > > Hi, > > > > > > Is there any way to know that the index file/s is present in the OS > cache > > > or RAM. I want to check if the index is present in the RAM or in OS > cache > > > and which files are not in either of them. > > > > > > With Regards > > > Aman Tandon > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
How to know index file in OS Cache
Hi, Is there any way to know that the index file/s is present in the OS cache or RAM. I want to check if the index is present in the RAM or in OS cache and which files are not in either of them. With Regards Aman Tandon
Re: solr update dynamic field generates multiValued error
Sure. thank you Upayavira With Regards Aman Tandon On Mon, Sep 21, 2015 at 6:01 PM, Upayavira <u...@odoko.co.uk> wrote: > You cannot do multi valued fields with LatLongType fields. Therefore, if > that is a need, you will have to investigate RPT fields. > > I'm not sure how you do distance boosting there, so I'd suggest you ask > that as a separate question with a new title. > > Upayavira > > On Mon, Sep 21, 2015, at 01:27 PM, Aman Tandon wrote: > > We are using LatLonType to use the gradual boosting / distance based > > boosting of search results. > > > > With Regards > > Aman Tandon > > > > On Mon, Sep 21, 2015 at 5:39 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > Aman, > > > > > > I cannot promise to answer questions promptly - like most people on > this > > > list, we answer if/when we have a gap in our workload. > > > > > > The reason you are getting the non multiValued field error is because > > > your latlon field does not have multiValued="true" enabled. > > > > > > However, the field type definition notes that this field type does not > > > support multivalued fields, so you're not gonna get anywhere with that > > > route. > > > > > > Have you tried the location_rpt type? > > > (solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I > > > understand it, far more flexible field type - for example, you can > index > > > shapes into it as well as locations. > > > > > > I'd suggest you read this page, and pay particular attention to > mentions > > > of RPT: > > > > > > https://cwiki.apache.org/confluence/display/solr/Spatial+Search > > > > > > Upayavira > > > > > > On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote: > > > > Upayavira, please help > > > > > > > > With Regards > > > > Aman Tandon > > > > > > > > On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon < > amantandon...@gmail.com> > > > > wrote: > > > > > > > > > Error is > > > > > > > > > > > > > > > > > > > > 400 > > > > name="QTime">28ERROR: > > > > > [doc=9474144846] multiple values encountered for non multiValued > field > > > > > latlon_0_coordinate: [11.0183, 11.0183] > > > > name="code">400 > > > > > > > > > > > > > > > And my configuration is > > > > > > > > > > > > > > > > > > > stored="true" /> > > > > > > > > > > > > > > > > > > > subFieldSuffix="_coordinate"/> > > > > > > > > > >> > > > required="false" multiValued="false" /> > > > > > > > > > > how you know it is because of stored="true"? > > > > > > > > > > As Erick replied in the last mail thread, > > > > > I'm not getting any multiple values in the _coordinate fields. > > > However, I > > > > > _do_ get the error if my dynamic *_coordinate field is set to > > > > > stored="true". > > > > > > > > > > And stored="true" is mandatory for using the atomic updates. > > > > > > > > > > With Regards > > > > > Aman Tandon > > > > > > > > > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > >> Can you show the error you are getting, and how you know it is > because > > > > >> of stored="true"? > > > > >> > > > > >> Upayavira > > > > >> > > > > >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote: > > > > >> > Hi Erick, > > > > >> > > > > > >> > I am getting the same error because my dynamic field > *_coordinate is > > > > >> > stored="true". > > > > >> > How can I get rid of this error? > > > > >> > > > > > >> > And I have to use the atomic update. Please help!! > > > > >> > > > > > >> > With Regards > > > > >> > Aman Tandon > > > > >>
Re: solr update dynamic field generates multiValued error
We are using LatLonType to use the gradual boosting / distance based boosting of search results. With Regards Aman Tandon On Mon, Sep 21, 2015 at 5:39 PM, Upayavira <u...@odoko.co.uk> wrote: > Aman, > > I cannot promise to answer questions promptly - like most people on this > list, we answer if/when we have a gap in our workload. > > The reason you are getting the non multiValued field error is because > your latlon field does not have multiValued="true" enabled. > > However, the field type definition notes that this field type does not > support multivalued fields, so you're not gonna get anywhere with that > route. > > Have you tried the location_rpt type? > (solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I > understand it, far more flexible field type - for example, you can index > shapes into it as well as locations. > > I'd suggest you read this page, and pay particular attention to mentions > of RPT: > > https://cwiki.apache.org/confluence/display/solr/Spatial+Search > > Upayavira > > On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote: > > Upayavira, please help > > > > With Regards > > Aman Tandon > > > > On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > > > Error is > > > > > > > > > > > > 400 > > name="QTime">28ERROR: > > > [doc=9474144846] multiple values encountered for non multiValued field > > > latlon_0_coordinate: [11.0183, 11.0183] > > name="code">400 > > > > > > > > > And my configuration is > > > > > > > > > > > stored="true" /> > > > > > > > > > > > subFieldSuffix="_coordinate"/> > > > > > >> > required="false" multiValued="false" /> > > > > > > how you know it is because of stored="true"? > > > > > > As Erick replied in the last mail thread, > > > I'm not getting any multiple values in the _coordinate fields. > However, I > > > _do_ get the error if my dynamic *_coordinate field is set to > > > stored="true". > > > > > > And stored="true" is mandatory for using the atomic updates. > > > > > > With Regards > > > Aman Tandon > > > > > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > >> Can you show the error you are getting, and how you know it is because > > >> of stored="true"? > > >> > > >> Upayavira > > >> > > >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote: > > >> > Hi Erick, > > >> > > > >> > I am getting the same error because my dynamic field *_coordinate is > > >> > stored="true". > > >> > How can I get rid of this error? > > >> > > > >> > And I have to use the atomic update. Please help!! > > >> > > > >> > With Regards > > >> > Aman Tandon > > >> > > > >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa <fgiac...@gmail.com > > > > >> > wrote: > > >> > > > >> > > Hey Erick, i think that you were right, there was a mix in the > > >> schemas and > > >> > > that was generating the error on some of the documents. > > >> > > > > >> > > Thanks for the help guys! > > >> > > > > >> > > > > >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson <erickerick...@gmail.com > >: > > >> > > > > >> > > > Hmmm, I jus tried this with a 4.x build and I can update the > > >> document > > >> > > > multiple times without a problem. I just indexed the standard > > >> exampledocs > > >> > > > and then updated a doc like this (vidcard.xml was the base): > > >> > > > > > >> > > > > > >> > > > > > >> > > > EN7800GTX/2DHTV/256M > > >> > > > > > >> > > > eoe changed this > > >> puppy > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > I'm not getting any multiple values in the _coordinate
Spatial Search: distance based boosting
Hi, Is there a way in solr to do the distance based boosting using Spatial RPT field? With Regards Aman Tandon
Re: solr update dynamic field generates multiValued error
Upayavira, please help With Regards Aman Tandon On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon <amantandon...@gmail.com> wrote: > Error is > > > > 400 name="QTime">28ERROR: > [doc=9474144846] multiple values encountered for non multiValued field > latlon_0_coordinate: [11.0183, 11.0183] name="code">400 > > > And my configuration is > > > stored="true" /> > > > subFieldSuffix="_coordinate"/> > >required="false" multiValued="false" /> > > how you know it is because of stored="true"? > > As Erick replied in the last mail thread, > I'm not getting any multiple values in the _coordinate fields. However, I > _do_ get the error if my dynamic *_coordinate field is set to > stored="true". > > And stored="true" is mandatory for using the atomic updates. > > With Regards > Aman Tandon > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira <u...@odoko.co.uk> wrote: > >> Can you show the error you are getting, and how you know it is because >> of stored="true"? >> >> Upayavira >> >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote: >> > Hi Erick, >> > >> > I am getting the same error because my dynamic field *_coordinate is >> > stored="true". >> > How can I get rid of this error? >> > >> > And I have to use the atomic update. Please help!! >> > >> > With Regards >> > Aman Tandon >> > >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa <fgiac...@gmail.com> >> > wrote: >> > >> > > Hey Erick, i think that you were right, there was a mix in the >> schemas and >> > > that was generating the error on some of the documents. >> > > >> > > Thanks for the help guys! >> > > >> > > >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson <erickerick...@gmail.com>: >> > > >> > > > Hmmm, I jus tried this with a 4.x build and I can update the >> document >> > > > multiple times without a problem. I just indexed the standard >> exampledocs >> > > > and then updated a doc like this (vidcard.xml was the base): >> > > > >> > > > >> > > > >> > > > EN7800GTX/2DHTV/256M >> > > > >> > > > eoe changed this >> puppy >> > > > >> > > > >> > > > >> > > > >> > > > I'm not getting any multiple values in the _coordinate fields. >> However, I >> > > > _do_ get the error if my dynamic *_coordinate field is set to >> > > > stored="true". >> > > > >> > > > Did you perhaps change this at some point? Whenever I change the >> schema, >> > > I >> > > > try to 'rm -rf solr/collection/data' just to be sure I've purged all >> > > traces >> > > > of the former schema definition. >> > > > >> > > > Best, >> > > > Erick >> > > > >> > > > >> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa <fgiac...@gmail.com> >> > > wrote: >> > > > >> > > > > No, they are not declarad explicitly. >> > > > > >> > > > > This is how they are created: >> > > > > >> > > > > > stored="true"/> >> > > > > >> > > > > > > > > > stored="false"/> >> > > > > >> > > > > > > > > > subFieldSuffix="_coordinate"/> >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan <mr...@moreover.com>: >> > > > > >> > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields >> > > populated >> > > > > > using copyField? If so, this sounds like it could be >> > > > > > https://issues.apache.org/jira/browse/SOLR-3502. >> > > > > > >> > > > > > -Michael >> > > > > > >> > > > > > -Original Message- >> > > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com] >> > > > > > Sent: Monday, August 04, 2014 9:05 PM &
Re: solr update dynamic field generates multiValued error
Hi Erick, I am getting the same error because my dynamic field *_coordinate is stored="true". How can I get rid of this error? And I have to use the atomic update. Please help!! With Regards Aman Tandon On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa <fgiac...@gmail.com> wrote: > Hey Erick, i think that you were right, there was a mix in the schemas and > that was generating the error on some of the documents. > > Thanks for the help guys! > > > 2014-08-05 1:28 GMT-03:00 Erick Erickson <erickerick...@gmail.com>: > > > Hmmm, I jus tried this with a 4.x build and I can update the document > > multiple times without a problem. I just indexed the standard exampledocs > > and then updated a doc like this (vidcard.xml was the base): > > > > > > > > EN7800GTX/2DHTV/256M > > > > eoe changed this puppy > > > > > > > > > > I'm not getting any multiple values in the _coordinate fields. However, I > > _do_ get the error if my dynamic *_coordinate field is set to > > stored="true". > > > > Did you perhaps change this at some point? Whenever I change the schema, > I > > try to 'rm -rf solr/collection/data' just to be sure I've purged all > traces > > of the former schema definition. > > > > Best, > > Erick > > > > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa <fgiac...@gmail.com> > wrote: > > > > > No, they are not declarad explicitly. > > > > > > This is how they are created: > > > > > > > > > > > > > > stored="false"/> > > > > > > > > subFieldSuffix="_coordinate"/> > > > > > > > > > > > > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan <mr...@moreover.com>: > > > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields > populated > > > > using copyField? If so, this sounds like it could be > > > > https://issues.apache.org/jira/browse/SOLR-3502. > > > > > > > > -Michael > > > > > > > > -Original Message- > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com] > > > > Sent: Monday, August 04, 2014 9:05 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: solr update dynamic field generates multiValued error > > > > > > > > Hello everyone, this is my first time posting a question, so forgive > me > > > if > > > > i'm missing something. > > > > > > > > This is my problem: > > > > > > > > I have a schema.xml that has the following latLong information > > > > > > > > The dynamicField generates 2 dynamic fields that have the lat and the > > > long > > > > (latLong_0_coordinate and latLong_1_coordinate) > > > > > > > > So for example a document will have > > > > > > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate": -74.1031, > > > > "latLong": "40.4114,-74.1031", > > > > > > > > Now when I try to update a document (i don't update the latLong > field. > > I > > > > just update other parts of the document using atomic update) solr > > > > re-creates the dynamicField and adds the same value again, like its > > using > > > > add instead of set. So when i do an update the fields of the doc look > > > like > > > > this > > > > > > > > "latLong_0_coordinate": [40.4114,40.4114] "latLong_1_coordinate": > > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031", > > > > > > > > So the dynamicFields now have 2 values, so the next time that I want > to > > > > update the document a schema error is throw because im trying to > store > > a > > > > collection into a none multivalued field. > > > > > > > > > > > > Thanks in advanced. > > > > > > > > > >
Re: solr update dynamic field generates multiValued error
Error is 40028ERROR: [doc=9474144846] multiple values encountered for non multiValued field latlon_0_coordinate: [11.0183, 11.0183]400 And my configuration is how you know it is because of stored="true"? As Erick replied in the last mail thread, I'm not getting any multiple values in the _coordinate fields. However, I _do_ get the error if my dynamic *_coordinate field is set to stored="true". And stored="true" is mandatory for using the atomic updates. With Regards Aman Tandon On Mon, Sep 21, 2015 at 2:22 PM, Upayavira <u...@odoko.co.uk> wrote: > Can you show the error you are getting, and how you know it is because > of stored="true"? > > Upayavira > > On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote: > > Hi Erick, > > > > I am getting the same error because my dynamic field *_coordinate is > > stored="true". > > How can I get rid of this error? > > > > And I have to use the atomic update. Please help!! > > > > With Regards > > Aman Tandon > > > > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa <fgiac...@gmail.com> > > wrote: > > > > > Hey Erick, i think that you were right, there was a mix in the schemas > and > > > that was generating the error on some of the documents. > > > > > > Thanks for the help guys! > > > > > > > > > 2014-08-05 1:28 GMT-03:00 Erick Erickson <erickerick...@gmail.com>: > > > > > > > Hmmm, I jus tried this with a 4.x build and I can update the document > > > > multiple times without a problem. I just indexed the standard > exampledocs > > > > and then updated a doc like this (vidcard.xml was the base): > > > > > > > > > > > > > > > > EN7800GTX/2DHTV/256M > > > > > > > > eoe changed this puppy > > > > > > > > > > > > > > > > > > > > I'm not getting any multiple values in the _coordinate fields. > However, I > > > > _do_ get the error if my dynamic *_coordinate field is set to > > > > stored="true". > > > > > > > > Did you perhaps change this at some point? Whenever I change the > schema, > > > I > > > > try to 'rm -rf solr/collection/data' just to be sure I've purged all > > > traces > > > > of the former schema definition. > > > > > > > > Best, > > > > Erick > > > > > > > > > > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa <fgiac...@gmail.com> > > > wrote: > > > > > > > > > No, they are not declarad explicitly. > > > > > > > > > > This is how they are created: > > > > > > > > > > stored="true"/> > > > > > > > > > > > > > > stored="false"/> > > > > > > > > > > > > > > subFieldSuffix="_coordinate"/> > > > > > > > > > > > > > > > > > > > > > > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan <mr...@moreover.com>: > > > > > > > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields > > > populated > > > > > > using copyField? If so, this sounds like it could be > > > > > > https://issues.apache.org/jira/browse/SOLR-3502. > > > > > > > > > > > > -Michael > > > > > > > > > > > > -Original Message- > > > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com] > > > > > > Sent: Monday, August 04, 2014 9:05 PM > > > > > > To: solr-user@lucene.apache.org > > > > > > Subject: solr update dynamic field generates multiValued error > > > > > > > > > > > > Hello everyone, this is my first time posting a question, so > forgive > > > me > > > > > if > > > > > > i'm missing something. > > > > > > > > > > > > This is my problem: > > > > > > > > > > > > I have a schema.xml that has the following latLong information > > > > > > > > > > > > The dynamicField generates 2 dynamic fields that have the lat > and the > > > > > long > > > > > > (latLong_0_coordinate and latLong_1_coordinate) > > > > > > > > > > > > So for example a document will have > > > > > > > > > > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate": > -74.1031, > > > > > > "latLong": "40.4114,-74.1031", > > > > > > > > > > > > Now when I try to update a document (i don't update the latLong > > > field. > > > > I > > > > > > just update other parts of the document using atomic update) solr > > > > > > re-creates the dynamicField and adds the same value again, like > its > > > > using > > > > > > add instead of set. So when i do an update the fields of the doc > look > > > > > like > > > > > > this > > > > > > > > > > > > "latLong_0_coordinate": [40.4114,40.4114] "latLong_1_coordinate": > > > > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031", > > > > > > > > > > > > So the dynamicFields now have 2 values, so the next time that I > want > > > to > > > > > > update the document a schema error is throw because im trying to > > > store > > > > a > > > > > > collection into a none multivalued field. > > > > > > > > > > > > > > > > > > Thanks in advanced. > > > > > > > > > > > > > > > > > > >
Re: How to reordering search result by some function query
> > boost=product_guideline_score Thank you Upayavira. Leonardo, thanks for the suggestion. But I think boost parameter will work great for us. Thank you so much for your help. With Regards Aman Tandon On Thu, Sep 10, 2015 at 5:11 PM, Upayavira <u...@odoko.co.uk> wrote: > Aman, > > If you are using edismax then what you have written is just fine. > > For Lucene query parser queries, wrap them with the boost query parser: > > q={!boost b=product_guideline_score v=$qq}=jute > > Note in your example you don't need product(), just do > boost=product_guideline_score > > Upayavira > > On Thu, Sep 10, 2015, at 07:33 AM, Aman Tandon wrote: > > Hi, > > > > I figured it out to implement the same. I will be doing this by using the > > boost parameter > > > > e.g. http://server:8112/solr/products/select?q=jute=title > > *=product(1,product_guideline_score)* > > > > If there is any other alternative then please suggest. > > > > With Regards > > Aman Tandon > > > > On Thu, Sep 10, 2015 at 11:02 AM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I have a requirement to reorder the search results by multiplying the > *text relevance > > > score* of a product with the *product_guideline_score,* which will be > > > stored in index and will have some floating point number. > > > > > > e.g. On searching the *jute* in title if we got some results ID1 & ID2 > > > > > > ID1 -> title = jute > > > score = 8.0 > > > * product_guideline_score = 2.0* > > > > > > ID2 -> title = jute bags > > > score = 7.5 > > > * product_guideline_score** = 2.2* > > > > > > So the new score should be like this > > > > > > ID1 -> title = jute > > > score = *product_score * 8 = 16.0* > > > * product_guideline_score** = 2.0* > > > > > > ID2 -> title = jute bags > > > score = *product_score * 7.5 = 16.5* > > > * product_guideline_score** = 2.2* > > > > > > *So new ordering should be* > > > > > > ID2 -> title = jute bags > > > score* = 16.5* > > > > > > ID1 -> title = jute > > > score =* 16.0* > > > > > > How can I do this in single query on runtime in solr. > > > > > > With Regards > > > Aman Tandon > > > >
Re: How to reordering search result by some function query
Hi, I figured it out to implement the same. I will be doing this by using the boost parameter e.g. http://server:8112/solr/products/select?q=jute=title *=product(1,product_guideline_score)* If there is any other alternative then please suggest. With Regards Aman Tandon On Thu, Sep 10, 2015 at 11:02 AM, Aman Tandon <amantandon...@gmail.com> wrote: > Hi, > > I have a requirement to reorder the search results by multiplying the *text > relevance > score* of a product with the *product_guideline_score,* which will be > stored in index and will have some floating point number. > > e.g. On searching the *jute* in title if we got some results ID1 & ID2 > > ID1 -> title = jute > score = 8.0 > * product_guideline_score = 2.0* > > ID2 -> title = jute bags > score = 7.5 > * product_guideline_score** = 2.2* > > So the new score should be like this > > ID1 -> title = jute > score = *product_score * 8 = 16.0* > * product_guideline_score** = 2.0* > > ID2 -> title = jute bags > score = *product_score * 7.5 = 16.5* > * product_guideline_score** = 2.2* > > *So new ordering should be* > > ID2 -> title = jute bags > score* = 16.5* > > ID1 -> title = jute > score =* 16.0* > > How can I do this in single query on runtime in solr. > > With Regards > Aman Tandon >
Boosting related doubt?
Hi, I need to ask that when i am looking for the all the parameters of the query using the *echoParams=ALL*, I am getting the boost parameter twice in the information printed on the browser screen. So does it mean that it is also applying twice on the data/result set and we are using the ? ** * 0* * 66* * * ** * map(query({!dismax qf=mcatid v=$mc1 pf=""}),0,0,1,2.0)* * map(eff_views,1,2,1.15,1)* * map(query({!dismax qf=titlex v=$ql1 pf=""}),0,0,1,1.5)* * map(query({!dismax qf=titlex v=$ql2 pf=""}),0,0,1,1.5)* * map(query({!dismax qf=attribs v='poorDescription' pf=''},0),0,0,1,0.02)* * if(exists(itemprice2),map(query({!dismax qf=itemprice2 v='0'}),0,0,1.2,1),1)* * map(sdesclen,0,150,1,1.5)* * map(sdesclen,0,0,0.1,1)* * map(CustTypeWt,700,1869,1.1,1)* * map(CustTypeWt,699,699,1.2,1)* * map(CustTypeWt,199,199,1.3,1)* * map(CustTypeWt,0,179,1.35,1)* * map(CustTypeWt,3399,3999,0.07,1)* * map(query({!dismax qf=attribs v='hot'}),0,0,1,1.2)* * map(query({!dismax qf=isphoto v='true' pf=""}),0,0,0.05,1)* ** ** * mcatid:(1223 6240 825 1936 31235) titlex:("imswjutebagimsw")* * attribs:(locprefglobal locprefnational locprefcity locprefunknown)* * displayid:4768979112* * +((+datatype:product +attribs:(aprstatus20 aprstatus40 aprstatus50) +aggregate:true -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) (+CustTypeWt:[149 TO 1499]) CustTypeWt:1870)) (+datatype:company -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) (+CustTypeWt:[149 TO 1499]) CustTypeWt:1870))) -attribs:liststatusdnf* ** *2-1 470%* ** * {!ex=cityf}city* * {!ex=datatypef}datatype* * {!ex=biztypef}biztype* ** *default* *ALL* *displayid,datatype,title,smalldescorg,photo,catid,mcatname,companyname,CustTypeWt,glusrid,usrpcatflname,paidurl,fcpurl,city,state,countryname,countryiso,tscode,address,state,zipcode,phone,mobile,contactperson,pns,dupimg,smalldesc,etoofrqty,lastactiondatet,mcatid,isadult,pnsdisabled,membersince,locpref,categoryinfo,distance:geodist($lat,$lon,latlon),iildisplayflag,dispflagval,biztype,datarefid,parentglusrid,itemcode,itemprice,itemcurrency,largedesc,ecom_url,ecom_source_id,moq,moq_type* *0* *20* *true* *true* *15* *true* ** * mcatnametext^0.2* * titlews^0.5* * smalldesc^0.01* * title_text^1.5* * usrpcatname^0.1* * customspell^0.1* ** *true* ** * mcatnametext^0.5* * titlews* * title_text^3* * usrpcatname^0.1* * smalldesc^0.01* * customspell^0.1* ** *true* *1* *10* *xml* *true* *0* *parentglusrid* *true* *true* *im.search* *2* *true* *ALL* *1* *0* ** * mcatid:(1223 6240 825 1936 31235) titlex:("imswjutebagimsw")* * attribs:(locprefglobal locprefnational locprefcity locprefunknown)* * displayid:4768979112* * +((+datatype:product +attribs:(aprstatus20 aprstatus40 aprstatus50) +aggregate:true -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) (+CustTypeWt:[149 TO 1499]) CustTypeWt:1870)) (+datatype:company -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) (+CustTypeWt:[149 TO 1499]) CustTypeWt:1870))) -attribs:liststatusdnf* ** *20* *jute bags* *true* *"jutebagimsw"* *"bagimsw"* *"1223"* ** * map(query({!dismax qf=mcatid v=$mc1 pf=""}),0,0,1,2.0)* * map(eff_views,1,2,1.15,1)* * map(query({!dismax qf=titlex v=$ql1 pf=""}),0,0,1,1.5)* * map(query({!dismax qf=titlex v=$ql2 pf=""}),0,0,1,1.5)* * map(query({!dismax qf=attribs v='poorDescription' pf=''},0),0,0,1,0.02)* * if(exists(itemprice2),map(query({!dismax qf=itemprice2 v='0'}),0,0,1.2,1),1)* * map(sdesclen,0,150,1,1.5)* * map(sdesclen,0,0,0.1,1)* * map(CustTypeWt,700,1869,1.1,1)* * map(CustTypeWt,699,699,1.2,1)* * map(CustTypeWt,199,199,1.3,1)* * map(CustTypeWt,0,179,1.35,1)* * map(CustTypeWt,3399,3999,0.07,1)* * map(query({!dismax qf=attribs v='hot'}),0,0,1,1.2)* * map(query({!dismax qf=isphoto v='true' pf=""}),0,0,0.05,1)* ** *xml* *0* *0.3* *synonym_edismax* *on* *true* * * ** With Regards Aman Tandon
How to reordering search result by some function query
Hi, I have a requirement to reorder the search results by multiplying the *text relevance score* of a product with the *product_guideline_score,* which will be stored in index and will have some floating point number. e.g. On searching the *jute* in title if we got some results ID1 & ID2 ID1 -> title = jute score = 8.0 * product_guideline_score = 2.0* ID2 -> title = jute bags score = 7.5 * product_guideline_score** = 2.2* So the new score should be like this ID1 -> title = jute score = *product_score * 8 = 16.0* * product_guideline_score** = 2.0* ID2 -> title = jute bags score = *product_score * 7.5 = 16.5* * product_guideline_score** = 2.2* *So new ordering should be* ID2 -> title = jute bags score* = 16.5* ID1 -> title = jute score =* 16.0* How can I do this in single query on runtime in solr. With Regards Aman Tandon
Re: Maximum Number of entires in External Field?
> > I can provide examples if needed. Yes that will be so much helpful. Thank you so much. Then I will try both methodology. And will report the results back here. With Regards Aman Tandon On Tue, Sep 8, 2015 at 2:11 PM, Upayavira <u...@odoko.co.uk> wrote: > If you have just 5-7 items, then an external file will work, as will the > join query. You'll need to handle the 'default' case with the join > query, that is, making sure you do OR so that > documents matching the join are boosted above those matching the main > query, rather than the join being a filter on the main query. > > I can provide examples if needed. > > Upayavira > > On Mon, Sep 7, 2015, at 07:21 PM, Aman Tandon wrote: > > I am currently doing boosting for 5-7 things. will it work great with > > this > > too? > > > > With Regards > > Aman Tandon > > > > On Mon, Sep 7, 2015 at 11:42 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > External file field would work, but requires a full import of the > > > external file field every time you change a single entry, which is > > > pretty extreme. > > > > > > I've tested out "score joins" which seemed to perform very well and > > > achieved the same effect, but using another core, rather than an > > > external file. > > > > > > Thus: > > > > > > {!join score=max fromIndex=prices from=id to=id}{!boost b=price}*:* > > > > > > seemed to do the job of using the price as a boost. Of course you could > > > extend this like so: > > > > > > q={!join score=max fromIndex=prices from=id to=id}{!boost b=$b}*:* > > > b=sqrt(price) > > > > > > or such things to make the price a more reasonable value. > > > > > > Upayavira > > > > > > On Mon, Sep 7, 2015, at 06:21 PM, Aman Tandon wrote: > > > > Any suggestions? > > > > > > > > With Regards > > > > Aman Tandon > > > > > > > > On Mon, Sep 7, 2015 at 1:07 PM, Aman Tandon <amantandon...@gmail.com > > > > > > wrote: > > > > > > > > > Hi Upayavira, > > > > > > > > > > Have you tried it? > > > > > > > > > > > > > > > No > > > > > > > > > > E.g. external file fields don't play nice with Solr Cloud > > > > > > > > > > > > > > > We are not using Solr Cloud. > > > > > > > > > > > > > > >> What are you using the external file for? > > > > > > > > > > > > > > > We are doing the boosting in the search result which are *having > price > > > by > > > > > 1.2* & *country is India by 1.1*. We are doing by using the > boosting > > > > > parameter in conjucation with query & map function e.g. > > > *=map(query({!dismax > > > > > qf=hasPrice v='yes' pf=''},0),1,1,1,1)* > > > > > > > > > > This is being done with 5/6 parameters. And I am hoping it will > > > increase > > > > > query time. So I am planning to make the single score and populate > it > > > in > > > > > external file field. And this might reduce some time. > > > > > > > > > > Just to mention we are doing incremental updates after every 10 > > > minutes. > > > > > > > > > > With Regards > > > > > Aman Tandon > > > > > > > > > > On Mon, Sep 7, 2015 at 12:53 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > >> Have you tried it? I suspect your issue will be with the process > of > > > > >> reloading the external file rather than consuming it once loaded. > > > > >> > > > > >> What are you using the external file for? There may be other ways > > > also. > > > > >> E.g. external file fields don't play nice with Solr Cloud. > > > > >> > > > > >> Upayavira > > > > >> > > > > >> On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote: > > > > >> > Hi, > > > > >> > > > > > >> > How much ids information can I define in External File? > Currently I > > > am > > > > >> > having the 100 Million records in my index. > > > > >> > > > > > >> > With Regards > > > > >> > Aman Tandon > > > > >> > > > > > > > > > > > > > >
Re: Maximum Number of entires in External Field?
Hi Upayavira, Have you tried it? No E.g. external file fields don't play nice with Solr Cloud We are not using Solr Cloud. > What are you using the external file for? We are doing the boosting in the search result which are *having price by 1.2* & *country is India by 1.1*. We are doing by using the boosting parameter in conjucation with query & map function e.g. *=map(query({!dismax qf=hasPrice v='yes' pf=''},0),1,1,1,1)* This is being done with 5/6 parameters. And I am hoping it will increase query time. So I am planning to make the single score and populate it in external file field. And this might reduce some time. Just to mention we are doing incremental updates after every 10 minutes. With Regards Aman Tandon On Mon, Sep 7, 2015 at 12:53 PM, Upayavira <u...@odoko.co.uk> wrote: > Have you tried it? I suspect your issue will be with the process of > reloading the external file rather than consuming it once loaded. > > What are you using the external file for? There may be other ways also. > E.g. external file fields don't play nice with Solr Cloud. > > Upayavira > > On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote: > > Hi, > > > > How much ids information can I define in External File? Currently I am > > having the 100 Million records in my index. > > > > With Regards > > Aman Tandon >
Maximum Number of entires in External Field?
Hi, How much ids information can I define in External File? Currently I am having the 100 Million records in my index. With Regards Aman Tandon
Re: Maximum Number of entires in External Field?
Any suggestions? With Regards Aman Tandon On Mon, Sep 7, 2015 at 1:07 PM, Aman Tandon <amantandon...@gmail.com> wrote: > Hi Upayavira, > > Have you tried it? > > > No > > E.g. external file fields don't play nice with Solr Cloud > > > We are not using Solr Cloud. > > >> What are you using the external file for? > > > We are doing the boosting in the search result which are *having price by > 1.2* & *country is India by 1.1*. We are doing by using the boosting > parameter in conjucation with query & map function e.g. > *=map(query({!dismax > qf=hasPrice v='yes' pf=''},0),1,1,1,1)* > > This is being done with 5/6 parameters. And I am hoping it will increase > query time. So I am planning to make the single score and populate it in > external file field. And this might reduce some time. > > Just to mention we are doing incremental updates after every 10 minutes. > > With Regards > Aman Tandon > > On Mon, Sep 7, 2015 at 12:53 PM, Upayavira <u...@odoko.co.uk> wrote: > >> Have you tried it? I suspect your issue will be with the process of >> reloading the external file rather than consuming it once loaded. >> >> What are you using the external file for? There may be other ways also. >> E.g. external file fields don't play nice with Solr Cloud. >> >> Upayavira >> >> On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote: >> > Hi, >> > >> > How much ids information can I define in External File? Currently I am >> > having the 100 Million records in my index. >> > >> > With Regards >> > Aman Tandon >> > >
Re: Maximum Number of entires in External Field?
I am currently doing boosting for 5-7 things. will it work great with this too? With Regards Aman Tandon On Mon, Sep 7, 2015 at 11:42 PM, Upayavira <u...@odoko.co.uk> wrote: > External file field would work, but requires a full import of the > external file field every time you change a single entry, which is > pretty extreme. > > I've tested out "score joins" which seemed to perform very well and > achieved the same effect, but using another core, rather than an > external file. > > Thus: > > {!join score=max fromIndex=prices from=id to=id}{!boost b=price}*:* > > seemed to do the job of using the price as a boost. Of course you could > extend this like so: > > q={!join score=max fromIndex=prices from=id to=id}{!boost b=$b}*:* > b=sqrt(price) > > or such things to make the price a more reasonable value. > > Upayavira > > On Mon, Sep 7, 2015, at 06:21 PM, Aman Tandon wrote: > > Any suggestions? > > > > With Regards > > Aman Tandon > > > > On Mon, Sep 7, 2015 at 1:07 PM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > > > Hi Upayavira, > > > > > > Have you tried it? > > > > > > > > > No > > > > > > E.g. external file fields don't play nice with Solr Cloud > > > > > > > > > We are not using Solr Cloud. > > > > > > > > >> What are you using the external file for? > > > > > > > > > We are doing the boosting in the search result which are *having price > by > > > 1.2* & *country is India by 1.1*. We are doing by using the boosting > > > parameter in conjucation with query & map function e.g. > *=map(query({!dismax > > > qf=hasPrice v='yes' pf=''},0),1,1,1,1)* > > > > > > This is being done with 5/6 parameters. And I am hoping it will > increase > > > query time. So I am planning to make the single score and populate it > in > > > external file field. And this might reduce some time. > > > > > > Just to mention we are doing incremental updates after every 10 > minutes. > > > > > > With Regards > > > Aman Tandon > > > > > > On Mon, Sep 7, 2015 at 12:53 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > > >> Have you tried it? I suspect your issue will be with the process of > > >> reloading the external file rather than consuming it once loaded. > > >> > > >> What are you using the external file for? There may be other ways > also. > > >> E.g. external file fields don't play nice with Solr Cloud. > > >> > > >> Upayavira > > >> > > >> On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote: > > >> > Hi, > > >> > > > >> > How much ids information can I define in External File? Currently I > am > > >> > having the 100 Million records in my index. > > >> > > > >> > With Regards > > >> > Aman Tandon > > >> > > > > > > >
Re: How to configure solr to not bind at 8983
Hi Samy, Any particular reason to not to use the -p paratmeter to start it on another port? ./solr start -p 9983 With Regards Aman Tandon On Thu, Aug 20, 2015 at 2:02 PM, Modassar Ather modather1...@gmail.com wrote: I think you need to add the port number in solr.xml too under hostPort attribute. STOP.PORT is SOLR.PORT-1000 and set under SOLR_HOME/bin/solr file. As far as I understand this can not be changed but I am not sure. Regards, Modassar On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia samyat...@hotmail.de wrote: I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. best regards. Samy
Re: docValues
Hi, I am seeing a significant difference in the query time after using docValue what kind of difference, is it good or bad? With Regards Aman Tandon On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com wrote: I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% In my experience, a healthy Java application (after the heap size has stabilized) will have a heap utilization graph where the low points are between 50 and 75 percent. If the low points in heap utilization are consistently below 25 percent, you would be better off reducing the heap size and allowing the OS to use that memory instead. If you want to track heap utilization, JVM-Memory in the Solr dashboard is a very poor tool. Use tools like visualvm or jconsole. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap I need to add what I said about very low heap utilization to that wiki page. Thanks, Shawn
Re: Query ReRanking question
Hi, Very-2 nice mail thread. I think many people might be facing the problem of maintaining the relevance and recency both at the same time. boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query ($q),0,1)) Currently in our search we are using the recency without any condition. And this one should work well with our case too. Thanks Joel, Erick Ravi. But Joel I think it is good idea to apply the sorting or any function in the last or only in the re-ranking otherwise it will affect the search relevance. If many users like this idea, then we should work for this feature. With Regards Aman Tandon On Fri, Jan 16, 2015 at 11:23 PM, Erick Erickson erickerick...@gmail.com wrote: Ravi: Yep, this is the standard way to have recency influence the rank rather than take over absolute ordering via a sort=date_time or similar. Of course how strongly the rank is influenced is more an art than a science as far as figuring out what actual constants to put in Best, Erick On Fri, Jan 16, 2015 at 8:03 AM, Ravi Solr ravis...@gmail.com wrote: As per Erick's suggestion reposting my response to the group. Joel and Erick Thank you very much for helping me out with the ReRanking question a while ago. I have an alternative which seems to be working better for me than ReRanking, can you kindly let me know of any pitfalls that you guys can think of about the this approach ?? Since we value relevancy recency at the same time even though both are mutually exclusive, i thought maybe I can use the function queries to adjust the boost as follows boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1)) What I intended to do here is - if it matched a more recent doc it will take recency into consideration, however if the relevancy is better than date boost we keep relevancy. What do you guys think ?? Thanks, Ravi Kiran Bhaskar On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr ravis...@gmail.com wrote: Joel and Erick, Thank you very much for explaining how the ReRanking works. Now its a bit more clear. Thanks, Ravi Kiran Bhaskar On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote: Oops wrong usage pattern. It should be: 1) Main query is sorted by a field (scores tracked silently in the background). 2) Reranker is reRanking docs based on the score from the main query. Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com wrote: Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100
Re: DocValues: Which format is better Default or Memory?
Hi, I tried to query the without and with docValues, the query with docValues was taking more time. Does it may be due to IO got involved as some data will be in some file. Are you sure anything else could affect your times ? Yes I am sure. We re-indexed the whole index of 40 Million records, to implement the docValues to improve the speed. And I somehow managed to do the simultaneous query for with/without docValues and I am getting higher time with docValues by approx 200ms. As far as I could see it is increasing as no of hits are increasing. *My configuration for docValue is:* field name=citydv type=string docValues=true stored=true required= false omitNorms=true multiValued=false / With Regards Aman Tandon On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: So first of all, DocValues is a strategy to store on the disk ( or in memory) the Un-inverted index for the field of interests. This has been done to SPEED UP the faceting calculus using the fc algorithm, and improve the memory usage. It is really weird that this is the cause of a degrading of performances. Building the DocValues should improve the query time to build facets, increasing the indexing time. Are you sure anything else could affect your times ? let's try to help you out ! 2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I tried to use the docValues to reduce the search time, but when I am using the default format for docValues it is taking more time as compared to normal faceting technique (without docValues). Should I go for Memory format or there is something missing? *Note:-* I am doing the indexing at every 10 minutes and I am using solr 4.8.1 With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: DocValues: Which format is better Default or Memory?
Anything wrong? With Regards Aman Tandon On Thu, Jul 2, 2015 at 4:19 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I tried to query the without and with docValues, the query with docValues was taking more time. Does it may be due to IO got involved as some data will be in some file. Are you sure anything else could affect your times ? Yes I am sure. We re-indexed the whole index of 40 Million records, to implement the docValues to improve the speed. And I somehow managed to do the simultaneous query for with/without docValues and I am getting higher time with docValues by approx 200ms. As far as I could see it is increasing as no of hits are increasing. *My configuration for docValue is:* field name=citydv type=string docValues=true stored=true required =false omitNorms=true multiValued=false / With Regards Aman Tandon On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: So first of all, DocValues is a strategy to store on the disk ( or in memory) the Un-inverted index for the field of interests. This has been done to SPEED UP the faceting calculus using the fc algorithm, and improve the memory usage. It is really weird that this is the cause of a degrading of performances. Building the DocValues should improve the query time to build facets, increasing the indexing time. Are you sure anything else could affect your times ? let's try to help you out ! 2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I tried to use the docValues to reduce the search time, but when I am using the default format for docValues it is taking more time as compared to normal faceting technique (without docValues). Should I go for Memory format or there is something missing? *Note:-* I am doing the indexing at every 10 minutes and I am using solr 4.8.1 With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: DocValues: Which format is better Default or Memory?
So should I use Memory format? With Regards Aman Tandon On Thu, Jul 2, 2015 at 9:20 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Alessandro Benedetti benedetti.ale...@gmail.com wrote: DocValues is a strategy to store on the disk ( or in memory) the Un-inverted index for the field of interests. True. This has been done to SPEED UP the faceting calculus using the fc algorithm, and improve the memory usage. Part of the reason was to speed up the _startup_ time for faceting. This is not the first time I read about people getting poorer query-performance with DocValues. It does make sense: DocValues in the index means that they compete with other files for disk caching and even when they are fully cached, the UnInverted structure has a speed edge due to being directly accessible as standard on-heap memory structures. The difference is likely to vary a great deal depending on concrete corpus hardware. - Toke Eskildsen
DocValues: Which format is better Default or Memory?
Hi, I tried to use the docValues to reduce the search time, but when I am using the default format for docValues it is taking more time as compared to normal faceting technique (without docValues). Should I go for Memory format or there is something missing? *Note:-* I am doing the indexing at every 10 minutes and I am using solr 4.8.1 With Regards Aman Tandon
Re: Help: Problem in customized token filter
Steve, Thank you thank you so much. You guys are awesome. Steve how can i learn more about the lucene indexing process in more detail. e.g. after we send documents for indexing which function calls till the doc actually store in index files. I will be thankful to you. If you guide me here. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:48 AM, Steve Rowe sar...@gmail.com wrote: Aman, Solr uses the same Token filter instances over and over, calling reset() before sending each document through. Your code sets “exhausted to true and then never sets it back to false, so the next time the token filter instance is used, its “exhausted value is still true, so no input stream tokens are concatenated ever again. Does that make sense? Steve www.lucidworks.com On Jun 19, 2015, at 1:10 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Steve, you never set exhausted to false, and when the filter got reused, *it incorrectly carried state from the previous document.* Thanks for replying, but I am not able to understand this. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import
Help: Problem in customized token filter
Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* * }* * /*** * * {@inheritDoc}* * */* * @Override* * public final boolean incrementToken() throws IOException {* *while (!exhausted input.incrementToken()) {* * char terms[] = charTermAttribute.buffer();* * int termLength = charTermAttribute.length();* * if(typeAtrr.type().equals(ALPHANUM)){* * stringBuilder.append(terms, 0, termLength);* * }* * charTermAttribute.copyBuffer(terms, 0, termLength);* * return true;* *}* *if (!exhausted) {* * exhausted = true;* * String sb = stringBuilder.toString();* * System.err.println(The Data got is +sb);* * int sbLength = sb.length();* * //posIncr.setPositionIncrement(0);* * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* * offsetAttribute.setOffset(offsetAttribute.startOffset(), offsetAttribute.startOffset()+sbLength);* * stringBuilder.setLength(0);* * //typeAtrr.setType(CONCATENATED);* * return true;* *}* *return false;* * }* *}* With Regards Aman Tandon
Re: Help: Problem in customized token filter
Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* * }* * /*** * * {@inheritDoc}* * */* * @Override* * public final boolean incrementToken() throws IOException {* *while (!exhausted input.incrementToken()) {* * char terms[] = charTermAttribute.buffer();* * int termLength = charTermAttribute.length();* * if(typeAtrr.type().equals(ALPHANUM)){* * stringBuilder.append(terms, 0, termLength);* * }* * charTermAttribute.copyBuffer(terms, 0, termLength);* * return true;* *}* *if (!exhausted) {* * exhausted = true;* * String sb = stringBuilder.toString();* * System.err.println(The Data got is +sb);* * int sbLength = sb.length();* * //posIncr.setPositionIncrement(0);* * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* * offsetAttribute.setOffset(offsetAttribute.startOffset(), offsetAttribute.startOffset()+sbLength);* * stringBuilder.setLength(0);* * //typeAtrr.setType(CONCATENATED);* * return true;* *}* *return false;* * }* *}* With Regards Aman Tandon
Re: Help: Problem in customized token filter
Hi Steve, you never set exhausted to false, and when the filter got reused, *it incorrectly carried state from the previous document.* Thanks for replying, but I am not able to understand this. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input
Re: Help: Problem in customized token filter
Yes I just saw. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:39 AM, Steve Rowe sar...@gmail.com wrote: Aman, My version won’t produce anything at all, since incrementToken() always returns false… I updated the gist (at the same URL) to fix the problem by returning true from incrementToken() once and then false until reset() is called. It also handles the case when the concatenated token is zero length by not emitting a token. Steve www.lucidworks.com On Jun 19, 2015, at 12:55 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder
Re: How to create concatenated token
Hi Erick, In that issue you forwarded to me, they want to make one token from all tokens received from token stream but in my case I want to keep the tokens same and create and extra new token which is concat of all the tokens. I'd guess, is the case here. I mean do you really want to concatenate 50 tokens? We are applying it on *title field* of product so max length can be 10 I guess and that too will be in rare case. With Regards Aman Tandon On Wed, Jun 17, 2015 at 7:16 PM, Erick Erickson erickerick...@gmail.com wrote: If you used the JIRA I linked, vote for it, add any improvements etc. Anyone can attach a patch to a JIRA, you just have to create a login. That said, this may be too rare a use-case to deal with. I just thought of shingling which I should have suggested before that will work for concatenating small numbers of tokens which, I'd guess, is the case here. I mean do you really want to concatenate 50 tokens? Best, Erick On Wed, Jun 17, 2015 at 12:07 AM, Aman Tandon amantandon...@gmail.com wrote: Dear Erick, e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 I did implemented the filter as per my requirement. Thank you so much for your help and guidance. So how could I contribute it to the solr. With Regards Aman Tandon On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti
Contribute the Customized Phonetic Filter to Apache Solr
Hi, We created the new phonetic filter, It is working great on our products, mostly of our suppliers are Indian, it is quite helpful for us to provide the exact result e.g. 1) rikshaw, still able to find the suppliers of rickshaw 2) telefone, still able to find the suppliers of telephone We also analyzed our search satisfaction feedback, it improved by 13% (54% - 67%) just after implementing the same. And we want to contribute the same to solr, So how could I do it. With Regards Aman Tandon
Re: How to create concatenated token
Dear Erick, e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 I did implemented the filter as per my requirement. Thank you so much for your help and guidance. So how could I contribute it to the solr. With Regards Aman Tandon On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
How to create concatenated token
Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon
Re: How to create concatenated token
e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: How to create concatenated token
We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: How to create concatenated token
Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: How to create concatenated token
Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: How To: Debuging the whole indexing process
Please help me here With Regards Aman Tandon On Sat, May 30, 2015 at 12:43 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks Alex, yes it for my testing to understand the code/process flow actually. Any other ideas. With Regards Aman Tandon On Fri, May 29, 2015 at 12:48 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: In production or in test? I assume in test. This level of detail usually implies some sort of Java debugger and java instrumentation enabled. E.g. Chronon, which is commercial but can be tried as a plugin with IntelliJ Idea full version trial. Regards, Alex On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
How To: Debuging the whole indexing process
Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
Re: docValues: Can we apply synonym
Hi Upayavira, How the copyField will help in my scenario when I have to add the synonym in docValue enable field. With Regards Aman Tandon On Sat, May 30, 2015 at 1:18 AM, Upayavira u...@odoko.co.uk wrote: Use copyField to clone the field for faceting purposes. Upayavira On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote: Hi Erick, Thanks for suggestion, We are this query parser plugin ( *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word synonym. So it does work slower than edismax that's why it is not in contrib right? (I am asking this question because we are using for all our searches to handle 10 multiword ice cube, icecube etc) *Moreover I thought a solution for this docValue problem* I need to make city field as *multivalued* and by this I mean i will add the synonym (*mumbai, bombay*) as an extra value to that field if present. Now searching operation will work fine as before. *field name=citymumbai/fieldfield name=citybombay/field* The only prob is if we have to remove the 'city alias/synonym facets' when we are providing results to the clients. *mumbai, 1000* With Regards Aman Tandon On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com wrote: Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate
Re: docValues: Can we apply synonym
Hi Erick, Thanks for suggestion, We are this query parser plugin ( *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word synonym. So it does work slower than edismax that's why it is not in contrib right? (I am asking this question because we are using for all our searches to handle 10 multiword ice cube, icecube etc) *Moreover I thought a solution for this docValue problem* I need to make city field as *multivalued* and by this I mean i will add the synonym (*mumbai, bombay*) as an extra value to that field if present. Now searching operation will work fine as before. *field name=citymumbai/fieldfield name=citybombay/field* The only prob is if we have to remove the 'city alias/synonym facets' when we are providing results to the clients. *mumbai, 1000* With Regards Aman Tandon On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com wrote: Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com : Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue
Re: How To: Debuging the whole indexing process
Thanks Alex, yes it for my testing to understand the code/process flow actually. Any other ideas. With Regards Aman Tandon On Fri, May 29, 2015 at 12:48 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: In production or in test? I assume in test. This level of detail usually implies some sort of Java debugger and java instrumentation enabled. E.g. Chronon, which is commercial but can be tried as a plugin with IntelliJ Idea full version trial. Regards, Alex On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
Re: docValues: Can we apply synonym
Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com : We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available
Re: SolrCloud: Creating more shard at runtime will lower down the load?
Thank you Alessandro. With Regards Aman Tandon On Thu, May 28, 2015 at 3:57 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Aman, this feature can be interesting for you : Shard Splitting When you create a collection in SolrCloud, you decide on the initial number shards to be used. But it can be difficult to know in advance the number of shards that you need, particularly when organizational requirements can change at a moment's notice, and the cost of finding out later that you chose wrong can be high, involving creating new cores and re-indexing all of your data. The ability to split shards is in the Collections API. It currently allows splitting a shard into two pieces. The existing shard is left as-is, so the split action effectively makes two copies of the data as new shards. You can delete the old shard at a later time when you're ready. More details on how to use shard splitting is in the section on the Collections API https://cwiki.apache.org/confluence/display/solr/Collections+API. To answer to your questions : 1) If your shard is properly splitter, and you use Solr Cloud to distribute the requests and load balancing, the users will not see anything 2) Of course it is but you must be careful, because maybe you want to add replicas if the amount of load is your concern. Usually sharing is because an increasing amount of content to process and search. Adding replication is because an increasing demand of queries and high load for the servers. Let me know more details if you like ! Cheers 2015-05-28 4:44 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a question regarding the solr cloud. The load on our search server are increasing day by day as our no of visitors are keep on increasing. So I have a scenario, I want to slice the data at the Runtime, by creating the more shards of the data. *i)* Does it affect the current queries *ii)* Does it lower down the load on our search servers? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Guidance needed to modify ExtendedDismaxQParserPlugin
Hi, *Problem Statement: *query - i need leather jute bags If we are searching on the *title *field using the pf2 ( *server:8003/solr/core0/select?q=i%20need%20leather%20jute%20bagspf2=titlexdebug=querydefType=edismaxwt=xmlrows=0*). Currently it will create the shingled phrases like i need, need leather, leather jute, jute bags. *str name=parsedquery_toString+(((title:i)~0.01 (title:need)~0.01 (title:leather)~0.01 (title:jute)~0.01 (title:bag)~0.01)~3) ((titlex:i need)~0.01 (titlex:need leather)~0.01 (titlex:leather jute)~0.01 (titlex:jute bag)~0.01)/str* *Requirement: * I want to customize the ExtendedDismaxQParserPlugin to generate custom phrase queries on pf2. I want to create the phrase tokens like jute bags, leather jute bags So the irrelevant tokens like *i need*, *need leather* didn't match any search results. Because in most of the scenarios in our business, we observed (from Google Analytics) that last two words are more important in the query. So I need to generate only these two tokens by calling my xyz function instead of calling the function *addShingledPhraseQueries. *Please guide me here. Should I modify the same java class or create another class. And In case of another class how and where should I need to define our customized *defType* . With Regards Aman Tandon
SolrCloud: Creating more shard at runtime will lower down the load?
Hi, I have a question regarding the solr cloud. The load on our search server are increasing day by day as our no of visitors are keep on increasing. So I have a scenario, I want to slice the data at the Runtime, by creating the more shards of the data. *i)* Does it affect the current queries *ii)* Does it lower down the load on our search servers? With Regards Aman Tandon
Re: docValues: Can we apply synonym
Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use
Re: docValues: Can we apply synonym
Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon
Re: docValues: Can we apply synonym
Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Thank you so much Ahmet :) With Regards Aman Tandon On Wed, May 27, 2015 at 1:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Aman, Start with creating a jira account and vote/watch that issue. Post on the issue to see if there is still interest on this. Declare that you will be volunteer and ask kindly for guidance. Creator of the issue or one the watchers may respond. Try to digest ideas discussed on the issue. Rise yours. Collaborate. Don't get discouraged if nobody responds, please remember that committers are busy people. If you have implement something you want to share, upload a patch : https://wiki.apache.org/solr/HowToContribute Good luck, Ahmet On Tuesday, May 26, 2015 7:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Ahmet, Can you please guide me to contribute for this *issue*. I haven't did this before. So I need to know...what should I need to know and how should I start..what IDE or whatever you thought is need to know for a novice. I will be thankful to you :) With Regards Aman Tandon On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com wrote: That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
docValues: Can we apply synonym
Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Hi Ahmet, Can you please guide me to contribute for this *issue*. I haven't did this before. So I need to know...what should I need to know and how should I start..what IDE or whatever you thought is need to know for a novice. I will be thankful to you :) With Regards Aman Tandon On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com wrote: That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: docValues: Can we apply synonym
Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: docValues: Can we apply synonym
We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Help/Guidance Needed : To reload kstem protword hash without full core reload
Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: Searcher is opening twice on Reload
Thanks chris, but in the issue it is mentioned that first searcher listener is opening twice but in my case firstly the firstSearcher is opening and then newSearcher. Is it same? With Regards Aman Tandon On Thu, May 14, 2015 at 11:05 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I suspect you aren't doing anything wrong, i think it's the same as this bug... https://issues.apache.org/jira/browse/SOLR-7035 : Date: Thu, 14 May 2015 12:53:34 +0530 : From: Aman Tandon amantandon...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: Searcher is opening twice on Reload : : Hi, : : Please help me here, when I am doing the reload of core, my searcher is : being opening twice. I am also attaching the logs output, please suggest me : what wrong I am doing here or this is default behavior on reload. : : May 14, 2015 12:47:38 PM org.apache.solr.spelling.DirectSolrSpellChecker : INFO: init: : {name=default,field=titlews,classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,accuracy=0.5,maxEdits=1,minPrefix=1,maxInspections=5,minQueryLength=5,maxQueryFrequency=100.0,thresholdTokenFrequency=100.0} : May 14, 2015 12:47:38 PM : org.apache.solr.handler.component.SpellCheckComponent : INFO: No queryConverter defined, using default converter : May 14, 2015 12:47:38 PM : org.apache.solr.handler.component.QueryElevationComponent : INFO: Loading QueryElevation from data dir: elevate.xml : May 14, 2015 12:47:38 PM org.apache.solr.handler.ReplicationHandler : INFO: Commits will be reserved for 1 : May 14, 2015 12:47:38 PM org.apache.solr.core.QuerySenderListener : INFO: QuerySenderListener sending requests to Searcher@41dc3c83 [IM-Search] : main{StandardDirectoryReader(segments_dd4:82296:nrt : _jdq(4.8):C5602938/2310052:delGen=3132 : _jkq(4.8):C6860454/1398005:delGen=2992 : _jx2(4.8):C5237053/1505048:delGen=3241 : _joo(4.8):C5825253/1599671:delGen=3323 : _k4d(4.8):C5860360/1916531:delGen=3150 : _o27(4.8):C5290435/1018865:delGen=370 : _mju(4.8):C5074973/1602707:delGen=1474 : _jka(4.8):C5172599/1774839:delGen=3202 : _nik(4.8):C4698916/1512091:delGen=804 _o8y(4.8):C1137592/521423:delGen=190 : _oeu(4.8):C469094/86291:delGen=29 _odq(4.8):C217505/65596:delGen=55 : _ogd(4.8):C50454/4155:delGen=5 _oea(4.8):C40833/7192:delGen=37 : _ofy(4.8):C73614/7273:delGen=13 _ogx(4.8):C395681/1388:delGen=4 : _ogh(4.8):C7676/70:delGen=2 _ohf(4.8):C108769/21:delGen=2 : _ogc(4.8):C24435/384:delGen=4 _ogi(4.8):C23088/158:delGen=3 : _ogj(4.8):C4217/2:delGen=1 _ohs(4.8):C7 _oh6(4.8):C20509/205:delGen=5 : _oh7(4.8):C3171 _oho(4.8):C6/1:delGen=1 _ohq(4.8):C1 : _ohv(4.8):C10484/996:delGen=2 _ohx(4.8):C500 _ohy(4.8):C1 _ohz(4.8):C1)} : ^[OFMay 14, 2015 12:47:43 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=/solr path=/select : params={spellcheck=truelon=0q=qwt=jsonqt=opsview.monitorlat=0rows=0ps=1} : hits=6 status=0 QTime=1 : May 14, 2015 12:47:44 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=ricedistrib=falseqt=im.search.intentrows=25} : hits=42749 status=0 QTime=5667 : May 14, 2015 12:47:58 PM org.apache.solr.request.UnInvertedField : INFO: UnInverted multi-valued field : {field=city,memSize=209216385,tindexSize=11029,time=3904,phase1=3783,nTerms=77614,bigTerms=3,termInstances=31291566,uses=0} : May 14, 2015 12:48:01 PM org.apache.solr.request.UnInvertedField : INFO: UnInverted multi-valued field : {field=biztype,memSize=208847178,tindexSize=40,time=1318,phase1=1193,nTerms=9,bigTerms=4,termInstances=1607459,uses=0} : May 14, 2015 12:48:01 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=ricedistrib=falseqt=im.searchrows=25} : hits=57619 status=0 QTime=17194 : May 14, 2015 12:48:04 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=potassium+cyanidedistrib=falseqt=eto.search.offerrows=20} : hits=443 status=0 QTime=3272 : May 14, 2015 12:48:09 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=motor+spare+partsdistrib=falseqt=im.searchfq=attribs:(locprefglobal+locprefnational+locprefcity)rows=20} : hits=107297 status=0 QTime=5254 : May 14, 2015 12:48:09 PM org.apache.solr.core.QuerySenderListener : INFO: QuerySenderListener done. : May 14, 2015 12:48:09 PM : org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener : INFO: Loading spell index for spellchecker: default : May 14, 2015 12:48:09 PM : org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener : INFO: Loading spell index for spellchecker: wordbreak : May 14, 2015 12:48:09 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] Registered new
Re: Searcher is opening twice on Reload
Any help here.. With Regards Aman Tandon On Fri, May 15, 2015 at 1:24 PM, Aman Tandon amantandon...@gmail.com wrote: Thanks chris, but in the issue it is mentioned that first searcher listener is opening twice but in my case firstly the firstSearcher is opening and then newSearcher. Is it same? With Regards Aman Tandon On Thu, May 14, 2015 at 11:05 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I suspect you aren't doing anything wrong, i think it's the same as this bug... https://issues.apache.org/jira/browse/SOLR-7035 : Date: Thu, 14 May 2015 12:53:34 +0530 : From: Aman Tandon amantandon...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: Searcher is opening twice on Reload : : Hi, : : Please help me here, when I am doing the reload of core, my searcher is : being opening twice. I am also attaching the logs output, please suggest me : what wrong I am doing here or this is default behavior on reload. : : May 14, 2015 12:47:38 PM org.apache.solr.spelling.DirectSolrSpellChecker : INFO: init: : {name=default,field=titlews,classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,accuracy=0.5,maxEdits=1,minPrefix=1,maxInspections=5,minQueryLength=5,maxQueryFrequency=100.0,thresholdTokenFrequency=100.0} : May 14, 2015 12:47:38 PM : org.apache.solr.handler.component.SpellCheckComponent : INFO: No queryConverter defined, using default converter : May 14, 2015 12:47:38 PM : org.apache.solr.handler.component.QueryElevationComponent : INFO: Loading QueryElevation from data dir: elevate.xml : May 14, 2015 12:47:38 PM org.apache.solr.handler.ReplicationHandler : INFO: Commits will be reserved for 1 : May 14, 2015 12:47:38 PM org.apache.solr.core.QuerySenderListener : INFO: QuerySenderListener sending requests to Searcher@41dc3c83 [IM-Search] : main{StandardDirectoryReader(segments_dd4:82296:nrt : _jdq(4.8):C5602938/2310052:delGen=3132 : _jkq(4.8):C6860454/1398005:delGen=2992 : _jx2(4.8):C5237053/1505048:delGen=3241 : _joo(4.8):C5825253/1599671:delGen=3323 : _k4d(4.8):C5860360/1916531:delGen=3150 : _o27(4.8):C5290435/1018865:delGen=370 : _mju(4.8):C5074973/1602707:delGen=1474 : _jka(4.8):C5172599/1774839:delGen=3202 : _nik(4.8):C4698916/1512091:delGen=804 _o8y(4.8):C1137592/521423:delGen=190 : _oeu(4.8):C469094/86291:delGen=29 _odq(4.8):C217505/65596:delGen=55 : _ogd(4.8):C50454/4155:delGen=5 _oea(4.8):C40833/7192:delGen=37 : _ofy(4.8):C73614/7273:delGen=13 _ogx(4.8):C395681/1388:delGen=4 : _ogh(4.8):C7676/70:delGen=2 _ohf(4.8):C108769/21:delGen=2 : _ogc(4.8):C24435/384:delGen=4 _ogi(4.8):C23088/158:delGen=3 : _ogj(4.8):C4217/2:delGen=1 _ohs(4.8):C7 _oh6(4.8):C20509/205:delGen=5 : _oh7(4.8):C3171 _oho(4.8):C6/1:delGen=1 _ohq(4.8):C1 : _ohv(4.8):C10484/996:delGen=2 _ohx(4.8):C500 _ohy(4.8):C1 _ohz(4.8):C1)} : ^[OFMay 14, 2015 12:47:43 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=/solr path=/select : params={spellcheck=truelon=0q=qwt=jsonqt=opsview.monitorlat=0rows=0ps=1} : hits=6 status=0 QTime=1 : May 14, 2015 12:47:44 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=ricedistrib=falseqt=im.search.intentrows=25} : hits=42749 status=0 QTime=5667 : May 14, 2015 12:47:58 PM org.apache.solr.request.UnInvertedField : INFO: UnInverted multi-valued field : {field=city,memSize=209216385,tindexSize=11029,time=3904,phase1=3783,nTerms=77614,bigTerms=3,termInstances=31291566,uses=0} : May 14, 2015 12:48:01 PM org.apache.solr.request.UnInvertedField : INFO: UnInverted multi-valued field : {field=biztype,memSize=208847178,tindexSize=40,time=1318,phase1=1193,nTerms=9,bigTerms=4,termInstances=1607459,uses=0} : May 14, 2015 12:48:01 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=ricedistrib=falseqt=im.searchrows=25} : hits=57619 status=0 QTime=17194 : May 14, 2015 12:48:04 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=potassium+cyanidedistrib=falseqt=eto.search.offerrows=20} : hits=443 status=0 QTime=3272 : May 14, 2015 12:48:09 PM org.apache.solr.core.SolrCore : INFO: [IM-Search] webapp=null path=null : params={start=0event=firstSearcherq=motor+spare+partsdistrib=falseqt=im.searchfq=attribs:(locprefglobal+locprefnational+locprefcity)rows=20} : hits=107297 status=0 QTime=5254 : May 14, 2015 12:48:09 PM org.apache.solr.core.QuerySenderListener : INFO: QuerySenderListener done. : May 14, 2015 12:48:09 PM : org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener : INFO: Loading spell index for spellchecker: default : May 14, 2015 12:48:09 PM : org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener : INFO: Loading spell index
Re: Searcher is opening twice on Reload
Please help. The solr version is 4.8.1 With Regards Aman Tandon On Thu, May 14, 2015 at 12:53 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Please help me here, when I am doing the reload of core, my searcher is being opening twice. I am also attaching the logs output, please suggest me what wrong I am doing here or this is default behavior on reload. May 14, 2015 12:47:38 PM org.apache.solr.spelling.DirectSolrSpellChecker INFO: init: {name=default,field=titlews,classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,accuracy=0.5,maxEdits=1,minPrefix=1,maxInspections=5,minQueryLength=5,maxQueryFrequency=100.0,thresholdTokenFrequency=100.0} May 14, 2015 12:47:38 PM org.apache.solr.handler.component.SpellCheckComponent INFO: No queryConverter defined, using default converter May 14, 2015 12:47:38 PM org.apache.solr.handler.component.QueryElevationComponent INFO: Loading QueryElevation from data dir: elevate.xml May 14, 2015 12:47:38 PM org.apache.solr.handler.ReplicationHandler INFO: Commits will be reserved for 1 May 14, 2015 12:47:38 PM org.apache.solr.core.QuerySenderListener INFO: QuerySenderListener sending requests to Searcher@41dc3c83[IM-Search] main{StandardDirectoryReader(segments_dd4:82296:nrt _jdq(4.8):C5602938/2310052:delGen=3132 _jkq(4.8):C6860454/1398005:delGen=2992 _jx2(4.8):C5237053/1505048:delGen=3241 _joo(4.8):C5825253/1599671:delGen=3323 _k4d(4.8):C5860360/1916531:delGen=3150 _o27(4.8):C5290435/1018865:delGen=370 _mju(4.8):C5074973/1602707:delGen=1474 _jka(4.8):C5172599/1774839:delGen=3202 _nik(4.8):C4698916/1512091:delGen=804 _o8y(4.8):C1137592/521423:delGen=190 _oeu(4.8):C469094/86291:delGen=29 _odq(4.8):C217505/65596:delGen=55 _ogd(4.8):C50454/4155:delGen=5 _oea(4.8):C40833/7192:delGen=37 _ofy(4.8):C73614/7273:delGen=13 _ogx(4.8):C395681/1388:delGen=4 _ogh(4.8):C7676/70:delGen=2 _ohf(4.8):C108769/21:delGen=2 _ogc(4.8):C24435/384:delGen=4 _ogi(4.8):C23088/158:delGen=3 _ogj(4.8):C4217/2:delGen=1 _ohs(4.8):C7 _oh6(4.8):C20509/205:delGen=5 _oh7(4.8):C3171 _oho(4.8):C6/1:delGen=1 _ohq(4.8):C1 _ohv(4.8):C10484/996:delGen=2 _ohx(4.8):C500 _ohy(4.8):C1 _ohz(4.8):C1)} ^[OFMay 14, 2015 12:47:43 PM org.apache.solr.core.SolrCore INFO: [IM-Search] webapp=/solr path=/select params={spellcheck=truelon=0q=qwt=jsonqt=opsview.monitorlat=0rows=0ps=1} hits=6 status=0 QTime=1 May 14, 2015 12:47:44 PM org.apache.solr.core.SolrCore INFO: [IM-Search] webapp=null path=null params={start=0event=firstSearcherq=ricedistrib=falseqt=im.search.intentrows=25} hits=42749 status=0 QTime=5667 May 14, 2015 12:47:58 PM org.apache.solr.request.UnInvertedField INFO: UnInverted multi-valued field {field=city,memSize=209216385,tindexSize=11029,time=3904,phase1=3783,nTerms=77614,bigTerms=3,termInstances=31291566,uses=0} May 14, 2015 12:48:01 PM org.apache.solr.request.UnInvertedField INFO: UnInverted multi-valued field {field=biztype,memSize=208847178,tindexSize=40,time=1318,phase1=1193,nTerms=9,bigTerms=4,termInstances=1607459,uses=0} May 14, 2015 12:48:01 PM org.apache.solr.core.SolrCore INFO: [IM-Search] webapp=null path=null params={start=0event=firstSearcherq=ricedistrib=falseqt=im.searchrows=25} hits=57619 status=0 QTime=17194 May 14, 2015 12:48:04 PM org.apache.solr.core.SolrCore INFO: [IM-Search] webapp=null path=null params={start=0event=firstSearcherq=potassium+cyanidedistrib=falseqt=eto.search.offerrows=20} hits=443 status=0 QTime=3272 May 14, 2015 12:48:09 PM org.apache.solr.core.SolrCore INFO: [IM-Search] webapp=null path=null params={start=0event=firstSearcherq=motor+spare+partsdistrib=falseqt=im.searchfq=attribs:(locprefglobal+locprefnational+locprefcity)rows=20} hits=107297 status=0 QTime=5254 May 14, 2015 12:48:09 PM org.apache.solr.core.QuerySenderListener INFO: QuerySenderListener done. May 14, 2015 12:48:09 PM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener INFO: Loading spell index for spellchecker: default May 14, 2015 12:48:09 PM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener INFO: Loading spell index for spellchecker: wordbreak May 14, 2015 12:48:09 PM org.apache.solr.core.SolrCore INFO: [IM-Search] Registered new searcher Searcher@41dc3c83[IM-Search] main{StandardDirectoryReader(segments_dd4:82296:nrt _jdq(4.8):C5602938/2310052:delGen=3132 _jkq(4.8):C6860454/1398005:delGen=2992 _jx2(4.8):C5237053/1505048:delGen=3241 _joo(4.8):C5825253/1599671:delGen=3323 _k4d(4.8):C5860360/1916531:delGen=3150 _o27(4.8):C5290435/1018865:delGen=370 _mju(4.8):C5074973/1602707:delGen=1474 _jka(4.8):C5172599/1774839:delGen=3202 _nik(4.8):C4698916/1512091:delGen=804 _o8y(4.8):C1137592/521423:delGen=190 _oeu(4.8):C469094/86291:delGen=29 _odq(4.8):C217505/65596:delGen=55 _ogd(4.8):C50454/4155:delGen=5 _oea(4.8):C40833/7192:delGen=37 _ofy(4.8):C73614/7273:delGen=13 _ogx(4.8):C395681/1388:delGen=4 _ogh(4.8):C7676/70:delGen=2 _ohf(4.8):C108769/21:delGen=2 _ogc(4.8
Searcher is opening twice on Reload
:48:53 PM org.apache.solr.core.SolrCore INFO: [IM-Search] Registered new searcher Searcher@49093738[IM-Search] main{StandardDirectoryReader(segments_dd4:82296:nrt _jdq(4.8):C5602938/2310052:delGen=3132 _jkq(4.8):C6860454/1398005:delGen=2992 _jx2(4.8):C5237053/1505048:delGen=3241 _joo(4.8):C5825253/1599671:delGen=3323 _k4d(4.8):C5860360/1916531:delGen=3150 _o27(4.8):C5290435/1018865:delGen=370 _mju(4.8):C5074973/1602707:delGen=1474 _jka(4.8):C5172599/1774839:delGen=3202 _nik(4.8):C4698916/1512091:delGen=804 _o8y(4.8):C1137592/521423:delGen=190 _oeu(4.8):C469094/86291:delGen=29 _odq(4.8):C217505/65596:delGen=55 _ogd(4.8):C50454/4155:delGen=5 _oea(4.8):C40833/7192:delGen=37 _ofy(4.8):C73614/7273:delGen=13 _ogx(4.8):C395681/1388:delGen=4 _ogh(4.8):C7676/70:delGen=2 _ohf(4.8):C108769/21:delGen=2 _ogc(4.8):C24435/384:delGen=4 _ogi(4.8):C23088/158:delGen=3 _ogj(4.8):C4217/2:delGen=1 _ohs(4.8):C7 _oh6(4.8):C20509/205:delGen=5 _oh7(4.8):C3171 _oho(4.8):C6/1:delGen=1 _ohq(4.8):C1 _ohv(4.8):C10484/996:delGen=2 _ohx(4.8):C500 _ohy(4.8):C1 _ohz(4.8):C1)} With Regards Aman Tandon