Re: How To: Debuging the whole indexing process
In production or in test? I assume in test. This level of detail usually implies some sort of Java debugger and java instrumentation enabled. E.g. Chronon, which is commercial but can be tried as a plugin with IntelliJ Idea full version trial. Regards, Alex On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
Re: Ability to load solrcore.properties from zookeeper
Yeah, you could do it like that. But looking at it further, I think solrcore.properties is actually being loaded in entirely the wrong place - it should be done by whatever is creating the CoreDescriptor, and then passed in as a Properties object to the CD constructor. At the moment, you can't refer to a property defined in solrcore.properties within your core.properties file. I'll open a JIRA if Steve hasn't already done so Alan Woodward www.flax.co.uk On 28 May 2015, at 17:57, Chris Hostetter wrote: : certainly didn't intend to write it like this!). The problem here will : be that CoreDescriptors are currently built entirely from : core.properties files, and the CoreLocators that construct them don't : have any access to zookeeper. But they do have access to the CoreContainer which is passed to the CoreDescriptor constructor -- it has all the ZK access you'd need at the time when loadExtraProperties() is called. correct? as fleshed out in my last emil... : patch: IIUC CoreDescriptor.loadExtraProperties is the relevent method ... : it would need to build up the path including the core name and get the : system level resource loader (CoreContainer.getResourceLoader()) to access : it since the core doesn't exist yet so there is no core level : ResourceLoader to use. -Hoss http://www.lucidworks.com/
Re: SolrCloud 4.8.0 - Snapshots directory take a lot of space
bump On Fri, May 8, 2015 at 4:45 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi All, Looking at data directory in my solrcloud cluster I have found a lot of old snapshot directory in Like these: snapshot.20150506003702765 snapshot.20150506003702760 snapshot.20150507002849492 snapshot.20150507002849473 snapshot.20150507002849459 or even a month older. These directories keep really a lot of space, 2 or 3 times the whole index. May I delete these directories? If yes, is there a best practice? -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251 -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: Index optimize runs in background.
I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn
How To: Debuging the whole indexing process
Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
Help for a field in my schema ?
Dear Solr-Users, (SOLR 5.0 Ubuntu) I have xml files with tags like this claimXXYYY where XX is a language code like FR EN DE PT etc... (I don't know the number of language code I can have) and YYY is a number [1..999] i.e.: claimen1 claimen2 claimen3 claimfr1 claimfr2 claimfr3 I would like to define fields named: *claimen* equal to claimenYYY (EN language, all numbers, indexed=true, stored=true) (search needed and must be displayed) *claim *equal to all claimXXYYY (all languages, all numbers, indexed=true, stored false) (search not needed but must be displayed) Is it possible to have these 2 fields ? Could you help me to declare them in my schema.xml ? Thanks a lot for your help ! Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: How to index 20 000 files with a command line ?
Hello Bruno, You can use find command with exec attribute. regards Sergey Friday, May 29, 2015, 3:11:37 PM, you wrote: Dear Solr Users, Habitualy i use this command line to index my files: bin/post -c hbl /data/hbl-201522/*.xml but today I have a big update, so there are 20 000 xml files (each files 1kox150ko) I get this error: Error: bin/post argument too long How could I index the whole directory ? Thanks a lot for your help, Solr 5.0 - Ubuntu Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com -- Best regards, Sergeymailto:ser...@bintime.com
Re: How to index 20 000 files with a command line ?
oh yes like this: find /data/hbl-201522/-name *.xml -exec bin/post -c hbl {} \; ? Le 29/05/2015 14:15, Sergey Shvets a écrit : Hello Bruno, You can use find command with exec attribute. regards Sergey Friday, May 29, 2015, 3:11:37 PM, you wrote: Dear Solr Users, Habitualy i use this command line to index my files: bin/post -c hbl /data/hbl-201522/*.xml but today I have a big update, so there are 20 000 xml files (each files 1kox150ko) I get this error: Error: bin/post argument too long How could I index the whole directory ? Thanks a lot for your help, Solr 5.0 - Ubuntu Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
How to index 20 000 files with a command line ?
Dear Solr Users, Habitualy i use this command line to index my files: bin/post -c hbl /data/hbl-201522/*.xml but today I have a big update, so there are 20 000 xml files (each files 1kox150ko) I get this error: Error: bin/post argument too long How could I index the whole directory ? Thanks a lot for your help, Solr 5.0 - Ubuntu Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: Number of clustering labels to show
Hi, The number of clusters primarily depends on the parameters of the specific clustering algorithm. If you're using the default Lingo algorithm, the number of clusters is governed by the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look at the documentation ( https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings) for some more details (the Tweaking at Query-Time section shows how to pass the specific parameters at request time). A complete overview of the Lingo clustering algorithm parameters is here: http://doc.carrot2.org/#section.component.lingo. Stanislaw -- Stanislaw Osinski, stanislaw.osin...@carrotsearch.com http://carrotsearch.com On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm trying to increase the number of cluster result to be shown during the search. I tried to set carrot.fragSize=20 but only 15 cluster labels is shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels shown. Is this the correct way to do this? I understand that setting it to 20 might not necessary mean 20 lables will be shown, as the setting is for maximum number. But when I set this to 5, it should reduce the number of labels to 5? I'm using Solr 5.1. Regards, Edwin
Re: docValues: Can we apply synonym
Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com : We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to
Re: Index optimize runs in background.
I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii.
Re: docValues: Can we apply synonym
Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com : We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search
Re: Ignoring the Document Cache per query
Thanks Erik. I realize this really makes no sense, but I was looking to work around a problem. Here is the scenario... Using Solr 5.1 we have a service that utilizes the new mlt query parser to get recommendations. So we start up the application, ask for recommendations for a document, and everything works. Another feature is to dislike a document, and once it is disliked it shouldn't show up as a recommended document. It does this by looking up the disliked documents for a user and adding a filter query to the recommendation call which excludes the disliked documents. So now we dislike a document that was in the original list of recommendations above, then ask for the recommendations again, and now we get nothing back. If we restart Solr, or reload the collection, then we can get it to work, but as soon as we dislike another document we get back into a weird state. Through trial and error I narrowed down that if we set the documentCache size to 0, then this problem doesn't happen. Since we can't really figure out why this is happening in Solr, we were hoping there was some way to not use the document cache on the call where we use the mlt query parser. On Thu, May 28, 2015 at 5:44 PM, Erick Erickson erickerick...@gmail.com wrote: First, there isn't that I know of. But why would you want to do this? On the face of it, it makes no sense to ignore the doc cache. One of its purposes is to hold the document (read off disk) for successive search components _in the same query_. Otherwise, each component might have to do a disk seek. So I must be missing why you want to do this. Best, Erick On Thu, May 28, 2015 at 1:23 PM, Bryan Bende bbe...@gmail.com wrote: Is there a way to the document cache on a per-query basis? It looks like theres {!cache=false} for preventing the filter cache from being used for a given query, looking for the same thing for the document cache. Thanks, Bryan
Re: Help for a field in my schema ?
Well yes, but the second doesn't do what you say you want, bq: *claim *equal to all claimXXYYY (all languages, all numbers, indexed=true, stored false) (search not needed but must be displayed) You can search this field, but specifying it in a field list (fl) will return nothing, you need indexed=false and stored=true. But there seems to be a problem here. You say I don't know how many language codes there are, so I'm assuming you want claimen claimfr claimde etc., that you want to search separately. So somewhere on the ingestion side or in a custom update processor (personally I'd do it in a SolrJ program in the ETL pipeline) you need to figure out which of these fields to populate. A dynamic field would work, something like: dynamicField name=claim* type=some text type indexed=true stored=false/ Now, anything that starts with claim will get its own field. Now, a copyField from claim* to display_claim (indexed=false, stored=true) will show the contents. But the problem here is that all your different languages get the same analysis applied so you can't do, say, language-specific stemming. If all your languages are Western, you might be able to use one of the folding filters to ignore diacritics etc and get good enough results. There is no need to store these twice, so the searchable forms should have stored=false, just always specify display_claim in your fl list. Best, Erick On Fri, May 29, 2015 at 5:27 AM, Bruno Mannina bmann...@free.fr wrote: Dear Solr-Users, (SOLR 5.0 Ubuntu) I have xml files with tags like this claimXXYYY where XX is a language code like FR EN DE PT etc... (I don't know the number of language code I can have) and YYY is a number [1..999] i.e.: claimen1 claimen2 claimen3 claimfr1 claimfr2 claimfr3 I would like to define fields named: *claimen* equal to claimenYYY (EN language, all numbers, indexed=true, stored=true) (search needed and must be displayed) *claim *equal to all claimXXYYY (all languages, all numbers, indexed=true, stored false) (search not needed but must be displayed) Is it possible to have these 2 fields ? Could you help me to declare them in my schema.xml ? Thanks a lot for your help ! Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
user interface
Hi, My name is Mustafa. I'm a master student at YTU in Turkey. I am doing a crawler for Voip problem for my job and scholl. I want to configure Solr's user interface. For example, I want to add an image or add a comment on user interface? I searched about it but could't find a good result. Could you help me, Best Regards. Mustafa KIZILDAĞ
Re: CLUSTERSTATUS timeout
I'm also getting this error with 5.1.0 and a 27 shard setup. null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:740) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:692) at org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:1042) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:259) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:783) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:282) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Just another data point. -Joe On 12/17/2014 8:44 AM, adfel70 wrote: Hi Jonathan, We are having the exact same problem with Solr 4.8.0. Did you manage to resolve this one? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/CLUSTERSTATUS-timeout-tp4173224p4174741.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: When is too many fields in qf is too many?
Before giving up, I might try a copyTo fields per field group and see how that works. Won't that get you down to 10-20 fields per query and be stable wrt view changes? But Solr is column oriented, in that the core query logic is a scatter/gather over qf list. Perhaps there is a reason qf does not support wildcards. Not sure. But it seems likely. That said, having thousands of columns is not weird at all in some applications. You might be better served with a product oriented to this type of usage. Maybe HBASE? -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: Thursday, May 28, 2015 5:59 PM To: solr-user@lucene.apache.org Subject: Re: When is too many fields in qf is too many? Hi Folks, First, thanks for taking the time to read and reply to this subject, it is much appreciated, I have yet to come up with a final solution that optimizes Solr. To give you more context, let me give you the big picture of how the application and the database is structured for which I'm trying to enable Solr search on. Application: Has the concept of views. A view contains one or more object types. An object type may exist in any view. An object type has one or more field groups. A field group has a set of fields. A field group can be used with any object type of any view. Notice how field groups are free standing, that they can be linked to an object type of any view? Here is a diagram of the above: FieldGroup-#1 == Field-1, Field-2, Field-5, etc. FieldGroup-#2 == Field-1, Field-5, Field-6, Field-7, Field-8, etc. FieldGroup-#3 == Field-2, Field-5, Field-8, etc. View-#1 == ObjType-#2 (using FieldGroup-#1 #3) + ObjType-#4 (using FieldGroup-#1) + ObjType-#5 (using FieldGroup-#1, #2, #3, etc). View-#2 == ObjType-#1 (using FieldGroup-#3, #15, #16, #19, etc.) + ObjType-#4 (using FieldGroup-#1, #4, #19, etc.) + etc. View-#3 == ObjType-#1 (using FieldGroup-#1, #8) + etc. Do you see where this is heading? To make it even a bit more interesting, ObjType-#4 (which is in view-#1 and #2 per the above) which in both views, it uses FieldGroup-#1, in one view it can be configured to have its own fields off FieldGroup-#1. With the above setting, a user is assigned a view and can be moved around views but cannot be in multiple views at the same time. Based on which view that user is in, that user will see different fields of ObjType-#1 (the example I gave for FieldGroup-#1) or even not see an object type that he was able to see in another view. If I have not lost you with the above, you can see that per view, there can be may fields. To make it even yet more interesting, a field in FieldGroup-#1 may have the exact same name as a field in another FieldGroup and the two could be of different type (one is date, the other is string type). Thus when I build my Solr doc object (and create list of Solr fields) those fields must be prefixed with the FieldGroup name otherwise I could end up overwriting the type of another field. Are you still with me? :-) Now you see how a view can end up with many fields (over 3500 in my case), but a doc I post to Solr for indexing will have on average 50 fields, worse case maybe 200 fields. This is fine, and it is not my issue but I want to call it out to get it out of our way. Another thing I need to mention is this (in case it is not clear from the above). Users create and edit records in the DB by an instance of ObjType-#N. Those object types that are created do NOT belong to a view, in fact they do NOT have any view concept in them. They simply have the concept of what fields the user can see / edit based on which view that user is in. In effect, in the DB, we have instances of object types data. One last thing I should point out is that views, and field groups are dynamic. This month, View-#3 may have ObjType-#1, but next month it may not or a new object type may be added to it. Still with me? If so, you are my hero!! :-) So, I setup my Solr schema.xml to include all fields off each field group that exists in the database like so: field name=FieldGroup-1.Headline type=text multiValued=true indexed=true stored=false required=false/ field name=FieldGroup-1.Summary type=text multiValued=true indexed=true stored=false required=false/ field name=FieldGroup-1. ... ... ... ... / field name=FieldGroup-2.Headline type=text multiValued=true indexed=true stored=false required=false/ field name=FieldGroup-2.Summary type=text multiValued=true indexed=true stored=false required=false/ field name=FieldGroup-2.Date type=text multiValued=true indexed=true stored=false required=false/ field name=FieldGroup-2. ... ... ... ... / field name=FieldGroup-3. ... ... ... ... / field name=FieldGroup-4. ... ... ... ... / You got the idea. Each record of an object type I index contains ALL the fields off that that object type REGARDLESS which view that object type is set to
RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting
Thanks, Erick. I appreciate the sanity check. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, May 28, 2015 5:50 PM To: solr-user@lucene.apache.org Subject: Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting Charles: You raise good points, and I didn't mean to say that co-locating docs due to some critera was never a good idea. That said, it does add administrative complexity that I'd prefer to avoid unless necessary. I suppose it largely depends on what the load and response SLAs are. If there's 1 query/second peak load, the sharding overhead for queries is probably not noticeable. If there are 1,000 QPS, then it might be worth it. Measure, measure, measure.. I think your composite ID understanding is fine. Best, Erick On Thu, May 28, 2015 at 1:40 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: We have used a similar sharding strategy for exactly the reasons you say. But we are fairly certain that the # of documents per user ID is 5000 and, typically, 500. Thus, we think the overhead of distributed searches clearly outweighs the benefits. Would you agree? We have done some load testing (with 100's of simultaneous users) and performance has been good with data and queries distributed evenly across shards. In Matteo's case, this model appears to apply well to user types B and C. Not sure about user type A, though.At 100,000 docs per user per year, on average, that load seems ok for one node. But, is it enough to benefit significantly from a parallel search? With a 2 part composite ID, each part will contribute 16 bits to a 32 bit hash value, which is then compared to the set of hash ranges for each active shard. Since the user ID will contribute the high-order bytes, it will dominate in matching the target shard(s). But dominance doesn't mean the lower order 16 bits will always be ignored, does it? I.e. if the original shard has been split, perhaps multiple times, isn't it possible that one user IDs documents will be spread over a multiple shards? In Matteo's case, it might make sense to specify fewer bits to the user ID for user category A. I.e. what I described above is the default for userId!docId. But if you use userId/8!docId/24 (8 bits for userId and 24 bits for the document ID), then couldn't one user's docs might be split over multiple shards, even without splitting? I'm just making sure I understand how composite ID sharding works correctly. Have I got it right? Has any of this logic changed in 5.x? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, May 21, 2015 11:30 AM To: solr-user@lucene.apache.org Subject: Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting I question your base assumption: bq: So shard by document producer seems a good choice Because what this _also_ does is force all of the work for a query onto one node and all indexing for a particular producer ditto. And will cause you to manually monitor your shards to see if some of them grow out of proportion to others. And I think it would be much less hassle to just let Solr distribute the docs as it may based on the uniqueKey and forget about it. Unless you want, say, to do joins etc There will, of course, be some overhead that you pay here, but unless you an measure it and it's a pain I wouldn't add the complexity you're talking about, especially at the volumes you're talking. Best, Erick On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com wrote: Hi I'd like some feedback on how I'd like to solve the following sharding problem I have a collection that will eventually become big Average document size is 1.5kb Every year 30 Million documents will be indexed Data come from different document producers (a person, owner of his documents) and queries are almost always performed by a document producer who can only query his own document. So shard by document producer seems a good choice there are 3 types of doc producer type A, cardinality 105 (there are 105 producers of this type) produce 17M docs/year (the aggregated production af all type A producers) type B cardinality ~10k produce 4M docs/year type C cardinality ~10M produce 9M docs/year I'm thinking about use compositeId ( solrDocId = producerId!docId ) to send all docs of the same producer to the same shards. When a shard becomes too large I can use shard splitting. problems -documents from type A producers could be oddly distributed among shards, because hashing doesn't work well on small numbers (105) see Appendix As a solution I could do this when a new typeA producer (producerA1) arrives: 1) client app: generate a producer code 2) client app: simulate murmurhashing
Re: Ignoring the Document Cache per query
This is totally weird. The document cache should really have nothing to do with whether MLT returns documents or not AFAIK. So either I'm totally misunderstanding MLT, you're leaving out a step or there's some bug in Solr. The fact that setting the document cache to 0 changes the behavior, or restarting Solr and submitting the exact same request gives different behavior is strong evidence it's a problem with Solr. Could I ask you to open a JIRA and add all the relevant details you can? Especially if you could get it to work (well actually fail) with the techproducts data. But barring that, the (perhaps sanitized) queries you send to get diff results before and after. Best, Erick On Fri, May 29, 2015 at 7:10 AM, Bryan Bende bbe...@gmail.com wrote: Thanks Erik. I realize this really makes no sense, but I was looking to work around a problem. Here is the scenario... Using Solr 5.1 we have a service that utilizes the new mlt query parser to get recommendations. So we start up the application, ask for recommendations for a document, and everything works. Another feature is to dislike a document, and once it is disliked it shouldn't show up as a recommended document. It does this by looking up the disliked documents for a user and adding a filter query to the recommendation call which excludes the disliked documents. So now we dislike a document that was in the original list of recommendations above, then ask for the recommendations again, and now we get nothing back. If we restart Solr, or reload the collection, then we can get it to work, but as soon as we dislike another document we get back into a weird state. Through trial and error I narrowed down that if we set the documentCache size to 0, then this problem doesn't happen. Since we can't really figure out why this is happening in Solr, we were hoping there was some way to not use the document cache on the call where we use the mlt query parser. On Thu, May 28, 2015 at 5:44 PM, Erick Erickson erickerick...@gmail.com wrote: First, there isn't that I know of. But why would you want to do this? On the face of it, it makes no sense to ignore the doc cache. One of its purposes is to hold the document (read off disk) for successive search components _in the same query_. Otherwise, each component might have to do a disk seek. So I must be missing why you want to do this. Best, Erick On Thu, May 28, 2015 at 1:23 PM, Bryan Bende bbe...@gmail.com wrote: Is there a way to the document cache on a per-query basis? It looks like theres {!cache=false} for preventing the filter cache from being used for a given query, looking for the same thing for the document cache. Thanks, Bryan
Re: Deleting Fields
On 5/29/2015 5:08 PM, Joseph Obernberger wrote: Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? I'm not aware of a way to delete a field. I may have a different definition of what a field is than you do, though. Solr lets you delete entire documents, but deleting a field from the entire index would involve re-indexing every document in the index, excluding that field. Can you be more specific about exactly what you are doing, what you are seeing, and what you want to see instead? Also, please be aware of this: http://people.apache.org/~hossman/#threadhijack Thanks, Shawn
Re: How to setup solr in cluster
You really have to tell us more about what you mean. You have two problems to solve 1 putting Solr on all the nodes and starting/stopping it. Puppet or Chef help here, although it's perfectly possible to do this manually. 2 creating collecitons etc. For this you just need all your Solr instances communicating with the Zookeeper you have set up. So tell us what you have tried and what you are having problems with and perhaps we can offer more specific suggestions. Best, Erick On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote: Hi All, I am trying to setup solr on a cluster with 16 nodes. Only documentation I could find, talks about a local cluster which behaves like a real cluster. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I read about using tools like Chef or Puppet to configure solr on production level cluster. Does this group has any suggestion about what is the best way to set it up? Thanks Sumit Purohit
RE: How to setup solr in cluster
Thanks for the reply. I have tried example cloud setup using the link I mentioned. I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the node. That’s when I find out about Chef and Puppet. My problem is manually setting and start/stop solr does not seem that efficient to me and I wanted to sought community's suggestion. Thanks Sumit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 29, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Re: How to setup solr in cluster You really have to tell us more about what you mean. You have two problems to solve 1 putting Solr on all the nodes and starting/stopping it. Puppet or Chef help here, although it's perfectly possible to do this manually. 2 creating collecitons etc. For this you just need all your Solr instances communicating with the Zookeeper you have set up. So tell us what you have tried and what you are having problems with and perhaps we can offer more specific suggestions. Best, Erick On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote: Hi All, I am trying to setup solr on a cluster with 16 nodes. Only documentation I could find, talks about a local cluster which behaves like a real cluster. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+ SolrCloud I read about using tools like Chef or Puppet to configure solr on production level cluster. Does this group has any suggestion about what is the best way to set it up? Thanks Sumit Purohit
Re: docValues: Can we apply synonym
Hi Upayavira, How the copyField will help in my scenario when I have to add the synonym in docValue enable field. With Regards Aman Tandon On Sat, May 30, 2015 at 1:18 AM, Upayavira u...@odoko.co.uk wrote: Use copyField to clone the field for faceting purposes. Upayavira On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote: Hi Erick, Thanks for suggestion, We are this query parser plugin ( *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word synonym. So it does work slower than edismax that's why it is not in contrib right? (I am asking this question because we are using for all our searches to handle 10 multiword ice cube, icecube etc) *Moreover I thought a solution for this docValue problem* I need to make city field as *multivalued* and by this I mean i will add the synonym (*mumbai, bombay*) as an extra value to that field if present. Now searching operation will work fine as before. *field name=citymumbai/fieldfield name=citybombay/field* The only prob is if we have to remove the 'city alias/synonym facets' when we are providing results to the clients. *mumbai, 1000* With Regards Aman Tandon On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com wrote: Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as
RE: How to setup solr in cluster
Sorry for this second email, but another problem of mine is : when I copy sorl folder on each node and start them, should I run it as 1 node cluster on each node and use the same name for collection ? OR I have to create individual Shard on each node. Thanks for your help. Thanks sumit -Original Message- From: Purohit, Sumit Sent: Friday, May 29, 2015 5:10 PM To: solr-user@lucene.apache.org Subject: RE: How to setup solr in cluster Thanks for the reply. I have tried example cloud setup using the link I mentioned. I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the node. That’s when I find out about Chef and Puppet. My problem is manually setting and start/stop solr does not seem that efficient to me and I wanted to sought community's suggestion. Thanks Sumit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 29, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Re: How to setup solr in cluster You really have to tell us more about what you mean. You have two problems to solve 1 putting Solr on all the nodes and starting/stopping it. Puppet or Chef help here, although it's perfectly possible to do this manually. 2 creating collecitons etc. For this you just need all your Solr instances communicating with the Zookeeper you have set up. So tell us what you have tried and what you are having problems with and perhaps we can offer more specific suggestions. Best, Erick On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote: Hi All, I am trying to setup solr on a cluster with 16 nodes. Only documentation I could find, talks about a local cluster which behaves like a real cluster. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+ SolrCloud I read about using tools like Chef or Puppet to configure solr on production level cluster. Does this group has any suggestion about what is the best way to set it up? Thanks Sumit Purohit
How to setup solr in cluster
Hi All, I am trying to setup solr on a cluster with 16 nodes. Only documentation I could find, talks about a local cluster which behaves like a real cluster. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I read about using tools like Chef or Puppet to configure solr on production level cluster. Does this group has any suggestion about what is the best way to set it up? Thanks Sumit Purohit
Deleting Fields
Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? Thank you! -Joe
Re: Deleting Fields
Thank you Shawn - I'm referring to fields in the schema. With Solr 5, you can delete fields from the schema. https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField -Joe On 5/29/2015 7:30 PM, Shawn Heisey wrote: On 5/29/2015 5:08 PM, Joseph Obernberger wrote: Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? I'm not aware of a way to delete a field. I may have a different definition of what a field is than you do, though. Solr lets you delete entire documents, but deleting a field from the entire index would involve re-indexing every document in the index, excluding that field. Can you be more specific about exactly what you are doing, what you are seeing, and what you want to see instead? Also, please be aware of this: http://people.apache.org/~hossman/#threadhijack Thanks, Shawn
Re: How to setup solr in cluster
None of the above. You simply start Solr on each node, then use the Collections API to create your collection. Solr will taker care of creating the individual replicas on each of the nodes with respect to the parameters you pass to the CREATE command. Best, Erick On Fri, May 29, 2015 at 5:46 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote: Sorry for this second email, but another problem of mine is : when I copy sorl folder on each node and start them, should I run it as 1 node cluster on each node and use the same name for collection ? OR I have to create individual Shard on each node. Thanks for your help. Thanks sumit -Original Message- From: Purohit, Sumit Sent: Friday, May 29, 2015 5:10 PM To: solr-user@lucene.apache.org Subject: RE: How to setup solr in cluster Thanks for the reply. I have tried example cloud setup using the link I mentioned. I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the node. That’s when I find out about Chef and Puppet. My problem is manually setting and start/stop solr does not seem that efficient to me and I wanted to sought community's suggestion. Thanks Sumit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 29, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Re: How to setup solr in cluster You really have to tell us more about what you mean. You have two problems to solve 1 putting Solr on all the nodes and starting/stopping it. Puppet or Chef help here, although it's perfectly possible to do this manually. 2 creating collecitons etc. For this you just need all your Solr instances communicating with the Zookeeper you have set up. So tell us what you have tried and what you are having problems with and perhaps we can offer more specific suggestions. Best, Erick On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote: Hi All, I am trying to setup solr on a cluster with 16 nodes. Only documentation I could find, talks about a local cluster which behaves like a real cluster. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+ SolrCloud I read about using tools like Chef or Puppet to configure solr on production level cluster. Does this group has any suggestion about what is the best way to set it up? Thanks Sumit Purohit
Re: Deleting Fields
Yes, but deleting fields from the schema only means that _future_ documents will throw an undefined field error. All the documents currently in the index will retain that field. Why you're hitting an OOM is a mystery though. But delete field isn't removing the contents if indexed documents. Showing us the full stack when you hit an OOM would be helpful. Best, Erick On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger j...@lovehorsepower.com wrote: Thank you Shawn - I'm referring to fields in the schema. With Solr 5, you can delete fields from the schema. https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField -Joe On 5/29/2015 7:30 PM, Shawn Heisey wrote: On 5/29/2015 5:08 PM, Joseph Obernberger wrote: Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? I'm not aware of a way to delete a field. I may have a different definition of what a field is than you do, though. Solr lets you delete entire documents, but deleting a field from the entire index would involve re-indexing every document in the index, excluding that field. Can you be more specific about exactly what you are doing, what you are seeing, and what you want to see instead? Also, please be aware of this: http://people.apache.org/~hossman/#threadhijack Thanks, Shawn
Re: docValues: Can we apply synonym
Hi Erick, Thanks for suggestion, We are this query parser plugin ( *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word synonym. So it does work slower than edismax that's why it is not in contrib right? (I am asking this question because we are using for all our searches to handle 10 multiword ice cube, icecube etc) *Moreover I thought a solution for this docValue problem* I need to make city field as *multivalued* and by this I mean i will add the synonym (*mumbai, bombay*) as an extra value to that field if present. Now searching operation will work fine as before. *field name=citymumbai/fieldfield name=citybombay/field* The only prob is if we have to remove the 'city alias/synonym facets' when we are providing results to the clients. *mumbai, 1000* With Regards Aman Tandon On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com wrote: Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com : Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue,
Re: user interface
Which user interface? Do you mean the admin UI? Or perhaps /browse? — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 29, 2015, at 1:34 PM, Mustafa KIZILDAĞ mustafakizilda...@gmail.com wrote: Hi, My name is Mustafa. I'm a master student at YTU in Turkey. I am doing a crawler for Voip problem for my job and scholl. I want to configure Solr's user interface. For example, I want to add an image or add a comment on user interface? I searched about it but could't find a good result. Could you help me, Best Regards. Mustafa KIZILDAĞ
Re: How To: Debuging the whole indexing process
Thanks Alex, yes it for my testing to understand the code/process flow actually. Any other ideas. With Regards Aman Tandon On Fri, May 29, 2015 at 12:48 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: In production or in test? I assume in test. This level of detail usually implies some sort of Java debugger and java instrumentation enabled. E.g. Chronon, which is commercial but can be tried as a plugin with IntelliJ Idea full version trial. Regards, Alex On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I want to debug the whole indexing process, the life cycle of indexing process (each and every function call by going via function to function), from the posting of the data.xml to creation of various index files ( _fnm, _fdt, etc ). So how/what should I setup and start, please help. I will be thankful to you. *add doc field name=title![CDATA[Aman Tandon]]/field field name=job_role![CDATA[Search Engineer]]/field* * /doc/add* With Regards Aman Tandon
Re: docValues: Can we apply synonym
Use copyField to clone the field for faceting purposes. Upayavira On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote: Hi Erick, Thanks for suggestion, We are this query parser plugin ( *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word synonym. So it does work slower than edismax that's why it is not in contrib right? (I am asking this question because we are using for all our searches to handle 10 multiword ice cube, icecube etc) *Moreover I thought a solution for this docValue problem* I need to make city field as *multivalued* and by this I mean i will add the synonym (*mumbai, bombay*) as an extra value to that field if present. Now searching operation will work fine as before. *field name=citymumbai/fieldfield name=citybombay/field* The only prob is if we have to remove the 'city alias/synonym facets' when we are providing results to the clients. *mumbai, 1000* With Regards Aman Tandon On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com wrote: Do take time for performance testing with that parser. It can be slow depending on your data as I remember. That said it solves the problem it set out to solve so if it meets your SLAs, it can be a life-saver. Best, Erick On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Even if a little bit outdated, that query parser is really really cool to manage synonyms ! +1 ! 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks chris. Yes we are using it for handling multiword synonym problem. With Regards Aman Tandon On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Again, I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:42 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 2015 6:12 PM To: solr-user@lucene.apache.org Subject: RE: docValues: Can we apply synonym But the query analysis isn't on a specific field, it is applied to the query string. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Wednesday, May 27, 2015 6:08 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Hi Charles, The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommend using Nolan Lawson's SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, less filling. http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ We found this to fix synonyms like ny for New York and vice versa. Haven't tried it with docValues, tho. -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Tuesday, May 26, 2015 11:15 PM To: solr-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which
Re: [solr 5.1] Looking for full text + collation search field
On 5/21/15, 5:19 AM, Björn Keil wrote: Thanks for the advice. I have tried the field type and it seems to do what it is supposed to in combination with a lower case filter. However, that raises another slight problem: German umlauts are supposed to be treated slightly different for the purpose of searching than for sorting. For sorting a normal ICUCollationField with standard rules should suffice*, for the purpose of searching I cannot just replace an ü with a u, ü is supposed to equal ue, or, in terms of RuleBasedCollators, there is a secondary difference. I haven't used this personally but GermanNormalizationFilter seems to do the job https://lucene.apache.org/core/5_1_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html