Re: improving search response time
I am using spellchecker in the query part. Now my search time has become more. say initiallly it was 1000ms now its 3000ms.I have data index of size 9GB. My query http://localhost:8983/solr/spellCheckCompRH/?q= http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on How can i improve the search time. i have 1) Fedora 11 as OS 2) Solr run on Jetty Server 3) Front page (search page) is on Tomcat 6 4)Index size is 9GB 5)RAM is 1GB - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2125220.html Sent from the Solr - User mailing list archive at Nabble.com.
Explanation of the different caches.
Hi, I want to do a quickdirt load testing - but all my results are cached. I commented out all the Solr caches - but still everything is cached. * Can the caching come from the 'Field Collapsing Cache'. -- although I don't see this element in my config file. ( As the system now jumps from 1GB to 7 GB of RAM when I do a load test with lots of queries ). * Can it be a Lucence cache? +-+ I want to lower the caches so they cache only some 100 or 1000 documents. ( Right now - when I do 50 000 unique queries Solr will use 7 GB of RAM and everything fits in some cache! ) Any suggestions how I could proper stress test my Solr - with a small number of queries (some 100 - not in the millions as some testers have)?
Re: Dismax score - maximu of any one field?
Also take a look at debugQuery=on output. It takes a while to decipher what this is telling you, but it'll let you know exactly. Best Erick On Mon, Dec 20, 2010 at 5:37 AM, Jason Brown jason.br...@sjp.co.uk wrote: Can anyone tell me hoe the dismax score is computed? Is it the maximum score for any of the component fields that are searched? Thank You. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Consequences for using multivalued on all fields
In our application we use dynamic fields and there can be about 50 of them and there can be up to 100 million documents. Are there any disadvantages having multivalued=true on all fields in the schema? An admin of the application can specify dynamic fields and if they should be indexed or stored. Question is if we gain anything by letting them to choose multivalued as well or if it just adds complexity to the user interface? Thanks, Tim
Re: Consequences for using multivalued on all fields
I have about 30 million documents and with the exception of the Unique ID, Type and a couple of date fields, every document is made of dynamic fields. Now, I only have maybe 1 in 5 being multi-value, but search and facet performance doesn't look appreciably different from a fixed schema solution. I don't do some of the fancier things, highlighting, spell check, etc. And I use a lot more string or lowercase field types than I do Text (so not as many fully tokenized fields), that probably helps with performance. The only disadvantage I know of is dealing with field names at runtime. Depending on your architecture, you don't really know what your document looks like until you have it in a result set. For what I'm doing, that isn't a problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Consequences for using multivalued on all fields
Someone please correct me if I am wrong, but as far as I am aware index format is identical in either case. One benefit of allowing one to specify a field as single-valued is similar to specifying that a field is required: Providing a safeguard that index data conforms to requirements. So making all fields multivalued forgoes that integrity check for fields which by definition should be singular. Also depending on the response writer and for the XMLResponseWriter the requested response version (see http://wiki.apache.org/solr/XMLResponseFormat) the multi-valued setting can determine whether the document values returned from a query will be scalars (eg. str name=year2010/str) or arrays of scalars (arr name=yearstr2010/str/arr), regardless of how many values are actually stored. But the most significant gotcha of not specifying the actual arity (1 or N) arises if any of those fields is used for field-faceting: By default the field-faceting logic chooses a different algorithm depending on whether the field is multi-valued, and the default choice for multi-valued is only appropriate for a small set of enumerated values since it creates a filter query for each value in the set. And this can have a profound effect on Solr memory utilization. So if you are not relying on the field arity setting to select the algorithm, you or your users might need to specify it explicitly with the f.field.facet.method argument; see http://wiki.apache.org/solr/SolrFacetingOverview for more info. So while all-multivalued isn't a showstopper, if it were up to me I'd want to give users the option to specify arity and whether the field is required. - J.J. At 2:13 PM +0100 12/21/10, Tim Terlegård wrote: In our application we use dynamic fields and there can be about 50 of them and there can be up to 100 million documents. Are there any disadvantages having multivalued=true on all fields in the schema? An admin of the application can specify dynamic fields and if they should be indexed or stored. Question is if we gain anything by letting them to choose multivalued as well or if it just adds complexity to the user interface? Thanks, Tim
RE: Explanation of the different caches.
Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote: I want to do a quickdirt load testing - but all my results are cached. I commented out all the Solr caches - but still everything is cached. * Can the caching come from the 'Field Collapsing Cache'. -- although I don't see this element in my config file. ( As the system now jumps from 1GB to 7 GB of RAM when I do a load test with lots of queries ). If you allow the JVM to use a maximum of 7GB heap, it is not that surprising that it allocates it when you hammer the searcher. Whether the heap is used for caching or just filled with dead object waiting for garbage collection is hard to say at this point. Try lowering the maximum heap to 1 GB and do your testing again. Also note that Lucene/Solr performance on conventional harddisks benefits a lot from disk caching: If you perform the same search more than one time, the speed will increase significantly as relevant parts of the index will (probably) be in RAM. Remember to flush your disk cache between tests.
Re: Explanation of the different caches.
I am aware of the power of the caches. I do not want to completely remove the caches - I want them to be small. - So I can launch a stress test with small amount of data. ( Some items may come from cache - some need to be searched up - right now everything comes from the cache... ) 2010/12/21 Toke Eskildsen t...@statsbiblioteket.dk: Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote: I want to do a quickdirt load testing - but all my results are cached. I commented out all the Solr caches - but still everything is cached. * Can the caching come from the 'Field Collapsing Cache'. -- although I don't see this element in my config file. ( As the system now jumps from 1GB to 7 GB of RAM when I do a load test with lots of queries ). If you allow the JVM to use a maximum of 7GB heap, it is not that surprising that it allocates it when you hammer the searcher. Whether the heap is used for caching or just filled with dead object waiting for garbage collection is hard to say at this point. Try lowering the maximum heap to 1 GB and do your testing again. Also note that Lucene/Solr performance on conventional harddisks benefits a lot from disk caching: If you perform the same search more than one time, the speed will increase significantly as relevant parts of the index will (probably) be in RAM. Remember to flush your disk cache between tests.
backup of Index or Snapshoot ?
Hello. Iam working with the shell-scripts for solr for performing a snapshot of the index. to do a snapshot is really easy and works fine. but how can i install a snaposhot for multi-cores. i wrote a little script wich install each snapshot for each core: cd $HOME_DIR/solr/bin ./snapinstaller -M http://localhost:$PORT/solr/core -S $DATA_DIR/payment -d $DATA_DIR/core so. but when i start this command comes ssh cannot connect to localhost. why is it not possible to set the port in this sript !?!? -- eg: -p 8983 it works, but why ? i want no errors by using this script ... -- View this message in context: http://lucene.472066.n3.nabble.com/backup-of-Index-or-Snapshoot-tp2126417p2126417.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: improving search response time
On 12/21/2010 3:02 AM, Anurag wrote: I am using spellchecker in the query part. Now my search time has become more. say initiallly it was 1000ms now its 3000ms.I have data index of size 9GB. My query http://localhost:8983/solr/spellCheckCompRH/?q= http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on How can i improve the search time. i have 1) Fedora 11 as OS 2) Solr run on Jetty Server 3) Front page (search page) is on Tomcat 6 4)Index size is 9GB 5)RAM is 1GB Install more memory. 8GB would be a good place to be, more would let you fit your entire index into RAM for incredible speed. Once you get above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires 64-bit processors. If your index is growing, you might want to have more memory than that. Shawn
Re: Consequences for using multivalued on all fields
You should be aware that the behavior of sorting on a multi-valued field is undefined. After all, which of the multiple values should be used for sorting? So if you need sorting on the field, you shouldn't make it multi-valued. Geert-Jan 2010/12/21 J.J. Larrea j...@panix.com Someone please correct me if I am wrong, but as far as I am aware index format is identical in either case. One benefit of allowing one to specify a field as single-valued is similar to specifying that a field is required: Providing a safeguard that index data conforms to requirements. So making all fields multivalued forgoes that integrity check for fields which by definition should be singular. Also depending on the response writer and for the XMLResponseWriter the requested response version (see http://wiki.apache.org/solr/XMLResponseFormat) the multi-valued setting can determine whether the document values returned from a query will be scalars (eg. str name=year2010/str) or arrays of scalars (arr name=yearstr2010/str/arr), regardless of how many values are actually stored. But the most significant gotcha of not specifying the actual arity (1 or N) arises if any of those fields is used for field-faceting: By default the field-faceting logic chooses a different algorithm depending on whether the field is multi-valued, and the default choice for multi-valued is only appropriate for a small set of enumerated values since it creates a filter query for each value in the set. And this can have a profound effect on Solr memory utilization. So if you are not relying on the field arity setting to select the algorithm, you or your users might need to specify it explicitly with the f.field.facet.method argument; see http://wiki.apache.org/solr/SolrFacetingOverview for more info. So while all-multivalued isn't a showstopper, if it were up to me I'd want to give users the option to specify arity and whether the field is required. - J.J. At 2:13 PM +0100 12/21/10, Tim Terlegård wrote: In our application we use dynamic fields and there can be about 50 of them and there can be up to 100 million documents. Are there any disadvantages having multivalued=true on all fields in the schema? An admin of the application can specify dynamic fields and if they should be indexed or stored. Question is if we gain anything by letting them to choose multivalued as well or if it just adds complexity to the user interface? Thanks, Tim
Re: Consequences for using multivalued on all fields
Thanks you for the input. You might have seen my posts about doing a flexible schema for derived objects. Sounds like dynamic fields might be the ticket. We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post a comment about it whn it gets there. I don't know if I would gain anything, but I think that ALL boolean that were NOT in the base object but wehre in the derived objects could be put into one field and textually positioned key:pairs, at least for searh purposes. Since the derived object would have it's own, additional methods, one of those methods could be to 'unserialize' the 'boolean column'. In fact, that could be a base object function - Empty boolean column values just end up not populating any extra base object attiributes. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: kenf_nc ken.fos...@realestate.com To: solr-user@lucene.apache.org Sent: Tue, December 21, 2010 6:07:51 AM Subject: Re: Consequences for using multivalued on all fields I have about 30 million documents and with the exception of the Unique ID, Type and a couple of date fields, every document is made of dynamic fields. Now, I only have maybe 1 in 5 being multi-value, but search and facet performance doesn't look appreciably different from a fixed schema solution. I don't do some of the fancier things, highlighting, spell check, etc. And I use a lot more string or lowercase field types than I do Text (so not as many fully tokenized fields), that probably helps with performance. The only disadvantage I know of is dealing with field names at runtime. Depending on your architecture, you don't really know what your document looks like until you have it in a result set. For what I'm doing, that isn't a problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: improving search response time
Thanks a lot! you mean i have to increase the resources. 1.Can the distributed search improve the speed.? 2.I have read from some thread that spellchecker takes time.Is spellchecker is one of the curlprit for more response time? On Tue, Dec 21, 2010 at 10:20 PM, Shawn Heisey-4 [via Lucene] ml-node+2126869-977261384-146...@n3.nabble.comml-node%2b2126869-977261384-146...@n3.nabble.com wrote: On 12/21/2010 3:02 AM, Anurag wrote: I am using spellchecker in the query part. Now my search time has become more. say initiallly it was 1000ms now its 3000ms.I have data index of size 9GB. My query http://localhost:8983/solr/spellCheckCompRH/?q= http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on How can i improve the search time. i have 1) Fedora 11 as OS 2) Solr run on Jetty Server 3) Front page (search page) is on Tomcat 6 4)Index size is 9GB 5)RAM is 1GB Install more memory. 8GB would be a good place to be, more would let you fit your entire index into RAM for incredible speed. Once you get above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires 64-bit processors. If your index is growing, you might want to have more memory than that. Shawn -- View message @ http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2126869.html To unsubscribe from improving search response time, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1204491code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwxMjA0NDkxfC0yMDk4MzQ0MTk2. -- Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2127198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Case Insensitive sorting while preserving case during faceted search
: I am trying to do a facet search and sort the facet values too. ... : Then I followed the sample example schema.xml, created a copyField of type ... : fieldType name=alphaOnlySort class=solr.TextField : sortMissingLast=true omitNorms=true ... : But the sorted facet values dont have their case preserved anymore. : : How can I get around this? Did you look at how/why/when alphaOnlySort is used in the example? The FAQ entry you refered to address almost the exact same scnerio with wanting to search/sort on the same data... http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F ...the simplest thing to do is to use copyField to index a second version of your field using the StrField class. So have one version of your field using StrField that you facet on, and copyField that to another version (using TextField and KeywordTokenizer) that you sort on. -Hoss
Faceting memory requirements
Dear all, I have created an index with aprox. 1.1 billion of documents (around 500GB) running on Solr 1.4.1. (64 bit JVM). I want to enable faceted navigation on am int field, which contains around 250 unique values. According to the wiki there are two methods: facet.method=fc which uses field cache. This method should use MaxDoc*4 bytes of memory which is around: 4.1GB. facet.method=enum which crated a bitset for each unique value. This method should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB. Are my calculations correct? My memory settings in Tomcat (windows) are: Initial memory pool: 4096 MB Maximum memory pool: 8192 MB (total 12GB in my test machine) I have tried to run a query (...facet=truefacet.field=PublisherIdfacet.method=fc) but I am still getting OOM: HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at ... Any idea what am I doing wrong, or have I miscalculated the memory requirements? Many thanks, Rok
Re: Case Insensitive sorting while preserving case during faceted search
Hoss, I think the use case being asked about is specifically doing a facet.sort though, for cases where you actually do want to sort facet values with facet.sort, not sort records -- while still presenting the facet values with original case, but sorting them case insensitively. The solutions offered at those URLs don't address this. Because I'm pretty sure there isn't really any good solution for this, Solr just won't do that, just how it goes. On 12/21/2010 2:33 PM, Chris Hostetter wrote: : I am trying to do a facet search and sort the facet values too. ... : Then I followed the sample example schema.xml, created a copyField of type ... : fieldType name=alphaOnlySort class=solr.TextField : sortMissingLast=true omitNorms=true ... : But the sorted facet values dont have their case preserved anymore. : : How can I get around this? Did you look at how/why/when alphaOnlySort is used in the example? The FAQ entry you refered to address almost the exact same scnerio with wanting to search/sort on the same data... http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F ...the simplest thing to do is to use copyField to index a second version of your field using the StrField class. So have one version of your field using StrField that you facet on, and copyField that to another version (using TextField and KeywordTokenizer) that you sort on. -Hoss
Re: Faceting memory requirements
On Tue, Dec 21, 2010 at 4:02 PM, Rok Rejc rokrej...@gmail.com wrote: Dear all, I have created an index with aprox. 1.1 billion of documents (around 500GB) running on Solr 1.4.1. (64 bit JVM). I want to enable faceted navigation on am int field, which contains around 250 unique values. According to the wiki there are two methods: facet.method=fc which uses field cache. This method should use MaxDoc*4 bytes of memory which is around: 4.1GB. facet.method=fc uses the fieldcache, but it uses the StringIndex for all field types currently, so you need to add in space for the string representation of all the unique values. But this is only 250, so given the large number of docs, your estimate should still be close. facet.method=enum which crated a bitset for each unique value. This method should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB. A more efficient representation is used for a set when the set size is less than maxDoc/64. This set type uses an int per doc in the set, so should use roughly the same amount of memory as a numeric fieldcache entry. Are my calculations correct? My memory settings in Tomcat (windows) are: Initial memory pool: 4096 MB Maximum memory pool: 8192 MB (total 12GB in my test machine) I have tried to run a query (...facet=truefacet.field=PublisherIdfacet.method=fc) but I am still getting OOM: HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at ... Any idea what am I doing wrong, or have I miscalculated the memory requirements? Perhaps you are already sorting by another field or faceting on another field that is causing a lot of memory to already be used, and this pushes it over the edge? Or perhaps the JVM simply can't find a contiguous area of memory this large? Line 703 is this: so it's failing to create the first array: final int[] retArray = new int[reader.maxDoc()]; Although the line after it is even more troublesome: String[] mterms = new String[reader.maxDoc()+1]; Although you only need an array of 250 to contain all the unique terms, the FieldCacheImpl starts out with maxDoc. I think trunk will be far better in this regard. You should also try facet.method=enum though too. -Yonik http://www.lucidimagination.com
Re: [Reload-Config] not working
I also noticed that when I run the config-reload command, the following warning is thrown. I changed all my PK=id to see if that changed anything. Anyone have any ideas why this is not working for me? INFO: id is a required field in SolrSchema . But not found in DataConfig. Regards, Adm On Mon, Dec 20, 2010 at 10:58 AM, Adam Estrada estrada.a...@gmail.comwrote: This is the response I get...Does it matter that the configuration file is called something other than data-config.xml? After I get this I still have to restart the service. I wonder...do I need to commit the change? ?xml version=1.0 encoding=UTF-8 ? -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config# response -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config# lst name=*responseHeader* int name=*status*0/int int name=*QTime*520/int /lst -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config# lst name=*initArgs* -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config# lst name=*defaults* str name=*config*./solr/conf/dataimporthandler/rss.xml/str /lst /lst str name=*command*reload-config/str str name=*status*idle/str str name=*importResponse*Configuration Re-loaded sucessfully/str lst name=*statusMessages* / str name=*WARNING*This response format is experimental. It is likely to change in the future./str /response On Sun, Dec 19, 2010 at 11:12 PM, Ahmet Arslan iori...@yahoo.com wrote: a href= http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=full-import Full Import/abr / a href= http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config Reload Configuration/a All, The links above are meant for me to reload the configuration file after a change is made and the other is to perform the full import. My problem is that The reload-config option does not seem to be working. Am I doing anything wrong? Your expertise is greatly appreciated! I am sorry, I hit the reply button accidentally. Are you receiving/checking the message str name=importResponseConfiguration Re-loaded sucessfully/str after the reload? And are checking that data-config.xml is a valid xml after editing it programatically? And instead of editing data-config.xml file cant you use variable resolver? http://search-lucene.com/m/qYzPk2n86iIsubj
[Import Timeout] using /dataimport
All, I've noticed that there are some RSS feeds that are slow to respond, especially during high usage times throughout the day. Is there a way to set the timeout to something really high or have it just wait until the feed is returned? The entire thing stops working when the feed doesn't respond. Your ideas are greatly appreciated. Adam
Re: [Import Timeout] using /dataimport
(10/12/22 9:35), Adam Estrada wrote: All, I've noticed that there are some RSS feeds that are slow to respond, especially during high usage times throughout the day. Is there a way to set the timeout to something really high or have it just wait until the feed is returned? The entire thing stops working when the feed doesn't respond. Your ideas are greatly appreciated. Adam readTimeout? http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource Koji -- http://www.rondhuit.com/en/
Solr branch_3x problems
Hello guys, We at scribd.com have recently deployed our new search cluster based on Dec 1st, 2010 branch_3x solr code and we're very happy about the new features in brings. Though looks like we have a weird problem here: once a day our servers handling sharded search queries (frontend servers that receive requests and then fan them out to backend machines) die. Everything looks cool for a day, memory usage is stable, GC is doing its work as usual and then eventually we get a weird GC activity spike that kills whole VM and the only way to bring it back is to kill -9 the tomcat6 vm and restart it. We've tried different GC tuning options, tried to reduce caches to almost a zero size, still no luck. So I was wondering if there were any known issues with solr branch 3x in the last month that could have caused this kind of problems or if we could provide any more information that could help to track down the issue. Thanks. -- Alexey Kovyrin http://kovyrin.net/
White space in facet values
How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?
Duplicate values in multiValued field
If I put duplicate values into a multiValued field, would that cause any issues? For example I have a multiValued field Color. Some of my documents have duplicate values for that field, such as: Green, Red, Blue, Green, Green. Would the above (having 3 duplicate Green) be the same as having the duplicated values of: Green, Red, Blue? Or do I need to clean my data and remove duplicate values before indexing? Thanks.