Occasionally hit ArrayIndexOutOfBoundException when searching
Hey All, We are using Solr for an enterprise product. Recently we did an upgrade from 4.7.0 to 4.9.1 and are seeing this exception. Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr Cloud very soon :)). And I used maven to upgrade following is the snippet from pom.xml dependency groupIdorg.apache.solr/groupId artifactIdsolr-clustering/artifactId version4.9.1/version /dependency *Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ArrayIndexOutOfBoundsException: 31 at com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607) ~[sabasearch.jar:na] ... 196 common frames omitted Caused by: java.lang.ArrayIndexOutOfBoundsException: 31 at org.apache.lucene.util.FixedBitSet.nextSetBit(FixedBitSet.java:294) ~[sabasearch.jar:na] at org.apache.solr.search.DocSetBase$1$1$1.advance(DocSetBase.java:202) ~[sabasearch.jar:na] at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.advance(ConstantScoreQuery.java:278) ~[sabasearch.jar:na] at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:69) ~[sabasearch.jar:na] at org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100) ~[sabasearch.jar:na] at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192) ~[sabasearch.jar:na] at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) ~[sabasearch.jar:na] at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) ~[sabasearch.jar:na] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) ~[sabasearch.jar:na] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) ~[sabasearch.jar:na] at org.apache.solr.search.SolrIndexSearcher.numDocs(SolrIndexSearcher.java:2040) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.rangeCount(SimpleFacets.java:1338) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1262) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1197) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1141) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:262) ~[sabasearch.jar:na] at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84) ~[sabasearch.jar:na] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) ~[sabasearch.jar:na] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) ~[sabasearch.jar:na] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962) ~[sabasearch.jar:na] at com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:602) ~[sabasearch.jar:na] ... 196 common frames omitted Please guide me, what must have gone wrong? Thanks -Hussain
Re: Solr exceptions during batch indexing
Just trying to understand what's the challenge in returning the bad doc id(s)? Solr already know which doc(s) failed on update and can return their id(s) in response or callback. Can we have JIRA ticket on it if it doesn't exist? This looks like a common use case and every solr consumer might be writing their own versions to handle this issue. On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood wun...@wunderwood.org wrote: Right, that is why we batch. When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems. If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 7, 2014, at 11:44 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single thread, so it's certainly worth it. Thanks, Peter On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson erickerick...@gmail.com wrote: And Walter has also been around for a _long_ time ;) (sorry, couldn't resist) Erick On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood wun...@wunderwood.org wrote: Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. It isn’t to hard if the code is structured for it; retry with a batch size of 1. wunder On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com wrote: Yeah, this has been an ongoing issue for a _long_ time. Basically, you can't. So far, people have essentially written fallback logic to index the docs of a failing packet one at a time and report it. I'd really like better reporting back, but we haven't gotten there yet. Best, Erick On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan peterlkee...@gmail.com wrote: How are folks handling Solr exceptions that occur during batch indexing? Solr stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing the batch. The bad document is not identified, so it would be hard for the client to recover by skipping over it. Peter
Re: Synonymn for Numbers
If you are searching for single document can a real time get on doc id mentioned below serve your use case? http://localhost:8983/solr/get?id=mydoc Real time get for multiple docs: http://localhost:8983/solr/get?id=mydocid=mydoc On Sat, Nov 8, 2014 at 12:52 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi Group, I am working on implementing synonym for number like 10,2010 14,2014 2 digit number to get documents with four digit, I added the above lines in synonym and everything works. But now I have to get for one direction, I tried 10=2010 but it is still gets the record belongs to 10 , if I search 2010. I want to get only 2010 documents if I search 2010 not 10. I have expand=true in the synonym filter. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ If any help really appreciated. Thanks Ravi
Re: Delete data from stored documents
Since the data already existing and need is to remove unwanted fields using a custom update processor looks less useful here. Erick's recommendation on re-indexing into a new collection if at all possible looks simple and safe. On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com wrote: bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml, so changing it then optimizing (and optimizing is also a Lucene function) won't have any effect. If you 1 change the schema and 2 update documents the data will be purged as background merges happen. But really, I'd recommend re-indexing into a new collection if at all possible. Best, Erick On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote: Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to dynamically create Solr cores with schema
For more advanced dynamic fields refer dynamicField elements convention patterns for fields from the below schema.xml http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/server/solr/configsets/basic_configs/conf/schema.xml solr create core api can be referred to create a core dynamically. e.g. curl http://localhost:8080/solr/admin/cores?action=CREATEname=$nameinstanceDir=/etc/solr/conf/$name On Fri, Nov 7, 2014 at 10:29 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The usual solution to that is to have dynamic fields with suffixes indicating the types. So, your int fields are mapped to *_i, your date fields to *_d. Solr has schemaless support, but it is auto-detect for now. Creating fields of particular types via API I think is in JIRA on the trunk for 5.0. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 6 November 2014 10:04, Andreas Hubold andreas.hub...@coremedia.com wrote: Hi, I have a use-case where Java applications need to create Solr indexes dynamically. Schema fields of these indexes differ and should be defined by the Java application upon creation. So I'm trying to use the Core Admin API [1] to create new cores and the Schema API [2] to define fields. When creating a core, I have to specify solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema to start with. I thought it would be a good idea to use a named config sets [3] for this purpose: curl ' http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data ' But when I add a field to the core m1, the field actually gets added to the config set. Is this a bug of feature? curl http://localhost:8082/solr/m1/schema/fields -X POST -H 'Content-type:application/json' --data-binary '[{ name:foo, type:tdate, stored:true }]' All cores created from the config set myconfig will get the new field foo in their schema. So this obviously does not work to create cores with different schema. I also tried to use the config/schema parameters of the CREATE core command (instead of config sets) to specify some existing solrconfig.xml/schema.xml. I tried relative paths here (e.g. some level upwards) but I could not get it to work. The documentation [1] tells me that relative paths are allowed. Should this work? Next thing that would come to my mind is to use dynamic fields instead of a correct managed schema, but that does not sound as nice. Or maybe I should implement a custom CoreAdminHandler which takes list of field definitions, if that's possible somehow...? I don't know. What's your recommended approach? We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or different with SolrCloud? Thank you, Andreas [1] https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE [2] https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema [3] https://cwiki.apache.org/confluence/display/solr/Config+Sets
Re: Sort documents by exist(multivalued field)
Is it possible to describe the exact use case here. On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You encode that knowledge by using UpdateRequestProcessor. Clone the field, replace it with true, map it to boolean. That way, you will pay the price once per document indexed not (documentCount*) times per request. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote: I want to sort by multivalued field like boolean values. Something like that: *sort exist(multivalued field name) desc* Is it possible? P.S. I know that sorting doesn't work for multivalued fields, but it work for single boolean field... -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-documents-by-exist-multivalued-field-tp4168141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sort documents by exist(multivalued field)
Re-index data is bad for me, it's 5TB of data, the time to re-index this data it's too much, but seem to be the only option I have. On Sat 8 Nov 2014 at 13:10 Anurag Sharma anura...@gmail.com wrote: Is it possible to describe the exact use case here. On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You encode that knowledge by using UpdateRequestProcessor. Clone the field, replace it with true, map it to boolean. That way, you will pay the price once per document indexed not (documentCount*) times per request. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote: I want to sort by multivalued field like boolean values. Something like that: *sort exist(multivalued field name) desc* Is it possible? P.S. I know that sorting doesn't work for multivalued fields, but it work for single boolean field... -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-documents-by- exist-multivalued-field-tp4168141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Minimum Term Matching in More Like This Queries
There is no direct way of retrieving doc based on minimum term match in Solr. mlm params 'mlt.mintf' and 'mlt.match.offset' can be explored if they meets the criteria. Refer below links for more details: http://wiki.apache.org/solr/MoreLikeThisHandler https://wiki.apache.org/solr/MoreLikeThis In case you are using lucene library directly,setPercentTermsToMatch() function can be used from MoreLikeThisQuery class. Refer code: https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThisQuery.java On Fri, Nov 7, 2014 at 9:45 PM, Tim Hearn timseman...@gmail.com wrote: Hi! I'm fairly new to Solr. Is there a feature which enforces minimum term matching for MLT Queries? More precisely, that is, a document will match the MLT query if and only if at least x terms in the query are found in the document, with x defined by the user. I could not find such a feature in the documentation, and switching to the edismax query parser and using the 'mm' parameter does not work for me. Thanks!
Re: Term count in multivalue fields
Since 'omitTermFremFreqAndPositions' is enabled what does a function query 'totaltermfreq(field,term)' return? Another way, not sure this is the correct approach, while indexing add a field containing the number. Filter and sum(function query) on the field while querying. Range query can also be done on this field. On Fri, Nov 7, 2014 at 7:23 PM, Nickolay41189 klin892...@yandex.ru wrote: Andrey, thank you for reply. Can you explain what do you mean faceting query with prefix? I'm newer in the wolrd of Solr, can you give me example of this query? -- View this message in context: http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sort documents by first value in multivalued field
What is 'first value' here, any example? On Fri, Nov 7, 2014 at 5:04 PM, Nickolay41189 klin892...@yandex.ru wrote: How can I sort documents by first value in multivalued field? (without adding copyField and without some changes in schema.xml)? -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-documents-by-first-value-in-multivalued-field-tp4168140.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr exceptions during batch indexing
bq: Just trying to understand what's the challenge in returning the bad doc Mostly, nobody has done it yet. There's some complication about async updates, ConcurrentUpdateSolrServer for instance. I suspect also that one has to write error handling logic in the client anyway so the motivation is reduced. And now it would need to handle SolrCloud mode. All that said, this has bugged me for a long time, but I haven't gotten around to it. Which says something about the priority I suspect. FWIW, Erick On Sat, Nov 8, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote: Just trying to understand what's the challenge in returning the bad doc id(s)? Solr already know which doc(s) failed on update and can return their id(s) in response or callback. Can we have JIRA ticket on it if it doesn't exist? This looks like a common use case and every solr consumer might be writing their own versions to handle this issue. On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood wun...@wunderwood.org wrote: Right, that is why we batch. When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems. If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 7, 2014, at 11:44 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single thread, so it's certainly worth it. Thanks, Peter On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson erickerick...@gmail.com wrote: And Walter has also been around for a _long_ time ;) (sorry, couldn't resist) Erick On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood wun...@wunderwood.org wrote: Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. It isn’t to hard if the code is structured for it; retry with a batch size of 1. wunder On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com wrote: Yeah, this has been an ongoing issue for a _long_ time. Basically, you can't. So far, people have essentially written fallback logic to index the docs of a failing packet one at a time and report it. I'd really like better reporting back, but we haven't gotten there yet. Best, Erick On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan peterlkee...@gmail.com wrote: How are folks handling Solr exceptions that occur during batch indexing? Solr stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing the batch. The bad document is not identified, so it would be hard for the client to recover by skipping over it. Peter
Re: Sort documents by exist(multivalued field)
Well, if you can write a custom function that does the right thing with multiValued fields you could sort by that. You still haven't defined the exact use case. The problem here is that sorting by a multiValued field is meaningless. Consider a field with aardvark and zebra. Where should it sort? Of course you can define rules like the minimum value of the field, which will at least give consistent results... until the next person wants to sort by the average of all the numbers in a field. On Sat, Nov 8, 2014 at 5:32 AM, Yago Riveiro yago.rive...@gmail.com wrote: Re-index data is bad for me, it's 5TB of data, the time to re-index this data it's too much, but seem to be the only option I have. On Sat 8 Nov 2014 at 13:10 Anurag Sharma anura...@gmail.com wrote: Is it possible to describe the exact use case here. On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You encode that knowledge by using UpdateRequestProcessor. Clone the field, replace it with true, map it to boolean. That way, you will pay the price once per document indexed not (documentCount*) times per request. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote: I want to sort by multivalued field like boolean values. Something like that: *sort exist(multivalued field name) desc* Is it possible? P.S. I know that sorting doesn't work for multivalued fields, but it work for single boolean field... -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-documents-by- exist-multivalued-field-tp4168141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Agreed, but I think it would be great if Lucene and Solr provided an API to delete a single field for the entire index. We could file a Jira, but can Lucene accommodate it? Maybe we'll just have to wait for Elasticsearch to implement this feature! -- Jack Krupansky -Original Message- From: Anurag Sharma Sent: Saturday, November 8, 2014 6:46 AM To: solr-user@lucene.apache.org Subject: Re: Delete data from stored documents Since the data already existing and need is to remove unwanted fields using a custom update processor looks less useful here. Erick's recommendation on re-indexing into a new collection if at all possible looks simple and safe. On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com wrote: bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml, so changing it then optimizing (and optimizing is also a Lucene function) won't have any effect. If you 1 change the schema and 2 update documents the data will be purged as background merges happen. But really, I'd recommend re-indexing into a new collection if at all possible. Best, Erick On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote: Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Synonymn for Numbers
Are you using the synonyms for both indexing and query? It sounds like you want to use these synonyms only at query time. Otherwise, 10 in the index becomes 2010 in the index. -- Jack Krupansky -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Friday, November 7, 2014 2:22 PM To: solr-user@lucene.apache.org Subject: Synonymn for Numbers Hi Group, I am working on implementing synonym for number like 10,2010 14,2014 2 digit number to get documents with four digit, I added the above lines in synonym and everything works. But now I have to get for one direction, I tried 10=2010 but it is still gets the record belongs to 10 , if I search 2010. I want to get only 2010 documents if I search 2010 not 10. I have expand=true in the synonym filter. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ If any help really appreciated. Thanks Ravi
Help with SolrCloud exceptions while recovering
Hi, I am a newbie SolrCloud enthusiast. My goal is to implement an infrastructure to enable text analysis (clustering, classification, information extraction, sentiment analysis, etc). My development environment consists of one machine, quad-core processor, 16GB RAM and 1TB HD. Have started implementing Apache Flume, Twitter as source and SolrCloud (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload configuration and managing cluster. The pseudo-distributed cluster consists of one collection, three shards each with three replicas. Everything runs smoothly for a while. After 50.000 tweets committed (actually CloudSolrServer commits every batch consisting of 500 documents) randomly SolrCloud starts logging exceptions: Lucene file not found, IndexWriter cannot be opened, replication unsuccessful and the likes. Recovery starts with no success until replica goes down. Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with same results. I have looked everywhere for help before writing this email. My guess right now is that the problem lies with SolrCloud and Zookeeper connection, although haven't seen any such exception. Any reference or help will be welcomed. Cheers, B.
Re: Solrcloud replicas do not match
Hi Erick, I found the issue to be related to my other question (about shared solrconfig.xml) which you also answered. Turns out that I had set data.dir variable in solrconfig.xml to an absolute path that coincided with a different index. So replica tried to be created there and something nasty probably happened. When removed the variable value, the replica starts to be created where expected (and appropriatelly grows in size). During this recovery process (copying 60GB of data), the Solr Admin console is unusable however. Anything I could do about this? Thank you a lot, Michal 2014-11-07 20:16 GMT+01:00 Erick Erickson erickerick...@gmail.com: How did you create the replica? Does the admin screen show it attached to the proper shard? What I'd do is set up my SolrCloud instance with (presumably) a single node (leader) and insure my searches were working. Then (and only then) use the Collection API ADDREPLICA command. You should see your replica be updated and be good-to-go Best, Erick On Fri, Nov 7, 2014 at 9:13 AM, Michal Krajňanský michal.krajnan...@gmail.com wrote: Hi all, I have a Solrcloud setup with a manually created collection with the index obtained via other means than Solr (data come from Lucene). I created a replica for the index and expected to see the data being copied to the replica, which does not happen. In the Admin interface I see something like: Version Gen Size Master (Searching) 1415379668601 5853288 60.13 GB Master (Replicable) 1415379668601 5853288 - Slave (Searching) 1415379668601 3 1.84 KB The versions seem to match. But obviously the replica only contains a handful of documents I indexed AFTER the replica was created. How do I replicate the documents that were already in the index? Or am I missing something? Best, Michal Krajnansky
Re: Solrcloud solrconfig.xml
Hi Erick, Thank you for making this clearer (it helped me solve issue with replication I asked about in different thread). However I suspect I still do something wrong. I am running a single Tomcat instance with two instances of Solr. The shared solrconfig.xml contains: dataDir${solr.data.dir:data}/dataDir And the Tomcat contexts set the solr/home as follows: Environment name=solr/home type=java.lang.String value=.../solrcloud/solr1 override=true / Environment name=solr/home type=java.lang.String value=.../solrcloud/solr2 override=true / The directory structure is as follows: .../solrcloud/solr1/solr.xml .../solrcloud/solr1/core1 .../solrcloud/solr1/core1/core.properties .../solrcloud/solr1/core1/data .../solrcloud/solr2/solr.xml After having issued ADDREPLICA on the collection managed by core1, I would expect to see the new data dir under .../solrcloud/solr2/core2/data. However I have seen something like this: (the core names were a little different). ... .../solrcloud/solr2/solr.xml .../solrcloud/solr2/core2 .../solrcloud/solr2/core2/core.properties .../solrcloud/data(!) I.e. the new core data dir was created relative to the parent solrcloud folder. Makes me confused... Best, Michal Krajnansky 2014-11-07 19:59 GMT+01:00 Erick Erickson erickerick...@gmail.com: Each of those data dirs is relative to the instance in question. So if you're running on different machines, they're physically separate even though named identically. If you're running multiple nodes on a single machine a-la the getting started docs, then each one is in it's own directory (e.g. solr/node1, solr/node2) and since the dirs are relative to that directory, you get things like ..solr/node1/solr/gettingstarted_shard1_replica1/data ..solr/node2/solr/gettingstarted_shard1_replica1/data etc. Best, Erick On Fri, Nov 7, 2014 at 5:26 AM, Michal Krajňanský michal.krajnan...@gmail.com wrote: Hi Everyone, I am quite a bit confused about managing configuration files with Zookeeper for running Solr in cloud mode. To be precise, I was able to upload the config files (schema.xml, solrconfig.xml) into the Zookeeper and run Solrcloud. What confuses me are properties like data.dir, or replication request handlers. It seems like these should be different for each of the servers in the cloud. So how does it work? (I did google to understand the matter unsuccessfully.) Best, Michal
Solr 4.10 very slow on build()
I have a ~4GB index which takes a minute (or over) to /build()/ when starting server. I noticed that this happens when I upgrade from solr 4.0 to 4.10. The index was fully rebuilt with solr 4.10 (using DIH). How can I speed up startup time?Here is the slow part of the starting log:INFO 141101-23:48:18.239 Loading spell index for spellchecker: wordbreakINFO 141101-23:48:18.239 Loading suggester index for: mySuggesterINFO 141101-23:48:18.239 reload()INFO *141101-23:48:18.239* build()INFO *141101-23:49:15.270* [admin] webapp=null path=/admin/cores params={_=1414873135659wt=json} status=0 QTime=11INFO 141101-23:49:22.503 [news] Registered new searcher Searcher@28195344[news] main{StandardDirectoryReader(segments_1b6:65731:nrt _fgm(4.10.1):C244111 _1pw(4.10.1):C191483/156:delGen=140 _1wg(4.10.1):C174054/11:delGen=11 _236(4.10.1):C1920/1:delGen=1 _23h(4.10.1):C1756 _67x(4.10.1):C2120/144:delGen=126 _23l(4.10.1):C2185/2:delGen=2 _4ch(4.10.1):C784/145:delGen=126 _3b5(4.10.1):C758/80:delGen=79 _23q(4.10.1):C3391 _97s(4.10.1):C1218/136:delGen=127 _buo(4.10.1):C1096/86:delGen=84 _eh8(4.10.1):C819/73:delGen=69 _fg8(4.10.1):C413/94:delGen=81 _geb(4.10.1):C229/5:delGen=5 _g4b(4.10.1):C130/24:delGen=23 _g6c(4.10.1):C144/15:delGen=14 _ghj(4.10.1):C21/2:delGen=2 _gj6(4.10.1):C25/3:delGen=3 _gfz(4.10.1):C10/1:delGen=1 _ghe(4.10.1):C1 _gir(4.10.1):C3/2:delGen=1 _gis(4.10.1):C2/1:delGen=1 _gja(4.10.1):C1 _gjb(4.10.1):C2/1:delGen=1 _gjd(4.10.1):C1 _gjj(4.10.1):C1 _gjo(4.10.1):C1 _gjp(4.10.1):C1 _gjq(4.10.1):C1 _gjs(4.10.1):C1)}INFO 141101-23:49:22.505 Creating new IndexWriter...INFO 141101-23:49:22.506 Waiting until IndexWriter is unused... core=newsINFO 141101-23:49:22.506 Closing old IndexWriter... core=newsINFO 141101-23:49:22.650 SolrDeletionPolicy.onInit: commits: num=1 commit{dir=NRTCachingDirectory(MMapDirectory@/app/solr/solrhome/news/data/index lockFactory=NativeFSLockFactory@/app/solr/solrhome/news/data/index; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_bpm,generation=15178}INFO 141101-23:49:22.650 newest commit generation = 15178 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-very-slow-on-build-tp4168368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to dynamically create Solr cores with schema
I remember a talk by CareerBuilder whe they wrote an API using the approach explained by Alexandre and they got really good results. - Original Message - From: Anurag Sharma anura...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, November 8, 2014 7:58:48 AM Subject: Re: How to dynamically create Solr cores with schema For more advanced dynamic fields refer dynamicField elements convention patterns for fields from the below schema.xml http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/server/solr/configsets/basic_configs/conf/schema.xml solr create core api can be referred to create a core dynamically. e.g. curl http://localhost:8080/solr/admin/cores?action=CREATEname=$nameinstanceDir=/etc/solr/conf/$name On Fri, Nov 7, 2014 at 10:29 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The usual solution to that is to have dynamic fields with suffixes indicating the types. So, your int fields are mapped to *_i, your date fields to *_d. Solr has schemaless support, but it is auto-detect for now. Creating fields of particular types via API I think is in JIRA on the trunk for 5.0. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 6 November 2014 10:04, Andreas Hubold andreas.hub...@coremedia.com wrote: Hi, I have a use-case where Java applications need to create Solr indexes dynamically. Schema fields of these indexes differ and should be defined by the Java application upon creation. So I'm trying to use the Core Admin API [1] to create new cores and the Schema API [2] to define fields. When creating a core, I have to specify solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema to start with. I thought it would be a good idea to use a named config sets [3] for this purpose: curl ' http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data ' But when I add a field to the core m1, the field actually gets added to the config set. Is this a bug of feature? curl http://localhost:8082/solr/m1/schema/fields -X POST -H 'Content-type:application/json' --data-binary '[{ name:foo, type:tdate, stored:true }]' All cores created from the config set myconfig will get the new field foo in their schema. So this obviously does not work to create cores with different schema. I also tried to use the config/schema parameters of the CREATE core command (instead of config sets) to specify some existing solrconfig.xml/schema.xml. I tried relative paths here (e.g. some level upwards) but I could not get it to work. The documentation [1] tells me that relative paths are allowed. Should this work? Next thing that would come to my mind is to use dynamic fields instead of a correct managed schema, but that does not sound as nice. Or maybe I should implement a custom CoreAdminHandler which takes list of field definitions, if that's possible somehow...? I don't know. What's your recommended approach? We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or different with SolrCloud? Thank you, Andreas [1] https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE [2] https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema [3] https://cwiki.apache.org/confluence/display/solr/Config+Sets
Re: Solr 4.10 very slow on build()
Try commenting out the suggester component handler in solrconfig.xml: https://issues.apache.org/jira/browse/SOLR-6679 -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Sat, Nov 8, 2014 at 2:03 PM, Mohsen Saboorian mohs...@gmail.com wrote: I have a ~4GB index which takes a minute (or over) to /build()/ when starting server. I noticed that this happens when I upgrade from solr 4.0 to 4.10. The index was fully rebuilt with solr 4.10 (using DIH). How can I speed up startup time?Here is the slow part of the starting log:INFO 141101-23:48:18.239 Loading spell index for spellchecker: wordbreakINFO 141101-23:48:18.239 Loading suggester index for: mySuggesterINFO 141101-23:48:18.239 reload()INFO *141101-23:48:18.239* build()INFO *141101-23:49:15.270* [admin] webapp=null path=/admin/cores params={_=1414873135659wt=json} status=0 QTime=11INFO 141101-23:49:22.503 [news] Registered new searcher Searcher@28195344[news] main{StandardDirectoryReader(segments_1b6:65731:nrt _fgm(4.10.1):C244111 _1pw(4.10.1):C191483/156:delGen=140 _1wg(4.10.1):C174054/11:delGen=11 _236(4.10.1):C1920/1:delGen=1 _23h(4.10.1):C1756 _67x(4.10.1):C2120/144:delGen=126 _23l(4.10.1):C2185/2:delGen=2 _4ch(4.10.1):C784/145:delGen=126 _3b5(4.10.1):C758/80:delGen=79 _23q(4.10.1):C3391 _97s(4.10.1):C1218/136:delGen=127 _buo(4.10.1):C1096/86:delGen=84 _eh8(4.10.1):C819/73:delGen=69 _fg8(4.10.1):C413/94:delGen=81 _geb(4.10.1):C229/5:delGen=5 _g4b(4.10.1):C130/24:delGen=23 _g6c(4.10.1):C144/15:delGen=14 _ghj(4.10.1):C21/2:delGen=2 _gj6(4.10.1):C25/3:delGen=3 _gfz(4.10.1):C10/1:delGen=1 _ghe(4.10.1):C1 _gir(4.10.1):C3/2:delGen=1 _gis(4.10.1):C2/1:delGen=1 _gja(4.10.1):C1 _gjb(4.10.1):C2/1:delGen=1 _gjd(4.10.1):C1 _gjj(4.10.1):C1 _gjo(4.10.1):C1 _gjp(4.10.1):C1 _gjq(4.10.1):C1 _gjs(4.10.1):C1)}INFO 141101-23:49:22.505 Creating new IndexWriter...INFO 141101-23:49:22.506 Waiting until IndexWriter is unused... core=newsINFO 141101-23:49:22.506 Closing old IndexWriter... core=newsINFO 141101-23:49:22.650 SolrDeletionPolicy.onInit: commits: num=1 commit{dir=NRTCachingDirectory(MMapDirectory@/app/solr/solrhome/news/data/index lockFactory=NativeFSLockFactory@/app/solr/solrhome/news/data/index; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_bpm,generation=15178}INFO 141101-23:49:22.650 newest commit generation = 15178 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-very-slow-on-build-tp4168368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Occasionally hit ArrayIndexOutOfBoundException when searching
Can anyone here provide help on this? Any further logs or environment details I can provide to help the analysis? On Nov 8, 2014 12:31 AM, Mohmed Hussain mohd.huss...@gmail.com wrote: Hey All, We are using Solr for an enterprise product. Recently we did an upgrade from 4.7.0 to 4.9.1 and are seeing this exception. Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr Cloud very soon :)). And I used maven to upgrade following is the snippet from pom.xml dependency groupIdorg.apache.solr/groupId artifactIdsolr-clustering/artifactId version4.9.1/version /dependency *Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ArrayIndexOutOfBoundsException: 31 at com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607) ~[sabasearch.jar:na] ... 196 common frames omitted Caused by: java.lang.ArrayIndexOutOfBoundsException: 31 at org.apache.lucene.util.FixedBitSet.nextSetBit(FixedBitSet.java:294) ~[sabasearch.jar:na] at org.apache.solr.search.DocSetBase$1$1$1.advance(DocSetBase.java:202) ~[sabasearch.jar:na] at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.advance(ConstantScoreQuery.java:278) ~[sabasearch.jar:na] at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:69) ~[sabasearch.jar:na] at org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100) ~[sabasearch.jar:na] at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192) ~[sabasearch.jar:na] at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) ~[sabasearch.jar:na] at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) ~[sabasearch.jar:na] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) ~[sabasearch.jar:na] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) ~[sabasearch.jar:na] at org.apache.solr.search.SolrIndexSearcher.numDocs(SolrIndexSearcher.java:2040) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.rangeCount(SimpleFacets.java:1338) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1262) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1197) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1141) ~[sabasearch.jar:na] at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:262) ~[sabasearch.jar:na] at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84) ~[sabasearch.jar:na] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) ~[sabasearch.jar:na] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) ~[sabasearch.jar:na] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962) ~[sabasearch.jar:na] at com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:602) ~[sabasearch.jar:na] ... 196 common frames omitted Please guide me, what must have gone wrong? Thanks -Hussain
Re: Help with SolrCloud exceptions while recovering
First. for tweets committing every 500 docs is much too frequent. Especially from the client and super-especially if you have multiple clients running. I'd recommend you just configure solrconfig this way as a place to start and do NOT commit from any clients. 1 a hard commit (openSearcher=false) every minute (or maybe 5 minutes) 2 a soft commit every minute This latter governs how long it'll be between when a doc is indexed and when can be searched. Here's a long post about how all this works: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ As far as the rest, it's a puzzle definitely. If it continues, a complete stack trace would be a good thing to start with. Best, Erick On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek baos...@gmail.com wrote: Hi, I am a newbie SolrCloud enthusiast. My goal is to implement an infrastructure to enable text analysis (clustering, classification, information extraction, sentiment analysis, etc). My development environment consists of one machine, quad-core processor, 16GB RAM and 1TB HD. Have started implementing Apache Flume, Twitter as source and SolrCloud (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload configuration and managing cluster. The pseudo-distributed cluster consists of one collection, three shards each with three replicas. Everything runs smoothly for a while. After 50.000 tweets committed (actually CloudSolrServer commits every batch consisting of 500 documents) randomly SolrCloud starts logging exceptions: Lucene file not found, IndexWriter cannot be opened, replication unsuccessful and the likes. Recovery starts with no success until replica goes down. Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with same results. I have looked everywhere for help before writing this email. My guess right now is that the problem lies with SolrCloud and Zookeeper connection, although haven't seen any such exception. Any reference or help will be welcomed. Cheers, B.
Re: Solrcloud replicas do not match
re: Solr admin console. Hmmm, switch it to a different node? It gets you the same info no matter which node you're pointing at in your SolrCloud Not sure why this happens though. Best, Erick On Sat, Nov 8, 2014 at 10:12 AM, Michal Krajňanský michal.krajnan...@gmail.com wrote: Hi Erick, I found the issue to be related to my other question (about shared solrconfig.xml) which you also answered. Turns out that I had set data.dir variable in solrconfig.xml to an absolute path that coincided with a different index. So replica tried to be created there and something nasty probably happened. When removed the variable value, the replica starts to be created where expected (and appropriatelly grows in size). During this recovery process (copying 60GB of data), the Solr Admin console is unusable however. Anything I could do about this? Thank you a lot, Michal 2014-11-07 20:16 GMT+01:00 Erick Erickson erickerick...@gmail.com: How did you create the replica? Does the admin screen show it attached to the proper shard? What I'd do is set up my SolrCloud instance with (presumably) a single node (leader) and insure my searches were working. Then (and only then) use the Collection API ADDREPLICA command. You should see your replica be updated and be good-to-go Best, Erick On Fri, Nov 7, 2014 at 9:13 AM, Michal Krajňanský michal.krajnan...@gmail.com wrote: Hi all, I have a Solrcloud setup with a manually created collection with the index obtained via other means than Solr (data come from Lucene). I created a replica for the index and expected to see the data being copied to the replica, which does not happen. In the Admin interface I see something like: Version Gen Size Master (Searching) 1415379668601 5853288 60.13 GB Master (Replicable) 1415379668601 5853288 - Slave (Searching) 1415379668601 3 1.84 KB The versions seem to match. But obviously the replica only contains a handful of documents I indexed AFTER the replica was created. How do I replicate the documents that were already in the index? Or am I missing something? Best, Michal Krajnansky
Re: Occasionally hit ArrayIndexOutOfBoundException when searching
Hi All, More analysis revealed it fails when we have indexed documents with many Japanese characters and have indexed it using tika parser. The search is successful when we turn OFF facets on the only one date param used. Following is the SolrParam Query q=(+(+((+resource_type:CURRICULUM^0.001+(+(audtype_id:audiexxx^0.001)+(+disc_from:[2014-11-09T14:27:27.137Z TO *]^0.001 +status:200^0.001 +disp_web:true^0.001 +avail_from:[* TO 2014-11-08T14:27:27.137Z]^0.001)))(+resource_type:OFFERING^1.0+(+is_private:false^0.001 +course_avail_from:[* TO 2014-11-08T14:27:27.137Z]^0.001 +course_disp_web:true^0.001 +course_disc_from:[2014-11-08T14:27:27.137Z TO *]^0.001 +is_recurring_course:0^0.001 +course_ispublished:true^0.001 +disp_web:true^0.001+((+offering_enroll_close:[2014-11-08T22:27:27.137Z TO *]^0.001 +base_delivery_type:100^0.001 +endDate:[2014-11-07T22:27:27.137Z TO *]^0.001 +offering_open_enroll:[* TO 2014-11-08T22:27:27.137Z]^0.001 +status:100^0.001)(+base_delivery_type:200^0.001 +disc_from:[2014-11-08T14:27:27.137Z TO *]^0.001 +avail_from:[* TO 2014-11-08T14:27:27.137Z]^0.001))+(audtype_id:audiexxx^0.001)))(+resource_type:CERTIFICATION^0.001+(+(+disp_web:true^0.001 +status:200^0.001 +avail_from:[* TO 2014-11-08T14:27:27.053Z]^0.001 +disc_from:[2014-11-09T14:27:27.053Z TO *]^0.001)+(audtype_id:audiexxx^0.001+(+(+(is_alumni:0^1.0))+(description_lower:ras_course_with_content_02*^100.0 name_tokenized:ras_course_with_content_02*^1.0 name_lower:ras_course_with_content_02^1.0 course_description_lower:ras_course_with_content_02*^100.0 offering_template_no:ras_course_with_content_02*^1000.0 part_no:ras_course_with_content_02*^1000.0 tag_name:ras_course_with_content_02*^1000.0 name_tokenized:ras_course_with_content_02*^1.0 tag_name:ras_course_with_content_02*^1000.0 name_tokenized:ras_course_with_content_02*^1.0 description_lower:ras_course_with_content_02*^100.0 part_no:ras_course_with_content_02*^1000.0 tag_name:ras_course_with_content_02*^1000.0 part_no:ras_course_with_content_02*^1000.0 name_lower:ras_course_with_content_02^1.0 offering_template_no:ras_course_with_content_02*^1000.0 name_tokenized:ras_course_with_content_02*^1.0 description_lower:ras_course_with_content_02*^100.0 part_no:ras_course_with_content_02*^1000.0 name_lower:ras_course_with_content_02^1.0 tag_name:ras_course_with_content_02*^1000.0 part_no:ras_course_with_content_02*^1000.0 name_lower:ras_course_with_content_02^1.0 description_lower:ras_course_with_content_02*^100.0 name_tokenized:ras_course_with_content_02*^1.0 name_lower:ras_course_with_content_02^1.0 part_no:ras_course_with_content_02*^1000.0 keywords:ras_course_with_content_02*^1000.0 description_lower:ras_course_with_content_02*^100.0 name_lower:ras_course_with_content_02^1.0 description_lower:ras_course_with_content_02*^100.0 course_description_lower:ras_course_with_content_02*^100.0 tag_name:ras_course_with_content_02*^1000.0 keywords:ras_course_with_content_02*^1000.0 keywords:ras_course_with_content_02*^1000.0 tag_name:ras_course_with_content_02*^1000.0 name_tokenized:ras_course_with_content_02*^1.0 keywords:ras_course_with_content_02*^1000.0facet=true hl.start=1group.limit=101facet.method=enumhl=falsecom.saba.datastore.paging.start=1 f.startDate.facet.date.start=2014-11-01T00:00:00.000ZdebugQuery=truefl=* fl=scoref.startDate.facet.date.gap=+1MONTH com.saba.datastore.security.userId=emplo0169692group.field=groupbyidfacet.field=lrnEventTypefacet.field=category_id_facetfacet.field=location_id_facetfacet.field=resource_typefacet.field=delivery_idfacet.field=offering_language_idhl.requireFieldMatch=truegroup.format=groupedgroup.ngroups=truecom.saba.datastore.paging.rows=60facet.mincount=1 f.startDate.facet.date.end=2015-11-30T00:00:00.000Zfacet.date=startDate hl.end=61com.saba.datastore.security.tenantId=SomeSitefacet.sort=indexgroup=trueindexKey=SomeSite:SocialSearchIndexdatastoreId=/127.0.0.1:8098 shards= Thanks -Hussain On Sat, Nov 8, 2014 at 12:11 PM, anil raju anillr...@gmail.com wrote: Can anyone here provide help on this? Any further logs or environment details I can provide to help the analysis? On Nov 8, 2014 12:31 AM, Mohmed Hussain mohd.huss...@gmail.com wrote: Hey All, We are using Solr for an enterprise product. Recently we did an upgrade from 4.7.0 to 4.9.1 and are seeing this exception. Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr Cloud very soon :)). And I used maven to upgrade following is the snippet from pom.xml dependency groupIdorg.apache.solr/groupId artifactIdsolr-clustering/artifactId version4.9.1/version /dependency *Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ArrayIndexOutOfBoundsException: 31 at com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607) ~[sabasearch.jar:na] ... 196 common frames
Re: on regards to Solr and NoSQL storages integration
There is no double storage of data - the Solr index for DataStax Enterprise ignores the stored attribute and only stores the primary key data to allow the Solr document to reference the Cassandra row, which is where the data is stored. The exception would be doc values, where the data does need to be kept in the index for efficient operation of Lucene and Solr, but that would only be done for fields such as facet fields and is under the complete control of the developer. DataStax Enterprise also utilizes an indexing queue so that Cassandra inserts and updates can occur at full speed, with indexing in a background thread, maximizing ingestion performance. -- Jack Krupansky -Original Message- From: andrey prokopenko Sent: Friday, November 7, 2014 5:00 AM To: solr-user@lucene.apache.org Subject: Re: on regards to Solr and NoSQL storages integration Thanks for the reply. I've considered DataStax, but dropped it first due to the commercial model they're using and second due to the integration model they have chosen to integrate with Cassandra. In their docs (can be found here: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_load_data), they do not disclose the architecture and details of their integration solution, yet the examination of the Solr configuration and handlers from their distribution package has revealed that they essentially let the docs rest both in Solr index and Cassandra storage. To safely propagating documents on each Solr index update to Casssandra they use their own update handler + custom update log. In my opinion, this is not very efficient, because it doubles docs storage and leaves Solr index as heavy as it is currently. My approach completely relays stored fields storage to NoSQL database, using user-defined key unique key. This gives the users quickly do partial updates of stored but non-indexed non-indexed fields and greatly reduces time required to replication in case of heavy write/load. On Wed, Nov 5, 2014 at 4:04 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: On 5 November 2014 08:52, andrey prokopenko andrey4...@gmail.com wrote: I assume, there might be other developers, trying to solve similar problems, so I'd be interested to hear about similar attempts issues encountered while trying to implement such an integration between Solr and other NoSQL databases. I think DataStax does Solr+Cassandra and Cloudera does Solr+Hadoop with underlying content stored in the databases. Also Neo4J has graph+search integration, but I think it's directly using Lucene engine, not Solr. Disclaimer: this is very high level understanding, hopefully the other people can confirm. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Term count in multivalue fields
while indexing add a field containing the number isn't suitable for my case. I can't add new field and do indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168400.html Sent from the Solr - User mailing list archive at Nabble.com.