AW: Question about dates and SolrJ
In 3.6.1 i also got back a Date insance, now from 4.0 I receive also a String. I don't like this, but I adapted my software now. Is there no way to change this behavior in the config? -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:s...@elyograg.org] Gesendet: Sonntag, 13. Januar 2013 07.53 An: solr-user@lucene.apache.org Betreff: Re: Question about dates and SolrJ On 1/12/2013 7:51 PM, Jack Park wrote: My work engages SolrJ, with which I send documents off to Solr 4 which properly store, as viewed in the admin panel, as this example: 2013-02-04T02:11:39.995Z When I retrieve a document with that date, I use the SolrDocument returned as a MapString,Object in which the date now looks like this: Sun Feb 03 18:11:39 PST 2013 I am thinking that I am missing something in the SolrJ configuration, though it could be in how I structure the query; for now, here is the simplistic way I setup SolrJ: HttpSolrServer server = new HttpSolrServer(solrURL); server.setParser(new XMLResponseParser()) Is there something I am missing to retain dates as Solr stores them? Quick note: setting the parser is NOT necessary unless you are trying to connect radically different versions of Solr and SolrJ (1.x and 3.x/later, to be precise), and will in fact make SolrJ slightly slower when contacting Solr. Just let it use the default javabin parser -- it's faster. If your date field in Solr is an actual date type, then you should be getting back a Date object in Java which you can manipulate in all the usual Java ways. The format that you are seeing matches the toString() output from a Date object: http://docs.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%2 9 You'll almost certainly have to cast the object so it's the right type: Date dateField = (Date) doc.get(datefieldname); Thanks, Shawn
SolrJ | Atomic Updates | How works exactly?
i have very big documents in the index. i want to update a multivalue field of a document, without loading the whole document. how can i do this? is there somewhere a good documentation? regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exactly-tp4032976.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: CoreAdmin STATUS performance
Thanks for sharing this info, Per - this info may prove to be valuable for me in the future. Shahar. -Original Message- From: Per Steffensen [mailto:st...@designware.dk] Sent: Thursday, January 10, 2013 6:10 PM To: solr-user@lucene.apache.org Subject: Re: CoreAdmin STATUS performance The collections are created dynamically. Not on update though. We use one collection per month and we have a timer-job running (every hour or so), which checks if all collections that need to exist actually does exist - if not it creates the collection(s). The rule is that the collection for next month has to exist as soon as we enter current month, so the first time the timer job runs e.g. 1. July it will create the August-collection. We never get data with timestamp in the future. Therefore if the timer-job just gets to run once within every month we will always have needed collections ready. We create collections using the new Collection API in Solr. Be used to manage creation of every single Shard/Replica/Core of the collections during the Core Admin API in Solr, but since an Collection API was introduced we decided that we better use that. In 4.0 it did not have the features we needed, which triggered SOLR-4114, SOLR-4120 and SOLR-4140 which will be available in 4.1. With those features we are now using the Collection API. BTW, our timer-job also handles deletion of old collections. In our system you can configure how many historic month-collection you will keep before it is ok to delete them. Lets say that this is configured to 3, as soon at it becomes 1. July the timer-job will delete the March-collection (the historic collections to keep will just have become April-, May- and June-collections). This way we will always have a least 3 months of historic data, and last in a month close to 4 months of history. It does not matter that we have a little to much history, when we just do not go below the lower limit on lenght of historic data. We also use the new Collection API for deletion. Regards, Per Steffensen On 1/10/13 3:04 PM, Shahar Davidson wrote: Hi Per, Thanks for your reply! That's a very interesting approach. In your system, how are the collections created? In other words, are the collections created dynamically upon an update (for example, per new day)? If they are created dynamically, who handles their creation (client/server) and how is it done? I'd love to hear more about it! Appreciate your help, Shahar. -Original Message- From: Per Steffensen [mailto:st...@designware.dk] Sent: Thursday, January 10, 2013 1:23 PM To: solr-user@lucene.apache.org Subject: Re: CoreAdmin STATUS performance On 1/10/13 10:09 AM, Shahar Davidson wrote: search request, the system must be aware of all available cores in order to execute distributed search on_all_ relevant cores For this purpose I would definitely recommend that you go SolrCloud. Further more we do something ekstra: We have several collections each containing data from a specific period in time - timestamp of ingoing data decides which collection it is indexed into. One important search-criteria for our clients are search on timestamp-interval. Therefore most searches can be restricted to only consider a subset of all our collections. Instead of having the logic calculating the subset of collections to search (given the timestamp search-interval) in clients, we just let clients do dumb searches by giving the timestamp-interval. The subset of collections to search are calculated on server-side from the timestamp-interval in the search-query. We handle this in a Solr SearchComponent which we place early in the chain of SearchComponents. Maybe you can get some inspiration by this approach, if it is also relevant for you. Regards, Per Steffensen Email secured by Check Point Email secured by Check Point
RE: CoreAdmin STATUS performance
Shawn, Per and anyone else who has participated in this thread - thank you! I have finally resorted to apply a minor patch the Solr code. I have noticed that most of the time of the STATUS request is spent when collecting Index related info (such as segmentCount, sizeInBytes, numDocs, etc.). In the STATUS request I added support for a new parameter which, if present, will skip collection of the Index info (hence will only return general static info, among it the core name) - this will, in fact, cut down the request time by an order of two magnitudes! In my case, it decreased the request time from around 800ms to around 1ms-4ms. Regards, Shahar. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, January 10, 2013 5:14 PM To: solr-user@lucene.apache.org Subject: Re: CoreAdmin STATUS performance On 1/10/2013 2:09 AM, Shahar Davidson wrote: As for your first question, the core info needs to be gathered upon every search request because cores are created dynamically. When a user initiates a search request, the system must be aware of all available cores in order to execute distributed search on _all_ relevant cores. (the user must get reliable and most up to date data) The reason that 800ms seems a lot to me is because the overall execution time takes about 2500ms and a large part of it is due to the STATUS request. The minimal interval concept is a good idea and indeed we've considered it, yet it poses a slight problem when building a RT system which needs to return to most up to date data. I am just trying to understand if there's some other way to hasten the STATUS reply (for example, by asking the STATUS request to return just certain core attributes, such as name, instead of collecting everything) Are there a *huge* number of SolrJ clients in the wild, or is it something like a server farm where you are in control of everything? If it's the latter, what I think I would do is have an asynchronous thread that periodically (every few seconds) updates the client's view of what cores exist. When a query is made, it will use that information, speeding up your queries by 800 milliseconds and ensuring that new cores will not have long delays before they become searchable. If you have a huge number of clients in the wild, it would still be possible, but ensuring that those clients get updated might be hard. If you also delete cores as well as add them, that complicates things. You'd have to have the clients be smart enough to exclude the last core on the list (by whatever sorting mechanism you require), and you'd have to wait long enough (30 seconds, maybe?) before *actually* deleting the last core to be sure that no clients are accessing it. Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 4.0. SolrCloud manages your cores for you automatically. You'd probably be using a slightly customized SolrCloud, including the custom hashing capability added by SOLR-2592. I don't know what other customizations you might need. Thanks, Shawn Email secured by Check Point
Re: CoreAdmin STATUS performance
Shahar would you mind, if i ask you to open an jira-issue for that? attaching your changes as typical patch? perhaps we could use that for the UI, in those cases where we don't need to full set of information .. Stefan On Sunday, January 13, 2013 at 12:28 PM, Shahar Davidson wrote: Shawn, Per and anyone else who has participated in this thread - thank you! I have finally resorted to apply a minor patch the Solr code. I have noticed that most of the time of the STATUS request is spent when collecting Index related info (such as segmentCount, sizeInBytes, numDocs, etc.). In the STATUS request I added support for a new parameter which, if present, will skip collection of the Index info (hence will only return general static info, among it the core name) - this will, in fact, cut down the request time by an order of two magnitudes! In my case, it decreased the request time from around 800ms to around 1ms-4ms. Regards, Shahar. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, January 10, 2013 5:14 PM To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Subject: Re: CoreAdmin STATUS performance On 1/10/2013 2:09 AM, Shahar Davidson wrote: As for your first question, the core info needs to be gathered upon every search request because cores are created dynamically. When a user initiates a search request, the system must be aware of all available cores in order to execute distributed search on _all_ relevant cores. (the user must get reliable and most up to date data) The reason that 800ms seems a lot to me is because the overall execution time takes about 2500ms and a large part of it is due to the STATUS request. The minimal interval concept is a good idea and indeed we've considered it, yet it poses a slight problem when building a RT system which needs to return to most up to date data. I am just trying to understand if there's some other way to hasten the STATUS reply (for example, by asking the STATUS request to return just certain core attributes, such as name, instead of collecting everything) Are there a *huge* number of SolrJ clients in the wild, or is it something like a server farm where you are in control of everything? If it's the latter, what I think I would do is have an asynchronous thread that periodically (every few seconds) updates the client's view of what cores exist. When a query is made, it will use that information, speeding up your queries by 800 milliseconds and ensuring that new cores will not have long delays before they become searchable. If you have a huge number of clients in the wild, it would still be possible, but ensuring that those clients get updated might be hard. If you also delete cores as well as add them, that complicates things. You'd have to have the clients be smart enough to exclude the last core on the list (by whatever sorting mechanism you require), and you'd have to wait long enough (30 seconds, maybe?) before *actually* deleting the last core to be sure that no clients are accessing it. Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 4.0. SolrCloud manages your cores for you automatically. You'd probably be using a slightly customized SolrCloud, including the custom hashing capability added by SOLR-2592. I don't know what other customizations you might need. Thanks, Shawn Email secured by Check Point
Re: Probleme running solr 4.0 in Websphere 7
This looks like you're getting old jars somewhere in your classpath on the websphere box. I know some classes have moved around between 3.6 and 4.0. It's tedious, but take a look at your solr log file, you should see a bunch of messages about what jars are being loaded. Do all of them look correct? Best Erick On Thu, Jan 10, 2013 at 3:46 PM, Riad I.A riad...@hotmail.com wrote: I'm trying tu run solr 4.0 on Websphere 7 and have problems on starting solr. I tried with solr 3.6 and everythings ok as I can access the admin UI. For solr 4.0 when I try to access the admin page I have ClassNotFoundException on solr.WhitespaceTokenizerFactory I noticed that many analyser classes were moved from solr-core to lucen-core analyser package. to solve the problem I replaced the shorthand core with org.apache.lucene.analyser.core in the schema.xml file and the problem desepear but got another ClassNotFoundExceptionabout another TokenizerFactory or filter and again replaced with the right package and so forth. starngely I hadn't those kind of errors under Tomcat by using the same War ! is there any special config to do with Websphere 7 to get it run ? thanks for your help
Re: Suggestion that preserve original phrase case
One way I've seen this done is to index pairs like lowercaseversion:LowerCaseVersion. You can't push this whole thing through your field as defined since it'll all be lowercased, you have to produce the left hand side of the above yourself and just use KeywordTokenizer without LowercaseFilter. Then, your application displays the right-hand-side of the returned token. Simple solution, not very elegant, but sometimes the easiest... Best Erick On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote: Hi*, * I have been trying to figure out a way for case insensitive suggestion but which should return original phrase as result.* *I am using* *solr 3.5* * *For eg: * If I index 'Hello world' and search for 'hello' it needs to return *'Hello world'* not *'hello world'. *My configurations are as follows,* * * New field type:* fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer *Field values*: field name=label type=text indexed=true stored=true termVectors=true omitNorms=true/ field name=label_autocomplete type=text_auto indexed=true stored=true multiValued=false/ copyField source=label dest=label_autocomplete / *Spellcheck Component*: searchComponent name=suggest class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_auto/str lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=buildOnOptimizetrue/str str name=buildOnCommittrue/str str name=fieldlabel_autocomplete/str /lst /searchComponent Kindly share your suggestions to implement this behavior. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
Re: SolrCloud removing shard (how to not loose data)
I don't think this will work in the long run with Solr4 (not sure you're using this or not). Solr4 will assign updates to a shard based on a hash of the uniqueKey. So let's say you have docs on your original three shards: shard 1 has docs 1, 4, 7 shard 2 has docs 2, 5, 8 shard 3 has docs 3, 6, 9 Now you merge shards 2 and 3, and you have shard 1 - 1, 4, 7 shard 2 - 2, 3, 5, 6, 8, 9 Now if you update docs 1 or 2, everything's fine. But, if you re-index doc 3, it'll be assigned shard 1. Now you have two live documents on different shards with the same ID. You'll get both back for searches, one will be stale, etc. This is a Bad Thing. And even if you're on 3.x and assigning docs to shards yourself, you now have pretty unbalanced shards, shard2 is twice as big as shard1. NOTE: The actual doc-shard assignment is NOT a simple round-robin, this is just for illustration Unless re-indexing is _really_ expensive, I'd just count on re-indexing when changing the number of shards. At least until shard splitting is in place for Solr4. And I'm not sure shard splitting will also handle shard merging, I'd check before assuming so... Best Erick On Fri, Jan 11, 2013 at 8:47 AM, mizayah miza...@gmail.com wrote: Seams I'm to lazy. I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works rly. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent
I admit I only skimmed your post, and at the level you're at I'm not sure how to hook it in, but see: https://issues.apache.org/jira/browse/SOLR-2429(SOLR-3.4) which allows you to specify cache=false which will specifically NOT put the filter into the filter cache at the Solr level. Best Erick On Fri, Jan 11, 2013 at 11:24 AM, radu radu.moldo...@atigeo.com wrote: Hello thank you in advance for your help!, *Context:* I have implemented a custom search component that receives 3 parameters field, termValue and payloadX. The component should search for a termValue in the requested Lucene field and for each *termValue* to check *payloadX* in its associated payload the information. *Constraints:* I don't want to disable filterCache from solconfig.xml the filterCache class=solr.FastLRUCache since I have other searchComponents that could use the filterCache. I have implemented this the payload search using SpanTermQuery and attached it to q:field=termValue public class MySearchComponent extends XPatternsSearchComponent { public void prepare(ResponseBuilder rb){ ... rb.setQueryString(parameters.**get(CommonParams.Q)... } public void process(ResponseBuilder rb) { ... SolrIndexSearcher.QueryResult queryResult = new SolrIndexSearcher.QueryResult(**);// ??? question for help *CustomSpanTermQuery* customFilterQuery = new CustomSpanTermQuery(field, term, payload); //search for payloadCriteria in the payload in a specific field for a specific term QueryCommand queryCommand = rb.getQueryCommand().** setFilterList(filterQuery)); rb.req.getSearcher().search(**queryResult, queryCommand); ... } *Issue:* If I call the search component with field1, termValue1 and: - *payload1*(the first search) the result from filtering it is saved in filterCache. - *payload2*(second time) the results from the first search(filterCache) are returned and not a different expected result set. Findings: I noticed that in SolrIndexSearch, filterCache is private so I can not change\clear it through inheritance. Also I tried to use rb.getQueryCommand().**replaceFlags() but SolrIndexSearch.NO_CHECK_**FILTERCACHE|NO_CHECK_QCACHE|**NO_SET_QCACHE are not public too. *Question*: How to disable\clear filterCache(from SolrIndexSearcher ) *only *for a custom search component. Do I have other options\approaches? Best regards, Radu
Re: Solr 4.0, slow opening searchers
In addition to Alan's comment, are you doing any warmup queries? Your Solr logs should show you some interesting stats, and the admin page also has some stats about warmup times. Although I'd expect similar issues when reopening searchers if it was just warmup queries. But 267M docs on a single machine (spread over 9 cores or not) is quite a lot (depending, of course, on how beefy the machine is and the characteristics of your corpus). It's possible you're just I/O bound at startup, experiencing memory pressure, etc. that is, your index is just too large for your hardware. I've seen machines vary from 10M to 300M docs being reasonable. FWIW, Erick On Fri, Jan 11, 2013 at 12:31 PM, Alan Woodward a...@flax.co.uk wrote: Hi Marcel, Are you committing data with hard commits or soft commits? I've seen systems where we've inadvertently only used soft commits, which means that the entire transaction log has to be re-read on startup, which can take a long time. Hard commits flush indexed data to disk, and make it a lot quicker to restart. Alan Woodward a...@flax.co.uk On 11 Jan 2013, at 13:51, Marcel Bremer wrote: Hi, We're experiencing slow startup times of searchers in Solr when containing a large number of documents. We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, spread across 9 cores. These documents contain keywords, with additional statistics, which we are using for suggestions and related keywords. When we (re)start Solr on one of our servers it can take up to two hours before Solr has opened all of it's searchers and starts accepting connections again. We can't figure out why it takes so long to open those searchers. Also the CPU and memory usage of Solr while opening searchers is not extremely high. Are there any known issues or tips someone could give us to speed up opening searchers? If you need more details, please ping me. Best regards, Marcel Bremer Vinden.nl BV
Re: SolrJ | Atomic Updates | How works exactly?
Atomic updates work by storing (stored=true) all the fields (note, you don't have to set stored=true for the destinations of copyField). Anyway, when you use the atomic update syntax under the covers Solr reads all the stored fields out, re-assembles the document and re-indexes it. So your index may be significantly larger. Also note that in the 4.1 world, stored fields are automatically compressed so this may not be so much of a problem. And, there's been at least 1 or 2 fixes to this since 4.0 as I remember, so you might want to wait for 4.1 to experiment with (there's talk of cutting RC1 for Solr4.1 early next week) or use a nightly build. Best Erick On Sun, Jan 13, 2013 at 3:43 AM, uwe72 uwe.clem...@exxcellent.de wrote: i have very big documents in the index. i want to update a multivalue field of a document, without loading the whole document. how can i do this? is there somewhere a good documentation? regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exactly-tp4032976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index sharing between multiple slaves
Sorry, might have shared more info. Planning to have Index files in NAS and share these index files across multiple nodes. We have 4 slave nodes. For redundancy we might be having 2 nodes per a shared index. Any issues you foresee with this. I will post details once we test this. Cheers, -- View this message in context: http://lucene.472066.n3.nabble.com/Index-sharing-between-multiple-slaves-tp4025996p4033006.html Sent from the Solr - User mailing list archive at Nabble.com.
How to manage solr cloud collections-sharding?
Hi, I know a few question on this issue have already been posted, but I dint find full answers in any of those posts. I'm using solr-4.0.0 I need my solr cluster to have multiple collections, each collection with different configuration (at least different schema.xml file). I follow the solrCloud tutorial page and execute this command: /java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=5 -jar start.jar/ when I start a solr servers I have collection1 in clustserState.json with each node assigned to some shard. questions so far: 1.Is this first command 100% necessary? 2. Do I have to defined the number of shards before starting solr instances? 3. What if I want to add a shard after I started all solr instances and haven't indexed yet? 4. what if I want to add a shard after indexing? 5. what is the role that clustserState.json plays? is it just a json file to show in the GUI? Or is it the only file that persists the current state of the cluster? 6. Can I edit it manually? should I? I add another schema-B.xml file to the zookeeper and open another collection by using coreAdmin Rest API. I want this collection to have 10 shards and not 5 as I defined for the previous collection. So I run /http://server:port/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=datashard=shard// 10 times with different / each run. questions: 1. is this an appropriate way to use the core admin API? should I specify the shard Id? I do it because it gives me a way to control the number of shards (each new shard id creates a new shard). but should I use it this way? 2. Can I have different number of shards in different collections on the same cluster? 3. If yes - then what is the purpose of the first bootstrap command? another question: I saw that in 4.1 version, each shard has another parameter - range. what is this parameter used for? would I have to re-index when upgrading from 4.0 to 4.1? this will help a lot in understanding the whole collection-sharding architecture in solr cloud. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-manage-solr-cloud-collections-sharding-tp4033009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NULL POINTER EXCEPTION WITH SOLR SUGGESTER
What URL did you use? What is your data like? I tried your exact config but with the field name of name rather than spell_check, using the Solr 4.0 example. Then I added the following data: curl http://localhost:8983/solr/update?commit=true -H 'Content-type:application/csv' -d ' id,name sug-1,aardvark abacus ball bill cat cello sug-2,abate accord band bell cattle check sug-3,adorn border clean clock' Then I issued a suggest request using curl and got the expected response: Jack Krupansky@JackKrupansky ~ $ curl http://localhost:8983/solr/suggest?q=bindent=true; ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=spellcheck lst name=suggestions lst name=b int name=numFound5/int int name=startOffset0/int int name=endOffset1/int arr name=suggestion strball/str strband/str strbell/str strbill/str strborder/str /arr /lst str name=collationball/str /lst /lst /response So, try that simple example first and make sure it works for you, then see what else is different in your failing scenario. -- Jack Krupansky -Original Message- From: obi240 Sent: Saturday, January 12, 2013 12:15 PM To: solr-user@lucene.apache.org Subject: NULL POINTER EXCEPTION WITH SOLR SUGGESTER Hi, I'm currently working with SOLR 4. I tried calling my suggester feature and got the error below: 5001java.lang.NullPointerException at org.apache.lucene.search.suggest.fst.FSTCompletionLookup.lookup(FSTCompletionLookup.java:237) at org.apache.solr.spelling.suggest.Suggester.getSuggestions(Suggester.java:190) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:964) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 500 My suggest searchcomponent and request handler are as below: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str str name=fieldspell_check/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Can anyone point out what I'm doing wrong here? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/NULL-POINTER-EXCEPTION-WITH-SOLR-SUGGESTER-tp4032763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index sharing between multiple slaves
It can work, so I believe. However, it is not normal Solr usage, so you are less likely to find people who can support you in it. Upayavira On Sun, Jan 13, 2013, at 03:59 PM, suri wrote: Sorry, might have shared more info. Planning to have Index files in NAS and share these index files across multiple nodes. We have 4 slave nodes. For redundancy we might be having 2 nodes per a shared index. Any issues you foresee with this. I will post details once we test this. Cheers, -- View this message in context: http://lucene.472066.n3.nabble.com/Index-sharing-between-multiple-slaves-tp4025996p4033006.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: SolrJ | Atomic Updates | How works exactly?
Thanks erick, the main reason why i want to use atomic updates is, to increase updating existing kind of large documents. So if under to cover, everything is the same (loading the whole doc, updating, re-index the whole doc) it is not interesting for me anymore. What is the best the most performant way to update a large document? Any recommendations? THANKS! -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Sonntag, 13. Januar 2013 16.53 An: solr-user@lucene.apache.org Betreff: Re: SolrJ | Atomic Updates | How works exactly? Atomic updates work by storing (stored=true) all the fields (note, you don't have to set stored=true for the destinations of copyField). Anyway, when you use the atomic update syntax under the covers Solr reads all the stored fields out, re-assembles the document and re-indexes it. So your index may be significantly larger. Also note that in the 4.1 world, stored fields are automatically compressed so this may not be so much of a problem. And, there's been at least 1 or 2 fixes to this since 4.0 as I remember, so you might want to wait for 4.1 to experiment with (there's talk of cutting RC1 for Solr4.1 early next week) or use a nightly build. Best Erick On Sun, Jan 13, 2013 at 3:43 AM, uwe72 uwe.clem...@exxcellent.de wrote: i have very big documents in the index. i want to update a multivalue field of a document, without loading the whole document. how can i do this? is there somewhere a good documentation? regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exac tly-tp4032976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ | Atomic Updates | How works exactly?
On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de wrote: What is the best the most performant way to update a large document? That *is* the best way to update a large document that we currently have. Although it re-indexes under the covers, it ensures that it's atomic, and it's faster because it does everything in a single request. -Yonik http://lucidworks.com
AW: SolrJ | Atomic Updates | How works exactly?
Thanks Yonik. Is this already working well on solr 4.0? or better to wait until solr 4.1?! -Ursprüngliche Nachricht- Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik Seeley Gesendet: Sonntag, 13. Januar 2013 20.24 An: solr-user@lucene.apache.org Betreff: Re: SolrJ | Atomic Updates | How works exactly? On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de wrote: What is the best the most performant way to update a large document? That *is* the best way to update a large document that we currently have. Although it re-indexes under the covers, it ensures that it's atomic, and it's faster because it does everything in a single request. -Yonik http://lucidworks.com
Re: Question about dates and SolrJ
Thanks Shawn. I stopped setting the parser as suggested. I found that what I had to do is to just store Date objects in my documents, then, at the last minute, when building a SolrDocument to send, convert with DateField. When I Export to XML, I export to that DateField string, then convert the zulu string back to a Date object as needed. Seems to be working fine now. Many thanks Jack On Sat, Jan 12, 2013 at 10:52 PM, Shawn Heisey s...@elyograg.org wrote: On 1/12/2013 7:51 PM, Jack Park wrote: My work engages SolrJ, with which I send documents off to Solr 4 which properly store, as viewed in the admin panel, as this example: 2013-02-04T02:11:39.995Z When I retrieve a document with that date, I use the SolrDocument returned as a MapString,Object in which the date now looks like this: Sun Feb 03 18:11:39 PST 2013 I am thinking that I am missing something in the SolrJ configuration, though it could be in how I structure the query; for now, here is the simplistic way I setup SolrJ: HttpSolrServer server = new HttpSolrServer(solrURL); server.setParser(new XMLResponseParser()) Is there something I am missing to retain dates as Solr stores them? Quick note: setting the parser is NOT necessary unless you are trying to connect radically different versions of Solr and SolrJ (1.x and 3.x/later, to be precise), and will in fact make SolrJ slightly slower when contacting Solr. Just let it use the default javabin parser -- it's faster. If your date field in Solr is an actual date type, then you should be getting back a Date object in Java which you can manipulate in all the usual Java ways. The format that you are seeing matches the toString() output from a Date object: http://docs.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%29 You'll almost certainly have to cast the object so it's the right type: Date dateField = (Date) doc.get(datefieldname); Thanks, Shawn
SolrCloud sort inconsistency
How is possible that this sorted query returns different results? The highest value is the id P2450024023, sometimes the value returned is not the highest. This is an example, the second curl request is the correct result. NOTE: I did the query when a indexing process was running. ➜ ~ curl -H Cache-Control: no-cache http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False { responseHeader:{ status:0, QTime:5, params:{ cache:False, rows:10, fl:id=sort=id desc, q:id:*}}, response:{numFound:2387312,start:0,maxScore:1.0,docs:[ { id:P2443605077}, { id:P2443588094}, { id:P2443647855}, { id:P2443613193}, { id:P2443572098}, { id:P2443562507}, { id:P2443643935}, { id:P2443556464}, { id:P2443625267}, { id:P2443580781}] }} ➜ ~ curl -H Cache-Control: no-cache http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False { responseHeader:{ status:0, QTime:4, params:{ cache:False, rows:10, fl:id=sort=id desc, q:id:*}}, response:{numFound:2387312,start:0,maxScore:1.0,docs:[ { id:P2450024023}, { id:P2450017490}, { id:P2450062568}, { id:P2450053498}, { id:P2449990839}, { id:P2449973572}, { id:P2449957535}, { id:P2450099098}, { id:P2450090195}, { id:P2450072528}] }} ➜ ~ curl -H Cache-Control: no-cache http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False { responseHeader:{ status:0, QTime:6, params:{ cache:False, rows:10, fl:id=sort=id desc, q:id:*}}, response:{numFound:2387312,start:0,maxScore:1.0,docs:[ { id:P2450024023}, { id:P2450017490}, { id:P2450062568}, { id:P2450053498}, { id:P2449990839}, { id:P2449973572}, { id:P2449957535}, { id:P2450099098}, { id:P2450090195}, { id:P2450072528}] }} ➜ ~ - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-sort-inconsistency-tp4033046.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ | Atomic Updates | How works exactly?
This is present in 4.0. Not sure if there re ny improvements in 4.1. Upayavira On Sun, Jan 13, 2013, at 07:35 PM, Uwe Clement wrote: Thanks Yonik. Is this already working well on solr 4.0? or better to wait until solr 4.1?! -Ursprüngliche Nachricht- Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik Seeley Gesendet: Sonntag, 13. Januar 2013 20.24 An: solr-user@lucene.apache.org Betreff: Re: SolrJ | Atomic Updates | How works exactly? On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de wrote: What is the best the most performant way to update a large document? That *is* the best way to update a large document that we currently have. Although it re-indexes under the covers, it ensures that it's atomic, and it's faster because it does everything in a single request. -Yonik http://lucidworks.com
Re: SolrJ | Atomic Updates | How works exactly?
There's several JIRA issues, but I several were duplicates of the same underlying issue: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20issuetype%20%3D%20Bug%20AND%20fixVersion%20%3D%20%224.1%22%20AND%20status%20%3D%20Resolved%20AND%20text%20~%20%22atomic%20update%22 Erik On Jan 13, 2013, at 19:49 , Upayavira wrote: This is present in 4.0. Not sure if there re ny improvements in 4.1. Upayavira On Sun, Jan 13, 2013, at 07:35 PM, Uwe Clement wrote: Thanks Yonik. Is this already working well on solr 4.0? or better to wait until solr 4.1?! -Ursprüngliche Nachricht- Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik Seeley Gesendet: Sonntag, 13. Januar 2013 20.24 An: solr-user@lucene.apache.org Betreff: Re: SolrJ | Atomic Updates | How works exactly? On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de wrote: What is the best the most performant way to update a large document? That *is* the best way to update a large document that we currently have. Although it re-indexes under the covers, it ensures that it's atomic, and it's faster because it does everything in a single request. -Yonik http://lucidworks.com
RSS tutorial that comes with the apache-solr not indexing
Hi I am trying to use the RSS tutorial that comes with the apache-solr. I am not sure if I missed anything but when I do full-import no indexing happens. These are the steps that I am taking: 1) Download apache-solr-3.6.2 (http://lucene.apache.org/solr/) 2) Start the solr by doing: java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar 3) Goto url: http://192.168.1.12:8983/solr/rss/dataimport?command=full-import 4) When I do this it says: Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Now I know that the default example is getting the RSS from: http://rss.slashdot.org/Slashdot/slashdot This default example is empty when I view it in chrome. It does have XML data in the source but I am not sure if this has anything to do with the import failure. I also modified the rss-config so that I can test other RSS sources. I used http://www.feedforall.com/sample.xml and updated the rss-config.xml but this did the same and did not Add/Update any documents. Any help is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067.html Sent from the Solr - User mailing list archive at Nabble.com.